Pandas Extract Numbers from Column into New Columns

Pandas extract numbers from column into new columns

Using extractall

df[['x', 'y', 'w', 'h']] = df['rect'].str.extractall('(\d+)').unstack().loc[:,0]
Out[267]:
match 0 1 2 3
0 120 168 260 120
1 120 168 260 120
2 120 168 260 120
3 120 168 260 120
4 120 168 260 120

Extract number from column to make a new column in Pandas

Here is my solution, you can copy and paste to use it:

df['Rate_New'] = df.Rate.apply(lambda x: float(x.replace("$","").replace("/Wh","")))

Or this, no apply, no attribute:

df["Rate"].str.replace("$","").str.replace("/Wh","")

Here is the version using regex, no attribute-style no apply.

repl = lambda m: m.group(1)
df["Rate"].str.replace(r'\$(.+?)\/Wh', repl, regex=True)

Extract numbers from string column from Pandas DF

Just so I understand, you're trying to avoid capturing decimal parts of numbers, right? (The (?:\.\d+)? part.)

First off, you need to use pd.Series.str.extractall if you want all the matches; extract stops after the first.

Using your df, try this code:

# Get a multiindexed dataframe using extractall
expanded = df.Info.str.extractall(r"(\d+(?:\.\d+)?)")

# Pivot the index labels
df_2 = expanded.unstack()

# Drop the multiindex
df_2.columns = df_2.columns.droplevel()


# Add the columns to the original dataframe (inplace or make a new df)
df_combined = pd.concat([df, df_2], axis=1)

Output df

Python - extract the largest number from the string in the column into a new column

Use extractall to get all the digit groups, convert them to integers, then max on the level:

# use pat = '(\d+)' of you want the digits mixed in text, e.g. `078`
pat = r'\b(\d+)\b'
df['Number'] = df['col1'].str.extractall(pat).astype(int).max(level=0)

Output:

                 col1  col2  Number
0 tom 11 abc 100 10 100
1 nick12 text 1 1000 15 1000
2 juli078 aq 199 299 14 299

How to Extract Numbers from String Column in Pandas with decimal?

If you want to match the numbers followed by OZ You could write the pattern as:

(\d*\.?\d+)\s*OZ\b

Explanation

  • ( Capture group 1 (the value will be picked up be str.extract)
  • \d*\.?\d+ Match optional digits, optional dot and 1+ digits
  • ) Close group 1
  • \s*OZ\b Match optional whitspace chars and then OZ followed by a word boundary

See a regex demo.

import pandas as pd

data= [
"tld los 16OZ",
"HSJ14 OZ",
"hqk 28.3 OZ",
"rtk .7 OZ",
"ahdd .92OZ",
"aje 0.22 OZ"
]

df = pd.DataFrame(data, columns=["Product"])
df['Numbers'] = df['Product'].str.extract(r'(\d*\.?\d+)\s*OZ\b')
print(df)

Output

        Product Numbers
0 tld los 16OZ 16
1 HSJ14 OZ 14
2 hqk 28.3 OZ 28.3
3 rtk .7 OZ .7
4 ahdd .92OZ .92
5 aje 0.22 OZ 0.22

python pandas extracting numbers within text to a new column

Use Regex.

Ex:

import pandas as pd

df = pd.DataFrame({"A": ["hellothere_3.43", "hellothere_3.9"]})
df["B"] = df["A"].str.extract("(\d*\.?\d+)", expand=True)
print(df)

Output:

                 A     B
0 hellothere_3.43 3.43
1 hellothere_3.9 3.9

How do I extract numbers from the strings in a pandas column of 'object'?

I would use str.extract here:

df['x'] = pd.to_numeric(df['x'].str.extract(r'^(\d+)'))

The challenge with trying to use a pure substring approach is that we don't necessarily know how many characters to take. Regex gets around this problem.

Extract numbers in one column and add to another, if more than one number add them with a patter (Pandas)

Quick solution using a couple of list comprehensions and a regular expression.

import pandas as pd
import re

df = pd.DataFrame({
'CLIENT BENEFIT':['Client Sav']*3,
'SPECIFIC BENEFIT (EX. 15%, CPA)':['']*3,
'NOTES':['a & string 10 some characters %&/',
'a number 25 / another number 5 random stuff /(%',
'hi 5']})

df['SPECIFIC BENEFIT (EX. 15%, CPA)'] = [''.join(l) if len(l) < 2 else ('% / '.join(l) + '%') for l in [re.findall(r'-?\d+\.?\d*', s) for s in df['NOTES']]]

print(df[['CLIENT BENEFIT', 'SPECIFIC BENEFIT (EX. 15%, CPA)']])

Output:

  CLIENT BENEFIT SPECIFIC BENEFIT (EX. 15%, CPA)
0 Client Sav 10
1 Client Sav 25% / 5%
2 Client Sav 5


Related Topics



Leave a reply



Submit