Pandas Extract Number from String
Give it a regex capture group:
df.A.str.extract('(\d+)')
Gives you:
0 1
1 NaN
2 10
3 100
4 0
Name: A, dtype: object
How to Extract Numbers from String Column in Pandas with decimal?
If you want to match the numbers followed by OZ
You could write the pattern as:
(\d*\.?\d+)\s*OZ\b
Explanation
(
Capture group 1 (the value will be picked up be str.extract)\d*\.?\d+
Match optional digits, optional dot and 1+ digits)
Close group 1\s*OZ\b
Match optional whitspace chars and thenOZ
followed by a word boundary
See a regex demo.
import pandas as pd
data= [
"tld los 16OZ",
"HSJ14 OZ",
"hqk 28.3 OZ",
"rtk .7 OZ",
"ahdd .92OZ",
"aje 0.22 OZ"
]
df = pd.DataFrame(data, columns=["Product"])
df['Numbers'] = df['Product'].str.extract(r'(\d*\.?\d+)\s*OZ\b')
print(df)
Output
Product Numbers
0 tld los 16OZ 16
1 HSJ14 OZ 14
2 hqk 28.3 OZ 28.3
3 rtk .7 OZ .7
4 ahdd .92OZ .92
5 aje 0.22 OZ 0.22
Extract int from string in Pandas
You can convert to string and extract the integer using regular expressions.
df['B'].str.extract('(\d+)').astype(int)
Extract only numbers from string with python
Your regex doesn't do what you think it does. What you have is a character class, which matches any of the characters in the set ?: \t\r\n\f\v0-9+
. So when the regex encounters the first non-matching character (P
for your sample data) it stops. It's probably simpler to use replace
to get rid of non-whitespace and digit characters:
df = pd.DataFrame({'data':['86531 86530 86529PIP 91897PIP']})
df['data'].str.replace('([^\s\d])', '', regex=True)
Which for your data will give:
86531 86530 86529 91897
How do I extract numbers from the strings in a pandas column of 'object'?
I would use str.extract
here:
df['x'] = pd.to_numeric(df['x'].str.extract(r'^(\d+)'))
The challenge with trying to use a pure substring approach is that we don't necessarily know how many characters to take. Regex gets around this problem.
Extract only numbers and only string from pandas dataframe
Your code is on the right track, you just need to account for the decimals and the possibility of integers :
df_num['colors_num'] = df_num.Colors.str.extract(r'(\d+[.\d]*)')
df_num['animals_num'] = df_num.Animals.str.extract(r'(\d+[.\d]*)')
df_num['colors_str'] = df_num.Colors.str.replace(r'(\d+[.\d]*)','')
df_num['animals_text'] = df_num.Animals.str.replace(r'(\d+[.\d]*)','')
Colors Animals colors_num animals_num colors_str animals_text
0 lila1.5 hu11nd 1.5 11 lila hund
1 rosa2.5 12welpe 2.5 12 rosa welpe
2 gelb3.5 13katze 3.5 13 gelb katze
3 grün4 s14chlange 4 14 grün schlange
4 rot5 vo15gel 5 15 rot vogel
5 schwarz6 16papagei 6 16 schwarz papagei
6 grau7 ku17h 7 17 grau kuh
7 weiß8 18ziege 8 18 weiß ziege
8 braun9 19pferd 9 19 braun pferd
9 hellblau10 esel20 10 20 hellblau esel
Extract numbers from strings in python
Assuming you expect only one number per column, you could try using str.extract
here:
df["some_col"] = df["some_col"].str.extract(r'(\d+(?:\.\d+)?)')
How to extract numbers from a string in Python?
If you only want to extract only positive integers, try the following:
>>> txt = "h3110 23 cat 444.4 rabbit 11 2 dog"
>>> [int(s) for s in txt.split() if s.isdigit()]
[23, 11, 2]
I would argue that this is better than the regex example because you don't need another module and it's more readable because you don't need to parse (and learn) the regex mini-language.
This will not recognize floats, negative integers, or integers in hexadecimal format. If you can't accept these limitations, jmnas's answer below will do the trick.
Related Topics
(Z3Py) Checking All Solutions for Equation
Python Locale Error: Unsupported Locale Setting
Display Loading Symbol While Waiting for a Result with Plot.Ly Dash
Closest Equivalent of a Factor Variable in Python Pandas
Xcode 3.2 Ruby and Python Templates
Log All Requests from the Python-Requests Module
MAC Osx Python Ssl.Sslerror: [Ssl: Certificate_Verify_Failed] Certificate Verify Failed (_Ssl.C:749)
Popen Waiting for Child Process Even When the Immediate Child Has Terminated
What Does "While True" Mean in Python
Parameterized Queries with Psycopg2/Python Db-API and Postgresql
Python Image Library Fails with Message "Decoder Jpeg Not Available" - Pil
Replicating Jupyter Notebook Pandas Dataframe HTML Printout
Numpy/Scipy Equivalent of R Ecdf(X)(X) Function
Does Ruby Support Conditional Regular Expressions
How to Link Pycharm with Pyspark
How to Change the Default MySQL Connection Timeout When Connecting Through Python