Pandas Extract Number from String

Give it a regex capture group:

df.A.str.extract('(\d+)')

Gives you:

0      1
1    NaN
2     10
3    100
4      0
Name: A, dtype: object

How to Extract Numbers from String Column in Pandas with decimal?

If you want to match the numbers followed by OZ You could write the pattern as:

(\d*\.?\d+)\s*OZ\b

Explanation

( Capture group 1 (the value will be picked up be str.extract)
\d*\.?\d+ Match optional digits, optional dot and 1+ digits
) Close group 1
\s*OZ\b Match optional whitspace chars and then OZ followed by a word boundary

See a regex demo.

import pandas as pd

data= [
    "tld los 16OZ",
    "HSJ14 OZ",
    "hqk 28.3 OZ",
    "rtk .7 OZ",
    "ahdd .92OZ",
    "aje 0.22 OZ"
]

df = pd.DataFrame(data, columns=["Product"])
df['Numbers'] =  df['Product'].str.extract(r'(\d*\.?\d+)\s*OZ\b')
print(df)

Output

        Product Numbers
0  tld los 16OZ      16
1      HSJ14 OZ      14
2   hqk 28.3 OZ    28.3
3     rtk .7 OZ      .7
4    ahdd .92OZ     .92
5   aje 0.22 OZ    0.22

Extract int from string in Pandas

You can convert to string and extract the integer using regular expressions.

df['B'].str.extract('(\d+)').astype(int)

Extract only numbers from string with python

Your regex doesn't do what you think it does. What you have is a character class, which matches any of the characters in the set ?: \t\r\n\f\v0-9+. So when the regex encounters the first non-matching character (P for your sample data) it stops. It's probably simpler to use replace to get rid of non-whitespace and digit characters:

df = pd.DataFrame({'data':['86531 86530 86529PIP 91897PIP']})
df['data'].str.replace('([^\s\d])', '', regex=True)

Which for your data will give:

86531 86530 86529 91897

How do I extract numbers from the strings in a pandas column of 'object'?

I would use str.extract here:

df['x'] = pd.to_numeric(df['x'].str.extract(r'^(\d+)'))

The challenge with trying to use a pure substring approach is that we don't necessarily know how many characters to take. Regex gets around this problem.

Extract only numbers and only string from pandas dataframe

Your code is on the right track, you just need to account for the decimals and the possibility of integers :

df_num['colors_num'] = df_num.Colors.str.extract(r'(\d+[.\d]*)')
df_num['animals_num'] = df_num.Animals.str.extract(r'(\d+[.\d]*)')
df_num['colors_str'] = df_num.Colors.str.replace(r'(\d+[.\d]*)','')
df_num['animals_text'] = df_num.Animals.str.replace(r'(\d+[.\d]*)','')

    Colors  Animals colors_num  animals_num colors_str  animals_text
0   lila1.5 hu11nd  1.5 11  lila    hund
1   rosa2.5 12welpe 2.5 12  rosa    welpe
2   gelb3.5 13katze 3.5 13  gelb    katze
3   grün4   s14chlange  4   14  grün    schlange
4   rot5    vo15gel 5   15  rot vogel
5   schwarz6    16papagei   6   16  schwarz papagei
6   grau7   ku17h   7   17  grau    kuh
7   weiß8   18ziege 8   18  weiß    ziege
8   braun9  19pferd 9   19  braun   pferd
9   hellblau10  esel20  10  20  hellblau    esel

Extract numbers from strings in python

Assuming you expect only one number per column, you could try using str.extract here:

df["some_col"] = df["some_col"].str.extract(r'(\d+(?:\.\d+)?)')

How to extract numbers from a string in Python?

If you only want to extract only positive integers, try the following:

>>> txt = "h3110 23 cat 444.4 rabbit 11 2 dog"
>>> [int(s) for s in txt.split() if s.isdigit()]
[23, 11, 2]

I would argue that this is better than the regex example because you don't need another module and it's more readable because you don't need to parse (and learn) the regex mini-language.

This will not recognize floats, negative integers, or integers in hexadecimal format. If you can't accept these limitations, jmnas's answer below will do the trick.

Pandas Extract Number from String