How to Extract Numbers from a String in Python

How to extract numbers from a string in Python?

If you only want to extract only positive integers, try the following:

>>> txt = "h3110 23 cat 444.4 rabbit 11 2 dog"
>>> [int(s) for s in txt.split() if s.isdigit()]
[23, 11, 2]

I would argue that this is better than the regex example because you don't need another module and it's more readable because you don't need to parse (and learn) the regex mini-language.

This will not recognize floats, negative integers, or integers in hexadecimal format. If you can't accept these limitations, jmnas's answer below will do the trick.

Extract Number from String in Python

You can filter the string by digits using str.isdigit method,

>>> int(filter(str.isdigit, str1))
3158

Is there a better way to extract numbers from a string in python 3

Here's one way you can do the regex search that @Barmar suggested:

>>> import re
>>> int(re.search("\d+", "V70N-HN")[0])
70

Extracting numbers from a string with a special structure with regular expressions

Given what you said in your comment to me, I believe a more appropriate solution for your problem might be this:

import re

s = 'Resolution: 1200, Time: 16.255 (7.920 GFlop => 1487.23 MFlop/s, residual 0.007113, 500 iterations)'

pattern = re.compile(r"Resolution: (?P<resolution>\d+), Time: (?P<time>\d+\.\d+) \((?P<gflops>\d+\.\d+) GFlop => (?P<mflops>\d+\.\d+) MFlop/s, residual (?P<residual>\d+\.\d+), (?P<iterations>\d+) iterations\)")

m = pattern.match(s)

Because of the named capture groups, you can get each value individually:

m = pattern.match(s)
print(m.group('resolution')) # 1200
print(m.group('time')) # 16.255
print(m.group('gflops')) # 7.920
# ...

But it won't match any string that isn't formatted exactly like the one you supplied. For example:

assert pattern.match("90234.12 °C on Core 12") is None

Get only numbers from string in python

you can use regex:

import re
just = 'Standard Price:20000'
price = re.findall("\d+", just)[0]

OR

price = just.split(":")[1]

Extract numbers from string with backslash

You have confused the REPRESENTATION of your string with the CONTENT of your string. The string '183\118\40' contains 6 characters, NONE of which are backslashes. The "\11" is an octal character constant. Octal 11 is decimal 9, which is the tab character. The "\40" is also an octal character constant. Octal 40 is decimal 32, which is space.

If you really want that literal string, you need one of:

st = '183\\118\\40'
st = r'183\118\40'

Note that this only happens because you have typed it as a Python string constant. If you read that line in from file, it will work just fine.

How to Extract Numbers from String Column in Pandas with decimal?

If you want to match the numbers followed by OZ You could write the pattern as:

(\d*\.?\d+)\s*OZ\b

Explanation

  • ( Capture group 1 (the value will be picked up be str.extract)
  • \d*\.?\d+ Match optional digits, optional dot and 1+ digits
  • ) Close group 1
  • \s*OZ\b Match optional whitspace chars and then OZ followed by a word boundary

See a regex demo.

import pandas as pd

data= [
"tld los 16OZ",
"HSJ14 OZ",
"hqk 28.3 OZ",
"rtk .7 OZ",
"ahdd .92OZ",
"aje 0.22 OZ"
]

df = pd.DataFrame(data, columns=["Product"])
df['Numbers'] = df['Product'].str.extract(r'(\d*\.?\d+)\s*OZ\b')
print(df)

Output

        Product Numbers
0 tld los 16OZ 16
1 HSJ14 OZ 14
2 hqk 28.3 OZ 28.3
3 rtk .7 OZ .7
4 ahdd .92OZ .92
5 aje 0.22 OZ 0.22


Related Topics



Leave a reply



Submit