Extract Number from String in Python

How to extract numbers from a string in Python?

If you only want to extract only positive integers, try the following:

>>> txt = "h3110 23 cat 444.4 rabbit 11 2 dog"
>>> [int(s) for s in txt.split() if s.isdigit()]
[23, 11, 2]

I would argue that this is better than the regex example because you don't need another module and it's more readable because you don't need to parse (and learn) the regex mini-language.

This will not recognize floats, negative integers, or integers in hexadecimal format. If you can't accept these limitations, jmnas's answer below will do the trick.

Extract Number from String in Python

You can filter the string by digits using str.isdigit method,

>>> int(filter(str.isdigit, str1))
3158

Is there a better way to extract numbers from a string in python 3

Here's one way you can do the regex search that @Barmar suggested:

>>> import re
>>> int(re.search("\d+", "V70N-HN")[0])
70

Get only numbers from string in python

you can use regex:

import re
just = 'Standard Price:20000'
price = re.findall("\d+", just)[0]

OR

price = just.split(":")[1]

Extract numbers from an Array which has more than one string element

Use re.search, which extract the first match to the pattern of 1 or more digit, followed by 3 zeros.

import re

my_array = ['STK72184 4/28/2022 50 from Exchange Balance, 50 from Earning Balance & 10 from Bonus 5000 Regular 10/20/2023 Approved 4/28/2022',
'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 5000 Regular 10/19/2023 Approved 4/27/2022',
'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 15000 Regular 10/19/2023 Approved 4/27/2022',
'STK722222 4/26/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 10000 Regular 10/18/2023 Approved 4/26/2022']

# If you want strings:
nums = [re.search(r'\d+000', s)[0] for s in my_array]
print(nums)
# ['5000', '5000', '15000', '10000']

# If you want integers:
nums = [int(re.search(r'\d+000', s)[0]) for s in my_array]
print(nums)
# [5000, 5000, 15000, 10000]

How to Extract Numbers from String Column in Pandas with decimal?

If you want to match the numbers followed by OZ You could write the pattern as:

(\d*\.?\d+)\s*OZ\b

Explanation

  • ( Capture group 1 (the value will be picked up be str.extract)
  • \d*\.?\d+ Match optional digits, optional dot and 1+ digits
  • ) Close group 1
  • \s*OZ\b Match optional whitspace chars and then OZ followed by a word boundary

See a regex demo.

import pandas as pd

data= [
"tld los 16OZ",
"HSJ14 OZ",
"hqk 28.3 OZ",
"rtk .7 OZ",
"ahdd .92OZ",
"aje 0.22 OZ"
]

df = pd.DataFrame(data, columns=["Product"])
df['Numbers'] = df['Product'].str.extract(r'(\d*\.?\d+)\s*OZ\b')
print(df)

Output

        Product Numbers
0 tld los 16OZ 16
1 HSJ14 OZ 14
2 hqk 28.3 OZ 28.3
3 rtk .7 OZ .7
4 ahdd .92OZ .92
5 aje 0.22 OZ 0.22

Extract numbers only from the strings in which a keyword is mentioned

Use list comprehension with re.search and an if. Note that the second example shows that regex-based search can be quite powerful in pulling out just the patterns you want, thus I almost always prefer it to exact string match (except when performance is critical). Also, I renamed array to lst (this data structure is called list in Python, and array is some other languages).

import re

my_lst = ['STK72184 4/28/2022 50 from Exchange Balance, 50 from Earning Balance & 10 from Bonus 25000 Regular 10/20/2023 Approved 4/28/2022',
'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 5000 Regular 10/19/2023 Closed 4/27/2022',
'STK725721 4/27/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 15000 Regular 10/19/2023 Closed 4/27/2022',
'STK722222 4/26/2022 50 from Exchange Balance, 40 from Earning Balance & 10 from Bonus Balance 10000 Regular 10/18/2023 Approved 4/26/2022']

nums = [int(re.search(r'\d+000', s)[0]) for s in my_lst if re.search(r'Approved', s)]
print(nums)
# [25000, 10000]

nums = [int(re.search(r'\d+000', s)[0]) for s in my_lst if re.search(r'4/2[67]', s)]
print(nums)
# [5000, 15000, 10000]


Related Topics



Leave a reply



Submit