Retrieve Number from the String Pattern Using Regular Expression

How to extract numbers from a string in Python?

If you only want to extract only positive integers, try the following:

>>> txt = "h3110 23 cat 444.4 rabbit 11 2 dog"
>>> [int(s) for s in txt.split() if s.isdigit()]
[23, 11, 2]

I would argue that this is better than the regex example because you don't need another module and it's more readable because you don't need to parse (and learn) the regex mini-language.

This will not recognize floats, negative integers, or integers in hexadecimal format. If you can't accept these limitations, jmnas's answer below will do the trick.

How to extract numbers from a string using regular expressions?

The following pattern:

(\d+(?>\.\d+)*)\w+?(\d+)

Will match this:

AppName5.2.6dbVer44Oracle.Group
\__________/ <-- match
\___/ \/ <-- captures

Demo

And will capture the two values you're interested in in capture groups.

Use it like this:

var match = Regex.Match(input, @"(\d+(?>\.\d+)*)\w+?(\d+)");
if (match.Success)
{
var first = match.Groups[1].Value;
var second = match.Groups[2].Value;
// ...
}

Pattern explanation:

(           # Start of group 1
\d+ # a series of digits
(?> # start of atomic group
\.\d+ # dot followed by digits
)* # .. 0 to n times
)
\w+? # some word characters (as few as possible)
(\d+) # a series of digits captured in group 2

Regular expression to extract number and string

You are using re.match, which tries to match the pattern at the beginning (ie from the first character) of your string.
Here, "initial-string/" prevents it from matching.

You can either include "initial-string/" in your pattern, or use re.search which will match starting at any position in your string.

Note that it's also better to use raw strings (r'my string with \backslahes') to avoid the potential need for escaping in your pattern.

string = 'initial-string/fixed-string-124-jeff-thompson'
result = re.search(r'fixed-string-([0-9]*)-(.*)', str)
result.groups()
# ('124', 'jeff-thompson')

or

result = re.match(r'initial-string/fixed-string-([0-9]*)-(.*)', str)
result.groups()
# ('124', 'jeff-thompson')

Extract string and number from a string which is in multiple format using regex in python?

I would use:

inp = "some text hello-21234-a-12345.tgz some more text"
parts = re.findall(r'\b([^\s-]+(?:-[^-]+)*)-(\d+)(?:-[^-]+)*\.\w+\b', inp)
print("FolderName: " + parts[0][0])
print("Version: " + parts[0][1])

This prints:

FolderName: hello-21234-a
Version: 12345

Retrieve number from the string pattern using regular expression

I'm not sure on the syntax in Ruby, but the regular expression would be "(\d+)" meaning a string of digits of size 1 or more. You can try it out here: http://www.rubular.com/

Updated:
I believe the syntax is /(\d+)/.match(your_string)

Using regular expression to extract number

There can be two approaches: one is more readable with splitting the string first and then getting the first item that matches your required pattern, or a less readable approach with a single regex.

See the Python demo:

import re
s = 'Total revenue for 201603 is 3000 €'
rx = re.compile(r'^(?=\d+(?:[_-]\d+)?$)[\d_-]{6,7}$')
res = [x for x in s.split() if rx.search(x)]
if len(res):
print(res[0])

# Pure regex approach:
rx = re.compile(r'(?<!\S)(?=\d+(?:[_-]\d+)?(?!\S))[\d_-]{6,7}(?!\S)')
res = rx.search(s)
if res:
print(res.group())

So, in the first approach, the string is split with whitespaces, and a ^(?=\d+(?:[_-]\d+)?$)[\d_-]{6,7}$ pattern is applied to each item, and if there are any matches, the first one is returned. The pattern matches:

  • ^ - start of string
  • (?=\d+(?:[_-]\d+)?$) - a positive lookahead that makes sure there is 1+ digits, then _ or -, and then again 1+ digits up to the end of string,
  • [\d_-]{6,7} - matches 6 or 7 digits, - or _
  • $ - end of string.

The second approach involves regex only and the ^ anchor is replced with (?<!\S) and $ is replaced with (?!\S) that act as whitespace boundaries. (?<!\S) is a negative lookbehind that requires a whitespace or start of string right before the current position and the (?!\S) is a negative lookahead that requires a whitespace or end of string right after the current position.

Extracting all numbers in a string that are surrounded by a certain pattern in R

You may use

string  <- "<img src='images/stimuli/32.png' style='width:341.38790035587186px;height: 265px;'><img src='images/stimuli/36.png' style='width:341.38790035587186px;height: 265px;'>"
regmatches(string, gregexpr("images/stimuli/\\K\\d+(?=\\.png)", string, perl=TRUE))[[1]]
# => [1] "32" "36"

NOTE: If there can be anything, not just numbers, you may replace \\d+ with .*?.

See the R demo and a regex demo.

The regmatches with gregexpr extract all matches found in the input.

The regex matches:

  • images/stimuli/ - a literal string
  • \K - a match reset operator discarding all text matched so far
  • \d+ - 1+ digits
  • (?=\.png) - a .png substring (. is a special character, it needs escaping).

Use regular expression to extract numbers before specific words

Code

import re
units = '|'.join(["hours", "hour", "hrs", "days", "day", "minutes", "minute", "min"]) # possible units
number = '\d+[.,]?\d*' # pattern for number
plus_minus = '\+\/\-' # plus minus

cases = fr'({number})(?:[\s\d\-\+\/]*)(?:{units})'

pattern = re.compile(cases)

Tests

print(pattern.findall('2 Approximately 5.1 hours 100 ays 1 s'))   
# Output: [5.1]

print(pattern.findall('2 Approximately 10.2 +/- 30hours'))
# Output: ['10.2']

print(pattern.findall('The mean half-life for Cetuximab is 114 hours (range 75-188 hours).'))
# Output: ['114', '75']

print(pattern.findall('102 +/- 30 hours in individuals with rheumatoid arthritis and 68 hours in healthy adults.'))
# Output: ['102', '68']

print(pattern.findall("102 +/- 30 hrs"))
# Output: ['102']

print(pattern.findall("102-130 hrs"))
# Output: ['102']

print(pattern.findall("102hrs"))
# Output: ['102']

print(pattern.findall("102 hours"))
# Output: ['102']

Explanation

Above uses the convenience that raw strings (r'...') and string interpolation f'...' can be combined to:

fr'...'

per PEP 498

The cases strings:

fr'({number})(?:[\s\d\-\+\/]*)(?:{units})'

Parts are sequence:

  • fr'({number})' - capturing group '(\d+[.,]?\d*)' for integers or floats
  • r'(?:[\s\d-+/]*)' - non capturing group for allowable characters between number and units (i.e. space, +, -, digit, /)
  • fr'(?:{units})' - non-capturing group for units

Get numbers from string with regex

Try this:

(\d+)

What language are you using to parse these strings?

If you let me know I can help you with the code you would need to use this regular expression.

Find numbers after specific text in a string with RegEx

Try this expression:

"Error importing row no\. (\d+):"

DEMO

Here you need to understand the quantifiers and escaped sequences:

  • . any character; as you want only numbers, use \d; if you meant the period character you must escape it with a backslash (\.)
  • ? Zero or one character; this isn't what do you want, as you can here an error on line 10 and would take only the "1"
  • + One or many; this will suffice for us
  • * Any character count; you must take care when using this with .* as it can consume your entire input


Related Topics



Leave a reply



Submit