How to extract numbers from a string in Python?
If you only want to extract only positive integers, try the following:
>>> txt = "h3110 23 cat 444.4 rabbit 11 2 dog"
>>> [int(s) for s in txt.split() if s.isdigit()]
[23, 11, 2]
I would argue that this is better than the regex example because you don't need another module and it's more readable because you don't need to parse (and learn) the regex mini-language.
This will not recognize floats, negative integers, or integers in hexadecimal format. If you can't accept these limitations, jmnas's answer below will do the trick.
How to extract numbers from a string using regular expressions?
The following pattern:
(\d+(?>\.\d+)*)\w+?(\d+)
Will match this:
AppName5.2.6dbVer44Oracle.Group
\__________/ <-- match
\___/ \/ <-- captures
Demo
And will capture the two values you're interested in in capture groups.
Use it like this:
var match = Regex.Match(input, @"(\d+(?>\.\d+)*)\w+?(\d+)");
if (match.Success)
{
var first = match.Groups[1].Value;
var second = match.Groups[2].Value;
// ...
}
Pattern explanation:
( # Start of group 1
\d+ # a series of digits
(?> # start of atomic group
\.\d+ # dot followed by digits
)* # .. 0 to n times
)
\w+? # some word characters (as few as possible)
(\d+) # a series of digits captured in group 2
Regular expression to extract number and string
You are using re.match, which tries to match the pattern at the beginning (ie from the first character) of your string.
Here, "initial-string/" prevents it from matching.
You can either include "initial-string/" in your pattern, or use re.search which will match starting at any position in your string.
Note that it's also better to use raw strings (r'my string with \backslahes') to avoid the potential need for escaping in your pattern.
string = 'initial-string/fixed-string-124-jeff-thompson'
result = re.search(r'fixed-string-([0-9]*)-(.*)', str)
result.groups()
# ('124', 'jeff-thompson')
or
result = re.match(r'initial-string/fixed-string-([0-9]*)-(.*)', str)
result.groups()
# ('124', 'jeff-thompson')
Extract string and number from a string which is in multiple format using regex in python?
I would use:
inp = "some text hello-21234-a-12345.tgz some more text"
parts = re.findall(r'\b([^\s-]+(?:-[^-]+)*)-(\d+)(?:-[^-]+)*\.\w+\b', inp)
print("FolderName: " + parts[0][0])
print("Version: " + parts[0][1])
This prints:
FolderName: hello-21234-a
Version: 12345
Retrieve number from the string pattern using regular expression
I'm not sure on the syntax in Ruby, but the regular expression would be "(\d+)" meaning a string of digits of size 1 or more. You can try it out here: http://www.rubular.com/
Updated:
I believe the syntax is /(\d+)/.match(your_string)
Using regular expression to extract number
There can be two approaches: one is more readable with splitting the string first and then getting the first item that matches your required pattern, or a less readable approach with a single regex.
See the Python demo:
import re
s = 'Total revenue for 201603 is 3000 €'
rx = re.compile(r'^(?=\d+(?:[_-]\d+)?$)[\d_-]{6,7}$')
res = [x for x in s.split() if rx.search(x)]
if len(res):
print(res[0])
# Pure regex approach:
rx = re.compile(r'(?<!\S)(?=\d+(?:[_-]\d+)?(?!\S))[\d_-]{6,7}(?!\S)')
res = rx.search(s)
if res:
print(res.group())
So, in the first approach, the string is split with whitespaces, and a ^(?=\d+(?:[_-]\d+)?$)[\d_-]{6,7}$
pattern is applied to each item, and if there are any matches, the first one is returned. The pattern matches:
^
- start of string(?=\d+(?:[_-]\d+)?$)
- a positive lookahead that makes sure there is 1+ digits, then_
or-
, and then again 1+ digits up to the end of string,[\d_-]{6,7}
- matches 6 or 7 digits,-
or_
$
- end of string.
The second approach involves regex only and the ^
anchor is replced with (?<!\S)
and $
is replaced with (?!\S)
that act as whitespace boundaries. (?<!\S)
is a negative lookbehind that requires a whitespace or start of string right before the current position and the (?!\S)
is a negative lookahead that requires a whitespace or end of string right after the current position.
Extracting all numbers in a string that are surrounded by a certain pattern in R
You may use
string <- "<img src='images/stimuli/32.png' style='width:341.38790035587186px;height: 265px;'><img src='images/stimuli/36.png' style='width:341.38790035587186px;height: 265px;'>"
regmatches(string, gregexpr("images/stimuli/\\K\\d+(?=\\.png)", string, perl=TRUE))[[1]]
# => [1] "32" "36"
NOTE: If there can be anything, not just numbers, you may replace \\d+
with .*?
.
See the R demo and a regex demo.
The regmatches
with gregexpr
extract all matches found in the input.
The regex matches:
images/stimuli/
- a literal string\K
- a match reset operator discarding all text matched so far\d+
- 1+ digits(?=\.png)
- a.png
substring (.
is a special character, it needs escaping).
Use regular expression to extract numbers before specific words
Code
import re
units = '|'.join(["hours", "hour", "hrs", "days", "day", "minutes", "minute", "min"]) # possible units
number = '\d+[.,]?\d*' # pattern for number
plus_minus = '\+\/\-' # plus minus
cases = fr'({number})(?:[\s\d\-\+\/]*)(?:{units})'
pattern = re.compile(cases)
Tests
print(pattern.findall('2 Approximately 5.1 hours 100 ays 1 s'))
# Output: [5.1]
print(pattern.findall('2 Approximately 10.2 +/- 30hours'))
# Output: ['10.2']
print(pattern.findall('The mean half-life for Cetuximab is 114 hours (range 75-188 hours).'))
# Output: ['114', '75']
print(pattern.findall('102 +/- 30 hours in individuals with rheumatoid arthritis and 68 hours in healthy adults.'))
# Output: ['102', '68']
print(pattern.findall("102 +/- 30 hrs"))
# Output: ['102']
print(pattern.findall("102-130 hrs"))
# Output: ['102']
print(pattern.findall("102hrs"))
# Output: ['102']
print(pattern.findall("102 hours"))
# Output: ['102']
Explanation
Above uses the convenience that raw strings (r'...') and string interpolation f'...' can be combined to:
fr'...'
per PEP 498
The cases strings:
fr'({number})(?:[\s\d\-\+\/]*)(?:{units})'
Parts are sequence:
- fr'({number})' - capturing group '(\d+[.,]?\d*)' for integers or floats
- r'(?:[\s\d-+/]*)' - non capturing group for allowable characters between number and units (i.e. space, +, -, digit, /)
- fr'(?:{units})' - non-capturing group for units
Get numbers from string with regex
Try this:
(\d+)
What language are you using to parse these strings?
If you let me know I can help you with the code you would need to use this regular expression.
Find numbers after specific text in a string with RegEx
Try this expression:
"Error importing row no\. (\d+):"
DEMO
Here you need to understand the quantifiers and escaped sequences:
.
any character; as you want only numbers, use\d
; if you meant the period character you must escape it with a backslash (\.
)?
Zero or one character; this isn't what do you want, as you can here an error on line 10 and would take only the "1"+
One or many; this will suffice for us*
Any character count; you must take care when using this with.*
as it can consume your entire input
Related Topics
Error Install Rubyracer with Error "Invalid Gem: Package Is Corrupt"
How to Remove Duplicates in a Hash in Ruby on Rails
How to Fix an Accidental 'Sudo Bundle Install Dir_Name'
Using Compass from Ruby (Not Shell)
Ruby on Rails Multiple Http Request at the Same Time
Cannot Install Any Version of Ruby on Mojave - Internal Ranlib Command Failed
How to Inspect What Is the Default Value for Optional Parameter in Ruby's Method
Get Chromes Console Log via Ruby Webdriver
Make Rails Ignore Daylight Saving Time When Displaying a Date
Rails 4 How to Call Accessible_Attributes from Model
How to Store an Instance Variable Across Multiple Actions in a Controller
Slicing Params Hash for Specific Values
Rails 5 Db Migration: How to Fix Activerecord::Concurrentmigrationerror
How to Give a Sub-Module the Same Name as a Top-Level Class
Why am I Getting This Passenger Error Could Not Find Rake-0.9.2.2 in Any of the Sources