Standalone numbers Regex?
Using lookaround, you can restrict your capturing to only digits which are not surrounded by other digits or decimal points:
(?<![0-9.])(\d+)(?![0-9.])
Alternatively, if you want to only match stand-alone numbers (e.g. if you don't want to match the 123 in abc123def
):
(?<!\S)\d+(?!\S)
Regex for the first standalone number
Update #2
Your regex has redundant parts that you could remove them. E.g s|^([^.]+).*$|\1|
that does replace a line with itself. If you are sure there is only one number as such in your string below regex is enough otherwise check the other solutions to capture the first one:
sed -r "s/^.* ([0-9]+) .*/\1/"
Simulating lazy version (preferred way):
- POSIX ERE (using
-r
option)
This works like greedy version except it is a must if your string may have more than one occurrence of such numbers.
Regex:
([0-9]+) .*|.
Usage:
$ sed -r "s/ ([0-9]+) .*|./\1/g" <<< " 54 foo 43 "
54
- POSIX BRE
If you want to go with the oldest regex flavor still in use (POSIX BRE) then this is your choice. This works the same as above regex but written in BRE.
Regex:
\(\( \([0-9]*\) .*\)*.\)*
Usage:
$ sed "s/\(\( \([0-9]*\) .*\)*.\)*/\3/g" <<< " 54 foo 43 "
54
In lazy versions, global g
modifier should be set.
Getting standalone numbers and not numeric-related codes
The obvious way to do it is this: (?<!AC)\d+
- a bunch of digits that is not preceded by AC
. However, that fails, because it matches 0001234
, as it is preceded by 0
, and not AC
. The missing piece is that you have to assert also that it is not preceded by a digit:
(?<!AC)(?<!\d)\d+
Depending on the possible input strings, a word boundary assertion can also do a similar job:
(?<!AC)\b\d+
Your code ((?<!AC\d{8})\d+
) fails because it means "a bunch of digits not preceded by ACXXXXXXXX
(where X is a digit). AC00001234
is not preceded by AC and eight more digits, so it is a match. You could kind of fix it by asserting it after the match: \d+(?<!AC\d{8})
, but that fails for a similar reason - it will disqualify 00001234
, but it does not disqualify 0000123
, because there is no AC
and eight digits in front of its end - only seven! so you still need a boundary assertion:
\d+(?<!AC\d{8})\b
However, this is less clear than the first two solutions (and also requires you to know the length of the ACXXXXXXXX string).
Regex to identify standalone numbers
Use the Replace
method of the RegExp
object:
RE.Global = True
RE.Pattern = "\b\d+(\s|$)"
result = RE.Replace(addr, "") ' Remove all matches from string
Stata Regex for 'standalone' numbers in string
Following up on the loop suggesting from the comments, you could do something like the following:
clear
input id str40 string
1 "9884 7-test 58 - 489"
2 "67-tty 783 444"
3 "j3782 3hty"
end
gen N_words = wordcount(string) // # words in each string
qui sum N_words
global max_words = r(max) // max # words in all strings
split string, gen(part) parse(" ") // split string at space (p.s. space is the default)
gen string2 = ""
forval i = 1/$max_words {
* add in parts that contain at least one letter
replace string2 = string2 + " " + part`i' if regexm(part`i', "[a-zA-Z]") & !missing(string2)
replace string2 = part`i' if regexm(part`i', "[a-zA-Z]") & missing(string2)
}
drop part* N_words
where the result would be
. list
+----------------------------------------+
| id string string2 |
|----------------------------------------|
1. | 1 9884 7-test 58 - 489 7-test |
2. | 2 67-tty 783 444 67-tty |
3. | 3 j3782 3hty j3782 3hty |
+----------------------------------------+
Note that I have assumed that you want all words that contain at least one letter. You may need to adjust the regexm
here for your specific use case.
Regex Python Extract number
without regexp
text = ['C1412DRE, New York 2695','Direction 12','Main Street 6254 C13D']
str = ' '.join(text)
[int(s) for s in str.split() if s.isdigit()]
[2695, 12, 6254]
with regexp:
import re
re.findall(r'\b\d+\b', str)
['2695', '12', '6254']
and convert them to digits
[int(s) for s in re.findall(r'\b\d+\b', str)]
[2695, 12, 6254]
https://docs.python.org/3/library/re.html
The great playgroud where you may try your regexp with codegen: https://regex101.com/r/4kUHhq/1
Regular expression to include numbers but not others
You can try something like so: \b(?<!-)\d+(?!-)\b
. This will basically look for numbers which aren't preceded by a -
and not followed by a -
by using a negative look behind and negative look ahead.
Example here.
Note: The \b
is there to ensure that given 12-34, the expression does not match 1
(since it is not followed by a -
) and 4
(since it is not preceded by a -
).
Python regex match only if standalone
Looks like a perfect job for Negative Lookbehind and Negative Lookahead:
re.sub(r'''(?<![^\s]) [+-]?[.,;]? (\d+[.,;']?)+% (?![^\s.,;!?'"])''',
'@percent@', string, flags=re.VERBOSE)
(?<![^\s])
means "no space immediately before the current position is allowed" (add more forbidden characters if you need).
(?![^\s.,;!?'"])
means "no space, period, etc. immediately after the current position are allowed".
Demo: https://regex101.com/r/khV7MZ/1.
Related Topics
How to Get All the Possible 3 Letter Permutations
Getting Selected Value of a Combobox
How to Access Resourcedictionary in Wpf from C# Code
More Elegant Exception Handling Than Multiple Catch Blocks
Update Float Array from C++ Native Plugin
Razor Syntax Error Serializing ASP.NET Model to JSON with HTML.Raw
Change Forecolor Af a Special Word in Gridview Cell
C#/.Net Analysis Tool to Find Race Conditions/Deadlocks
C# Serialized JSON Date to Ruby
Add the Where Clause Dynamically in Entity Framework
Are Java and C# Regular Expressions Compatible
C# Generics Compared to C++ Templates
Webapi Cannot Parse Multipart/Form-Data Post
Asynchronous Controller Is Blocking Requests in ASP.NET MVC Through Jquery
How to Bind an Objective-C Static Library to Xamarin.iOS
The Call Stack Does Not Say "Where You Came From", But "Where You Are Going Next"