How can I match "anything up until this sequence of characters" in a regular expression?
You didn't specify which flavor of regex you're using, but this will
work in any of the most popular ones that can be considered "complete".
/.+?(?=abc)/
How it works
The .+?
part is the un-greedy version of .+
(one or more of
anything). When we use .+
, the engine will basically match everything.
Then, if there is something else in the regex it will go back in steps
trying to match the following part. This is the greedy behavior,
meaning as much as possible to satisfy.
When using .+?
, instead of matching all at once and going back for
other conditions (if any), the engine will match the next characters by
step until the subsequent part of the regex is matched (again if any).
This is the un-greedy, meaning match the fewest possible to
satisfy.
/.+X/ ~ "abcXabcXabcX" /.+/ ~ "abcXabcXabcX"
^^^^^^^^^^^^ ^^^^^^^^^^^^
/.+?X/ ~ "abcXabcXabcX" /.+?/ ~ "abcXabcXabcX"
^^^^ ^
Following that we have (?=
{contents}
)
, a zero width
assertion, a look around. This grouped construction matches its
contents, but does not count as characters matched (zero width). It
only returns if it is a match or not (assertion).
Thus, in other terms the regex /.+?(?=abc)/
means:
Match any characters as few as possible until a "abc" is found,
without counting the "abc".
Regular expression: Match everything after a particular word until multiple occurence of carriage return and new line
To get the line after Examination(s):
you can use
re.search(r'Examination\(s\):\s*([^\r\n]+)', text)
See the regex demo. Details:
Examination\(s\):
- a literalExamination(s):
string\s*
- zero or more whitespaces([^\r\n]+)
- Group 1: one or more chars other than CR and LF chars.
See the Python demo:
import re
texts = ["Examination(s):\r\nMathematics 2nd Paper\r\n\r\nTimeTable",
"Examination(s):\r\nMathematics 2nd Paper\r\nblahblah",
"Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks"]
for text in texts:
m = re.search(r'Examination\(s\):\s*([^\r\n]+)', text)
print(f'--- {repr(text)} ---')
if m:
print(m.group(1))
Output:
--- 'Examination(s):\r\nMathematics 2nd Paper\r\n\r\nTimeTable' ---
Mathematics 2nd Paper
--- 'Examination(s):\r\nMathematics 2nd Paper\r\nblahblah' ---
Mathematics 2nd Paper
--- 'Examination(s):\r\nMathematics 2nd Paper\r\n\r\n\r\nmarks' ---
Mathematics 2nd Paper
Regexp match everything after a word
var test = 'data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQAB';
var regex = /(?<=base64).+/;
var r = test.match(regex);
console.log(r);
get everything after a particular string in python using regular expression
If you want to use regular expression, you use re.findall
:
re.findall('(?<=com/).*$', "www.example.com/thedubaimall")
# ['thedubaimall']
Some speed tests with @DeepSpace's suggestion:
%timeit re.findall('(?<=com/).*$', "www.example.com/thedubaimall")
# The slowest run took 7.57 times longer than the fastest. This could mean that an intermediate result is being cached.
# 1000000 loops, best of 3: 1.29 µs per loop
%timeit re.findall('com/(.*)', "www.example.com/thedubaimall")
# The slowest run took 6.48 times longer than the fastest. This could mean that an intermediate result is being cached.
# 1000000 loops, best of 3: 992 ns per loop
%timeit "www.example.com/thedubaimall".partition(".com/")[2]
# The slowest run took 7.87 times longer than the fastest. This could mean that an intermediate result is being cached.
# 1000000 loops, best of 3: 204 ns per loop
Looks like @DeepSpace's suggestion is a bit faster, and @Kevin's answer is much faster.
How to match everything after a certain word
A different way to approach this is to split the string on your target word and return the first part.
my_string="New species of fish found at Arkansas http://example"
print(my_string.split("http",1)[0])
#New species of fish found at Arkansas
Using Regex selecting text match everything after a word or patterns (similar topic but text is not fix patterns except 1 character)
- Ctrl+H
- Find what:
v\d[0-9a-z]*\K.*$
- Replace with:
LEAVE EMPTY
- UNCHECK Match case
- CHECK Wrap around
- CHECK Regular expression
- UNCHECK
. matches newline
- Replace all
Explanation:
v # a "v"
\d # a digit
[0-9a-z]* # 0 or more alphanum
\K # forget all we have seen until this position
.* # 0 or more any character but newline
$ # end of line
Screenshot (before):
Screenshot (after):
regex to match a word and everything after it?
re.match
matches only at the beginning of the string. Use re.search
to match at any position. (See search()
vs. match()
)
>>> import re
>>> pat = re.compile(r'(?:/bdata:/b)?\w$')
>>> string = " dnfhndkn data: ndknfdjoj pop"
>>> res = re.search(pat,string)
>>> res
<_sre.SRE_Match object at 0x0000000002838100>
>>> res.group()
'p'
To match everything, you need to change \w
with .*
. Also remove /b
.
>>> import re
>>> pat = re.compile(r'(?:data:).*$')
>>> string = " dnfhndkn data: ndknfdjoj pop"
>>> res = re.search(pat,string)
>>> print res.group()
data: ndknfdjoj pop
Regex help match everything after given word which ends with some file extension
If you want to match either slashes or backslashes, you need to say so.
[\\/]assets[\\/].*\.(png|jpe?g|gif|js(on)?)$
The leading \.*
wasn't contributing anything useful, so I took it out.
In Javascript, you'll need to backslash-escape the literal backslash, even in a character class.
Related Topics
Replacing Blank Values (White Space) With Nan in Pandas
Python Json.Loads Shows Valueerror: Extra Data
Python + Beautifulsoup: How to Get 'Href' Attribute of 'A' Element
Bold Formatting in Python Console
How to Determine Whether a Pandas Column Contains a Particular Value
How to Merge 2 CSV Files Together by Multiple Columns in Python
Python - Having Trouble Opening a File With Spaces
How to Remove the Double Quote When the Value Is Empty in Spark
How to Split But Ignore Separators in Quoted Strings, in Python
How to Sort a Single String Output in Ascii Descending Order Through a Function
Python: Scaling Numbers Column by Column With Pandas
Datetime.Datetime Has No Attribute Datetime
Python - Split Array into Multiple Arrays
How to Uniqify a List of Dict in Python
Simple Digit Recognition Ocr in Opencv-Python
Python Creating Dictionary from Excel Data