How to Use a Regex to Search Backwards Effectively

How to use a regex to search backwards effectively?

Java's regular expression engine cannot search backwards. In fact, the only regex engine that I know that can do that is the one in .NET.

Instead of searching backwards, iterate over all the matches in a loop (searching forward). If the match is prior to the position you want, remember it. If the match is after the position you want, exit from the loop. In pseudo code (my Java is a little rusty):

storedmatch = ""
while matcher.find {
if matcher.end < offset {
storedmatch = matcher.group()
} else {
return storedmatch
}
}

searching backwards with regex

I think you want:

(\w*)\s*,?$

Where match group one contains the first word at the end of the line.

Anchoring the expression to the end of the line basically is starting the regex from the end.

Regex to return all characters until / searching backwards

Another option is to use a positive lookbehind such as (?<=//):

>>> re.search(r'(?<=//).+(?= \" target)', 
... 'http://domain.com.uy " target').group(0)
'domain.com.uy'

Note that this will match slashes within the url itself, if that's desired:

>>> re.search(r'(?<=//).+(?= \" target)',
... 'http://example.com/path/to/whatever " target').group(0)
'example.com/path/to/whatever'

If you just wanted the bare domain, without any path or query parameters, you could use r'(?<=//)([^/]+)(/.*)?(?= \" target)' and capture group 1:

>>> re.search(r'(?<=//)([^/]+)(/.*)?(?= \" target)',
... 'http://example.com/path/to/whatever " target').groups()
('example.com', '/path/to/whatever')

Regex match anything backward till first occurrence of string

This one will do the job, it match everything between Tick\d and TO BE MATCHED is there're no Tick\d+ in between:

import re
if __name__ == "__main__":

text_str ='''0000 :TRACE|####### Tick1 ####### | file1.c:604
0001 :TRACE|log1 | file2.c:400
0002 :TRACE|log2 | file3.c:611
0003 :TRACE|####### Tick2 ####### | file1.c:604
0004 :TRACE|log3 | file2.c:498
0005 :TRACE|log4 | file3.c:676
0006 :TRACE|TO_BE_MATCHED | file4.c:555
0007 :TRACE|log5 | file5.c:676
0008 :TRACE|####### Tick3 ####### | file1.c:604"'''

regex = r"(Tick\d+((?!Tick\d+).)*TO_BE_MATCHED)"

match = re.findall(regex,str(text_str), re.DOTALL)

if(match):
print match[0][0]

Reverse regular expression search

How about:

$string =~ /.*foo(\d+)/;

Clarification:

$string =~ /.*     # Match any character zero or more times, greedily.
foo # Match 'foo'
(\d+) # Match and capture one or more digits.
/x;

The greedy "any character" match will match the first "foo"s in the string, and you'll be left just matching the last "foo".

Example:

#!perl -w

use strict;
use 5.010;

my $string = "foo1 foo2 foo3";
$string =~ /.*foo(\d+)/;
say $1;

Output:

% perl regex.pl
3

C++ reverse regex_search

If you have a user provided regex that you cannot change, but you still need the rightmost match, wrap the pattern with ^.*( and ) (or [\s\S]* to match across linebreaks) and grab capture group 1 contents:

"^.*(:.*)"

See the regex demo

The thing is that the above pattern matches

  • ^ - the start of string
  • .* - matches any 0+ characters other than linebreak characters (if you use [\s\S]*, all chars will be matched) as many as possible (because * is a greedy quantifier)
  • (:.*) - a capturing group that matches : and then any 0+ characters other than linebreak characters.

Note that the first .* will actually grab as many chars as possible, up to the end of the line (and in most cases, it is the end of the string if there are no linebreaks). Then backtracking occurs, the regex engine will start trying to accommodate text for the subsequent subpatterns (here, it will be the user pattern). Thus, the user subpattern that will get captured will be at the rightmost position.

An example (basic) C++ program showing how this can work:

#include <regex>
#include <string>
#include <iostream>
using namespace std;

int main() {
string user_pattern(":.*");
string s("test:55:last");
regex r("^.*(" + user_pattern + ")");
smatch matches;
if (regex_search(s, matches, r)) {
cout<<matches[1].str();
}
return 0;
}


Related Topics



Leave a reply



Submit