Using Regex Lookbehinds in C++11

Using regex lookbehinds in C++11

C++11 <regex> uses ECMAScript's (ECMA-262) regex syntax, so it will not have look-behind (other flavors of regex that C++11 supports also don't have look-behind).

If your use case requires the use of look-behind, you may consider using Boost.Regex instead.

C++ Clang Can't Parse this Lookbehind Regex

C++11 uses ECMAScript's regular expression syntax, lookbehind is not supported.

An equivalent of the above regular expression would be the following —

\\((.*)

Note: The capturing group ( ... ) retains everything that follows an open parenthesis.

Working Demo

What is an alternative for lookbehind with C++ RegEx?

Note that (?<=<)(?<!>) is equal to (?<=<) (since a < is required immediately to the left of the current location, there cannot be any >) and (?!<)(?=>) is equal to (?=>) (same logic applies here, as > must be immediately to the right, there won't be any <). The first .*? will not match the shortest substring possible, it will literally find its way to the first q that is followed with any 0+ chars up to the first >. So, the pattern is hardly working for you even in the lookbehind-supporting engine.

I'd rather use <([^<>q]*q[^<>]*)> regex with a capturing group and literal consuming < and > symbols at the start/end of the expression:

std::regex r("<([^<>q]*q[^<>]*)>");
std::string s = "<adqsdq<><abc>5<abq>6<qaz> <hjfffffffk>";
for(std::sregex_iterator i = std::sregex_iterator(s.begin(), s.end(), r);
i != std::sregex_iterator();
++i)
{
std::cout << (*i).str(1) << srd::endl;
}

See the C++ demo

Output: abq and qaz

C++11 std::regex lookbehind alternative

You may use a non-word boundary here:

std::regex exprZero{R"(\B\([^(),]+\))"};

See the regex demo

The \B\([^(),]+\) pattern will match a ( that is either at the start of a string or right after a non-word char (a char other than a letter, digit or _), [^(),]+ will consume 1 or more chars other than (, ) and , and then \) will match a ) char.

Replacement for lookbehind in std::regex

The pattern is missing a closing ] at the end, and \w also matches \d

You might use an alternation asserting either the start of the string, or a position where \b does not match and assert not a word char to the right.

(?:^|\B)TOKEN(?!\w)

Regex demo

After the update of the question, you can write (?<=[^\w]|^)TOKEN(?=[^\w]|$) as (?<=\W|^)TOKEN(?=\W|$) or in short without the lookbehind:

\bTOKEN(?!\w)

Translate C# regex with lookbehinds to C++

You can change the regex to not use lookbehind: [A-Z](?=[A-Z][a-z])|[^A-Z](?=[A-Z])|[A-Za-z](?=[^A-Za-z]).

In the end the original regex was looking for the beginning of the new word, so it had to look behind for the end of the previous word. But we can look for the end of a word and look ahead for the beginning of the next word. Then we only have to "move" the position by +1.

const std::sregex_iterator End;

// the code doesn't handle correctly "",
// handle as a special case
std::string str = "ThisIsAPascalStringX";

std::regex rx("[A-Z](?=[A-Z][a-z])|[^A-Z](?=[A-Z])|[A-Za-z](?=[^A-Za-z])");

std::vector<std::string> pieces;

size_t lastStartPosition = 0;

for (auto i(std::sregex_iterator(str.begin(), str.end(), rx)); i != End; ++i)
{
size_t startPosition = i->position() + 1;

pieces.push_back(str.substr(lastStartPosition, startPosition - lastStartPosition));
lastStartPosition = startPosition;
}

pieces.push_back(str.substr(lastStartPosition));

std::cout << "<-- start" << std::endl;

for (auto& s : pieces)
{
std::cout << s << std::endl;
}

std::cout << "<-- end" << std::endl;

c++11 regex lookahead exclude word

Having to take of soon, I'll answer with an (in my opinion) silly solution (there must be much better ones :P).

((?!Cost)....|^.{0,3})Price

If preceded by 4 characters (atleast), make sure it isn't Cost. Alternatively, make sure there aren't more than 3 characters preceding Price.

See it here at regex101.



Related Topics



Leave a reply



Submit