C++11 Regex Matching
See gcc's stdc++11 implementation status page -- regexes are not supported as of gcc 4.8
Edit for posterity: As mentioned in the comments, the regex library is now in libstdc++
and should be in gcc 4.9 and on.
C++11 regex matching a full word that does not end with a period?
You need to make sure the word is followed with a word boundary:
std::regex rex(R"(\w+\b(?!\.))");
See the regex demo
Otherwise, backtracking occurs and you find jo
in joe.
with your pattern.
I also advise to use raw string literals when defining a regex, you get rid of excessive backslashes this way.
Regex grouping matches with C++ 11 regex library
Your regular expression is incorrect because neither capture group does what you want. The first is looking to match a single character from the set [a-zA-Z0-9]
followed by <space>:
, which works for single character usernames, but nothing else. The second capture group will always be empty because you're looking for zero or more characters, but also specifying the match should not be greedy, which means a zero character match is a valid result.
Fixing both of these your regex
becomes
std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)");
But simply instantiating a regex
and a match_results
object does not produce matches, you need to apply a regex
algorithm. Since you only want to match part of the input string the appropriate algorithm to use in this case is regex_search
.
std::regex_search(s, matches, rgx);
Putting it all together
std::string s{R"(
tХB:Username!Username@Username.tcc.domain.com Connected
tХB:Username!Username@Username.tcc.domain.com WEBMSG #Username :this is a message
tХB:Username!Username@Username.tcc.domain.com Status: visible
)"};
std::regex rgx("WEBMSG #([a-zA-Z0-9]+) :(.*)");
std::smatch matches;
if(std::regex_search(s, matches, rgx)) {
std::cout << "Match found\n";
for (size_t i = 0; i < matches.size(); ++i) {
std::cout << i << ": '" << matches[i].str() << "'\n";
}
} else {
std::cout << "Match not found\n";
}
Live demo
How to match a sequence of whitespaces with c++11 regex
Just turn \s*
to \s+
in your regex because \s*
matches an empty string also(ie, \s*
matches zero or more spaces) also and you don't need to have a capturing group.
matching text ranges with C++11 regexes
Based on the rules described in [re.grammar], we have:
— During matching of a regular expression finite state machine against a sequence of characters, two
charactersc
andd
are compared using the following rules:
1. if(flags() & regex_constants::icase)
the two characters are equal iftraits_inst.translate_nocase(c) == traits_inst.translate_nocase(d)
;
2. otherwise, ifflags() & regex_constants::collate
the two characters are equal iftraits_inst.translate(c) == traits_inst.translate(d);
3. otherwise, the two characters are equal ifc == d
.
This applies to your pattern2
, we're matching a sequence of characters and we have flags() & icase
, so we do a nocase comparison. Since each character in the sequence matches, it "works".
However, with pattern
, we don't have a sequence of characters. So we instead use this rule:
— During matching of a regular expression finite state machine against a sequence of characters, comparison
of a collating element rangec1-c2
against a characterc
is conducted as follows: ifflags() & regex_constants::collate
is false then the characterc
is matched ifc1 <= c && c <= c2
, otherwise
c
is matched in accordance with the following algorithm:string_type str1 = string_type(1,
flags() & icase ?
traits_inst.translate_nocase(c1) : traits_inst.translate(c1);
string_type str2 = string_type(1,
flags() & icase ?
traits_inst.translate_nocase(c2) : traits_inst.translate(c2);
string_type str = string_type(1,
flags() & icase ?
traits_inst.translate_nocase(c) : traits_inst.translate(c);
return traits_inst.transform(str1.begin(), str1.end())
<= traits_inst.transform(str.begin(), str.end())
&& traits_inst.transform(str.begin(), str.end())
<= traits_inst.transform(str2.begin(), str2.end());
Since you don't have collate
set, the character is matched literally for the range a-z
. There is no accounting for icase
here, that is why it "doesn't work." If you provide collate
however:
std::regex pattern("[a-z]+",
std::regex_constants::icase | std::regex_constants::collate);
Then we use the algorithm described, which will do a no-case comparison, and the result will be "works". Both compilers are correct - though I find the expected behavior confusing in this case.
How to handle or avoid exceptions from C++11 regex matching functions (§28.11)?
C++11 §28.6 states
The class
regex_error
defines the type of objects thrown as exceptions
to report errors from the regular expression library.
Which means that the <regex>
library should not throw anything else by itself. You are correct that constructing a regex_error
which inherits from runtime_error
may throw bad_alloc
during construction due to out-of-memory conditions, therefore you must also check for this in your error handling code. Unfortunately this makes it impossible to determine which regex_error
construction actually throws bad_alloc
.
For regular expressions algorithms in §28.11 it is stated in §28.11.1 that
The algorithms described in this subclause may throw an exception of type
regex_error
. If such an exceptione
is thrown,e.code()
shall return eitherregex_constants::error_complexity
orregex_-constants::error_stack
.
This means that if the functions in §28.11 ever throw a regex_error
, it shall hold one of these codes and nothing else. However, note also that things you pass to the <regex>
library, such as allocators etc might also throw, e.g. the allocator of match_results
which may trigger if results are added to the given match_results
container. Also note that §28.11 has shorthand functions which "as if" construct match_results
, such as
template <class BidirectionalIterator, class charT, class traits>
bool regex_match(BidirectionalIterator first, BidirectionalIterator last,
const basic_regex<charT, traits> & e,
regex_constants::match_flag_type flags =
regex_constants::match_default);
template <class BidirectionalIterator, class charT, class traits>
bool regex_search(BidirectionalIterator first, BidirectionalIterator last,
const basic_regex<charT, traits> & e,
regex_constants::match_flag_type flags =
regex_constants::match_default);
and possibly others. Since such might construct and use match_results
with the standard allocator
internally, they might throw anything std::allocator
throws. Therefore your simple example of regex_match(anyString, regex("."))
might also throw due to construction and usage of the default allocator.
Another caveat to note that for some <regex>
functions and classes it is currently impossible to determine whether a bad_alloc
was thrown by some allocator or during construction of a regex_error
exception.
In general, if you need something with a better exception specifications avoid using <regex>
. If you require simple pattern matching you're better off rolling your own safe match/search/replace functions, because it is impossible to constrain your regular expressions to avoid these exceptions in a portable nor forwards-compatible manner, even using an empty regular expression ""
might give you an exception.
PS: Note that the C++11 standard is rather poorly written in some aspects, lacking complete cross referencing. E.g. there's no explicit notice under the clauses for the methods of match_results
to throw anything, whereas §28.10.1.1 states (emphasis mine):
In all
match_results
constructors, a copy of theAllocator
argument shall be used for any memory allocation performed by the constructor or member functions during the lifetime of the object.
So take care when browsing the standards like a lawyer! ;-)
C++11: Safe practice with regex of two possible number of matches
m.size()
will always be the number of marked subexpressions in your expression plus 1 (for the whole expression).
In your code you have 4 marked subexpressions, whether these are matched or not has no effect on the size of m
.
If you want to now if there are milliseconds, you can check:
m[4].matched
C++ regex match, not matching
The regex_match fails when the string doesnt match EXACTLY the pattern. Note that the brd ff:ff:ff:ff:ff:ff
part of the string isnt being matched. All you need to do, then, is to append a .* to the pattern:
^\\d{1}:\\s+(\\w+).*?link\\/ether\\s{1}([a-z0-9:]+).*
Also, for that example, the loop isnt necessary. You can use:
if (std::regex_match(line, pieces, interface_address)) {
std::string name = pieces[1];
std::string address = pieces[2];
std::cout << name << address << std::endl;
}
Related Topics
Why, Really, Deleting an Incomplete Type Is Undefined Behaviour
How to Find the Current System Timezone
Why Does a Push_Back on an Std::List Change a Reverse Iterator Initialized with Rbegin
When Extending a Padded Struct, Why Can't Extra Fields Be Placed in the Tail Padding
Can't Modify Char* - Memory Access Violation
Can a Single Member of a Class Template Be Partially Specialized
Should I Inherit from Std::Exception
Where Are the Man Pages for C++
Difference Between 'Strcpy' and 'Strcpy_S'
Constexpr Function Parameters as Template Arguments
Thread Safety of Std::Map for Read-Only Operations
Detect Gcc Compile-Time Flags of a Binary
Arithmetic Right Shift Gives Bogus Result
C++ Delete Pointer Issue, Can Still Access Data