C++: What Regex Library Should I Use

C++: what regex library should I use?

Thanks for all the suggestions.

I tried out a few things today, and with the stuff we're trying to do, I opted for the simplest solution where I don't have to download any other 3rd-party library. In the end, I #include <regex.h> and used the standard C POSIX calls regcomp() and regexec(). Not C++, but in a pinch this proved to be the easiest.

Does C or C++ have a standard regex library?

C++11 now finally does have a standard regex library - std::regex.

If you do not have access to a C++11 implementation, a good alternative could be boost regex. It isn't completely equivalent to std::regex (e.g. the "empty()" method is not in the std::regex) but it's a very mature regex implementation for C++ none the less.

How to use regex in C/Where to find the files?

C by itself doesn't have regex, but there are multiple libraries providing this functionality, like:

  • PCRE and PCRE2 - http://www.pcre.org/
  • libgnurx - https://github.com/TimothyGu/libgnurx
  • TRE - http://laurikari.net/tre/about/
  • sregex - https://github.com/openresty/sregex
  • slre - https://github.com/cesanta/slre
  • liblightgrep - https://github.com/strozfriedberg/liblightgrep
  • RxSpencer - https://github.com/garyhouston/rxspencer
  • RE2 - https://github.com/google/re2/
  • Oniguruma - https://github.com/kkos/oniguruma
  • Onigmo - https://github.com/k-takata/Onigmo
  • Hyperscan - https://www.hyperscan.io/

And there are probably more regex libraries out there.

I have been able to compile all of the above from source for Windows using MinGW-w64.

Most commonly used are PCRE, PCRE2, libgnurx, but Oniguruma and Hyperscan are interesting alternatives.

If you're using C++ there is also std::regex or boost::regex.

Regular expressions in C: examples?

Regular expressions actually aren't part of ANSI C. It sounds like you might be talking about the POSIX regular expression library, which comes with most (all?) *nixes. Here's an example of using POSIX regexes in C (based on this):

#include <regex.h>        
regex_t regex;
int reti;
char msgbuf[100];

/* Compile regular expression */
reti = regcomp(®ex, "^a[[:alnum:]]", 0);
if (reti) {
fprintf(stderr, "Could not compile regex\n");
exit(1);
}

/* Execute regular expression */
reti = regexec(®ex, "abc", 0, NULL, 0);
if (!reti) {
puts("Match");
}
else if (reti == REG_NOMATCH) {
puts("No match");
}
else {
regerror(reti, ®ex, msgbuf, sizeof(msgbuf));
fprintf(stderr, "Regex match failed: %s\n", msgbuf);
exit(1);
}

/* Free memory allocated to the pattern buffer by regcomp() */
regfree(®ex);

Alternatively, you may want to check out PCRE, a library for Perl-compatible regular expressions in C. The Perl syntax is pretty much that same syntax used in Java, Python, and a number of other languages. The POSIX syntax is the syntax used by grep, sed, vi, etc.

Why is std::regex notoriously much slower than other regular expression libraries?

Is that because of the C++ standard requirements or it is just that that particular implementation is not very well optimized?

The answer is yes. Kinda.

There is no question that libstdc++'s implementation of <regex> is not well optimized. But there is more to it than that. It's not that the standard requirements inhibit optimizations so much as the standard requirements inhibit changes.

The regex library is defined through a bunch of templates. This allows people to choose between char and wchar_t, which is in theory good. But there's a catch.

Template libraries are used by copy-and-pasting the code directly into the code compiling against those libraries. Because of how templates get included, even types that nobody outside of the template library knows about are effectively part of the library's ABI. If you change them, two libraries compiled against different versions of the standard library cannot work with each other. And because the template parameter for regex is its character type, those implementation details touch basically everything about the implementation.

The minute libstdc++ (and other standard library implementations) started shipping an implementation of C++ regular expressions, they bound themselves to a specific implementation that could not be changed in a way that impacted the ABI of the library. And while they could cause another ABI break to fix it, standard library implementers don't like breaking ABI because people don't upgrade to standard libraries that break their code.

When C++11 forbade basic_string copy-on-write implementations, libstdc++ had an ABI problem. Their COW string was widely used, and changing it would make code that compiled against the new one break when used with code compiled against the old one. It took years before libstdc++ bit the bullet and actually implemented C++11 strings.

If Regex had been defined without templates, implementations could use traditional mechanisms to hide implementation details. The ABI for the interface to external code could be fixed and unchanging, with only the implementation of the functions behind that ABI changing from version to version.

is it possible to use regex in c++?

In the vanilla C++ language there is no support for regular expressions. However there are several libraries available that support Regex's. Boost is a popular one.

Check out Boost's Regex implementation.

  • http://www.onlamp.com/pub/a/onlamp/2006/04/06/boostregex.html
  • http://www.boost.org/doc/libs/1_39_0/libs/regex/doc/html/boost_regex/syntax.html

POSIX-compatible regex library for Visual Studio C

The one library I've found that compiles with basically no effort, and is also the smallest, is: https://code.google.com/p/slre/. It's pretty basic but is good enough for my purposes. Thanks for the help, though.



Related Topics



Leave a reply



Submit