How to Escape a String for Use in Boost Regex

How to escape a string for use in Boost Regex

. ^ $ | ( ) [ ] { } * + ? \

Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.

const boost::regex esc("[.^$|()\\[\\]{}*+?\\\\]");
const std::string rep("\\\\&");
std::string result = regex_replace(url_to_escape, esc, rep,
boost::match_default | boost::format_sed);

(The flag boost::format_sed specifies to use the replacement string format of sed. In sed, an escape & will output whatever matched by the whole expression)

Or if you are not comfortable with sed's replacement string format, just change the flag to boost::format_perl, and you can use the familiar $& to refer to whatever matched by the whole expression.

const std::string rep("\\\\$&");
std::string result = regex_replace(url_to_escape, esc, rep,
boost::match_default | boost::format_perl);

c++11/regex - search for exact string, escape

You will have to escape all special characters in the string with \. The most straightforward approach would be to use another expression to sanitize the input string before creating the expression regex.

// matches any characters that need to be escaped in RegEx
std::regex specialChars { R"([-[\]{}()*+?.,\^$|#\s])" };

std::string input = ">> "+ s1 +" <<";
std::string sanitized = std::regex_replace( input, specialChars, R"(\$&)" );

// "sanitized" can now safely be used in another expression

How to parse escape element '\' and unicode character '\u' using boost regex in C++

Try using [^u] in your first regex to match any character that is not u.

boost::regex re("\\\\[^u]");  // matches \ not followed by u
boost::regex uni("\\\\u"); // matches \u

It's probably best to use one regex expression.

boost:regex re("\\\\(u)?"); // matches \ with or without u

Then check if the partial match m[1] is 'u':

m = boost::regex_search(buf, uni)
if (m && m[1] === "u") { // pseudo-code
// unicode
}
else {
// not unicode
}

It's better to use regex for pattern matching. They seem more complex but they are actually easier to maintain once you get used to them and less bug-prone than iterating over strings one character at a time.

boost regex pattern for special characters

I would recommend to do a set excluding the character that won't pass like that for example :

[^\\!\\?]+ , then test if match

std::regex escape special characters for use in regex

File paths can contain many characters that have special meaning in regular expression patterns. Escaping just the backslashes is not enough for robust checking in the general case.

Even a simple path, like C:\Program Files (x86)\Vendor\Product\app.exe, contains several special characters. If you want to turn that into a regular expression (or part of a regular expression), you would need to escape not only the backslashes but also the parentheses and the period (dot).

Fortunately, we can solve our regular expression problem with more regular expressions:

std::string EscapeForRegularExpression(const std::string &s) {
static const std::regex metacharacters(R"([\.\^\$\-\+\(\)\[\]\{\}\|\?\*)");
return std::regex_replace(s, metacharacters, "\\$&");
}

(File paths can't contain * or ?, but I've included them to keep the function general.)

If you don't abide by the "no raw loops" guideline, a probably faster implementation would avoid regular expressions:

std::string EscapeForRegularExpression(const std::string &s) {
static const char metacharacters[] = R"(\.^$-+()[]{}|?*)";
std::string out;
out.reserve(s.size());
for (auto ch : s) {
if (std::strchr(metacharacters, ch))
out.push_back('\\');
out.push_back(ch);
}
return out;
}

Although the loop adds some clutter, this approach allows us to drop a level of escaping on the definition of metacharacters, which is a readability win over the regex version.



Related Topics



Leave a reply



Submit