Boost C++ Regex - How to Get Multiple Matches

Boost C++ regex - how to get multiple matches

You can use the boost::sregex_token_iterator like in this short example:

#include <boost/regex.hpp>
#include <iostream>
#include <string>

int main() {
std::string text("abc abd");
boost::regex regex("ab.");

boost::sregex_token_iterator iter(text.begin(), text.end(), regex, 0);
boost::sregex_token_iterator end;

for( ; iter != end; ++iter ) {
std::cout<<*iter<<'\n';
}

return 0;
}

The output from this program is:

abc
abd

Boost C++ regex - how to return all matches

The problem is that * is greedy. Change to using the non-greedy version (note the ?):

int main(int ac,char* av[])
{
std::string strTotal("SolutionAN ANANANA SolutionBN");
boost::regex regex("Solu(.*?)N");

boost::sregex_token_iterator iter(strTotal.begin(), strTotal.end(), regex, 0);
boost::sregex_token_iterator end;

for( ; iter != end; ++iter ) {
std::cout<<*iter<<std::endl;
}
}

Why doesn't Boost.Regex find multiple matches in one string?

You're using the wrong thing -- regex_match is intended to check whether a (single) regex matches the entirety of a sequence of characters. As such, you need to either specify a regex that matches the whole input, or use something else. For your situation, it probably makes the most sense to just modify the regex as you've already done (group it and add a Kleene star). If you wanted to iterate over the individual terms of the polynomial, you'd probably want to use something like a regex_token_iterator.

Edit: Of course, since you're embedding this into C++, you also have to double all your backslashes. Looking at it, I'm also a little confused about the regex you're using -- it doesn't look to me like it should really work quite right. Just for example, it seems to require a "+", "-" or "^" at the beginning of a term, but the first term won't normally have that. I'm also somewhat uncertain why there would be a "^" at the beginning of a term. Since the exponent is normally omitted when it's zero, it's probably better to allow it to be omitted. Taking those into account, I get something like: "[-+]?(\d*)x(\^([0-9])*)".

Incorporating that into some code, we can get something like this:

#include <iterator>
#include <regex>
#include <string>
#include <iostream>

int main() {

std::string poly = "4x^2+3x^1+2x";

std::tr1::regex term("[-+]?(\\d*)x(\\^[0-9])*");

std::copy(std::tr1::sregex_token_iterator(poly.begin(), poly.end(), term),
std::tr1::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}

At least for me, that prints out each term individually:

4x^2

+3x^1

+2x

Note that for the moment, I've just printed out each complete term, and modified your input to show off the ability to recognize a term that doesn't include a power (explicitly, anyway).

Edit: to collect the results into a vector instead of sending them to std::cout, you'd do something like this:

#include <iterator>
#include <regex>
#include <string>
#include <iostream>

int main() {
std::string poly = "4x^2+3x^1+2x";

std::tr1::regex term("[-+]?(\\d*)x(\\^[0-9])*");
std::vector<std::string> terms;

std::copy(std::tr1::sregex_token_iterator(poly.begin(), poly.end(), term),
std::tr1::sregex_token_iterator(),
std::back_inserter(terms));

// Now terms[0] is the first term, terms[1] the second, and so on.

return 0;
}

Boost C regex - how to return all matches

You can use this regex

(?s)Solution.*?(?=Solution|$)

.*? would match 0 to many characters lazily i.e it eats as less as possible.
Without ? it would become greedy and eat as much as possible.

x(?=yz) is a lookahead which matches x only if it is followed by yz

$ is the end of string


. by default won't match newline character..You should use (?s) modifier within regex or use mod_s option which causes . to match newline character

How to match multiple results using std::regex

This can be done in regex of C++11.

Two methods:

  1. You can use () in regex to define your captures(sub expressions).

Like this:

    string var = "first second third forth";

const regex r("(.*) (.*) (.*) (.*)");
smatch sm;

if (regex_search(var, sm, r)) {
for (int i=1; i<sm.size(); i++) {
cout << sm[i] << endl;
}
}

See it live: http://coliru.stacked-crooked.com/a/e1447c4cff9ea3e7


  1. You can use sregex_token_iterator():

     string var = "first second third forth";

    regex wsaq_re("\\s+");
    copy( sregex_token_iterator(var.begin(), var.end(), wsaq_re, -1),
    sregex_token_iterator(),
    ostream_iterator<string>(cout, "\n"));

See it live: http://coliru.stacked-crooked.com/a/677aa6f0bb0612f0

regex - multiple matches after a specific word

You can use

(?:\G(?!^)|c).*?\Kaa

See the regex demo. Details:

  • (?:\G(?!^)|c) - either the end of the previous successful match (\G(?!^)) or (|) a c char
  • .*? - any zero or more chars other than line break chars, as few as possible
  • \K - forget the text matched so far
  • aa - an aa string.

C++ boost::regex multiples captures

  1. Firstly, ^ is symbol for the beginning of a line. Secondly, \ must be escaped. So you should fix each (^-?\d*\.?\d+) group to (-?\\d*\\.\\d+). (Probably, (-?\\d+(?:\\.\\d+)?) is better.)

  2. Your regular expression searches for the number,number,number,number pattern, not for the each number. You add only the first substring to matches and ignore others. To fix this, you can replace your expression with (-?\\d*\\.\\d+) or just add all the matches stored in what to your matches vector:

 while (boost::regex_search(start, end, what, ex))
{
for(int j = 1; j < what.size(); ++j)
{
std::string stest(what[j].first, what[j].second);
matches.push_back(stest);
}
start = what[0].second;
}

How can I access all matches of a repeated capture group, not just the last one?

This is what I've found so far:

text = "alpha beta";
string::const_iterator begin = text.begin();
string::const_iterator end = text.end();
boost::match_results<string::const_iterator> what;
while (regex_search(begin, end, what, boost::regex("([a-z]+)"))) {
cout << string(what[1].first, what[2].second-1);
begin = what[0].second;
}

And it works as expected. Maybe someone knows a better solution?



Related Topics



Leave a reply



Submit