Regex Replace with Callback in C++11

regex replace with callback in c++11?

I wanted this kind of function and didn't like the answer "use boost". The problem with Benjamin's answer is it provides all the tokens. This means you don't know which token is a match and it doesn't let you use capture groups. This does:

// clang++ -std=c++11 -stdlib=libc++ -o test test.cpp
#include <cstdlib>
#include <iostream>
#include <string>
#include <regex>

namespace std
{

template<class BidirIt, class Traits, class CharT, class UnaryFunction>
std::basic_string<CharT> regex_replace(BidirIt first, BidirIt last,
const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
std::basic_string<CharT> s;

typename std::match_results<BidirIt>::difference_type
positionOfLastMatch = 0;
auto endOfLastMatch = first;

auto callback = [&](const std::match_results<BidirIt>& match)
{
auto positionOfThisMatch = match.position(0);
auto diff = positionOfThisMatch - positionOfLastMatch;

auto startOfThisMatch = endOfLastMatch;
std::advance(startOfThisMatch, diff);

s.append(endOfLastMatch, startOfThisMatch);
s.append(f(match));

auto lengthOfMatch = match.length(0);

positionOfLastMatch = positionOfThisMatch + lengthOfMatch;

endOfLastMatch = startOfThisMatch;
std::advance(endOfLastMatch, lengthOfMatch);
};

std::regex_iterator<BidirIt> begin(first, last, re), end;
std::for_each(begin, end, callback);

s.append(endOfLastMatch, last);

return s;
}

template<class Traits, class CharT, class UnaryFunction>
std::string regex_replace(const std::string& s,
const std::basic_regex<CharT,Traits>& re, UnaryFunction f)
{
return regex_replace(s.cbegin(), s.cend(), re, f);
}

} // namespace std

using namespace std;

std::string my_callback(const std::smatch& m) {
int int_m = atoi(m.str(0).c_str());
return std::to_string(int_m + 1);
}

int main(int argc, char *argv[])
{
cout << regex_replace("my values are 9, 19", regex("\\d+"),
my_callback) << endl;

cout << regex_replace("my values are 9, 19", regex("\\d+"),
[](const std::smatch& m){
int int_m = atoi(m.str(0).c_str());
return std::to_string(int_m + 1);
}
) << endl;

return 0;
}

C++ Regex Replace One by One

It will not work as you planned. The regex will replace everything. Or you could use backreferences ($1, $2, $3 and so on), if the number of patterns in the string are known.

You will also have the difficulty with counting. The replace string will be created once and the counter will always have the same value.

So we need to use a different approach using std::regex_search.. We search for our pattern, then take the prefix and add the new g().

Then we continue the operation in a loop with the suffix.

And that's it.

See the following example:

#include<iostream>
#include<regex>
#include<string>

void renameGates(std::string& qt_prims, int& gate_id){
// Define a std::regex to search for g then some digits and brackezs
std::regex re("g\\d+\\(\\)");

// Here we will receive the submatches
std::smatch sm{};

// Make a local copy
std::string tmp{qt_prims};

// Reset resulting value
qt_prims.clear();

// Search all g-numbers
while (std::regex_search(tmp, sm, re)) {
// Build resulting string
qt_prims = qt_prims + std::string(sm.prefix()) + "g" + std::to_string(gate_id++) + "()";
// Continue to search with the rest of the string
tmp = sm.suffix();
}
// If there is still a suffix, add it
qt_prims += sm.suffix();

// Debug output
std::cout << qt_prims << "\n";

}
int main(){
std::string str="g500() g600() g200()\n g1()";
int x=0;
renameGates(str,x);
}

C++ regex: Conditional replace

Use regex_token_iterator

#include <regex>
#include <string>
#include <sstream>
#include <set>
#include <map>

std::string replacer(std::string text) {
std::string output_text;
std::set<std::string> keywords = { "foo", "bar" };
std::map<std::string, int> ids = {};

int counter = 0;
auto callback = [&](std::string const& m){
std::istringstream iss(m);
std::string n;
if (iss >> n)
{
if (keywords.find(m) != keywords.end()) {
output_text += m + " ";
}
else {
if (ids.find(m) != ids.end()) {
output_text += "ID" + std::to_string(ids[m]) + " ";
}
else {
// not found
ids[m] = counter;
output_text += "ID" + std::to_string(counter++) + " ";
}
}
}
else
{
output_text += m;
}
};

std::regex re("\\b\\w*\\b");
std::sregex_token_iterator
begin(text.begin(), text.end(), re, { -1, 0 }),
end;
std::for_each(begin, end, callback);
return output_text;
}

Conditionally replace regex matches in string

The c++ (0x, 11, tr1) regular expressions do not really work (stackoverflow) in every case (look up the phrase regex on this page for gcc), so it is better to use boost for a while.

You may try if your compiler supports the regular expressions needed:

#include <string>
#include <iostream>
#include <regex>

using namespace std;

int main(int argc, char * argv[]) {
string test = "test replacing \"these characters\"";
regex reg("[^\\w]+");
test = regex_replace(test, reg, "_");
cout << test << endl;
}

The above works in Visual Studio 2012Rc.

Edit 1: To replace by two different strings in one pass (depending on the match), I'd think this won't work here. In Perl, this could easily be done within evaluated replacement expressions (/e switch).

Therefore, you'll need two passes, as you already suspected:

 ...
string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+"), "_");
test = regex_replace(test, regex("\\W+"), "");
...

Edit 2:

If it would be possible to use a callback function tr() in regex_replace, then you could modify the substitution there, like:

 string output = regex_replace(test, regex("\\s+|\\W+"), tr);

with tr() doing the replacement work:

 string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }

the problem would have been solved. Unfortunately, there's no such overload in some C++11 regex implementations, but Boost has one. The following would work with boost and use one pass:

...
#include <boost/regex.hpp>
using namespace boost;
...
string tr(const smatch &m) { return m[0].str()[0] == ' ' ? "_" : ""; }
...

string test = "test replacing \"these characters\"";
test = regex_replace(test, regex("\\s+|\\W+"), tr); // <= works in Boost
...

Maybe some day this will work with C++11 or whatever number comes next.

Regards

rbo

C++11 regex replace

regex_replace of C++11 regular expressions does not have the capability you are asking for — the replacement format argument must be a string. Some regular expression APIs allow replacement to be a function that receives a match, and which could perform exactly the substitution you need.

But regexps are not the only way to solve a problem, and in C++ it's not exactly hard to look for two fixed strings and replace characters inbetween:

const char* const PREFIX = "<SensitiveData>";
const char* const SUFFIX = "</SensitiveData>";

void replace_sensitive(std::string& xml) {
size_t start = 0;
while (true) {
size_t pref, suff;
if ((pref = xml.find(PREFIX, start)) == std::string::npos)
break;
if ((suff = xml.find(SUFFIX, pref + strlen(PREFIX))) == std::string::npos)
break;
// replace stuff between prefix and suffix with '.'
for (size_t i = pref + strlen(PREFIX); i < suff; i++)
xml[i] = '.';
start = suff + strlen(SUFFIX);
}
}

Regular expression replace with callback

You may like regex-applicative, which offers:

match :: RE Char String -> String -> Maybe String

You can replace specific parts of the match in the code that builds the value of type RE Char String. For example, here is a function which finds a string of a and b characters and reverses them:

import Text.Regex.Applicative
asAndBs = many (psym ( `elem` "ab"))
noAsAndBs = many (psym (`notElem` "ab"))
transformation = concat <$> sequenceA [noAsAndBs, reverse <$> asAndBs, noAsAndBs]

Some example runs in ghci:

> match transformation "ntoheuuaaababbboenuth"
Just "ntoheuubbbabaaaoenuth"
> match transformation "aoesnuthaosneut"
Nothing

To handle your updated question: here is a transformation which looks for a string of a and b characters and asks the user what to replace them with. It reuses asAndBs and noAsAndBs from before, only modifying the transformation applied to them. I also include an example driver of queryTransform just to show how it might be used. The basic idea is to build up, rather than a flat replacement string, an IO action which produces the replacement string. It is then the job of the consumer who calls match to execute that IO action as appropriate.

import Data.Functor.Compose
queryTransform = getCompose . (concat <$>) . sequenceA . map Compose $
[ pure <$> noAsAndBs
, getLine <$ asAndBs
, pure <$> noAsAndBs
]
runQueryTransform = getLine >>= sequenceA . match queryTransform

I hope you recognize the parallels between the queryTransform structure and the transformation structure from before (in particular note that the (concat <$>) . sequenceA construct is just like before). Here's some examples in ghci:

> runQueryTransform
oeunthaaabbbaboenuth
replacement
Just "oeunthreplacementoenuth"
> runQueryTransform
aoeunthaoeunth
Nothing

BOOST regex - no prototype for u32regex_replace() with callback function

I have to answer my own question with a work around.

Kind of figured no callback for u32regex_replace was available since I couldn't find it.

After looking at the normal regex_replace and the u32 stuff in icu.cpp,

Its apparent, he uses regex_iterator for just about everything.

So, this almost would duplicate a regex_replace with a Formatter fmt

functor.

 // Ficticous formatter string.
std::wstring sReplace = _T( "$1$2" );

// Callback Functor.
std::wstring Callback( const boost::wsmatch m )
{
// Do stuff here, thats why its a callbck !!
return m.format( sReplace );
}

// ------------------------------------------

// Ficticous test regex
boost::u32regex Regex = make_u32regex( _T("(?<=(\\w))(?=(\\w))") ));

// Create a u32regex_iterator via make_u32regex_iterator
//
boost::u32regex_iterator<std::wstring::const_iterator>
i(boost::make_u32regex_iterator( sInput, Regex)), j;

// Ficticous input string.
std::wstring sInput = _T( "This is a sentence"" );

// Maintain a last iterator to use for the ending.
std::wstring::const_iterator last = sInput.begin();

// Clear the output string
sOutput = _T("");

// Do global replace with callback.
while(i != j)
{
sOutput.append( (*i).prefix() ); // append last match to here
sOutput.append( Callback( (*i) ) ) ; // append callback string
last = (*i)[0].second; // save to 'last' the end of this match
++i;
}

// Append any trailing text.
sOutput.append( last, stext.end() );

Find $number and then replace it with $number+1?

Consider the following approach

#include <iostream>
#include <string>
#include <vector>
#include <regex>
using std::string;
using std::regex;
using std::sregex_token_iterator;
using std::cout;
using std::endl;
using std::vector;


int main()
{
regex re("(\\$[0-9]+)");
string s = "!$@#$34$1%^&$5*$1$!%$91$12@$3";
sregex_token_iterator it1(s.begin(), s.end(), re);
sregex_token_iterator it2(s.begin(), s.end(), re, -1);
sregex_token_iterator reg_end;
vector<string> vec;
string new_str;
cout << s << endl;
for (; it1 != reg_end; ++it1){
string temp;
temp = "$" + std::to_string(std::stoi(it1->str().substr(1)) + 1);
vec.push_back(temp);
}
int i(0);
for (; it2 != reg_end; ++it2)
new_str += it2->str() + vec[i++];

cout << new_str << endl;

}

The result is

!$@#$34$1%^&$5*$1$!%$91$12@$3
!$@#$35$2%^&$6*$2$!%$92$13@$4


Related Topics



Leave a reply



Submit