C++ - Split String by Regex

C++ - Split string by regex

You don't need to use regular expressions if you just want to split a string by multiple spaces. Writing your own regex library is overkill for something that simple.

The answer you linked to in your comments, Split a string in C++?, can easily be changed so that it doesn't include any empty elements if there are multiple spaces.

std::vector<std::string> &split(const std::string &s, char delim,std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
if (item.length() > 0) {
elems.push_back(item);
}
}
return elems;
}

std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}

By checking that item.length() > 0 before pushing item on to the elems vector you will no longer get extra elements if your input contains multiple delimiters (spaces in your case)

How to Split String With Regular Expression ios objectivec

You may first replace all the matches with a non-used symbol, say, with \x00 null char, and then split with it:

NSError *error = nil;
NSString *str = @"LR00001";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(?<=\\D)(?=\\d)" options:nil error:&error];
NSString *modifiedString = [regex stringByReplacingMatchesInString:str options:0 range:NSMakeRange(0, [str length]) withTemplate:@"\x00"];
NSArray *chunks = [modifiedString componentsSeparatedByString: @"\x00"];
for(NSString *c in chunks) {
NSLog(@"%@", c);
}

It prints LR and 00001.

See the online demo.

Split string by regex in VC++

C++11 standard has std::regex. It also included in TR1 for Visual Studio 2010. Actually TR1 is available since VS2008, it's hidden under std::tr1 namespace. So you don't need Boost.Regex for VS2008 or later.

Splitting can be performed using regex_token_iterator:

#include <iostream>
#include <string>
#include <regex>

const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separator("-");
const std::tr1::sregex_token_iterator endOfSequence;

std::tr1::sregex_token_iterator token(s.begin(), s.end(), separator, -1);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}

if you need to get also the separator itself, you could obtain it from sub_match object pointed by token, it is pair containing start and end iterators of token.

while(token != endOfSequence) 
{
const std::tr1::sregex_token_iterator::value_type& subMatch = *token;
if(subMatch.first != s.begin())
{
const char sep = *(subMatch.first - 1);
std::cout << "Separator: " << sep << std::endl;
}

std::cout << *token++ << std::endl;
}

This is sample for case when you have single char separator. If separator itself can be any substring you need to do some more complex iterator work and possible store previous token submatch object.

Or you can use regex groups and place separators in first group and the real token in second:

const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separatorAndStr("(-*)([^-]*)");
const std::tr1::sregex_token_iterator endOfSequence;

// Separators will be 0th, 2th, 4th... tokens
// Real tokens will be 1th, 3th, 5th... tokens
int subMatches[] = { 1, 2 };
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separatorAndStr, subMatches);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}

Not sure it is 100% correct, but just to illustrate the idea.

Split string with regular expression - objective-C

NSString *expression = @"Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)";
NSRegularExpression *testExpression = [NSRegularExpression regularExpressionWithPattern:@"(.+)/([0-9\\.]+) \\(([^)]*).*"
options:NSRegularExpressionCaseInsensitive error:nil];
NSArray *matches = [testExpression matchesInString:expression
options:0
range:NSMakeRange(0, [expression length])];
NSLog(@"%@",matches);

NSMutableArray *array = [@[] mutableCopy];
[matches enumerateObjectsUsingBlock:^(NSTextCheckingResult *obj, NSUInteger idx, BOOL *stop) {

for (int i = 1; i< [obj numberOfRanges]; ++i) {
NSRange range = [obj rangeAtIndex:i];

NSString *string = [expression substringWithRange:range];
if ([string rangeOfString:@";"].location == NSNotFound) {
[array addObject: string];
} else {
NSArray *a = [string componentsSeparatedByString:@";"];
for (NSString *s in a) {
[array addObject: [s stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]]];
}

}

}
}];

array contains

<__NSArrayM 0x10010d540>(
Mozilla,
4.0,
compatible,
MSIE 5.0,
Windows NT,
DigExt
)

@"(.+)/([0-9\\.]+) \\(([^)]*).*"
^__^ capture group 1
^_________^ capture group 2
^ the char (
^_____^ capture group 3
  • Capture group 1 captures all printable chars till /.
  • Capture group 2 captures all numbers and dots. we must escape the dot with \\, as otherwise it would again stand for any character.
  • \\( says that a ( will follow, but as we dont enclose it it our capture groups, we just dont care much for it.
  • Capture group 3 ([^)]*) says "anything printable but not )

Now we iterate over the capture groups with their ranges. we start at index 1, as index 0 will give the range of the complete expression


([1-9.]+)

this will not match 0 and the points stands for any printable character. you want

([0-9\\.]+)

Splitting a string with regex, ignoring delimiters that occur within braces

If lookaheads are supported by the QRegExp you can check if inside braces by looking ahead at the final word boundary if there is a closing } with no opening { in between.

\band\b(?![^{]*})

See this demo at regex101

Need to be escaped as desired or try the raw string literal like @SMeyer commented.

Parse (split) a string in C++ using string delimiter (standard C++)

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

Example:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
  • The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found.

  • The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos.


If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

Complete Example

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

scott
tiger
mushroom

How to Split the string with multiple characters in C#?

Use string.Split() and pass it an array of characters to split on.

"Name_20160204_102-10002".Split(new char[] {'_', '-'});

Which gives the output:

["Name",
"20160204",
"102",
"10002"]

How can I split string by same sequence of symbols using REGEX?

You can match them with

(?:^')?(.')\1*(?:.$)?

See regex demo

The regex matches optional ' at the beginning with (?:^')?, then matches and captures any symbol other than a newline followed by a ' (with (.')), followed by itself any number of times (with \1*) and then followed by an optional any symbol but a newline at the end of the string (with (?:.$)?).

Output:

'a'a'a'a'
b'
c'c'
a'a'
d'
e'e'e'e

Splitting a string in C#

Use the Regex.Matches method instead:

string[] result =
Regex.Matches(str, @"\[.*?\]").Cast<Match>().Select(m => m.Value).ToArray();

How to split strings using regular expressions

Actually this is easy enough to just use match :

string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";
try
{
Regex regexObj = new Regex(@"(?<="")\b[a-z,]+\b(?="")|[a-z]+", RegexOptions.IgnoreCase);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
Console.WriteLine("{0}", matchResults.Value);
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
}

Output :

green
yellow,green
white
orange
blue,black

Explanation :

@"
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
"" # Match the character “""” literally
)
\b # Assert position at a word boundary
[a-z,] # Match a single character present in the list below
# A character in the range between “a” and “z”
# The character “,”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
"" # Match the character “""” literally
)
| # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"


Related Topics



Leave a reply



Submit