C++ - Split string by regex
You don't need to use regular expressions if you just want to split a string by multiple spaces. Writing your own regex library is overkill for something that simple.
The answer you linked to in your comments, Split a string in C++?, can easily be changed so that it doesn't include any empty elements if there are multiple spaces.
std::vector<std::string> &split(const std::string &s, char delim,std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
if (item.length() > 0) {
elems.push_back(item);
}
}
return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}
By checking that item.length() > 0
before pushing item
on to the elems
vector you will no longer get extra elements if your input contains multiple delimiters (spaces in your case)
How to Split String With Regular Expression ios objectivec
You may first replace all the matches with a non-used symbol, say, with \x00
null char, and then split with it:
NSError *error = nil;
NSString *str = @"LR00001";
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"(?<=\\D)(?=\\d)" options:nil error:&error];
NSString *modifiedString = [regex stringByReplacingMatchesInString:str options:0 range:NSMakeRange(0, [str length]) withTemplate:@"\x00"];
NSArray *chunks = [modifiedString componentsSeparatedByString: @"\x00"];
for(NSString *c in chunks) {
NSLog(@"%@", c);
}
It prints LR
and 00001
.
See the online demo.
Split string by regex in VC++
C++11 standard has std::regex
. It also included in TR1 for Visual Studio 2010
. Actually TR1 is available since VS2008, it's hidden under std::tr1
namespace. So you don't need Boost.Regex for VS2008 or later.
Splitting can be performed using regex_token_iterator
:
#include <iostream>
#include <string>
#include <regex>
const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separator("-");
const std::tr1::sregex_token_iterator endOfSequence;
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separator, -1);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}
if you need to get also the separator itself, you could obtain it from sub_match
object pointed by token
, it is pair containing start and end iterators of token.
while(token != endOfSequence)
{
const std::tr1::sregex_token_iterator::value_type& subMatch = *token;
if(subMatch.first != s.begin())
{
const char sep = *(subMatch.first - 1);
std::cout << "Separator: " << sep << std::endl;
}
std::cout << *token++ << std::endl;
}
This is sample for case when you have single char separator. If separator itself can be any substring you need to do some more complex iterator work and possible store previous token submatch object.
Or you can use regex groups and place separators in first group and the real token in second:
const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separatorAndStr("(-*)([^-]*)");
const std::tr1::sregex_token_iterator endOfSequence;
// Separators will be 0th, 2th, 4th... tokens
// Real tokens will be 1th, 3th, 5th... tokens
int subMatches[] = { 1, 2 };
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separatorAndStr, subMatches);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}
Not sure it is 100% correct, but just to illustrate the idea.
Split string with regular expression - objective-C
NSString *expression = @"Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)";
NSRegularExpression *testExpression = [NSRegularExpression regularExpressionWithPattern:@"(.+)/([0-9\\.]+) \\(([^)]*).*"
options:NSRegularExpressionCaseInsensitive error:nil];
NSArray *matches = [testExpression matchesInString:expression
options:0
range:NSMakeRange(0, [expression length])];
NSLog(@"%@",matches);
NSMutableArray *array = [@[] mutableCopy];
[matches enumerateObjectsUsingBlock:^(NSTextCheckingResult *obj, NSUInteger idx, BOOL *stop) {
for (int i = 1; i< [obj numberOfRanges]; ++i) {
NSRange range = [obj rangeAtIndex:i];
NSString *string = [expression substringWithRange:range];
if ([string rangeOfString:@";"].location == NSNotFound) {
[array addObject: string];
} else {
NSArray *a = [string componentsSeparatedByString:@";"];
for (NSString *s in a) {
[array addObject: [s stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]]];
}
}
}
}];
array
contains
<__NSArrayM 0x10010d540>(
Mozilla,
4.0,
compatible,
MSIE 5.0,
Windows NT,
DigExt
)
@"(.+)/([0-9\\.]+) \\(([^)]*).*"
^__^ capture group 1
^_________^ capture group 2
^ the char (
^_____^ capture group 3
- Capture group 1 captures all printable chars till /.
- Capture group 2 captures all numbers and dots. we must escape the dot with
\\
, as otherwise it would again stand for any character. \\(
says that a(
will follow, but as we dont enclose it it our capture groups, we just dont care much for it.- Capture group 3
([^)]*)
says "anything printable but not)
Now we iterate over the capture groups with their ranges. we start at index 1, as index 0 will give the range of the complete expression
([1-9.]+)
this will not match 0
and the points stands for any printable character. you want
([0-9\\.]+)
Splitting a string with regex, ignoring delimiters that occur within braces
If lookaheads are supported by the QRegExp you can check if inside braces by looking ahead at the final word boundary if there is a closing }
with no opening {
in between.
\band\b(?![^{]*})
See this demo at regex101
Need to be escaped as desired or try the raw string literal like @SMeyer commented.
Parse (split) a string in C++ using string delimiter (standard C++)
You can use the std::string::find()
function to find the position of your string delimiter, then use std::string::substr()
to get a token.
Example:
std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
The
find(const string& str, size_t pos = 0)
function returns the position of the first occurrence ofstr
in the string, ornpos
if the string is not found.The
substr(size_t pos = 0, size_t n = npos)
function returns a substring of the object, starting at positionpos
and of lengthnpos
.
If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());
):
s.erase(0, s.find(delimiter) + delimiter.length());
This way you can easily loop to get each token.
Complete Example
std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";
size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;
Output:
scott
tiger
mushroom
How to Split the string with multiple characters in C#?
Use string.Split()
and pass it an array of characters to split on.
"Name_20160204_102-10002".Split(new char[] {'_', '-'});
Which gives the output:
["Name",
"20160204",
"102",
"10002"]
How can I split string by same sequence of symbols using REGEX?
You can match them with
(?:^')?(.')\1*(?:.$)?
See regex demo
The regex matches optional '
at the beginning with (?:^')?
, then matches and captures any symbol other than a newline followed by a '
(with (.')
), followed by itself any number of times (with \1*
) and then followed by an optional any symbol but a newline at the end of the string (with (?:.$)?
).
Output:
'a'a'a'a'
b'
c'c'
a'a'
d'
e'e'e'e
Splitting a string in C#
Use the Regex.Matches
method instead:
string[] result =
Regex.Matches(str, @"\[.*?\]").Cast<Match>().Select(m => m.Value).ToArray();
How to split strings using regular expressions
Actually this is easy enough to just use match :
string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";
try
{
Regex regexObj = new Regex(@"(?<="")\b[a-z,]+\b(?="")|[a-z]+", RegexOptions.IgnoreCase);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
Console.WriteLine("{0}", matchResults.Value);
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
}
Output :
green
yellow,green
white
orange
blue,black
Explanation :
@"
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
"" # Match the character “""” literally
)
\b # Assert position at a word boundary
[a-z,] # Match a single character present in the list below
# A character in the range between “a” and “z”
# The character “,”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
"" # Match the character “""” literally
)
| # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
Related Topics
Vector: Initialization or Reserve
How to Vectorize My Loop with G++
Reference to Non-Static Member Function Must Be Called
How to Create a Thread-Safe Singleton Pattern in Windows
How to Simulate Interfaces in C++
Making Qlabel Behave Like a Hyperlink
Implementing the Derivative in C/C++
Aligned_Storage and Strict Aliasing
Sign Changes When Going from Int to Float and Back
Are Mutex Lock Functions Sufficient Without Volatile
Does an Unused Member Variable Take Up Memory