Split String Based on Regex

How to split strings using regular expressions

Actually this is easy enough to just use match :

string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";
try
{
Regex regexObj = new Regex(@"(?<="")\b[a-z,]+\b(?="")|[a-z]+", RegexOptions.IgnoreCase);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
Console.WriteLine("{0}", matchResults.Value);
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
}

Output :

green
yellow,green
white
orange
blue,black

Explanation :

@"
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
"" # Match the character “""” literally
)
\b # Assert position at a word boundary
[a-z,] # Match a single character present in the list below
# A character in the range between “a” and “z”
# The character “,”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
"" # Match the character “""” literally
)
| # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"

Split String based on multiple Regex matches

The matching changes because:

  • In the first part, you call .group().split() where .group() returns the full match which is a string.

  • In the second part, you call re.compile("...").split() where re.compile returns a regular expression object.

In the pattern, this part will match only a single word [a-zA-Z0-9]+[ ], and if this part should be in a capture group [0-9]([-][0-9]+)? the first (single) digit is currently not part of the capture group.

You could write the pattern writing 4 capture groups:

^(.*? )?((?:[Ll]ist|[Tt]able|[Ff]igure))\s+(\d+(?:-\d+)?):\s+(.+)

See a regex demo.

import re

pattern = r"^(.*? )?((?:[Ll]ist|[Tt]able|[Ff]igure))\s+(\d+(?:-\d+)?):\s+(.+)"
s = "Text Table 6-2: Management of children study and actions"
m = re.match(pattern, s)
if m:
print(m.groups())

Output

('Text ', 'Table', '6-2', 'Management of children study and actions')

If you want point 1 and 2 as one string, then you can use 2 capture groups instead.

^((?:.*? )?(?:[Ll]ist|[Tt]able|[Ff]igure)\s+\d+(?:-\d+)?):\s+(.+)

Regex demo

The output will be

('Text Table 6-2', 'Management of children study and actions')

How to split a string based on two regex formats?

The problem is that String.split() gives you only the pieces between delimiters. The delimiters themselves -- the substrings that match the pattern -- are omitted. But you don't have actual delimiters in your string. Rather, you want to split at transitions between digits and non-digits. These can be matched via zero-width assertions:

string.split("(?<![0-9])(?=[0-9])|(?<=[0-9])(?![0-9])");

That is

  • the position after a non-digit (?<![0-9]) and before a digit (?=[0-9])

or (|)

  • the position after a digit (?<=[0-9]) and before a non-digit (?![0-9])

Split string based on regex

I suggest

l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)

Check this demo.

Splitting strings based on regex expression

# If the string matches a certain pattern, split it in two.
[array] $tokens =
if ($str -match '^(\d{3})([a-z]\d)$') { $Matches.1, $Matches.2 }
else { $str }

# Test if all tokens exist as elements in the array.
# -> $true, in this case.
$allTokensContainedInArray =
(Compare-Object $array $tokens).SideIndicator -notcontains '=>'
  • The regex-based -match operator is used to test whether $str starts with 3 digits, followed by a letter and a single digit, and, if so, via capture groups ((...)) and the automatic $Matches variable, splits the string into the part with the 3 digits and the rest.

  • The above uses Compare-Object to test (case-insensitively) if the array elements derived from the input string are all contained in the reference array, in any order, while allowing the reference array to contain additional elements.


If you want to limit all input strings to those matching regex pattern, before even attempting lookup in the array:

# If no pattern matches, $tokens will be $null
[array] $tokens =
if ($str -match '^(\d{3})([a-z]\d)$') { $Matches.1, $Matches.2 }
elseif ($str -match '^\d{3}$') { $str }
elseif ($str -match '^[a-z]\d$') { $str }

Split string based on a regular expression

By using (,), you are capturing the group, if you simply remove them you will not have this problem.

>>> str1 = "a    b     c      d"
>>> re.split(" +", str1)
['a', 'b', 'c', 'd']

However there is no need for regex, str.split without any delimiter specified will split this by whitespace for you. This would be the best way in this case.

>>> str1.split()
['a', 'b', 'c', 'd']

If you really wanted regex you can use this ('\s' represents whitespace and it's clearer):

>>> re.split("\s+", str1)
['a', 'b', 'c', 'd']

or you can find all non-whitespace characters

>>> re.findall(r'\S+',str1)
['a', 'b', 'c', 'd']

split string with RegEx pattern with words and numbers

By just adding twice (?:\s) to your expression:

re.findall(r"(?:^|(?<=\d\.))(?:\s)([\sa-zA-Z0-9]+)(?:\s\d\.|$)", test_string)

the output is : ['Fruit 12 oranges', 'vegetables 7 carrot', 'NFL 246 SHIRTS']

JavaScript split string with .match(regex)

Use a non-capturing group as split regex. By using non-capturing group, split matches will not be included in resulting array.

var string4 = 'one split two splat three splot four';var splitString4 = string4.split(/\s+(?:split|splat|splot)\s+/);console.log(splitString4);

Splitting a string by a regular expression

Yes, a regex can easily do this :)

# SELECT regexp_split_to_table(
'I use Python, SQL, C++. I need: apples and oranges',
'[ .,:;]+');
┌───────────────────────┐
│ regexp_split_to_table │
├───────────────────────┤
│ I │
│ use │
│ Python │
│ SQL │
│ C++ │
│ I │
│ need │
│ apples │
│ and │
│ oranges │
└───────────────────────┘
(10 rows)


Related Topics



Leave a reply



Submit