How to split strings using regular expressions
Actually this is easy enough to just use match :
string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";
try
{
Regex regexObj = new Regex(@"(?<="")\b[a-z,]+\b(?="")|[a-z]+", RegexOptions.IgnoreCase);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
Console.WriteLine("{0}", matchResults.Value);
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
}
Output :
green
yellow,green
white
orange
blue,black
Explanation :
@"
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
"" # Match the character “""” literally
)
\b # Assert position at a word boundary
[a-z,] # Match a single character present in the list below
# A character in the range between “a” and “z”
# The character “,”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
"" # Match the character “""” literally
)
| # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
Split String based on multiple Regex matches
The matching changes because:
In the first part, you call
.group().split()
where.group()
returns the full match which is a string.In the second part, you call
re.compile("...").split()
where re.compile returns a regular expression object.
In the pattern, this part will match only a single word [a-zA-Z0-9]+[ ]
, and if this part should be in a capture group [0-9]([-][0-9]+)?
the first (single) digit is currently not part of the capture group.
You could write the pattern writing 4 capture groups:
^(.*? )?((?:[Ll]ist|[Tt]able|[Ff]igure))\s+(\d+(?:-\d+)?):\s+(.+)
See a regex demo.
import re
pattern = r"^(.*? )?((?:[Ll]ist|[Tt]able|[Ff]igure))\s+(\d+(?:-\d+)?):\s+(.+)"
s = "Text Table 6-2: Management of children study and actions"
m = re.match(pattern, s)
if m:
print(m.groups())
Output
('Text ', 'Table', '6-2', 'Management of children study and actions')
If you want point 1 and 2 as one string, then you can use 2 capture groups instead.
^((?:.*? )?(?:[Ll]ist|[Tt]able|[Ff]igure)\s+\d+(?:-\d+)?):\s+(.+)
Regex demo
The output will be
('Text Table 6-2', 'Management of children study and actions')
How to split a string based on two regex formats?
The problem is that String.split()
gives you only the pieces between delimiters. The delimiters themselves -- the substrings that match the pattern -- are omitted. But you don't have actual delimiters in your string. Rather, you want to split at transitions between digits and non-digits. These can be matched via zero-width assertions:
string.split("(?<![0-9])(?=[0-9])|(?<=[0-9])(?![0-9])");
That is
- the position after a non-digit
(?<![0-9])
and before a digit(?=[0-9])
or (|
)
- the position after a digit
(?<=[0-9])
and before a non-digit(?![0-9])
Split string based on regex
I suggest
l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)
Check this demo.
Splitting strings based on regex expression
# If the string matches a certain pattern, split it in two.
[array] $tokens =
if ($str -match '^(\d{3})([a-z]\d)$') { $Matches.1, $Matches.2 }
else { $str }
# Test if all tokens exist as elements in the array.
# -> $true, in this case.
$allTokensContainedInArray =
(Compare-Object $array $tokens).SideIndicator -notcontains '=>'
The regex-based
-match
operator is used to test whether$str
starts with 3 digits, followed by a letter and a single digit, and, if so, via capture groups ((...)
) and the automatic$Matches
variable, splits the string into the part with the 3 digits and the rest.The above uses
Compare-Object
to test (case-insensitively) if the array elements derived from the input string are all contained in the reference array, in any order, while allowing the reference array to contain additional elements.
If you want to limit all input strings to those matching regex pattern, before even attempting lookup in the array:
# If no pattern matches, $tokens will be $null
[array] $tokens =
if ($str -match '^(\d{3})([a-z]\d)$') { $Matches.1, $Matches.2 }
elseif ($str -match '^\d{3}$') { $str }
elseif ($str -match '^[a-z]\d$') { $str }
Split string based on a regular expression
By using (
,)
, you are capturing the group, if you simply remove them you will not have this problem.
>>> str1 = "a b c d"
>>> re.split(" +", str1)
['a', 'b', 'c', 'd']
However there is no need for regex, str.split
without any delimiter specified will split this by whitespace for you. This would be the best way in this case.
>>> str1.split()
['a', 'b', 'c', 'd']
If you really wanted regex you can use this ('\s'
represents whitespace and it's clearer):
>>> re.split("\s+", str1)
['a', 'b', 'c', 'd']
or you can find all non-whitespace characters
>>> re.findall(r'\S+',str1)
['a', 'b', 'c', 'd']
split string with RegEx pattern with words and numbers
By just adding twice (?:\s) to your expression:
re.findall(r"(?:^|(?<=\d\.))(?:\s)([\sa-zA-Z0-9]+)(?:\s\d\.|$)", test_string)
the output is : ['Fruit 12 oranges', 'vegetables 7 carrot', 'NFL 246 SHIRTS']
JavaScript split string with .match(regex)
Use a non-capturing group as split regex. By using non-capturing group, split matches will not be included in resulting array.
var string4 = 'one split two splat three splot four';var splitString4 = string4.split(/\s+(?:split|splat|splot)\s+/);console.log(splitString4);
Splitting a string by a regular expression
Yes, a regex can easily do this :)
# SELECT regexp_split_to_table(
'I use Python, SQL, C++. I need: apples and oranges',
'[ .,:;]+');
┌───────────────────────┐
│ regexp_split_to_table │
├───────────────────────┤
│ I │
│ use │
│ Python │
│ SQL │
│ C++ │
│ I │
│ need │
│ apples │
│ and │
│ oranges │
└───────────────────────┘
(10 rows)
Related Topics
Cleanest Way to Hide Every Nth Tick Label in Matplotlib Colorbar
Axes Class - Set Explicitly Size (Width/Height) of Axes in Given Units
How to Unimport a Python Module Which Is Already Imported
How Does Python Find a Module File If the Import Statement Only Contains the Filename
How to Use Boto to Stream a File Out of Amazon S3 to Rackspace Cloudfiles
Nested Ssh Session with Paramiko
Installing MySQL Python on MAC Os X
Will Ordereddict Become Redundant in Python 3.7
Pyplot Move Alternative Y Axis to Background
Generating Discrete Random Variables with Specified Weights Using Scipy or Numpy
Pygame Tic Tak Toe Logic? How Would I Do It
How to Use Hex() Without 0X in Python
Site Matching Query Does Not Exist
Initialize List to a Variable in a Dictionary Inside a Loop
How to Return a String from a Regex Match in Python
Tuple or List When Using 'In' in an 'If' Clause