Split string based on a regular expression
By using (
,)
, you are capturing the group, if you simply remove them you will not have this problem.
>>> str1 = "a b c d"
>>> re.split(" +", str1)
['a', 'b', 'c', 'd']
However there is no need for regex, str.split
without any delimiter specified will split this by whitespace for you. This would be the best way in this case.
>>> str1.split()
['a', 'b', 'c', 'd']
If you really wanted regex you can use this ('\s'
represents whitespace and it's clearer):
>>> re.split("\s+", str1)
['a', 'b', 'c', 'd']
or you can find all non-whitespace characters
>>> re.findall(r'\S+',str1)
['a', 'b', 'c', 'd']
Split String based on multiple Regex matches
The matching changes because:
In the first part, you call
.group().split()
where.group()
returns the full match which is a string.In the second part, you call
re.compile("...").split()
where re.compile returns a regular expression object.
In the pattern, this part will match only a single word [a-zA-Z0-9]+[ ]
, and if this part should be in a capture group [0-9]([-][0-9]+)?
the first (single) digit is currently not part of the capture group.
You could write the pattern writing 4 capture groups:
^(.*? )?((?:[Ll]ist|[Tt]able|[Ff]igure))\s+(\d+(?:-\d+)?):\s+(.+)
See a regex demo.
import re
pattern = r"^(.*? )?((?:[Ll]ist|[Tt]able|[Ff]igure))\s+(\d+(?:-\d+)?):\s+(.+)"
s = "Text Table 6-2: Management of children study and actions"
m = re.match(pattern, s)
if m:
print(m.groups())
Output
('Text ', 'Table', '6-2', 'Management of children study and actions')
If you want point 1 and 2 as one string, then you can use 2 capture groups instead.
^((?:.*? )?(?:[Ll]ist|[Tt]able|[Ff]igure)\s+\d+(?:-\d+)?):\s+(.+)
Regex demo
The output will be
('Text Table 6-2', 'Management of children study and actions')
Splitting a string by a regular expression
Yes, a regex can easily do this :)
# SELECT regexp_split_to_table(
'I use Python, SQL, C++. I need: apples and oranges',
'[ .,:;]+');
┌───────────────────────┐
│ regexp_split_to_table │
├───────────────────────┤
│ I │
│ use │
│ Python │
│ SQL │
│ C++ │
│ I │
│ need │
│ apples │
│ and │
│ oranges │
└───────────────────────┘
(10 rows)
How to split strings using regular expressions
Actually this is easy enough to just use match :
string subjectString = @"green,""yellow,green"",white,orange,""blue,black""";
try
{
Regex regexObj = new Regex(@"(?<="")\b[a-z,]+\b(?="")|[a-z]+", RegexOptions.IgnoreCase);
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success)
{
Console.WriteLine("{0}", matchResults.Value);
// matched text: matchResults.Value
// match start: matchResults.Index
// match length: matchResults.Length
matchResults = matchResults.NextMatch();
}
}
Output :
green
yellow,green
white
orange
blue,black
Explanation :
@"
# Match either the regular expression below (attempting the next alternative only if this one fails)
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
"" # Match the character “""” literally
)
\b # Assert position at a word boundary
[a-z,] # Match a single character present in the list below
# A character in the range between “a” and “z”
# The character “,”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\b # Assert position at a word boundary
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
"" # Match the character “""” literally
)
| # Or match regular expression number 2 below (the entire match attempt fails if this one fails to match)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
String.split() *not* on regular expression?
A general solution using just Java SE APIs is:
String separator = ...
s.split(Pattern.quote(separator));
The quote
method returns a regex that will match the argument string as a literal.
How to Split String Using Regex Expressions
Swift doesn't have native regular expressions as of yet. But Foundation
provides NSRegularExpression
.
import Foundation
let toSearch = "323 ECO Economics Course 451 ENG English Course 789 MAT Mathematical Topography"
let pattern = "[0-9]{3} [A-Z]{3}"
let regex = try! NSRegularExpression(pattern: pattern, options: [])
// NSRegularExpression works with objective-c NSString, which are utf16 encoded
let matches = regex.matches(in: toSearch, range: NSMakeRange(0, toSearch.utf16.count))
// the combination of zip, dropFirst and map to optional here is a trick
// to be able to map on [(result1, result2), (result2, result3), (result3, nil)]
let results = zip(matches, matches.dropFirst().map { Optional.some($0) } + [nil]).map { current, next -> String in
let range = current.rangeAt(0)
let start = String.UTF16Index(range.location)
// if there's a next, use it's starting location as the ending of our match
// otherwise, go to the end of the searched string
let end = next.map { $0.rangeAt(0) }.map { String.UTF16Index($0.location) } ?? String.UTF16Index(toSearch.utf16.count)
return String(toSearch.utf16[start..<end])!
}
dump(results)
Running this will output
▿ 3 elements
- "323 ECO Economics Course "
- "451 ENG English Course "
- "789 MAT Mathematical Topography"
Splitting strings based on regex expression
# If the string matches a certain pattern, split it in two.
[array] $tokens =
if ($str -match '^(\d{3})([a-z]\d)$') { $Matches.1, $Matches.2 }
else { $str }
# Test if all tokens exist as elements in the array.
# -> $true, in this case.
$allTokensContainedInArray =
(Compare-Object $array $tokens).SideIndicator -notcontains '=>'
The regex-based
-match
operator is used to test whether$str
starts with 3 digits, followed by a letter and a single digit, and, if so, via capture groups ((...)
) and the automatic$Matches
variable, splits the string into the part with the 3 digits and the rest.The above uses
Compare-Object
to test (case-insensitively) if the array elements derived from the input string are all contained in the reference array, in any order, while allowing the reference array to contain additional elements.
If you want to limit all input strings to those matching regex pattern, before even attempting lookup in the array:
# If no pattern matches, $tokens will be $null
[array] $tokens =
if ($str -match '^(\d{3})([a-z]\d)$') { $Matches.1, $Matches.2 }
elseif ($str -match '^\d{3}$') { $str }
elseif ($str -match '^[a-z]\d$') { $str }
split string based on regular expression value in python
re documentation:re.split(pattern, string, maxsplit=0, flags=0)
The split()
function expects that the third positional argument is the maxsplit
argument. Your code gives re.I
to maxsplit
and no flags
. You should give flags
as a keyword argument like so:
exp_split = re.split(r'( in )',exp, flags=re.I)
How to split a string based on two regex formats?
The problem is that String.split()
gives you only the pieces between delimiters. The delimiters themselves -- the substrings that match the pattern -- are omitted. But you don't have actual delimiters in your string. Rather, you want to split at transitions between digits and non-digits. These can be matched via zero-width assertions:
string.split("(?<![0-9])(?=[0-9])|(?<=[0-9])(?![0-9])");
That is
- the position after a non-digit
(?<![0-9])
and before a digit(?=[0-9])
or (|
)
- the position after a digit
(?<=[0-9])
and before a non-digit(?![0-9])
Related Topics
How to Interact with the Recaptcha Audio Element Using Selenium and Python
How to Clone a Python Generator Object
Using Numpy.Genfromtxt to Read a CSV File with Strings Containing Commas
Set Up Python Simplehttpserver on Windows
Is There a Multi-Dimensional Version of Arange/Linspace in Numpy
How to Copy Inmemoryuploadedfile Object to Disk
Python: Fastest Way to Create a List of N Lists
Why Does Pandas Apply Calculate Twice
How to Enable Pan and Zoom in a Qgraphicsview
How to Split/Partition a Dataset into Training and Test Datasets For, E.G., Cross Validation
Multiprocessing - Pipe VS Queue
Python: Start New Command Prompt on Windows and Wait for It Finish/Exit
Import Win32API Error in Python 2.6
Convert List into a Dictionary