Product code looks like abcd2343, how to split by letters and numbers?
import re
s='abcd2343 abw34324 abc3243-23A'
re.split('(\d+)',s)
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A']
Or, if you want to split on the first occurrence of a digit:
re.findall('\d*\D+',s)
> ['abcd', '2343 abw', '34324 abc', '3243-', '23A']
\d+
matches 1-or-more digits.\d*\D+
matches 0-or-more digits followed by 1-or-more non-digits.\d+|\D+
matches 1-or-more digits or 1-or-more non-digits.
Consult the docs for more about Python's regex syntax.
re.split(pat, s)
will split the string s
using pat
as the delimiter. If pat
begins and ends with parentheses (so as to be a "capturing group"), then re.split
will return the substrings matched by pat
as well. For instance, compare:
re.split('\d+', s)
> ['abcd', ' abw', ' abc', '-', 'A'] # <-- just the non-matching parts
re.split('(\d+)', s)
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A'] # <-- both the non-matching parts and the captured groups
In contrast, re.findall(pat, s)
returns only the parts of s
that match pat
:
re.findall('\d+', s)
> ['2343', '34324', '3243', '23']
Thus, if s
ends with a digit, you could avoid ending with an empty string by using re.findall('\d+|\D+', s)
instead of re.split('(\d+)', s)
:
s='abcd2343 abw34324 abc3243-23A 123'
re.split('(\d+)', s)
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123', '']
re.findall('\d+|\D+', s)
> ['abcd', '2343', ' abw', '34324', ' abc', '3243', '-', '23', 'A ', '123']
Splitting letters from numbers within a string
Use itertools.groupby
together with str.isalpha
method:
Docstring:
groupby(iterable[, keyfunc]) -> create an iterator which returns
(key, sub-iterator) grouped by each value of key(value).
Docstring:
S.isalpha() -> bool
Return True if all characters in S are alphabetic
and there is at least one character in S, False otherwise.
In [1]: from itertools import groupby
In [2]: s = "125A12C15"
In [3]: [''.join(g) for _, g in groupby(s, str.isalpha)]
Out[3]: ['125', 'A', '12', 'C', '15']
Or possibly re.findall
or re.split
from the regular expressions module:
In [4]: import re
In [5]: re.findall('\d+|\D+', s)
Out[5]: ['125', 'A', '12', 'C', '15']
In [6]: re.split('(\d+)', s) # note that you may have to filter out the empty
# strings at the start/end if using re.split
Out[6]: ['', '125', 'A', '12', 'C', '15', '']
In [7]: re.split('(\D+)', s)
Out[7]: ['125', 'A', '12', 'C', '15']
As for the performance, it seems that using a regex is probably faster:
In [8]: %timeit re.findall('\d+|\D+', s*1000)
100 loops, best of 3: 2.15 ms per loop
In [9]: %timeit [''.join(g) for _, g in groupby(s*1000, str.isalpha)]
100 loops, best of 3: 8.5 ms per loop
In [10]: %timeit re.split('(\d+)', s*1000)
1000 loops, best of 3: 1.43 ms per loop
Any way to split strings in Python at the place were an integer appears?
What about using regex? i.e., the re package in python, combined with the split method? Something like this could work:
import re
string = 'string01string02string23string4string500string'
strlist = re.split('(\d+)', string)
print(strlist)
['string', '01', 'string', '02', 'string', '23', 'string', '4', 'string', '500', 'string']
You would then need to combine every other element in the list in your case i think, so something like this:
cmb = [i+j for i,j in zip(strlist[::2], strlist[1::2])]
print(cmb)
['string01', 'string02', 'string23', 'string4', 'string500']
Converting string of letters and numbers into array
s = '2A3M4D8'
s = re.split('(\d+)', s)
s = list(filter(None, s))
print(s)
stack = []
res = 0
letter = ''
for x in s:
if x.isnumeric():
stack.append(int(x))
if letter != '':
print(stack)
if letter == 'M':
res = stack[0] * stack[1]
elif letter == 'A':
res = stack[0] + stack[1]
elif letter == 'D':
res = stack[0] / stack[1]
stack = []
print(res)
stack.append(res)
print(stack)
res = 0
letter = ''
else:
letter = x
print(stack[0])
Split alphanumeric strings by space and keep separator for just first occurence
Here is one way. We can use re.findall
on the pattern [A-Za-z]+|[0-9]+
, which will alternatively find all letter or all number words. Then, join that resulting list by space to get your output
inp = "Brijesh Tiwari810663 A14082014RGUBWA"
output = ' '.join(re.findall(r'[A-Za-z]+|[0-9]+', inp))
print(output) # Brijesh Tiwari 810663 A 14082014 RGUBWA
Edit: For your updated requirement, use re.sub
with just one replacement:
inp = "Johnson12 is at club39"
output = re.sub(r'\b([A-Za-z]+)([0-9]+)\b', r'\1 \2', inp, 1)
print(output) # Johnson 12 is at club39
Split string when first occurence of a number
Try splitting on the first occurrence of [ ](?=\d)
:
text = "MARIA APARECIDA 99223-2000 / 98450-8026"
parts = re.split(r' (?=\d)', text, 1)
print(parts)
This prints:
['MARIA APARECIDA', '99223-2000 / 98450-8026']
Note that the regex pattern used splits and consumes a single space, but does not consume the digit that follows (lookaheads do not advance the position in the input).
split string on numeric/non-numeric boundary
You can use the combination of positive lookahead & lookbehind in regex to determine the boundaries(delimiters) around which you can split the given string. Use:
import re
matches = re.split(r'(?<=\D)(?=\d)|(?<=\d)(?=\D)', string)
The resulting matches
for the given strings will be,
['abc', '0', 'foo!bar'] # 'abc0foo!bar'
['100', '.', '200', '.', '300'] # '100.200.300'
['123'] # '123'
['foo'] # 'foo'
Explanation:
Positive Lookbehind
(?<=\D)
\D
matches any character that's not a digit.
Positive Lookahead
(?=\d)
\d
matches a digit (equal to [0-9])
Positive Lookbehind
(?<=\d)
\d
matches a digit (equal to [0-9])
Positive Lookahead
(?=\D)
\D
matches any character that's not a digit.
You can test the regular expression here
.
Related Topics
Add Sum of Values of Two Lists into New List
How to Set Env Variable in Jupyter Notebook
How to Check Blas/Lapack Linkage in Numpy and Scipy
Fastest Way to Take a Screenshot with Python on Windows
Load CSV Data into MySQL in Python
How to Automatically Fix an Invalid JSON String
Python: Find_Element_By_Css_Selector
Login Credentials Not Working with Gmail Smtp
Find the Most Frequent Number in a Numpy Array
I Can't Seem to Get --Py-Files on Spark to Work
Differences Between Numpy.Random and Random.Random in Python
How to Install Pyqt4 on Windows Using Pip
How to Implement _Getattribute_ Without an Infinite Recursion Error
Parsing a JSON String Which Was Loaded from a CSV Using Pandas
String to Dictionary in Python
Extract Subset of Key-Value Pairs from Dictionary