How to split strings into text and number?
I would approach this by using re.match
in the following way:
import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
items = match.groups()
print(items)
>> ("foofo", "21")
How to split a string into numbers and characters
A regex find all approach might be appropriate here. We can find groups of all non digit or all digit characters, alternatively.
string = 'Hello, welcome to my world001'
parts = re.findall(r'\D+|\d+', string)
print(parts) # ['Hello, welcome to my world', '001']
separate integers and text in a string
Split your string into an array by integer:
myArray = datastring.split(/([0-9]+)/)
Then the first element of myArray
will be something like fullData
and the second will be some numbers such as 1
or 10
.
If your string was fullData10foo
then you would have an array ['fullData', 10, 'foo']
You could also:
.split(/(?=\d+)/)
which will yield["fullData", "1", "0"]
.split(/(\d+)/)
which will yield["fullData", "10", ""]
Additionally
.filter(Boolean)
to get rid of any empty strings (""
)
Split string into letters and numbers, keep symbols
Try this:
compiled = re.compile(r'[A-Za-z]+|-?\d+\.\d+|\d+|\W')
compiled.findall("$100.0thousand")
# ['$', '100.0', 'thousand']
Here's an Advanced Edition™
advanced_edition = re.compile(r'[A-Za-z]+|-?\d+(?:\.\d+)?|(?:[^\w-]+|-(?!\d))+')
The difference is:
compiled.findall("$$$-100thousand") # ['$', '$', '$', '-', '100', 'thousand']
advanced_edition.findall("$$$-100thousand") # ['$$$', '-100', 'thousand']
Python - Splitting numbers and letters into sub-strings with regular expression
What's wrong with re.findall
?
>>> s = '125km'
>>> re.findall(r'[A-Za-z]+|\d+', s)
['125', 'km']
[A-Za-z]+
matches one or more alphabets. |
or \d+
one or more digits.
OR
Use list comprehension.
>>> [i for i in re.split(r'([A-Za-z]+)', s) if i]
['125', 'km']
>>> [i for i in re.split(r'(\d+)', s) if i]
['125', 'km']
Any way to split strings in Python at the place were an integer appears?
What about using regex? i.e., the re package in python, combined with the split method? Something like this could work:
import re
string = 'string01string02string23string4string500string'
strlist = re.split('(\d+)', string)
print(strlist)
['string', '01', 'string', '02', 'string', '23', 'string', '4', 'string', '500', 'string']
You would then need to combine every other element in the list in your case i think, so something like this:
cmb = [i+j for i,j in zip(strlist[::2], strlist[1::2])]
print(cmb)
['string01', 'string02', 'string23', 'string4', 'string500']
How to Split text by Numbers and Group of words
you can try splitting using this regex
([\d,]+|[a-zA-Z]+ *[a-zA-Z]*) //note the spacing between + and *.
- [0-9,]+ // will search for one or more digits and commas
[a-zA-Z]+ [a-zA-Z] // will search for a word, followed by a space(if any) followed by another word(if any).
String regEx = "[0-9,]+|[a-zA-Z]+ *[a-zA-Z]*";
you use them like this
public static void main(String args[]) {
String input = new String("2 Marine Cargo 14,642 10,528 16,016 more text 8,609 argA 2,106 argB");
System.out.println("Return Value :" );
Pattern pattern = Pattern.compile("[0-9,]+|[a-zA-Z]+ *[a-zA-Z]*");
ArrayList<String> result = new ArrayList<String>();
Matcher m = pattern.matcher(input);
while (m.find()) {
System.out.println(">"+m.group(0)+"<");
result.add(m.group(0));
}
}
The following is the output as well as a detailed explaination of the RegEx that is autogenerated from https://regex101.com
1st Alternative [0-9,]+
Match a single character present in the list below [0-9,]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
, matches the character , literally (case sensitive)
2nd Alternative [a-zA-Z]+ *[a-zA-Z]*
Match a single character present in the list below [a-zA-Z]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
a-z a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z a single character in the range between A (index 65) and Z (index 90) (case sensitive)
* matches the character literally (case sensitive)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character present in the list below [a-zA-Z]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
a-z a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z a single character in the range between A (index 65) and Z (index 90) (case sensitive)
Related Topics
Why Are There No ++ and -- Operators in Python
How to Convert a Time.Struct_Time Object into a Datetime Object
Python Requests - No Connection Adapters
Reading a Binary File with Python
How to Call a Shell Script from Python Code
Print to Standard Printer from Python
Search for String in All Pandas Dataframe Columns and Filter
Problems with Pip Install Numpy - Runtimeerror: Broken Toolchain: Cannot Link a Simple C Program
Reloading Module Giving Nameerror: Name 'Reload' Is Not Defined
Execute a File with Arguments in Python Shell
Convert to Binary and Keep Leading Zeros
How to Sort Objects by Multiple Keys
Excluding Directories in Os.Walk
How to Add a String in a Certain Position
How to Put Multiple Statements in One Line