How to Split Strings into Text and Number

How to split strings into text and number?

I would approach this by using re.match in the following way:

import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
items = match.groups()
print(items)
>> ("foofo", "21")

How to split a string into numbers and characters

A regex find all approach might be appropriate here. We can find groups of all non digit or all digit characters, alternatively.

string = 'Hello, welcome to my world001'
parts = re.findall(r'\D+|\d+', string)
print(parts) # ['Hello, welcome to my world', '001']

separate integers and text in a string

Split your string into an array by integer:

myArray = datastring.split(/([0-9]+)/)

Then the first element of myArray will be something like fullData and the second will be some numbers such as 1 or 10.

If your string was fullData10foo then you would have an array ['fullData', 10, 'foo']

You could also:

  • .split(/(?=\d+)/) which will yield ["fullData", "1", "0"]

  • .split(/(\d+)/) which will yield ["fullData", "10", ""]

  • Additionally .filter(Boolean) to get rid of any empty strings ("")

Split string into letters and numbers, keep symbols

Try this:

compiled = re.compile(r'[A-Za-z]+|-?\d+\.\d+|\d+|\W')
compiled.findall("$100.0thousand")
# ['$', '100.0', 'thousand']

Here's an Advanced Edition™

advanced_edition = re.compile(r'[A-Za-z]+|-?\d+(?:\.\d+)?|(?:[^\w-]+|-(?!\d))+')

The difference is:

compiled.findall("$$$-100thousand")  # ['$', '$', '$', '-', '100', 'thousand']
advanced_edition.findall("$$$-100thousand") # ['$$$', '-100', 'thousand']

Python - Splitting numbers and letters into sub-strings with regular expression

What's wrong with re.findall ?

>>> s = '125km'
>>> re.findall(r'[A-Za-z]+|\d+', s)
['125', 'km']

[A-Za-z]+ matches one or more alphabets. | or \d+ one or more digits.

OR

Use list comprehension.

>>> [i for i in re.split(r'([A-Za-z]+)', s) if i]
['125', 'km']
>>> [i for i in re.split(r'(\d+)', s) if i]
['125', 'km']

Any way to split strings in Python at the place were an integer appears?

What about using regex? i.e., the re package in python, combined with the split method? Something like this could work:

import re
string = 'string01string02string23string4string500string'

strlist = re.split('(\d+)', string)
print(strlist)
['string', '01', 'string', '02', 'string', '23', 'string', '4', 'string', '500', 'string']

You would then need to combine every other element in the list in your case i think, so something like this:

cmb = [i+j for i,j in zip(strlist[::2], strlist[1::2])]
print(cmb)

['string01', 'string02', 'string23', 'string4', 'string500']

How to Split text by Numbers and Group of words

you can try splitting using this regex

([\d,]+|[a-zA-Z]+ *[a-zA-Z]*) //note the spacing between + and *.
  • [0-9,]+ // will search for one or more digits and commas
  • [a-zA-Z]+ [a-zA-Z] // will search for a word, followed by a space(if any) followed by another word(if any).

    String regEx = "[0-9,]+|[a-zA-Z]+ *[a-zA-Z]*";

you use them like this

public static void main(String args[]) {

String input = new String("2 Marine Cargo 14,642 10,528 16,016 more text 8,609 argA 2,106 argB");
System.out.println("Return Value :" );

Pattern pattern = Pattern.compile("[0-9,]+|[a-zA-Z]+ *[a-zA-Z]*");

ArrayList<String> result = new ArrayList<String>();
Matcher m = pattern.matcher(input);
while (m.find()) {
System.out.println(">"+m.group(0)+"<");
result.add(m.group(0));

}
}

The following is the output as well as a detailed explaination of the RegEx that is autogenerated from https://regex101.com

Sample Image

1st Alternative [0-9,]+
Match a single character present in the list below [0-9,]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
0-9 a single character in the range between 0 (index 48) and 9 (index 57) (case sensitive)
, matches the character , literally (case sensitive)

2nd Alternative [a-zA-Z]+ *[a-zA-Z]*
Match a single character present in the list below [a-zA-Z]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
a-z a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z a single character in the range between A (index 65) and Z (index 90) (case sensitive)
* matches the character literally (case sensitive)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character present in the list below [a-zA-Z]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
a-z a single character in the range between a (index 97) and z (index 122) (case sensitive)
A-Z a single character in the range between A (index 65) and Z (index 90) (case sensitive)


Related Topics



Leave a reply



Submit