Split Character Data into Numbers and Letters

split character data into numbers and letters

For your regex you have to use:

gsub("[[:digit:]]","",my.data)

The [:digit:] character class only makes sense inside a set of [].

Split a character to letters and numbers

We can use regex lookarounds to split between the letters and numbers

v1 <- strsplit(str1, "(?<=[A-Za-z])(?=[0-9])|(?<=[0-9])(?=[A-Za-z])", perl = TRUE)[[1]]
v1[c(TRUE, FALSE)]
#[1] "A" "B" "C"

as.numeric(v1[c(FALSE, TRUE)])
#[1] 1 10 5

data

str1 <- "A1B10C5"

Split string into letters and numbers, keep symbols

Try this:

compiled = re.compile(r'[A-Za-z]+|-?\d+\.\d+|\d+|\W')
compiled.findall("$100.0thousand")
# ['$', '100.0', 'thousand']

Here's an Advanced Edition™

advanced_edition = re.compile(r'[A-Za-z]+|-?\d+(?:\.\d+)?|(?:[^\w-]+|-(?!\d))+')

The difference is:

compiled.findall("$$$-100thousand")  # ['$', '$', '$', '-', '100', 'thousand']
advanced_edition.findall("$$$-100thousand") # ['$$$', '-100', 'thousand']

How to split a string into numbers and characters

A regex find all approach might be appropriate here. We can find groups of all non digit or all digit characters, alternatively.

string = 'Hello, welcome to my world001'
parts = re.findall(r'\D+|\d+', string)
print(parts) # ['Hello, welcome to my world', '001']

tidyr separate column values into character and numeric using regex

You may use a (?<=[a-z])(?=[0-9]) lookaround based regex with tidyr::separate:

> tidyr::separate(df, A, into = c("name", "value"), "(?<=[a-z])(?=[0-9])")
name value
1 enc 0
2 enc 10
3 enc 25
4 enc 100
5 harab 0
6 harab 25
7 harab 100
8 requi 0
9 requi 25
10 requi 100

The (?<=[a-z])(?=[0-9]) pattern matches a location in the string right in between a lowercase ASCII letter ((?<=[a-z])) and a digit ((?=[0-9])). The (?<=...) is a positive lookahead that requires the presence of some pattern immediately to the left of the current location, and (?=...) is a positive lookahead that requires the presence of its pattern immediately to the right of the current location. Thus, the letters and digits are kept intact when splitting.

Alternatively, you may use extract:

extract(df, A, into = c("name", "value"), "^([a-z]+)(\\d+)$")

Output:

    name value
1 enc 0
2 enc 10
3 enc 25
4 enc 100
5 harab 0
6 harab 25
7 harab 100
8 requi 0
9 requi 25
10 requi 100

The ^([a-z]+)(\\d+)$ pattern matches:

  • ^ - start of input
  • ([a-z]+) - Capturing group 1 (column name): one or more lowercase ASCII letters
  • (\\d+) - Capturing group 2 (column value): one or more digits
  • $ - end of string.

split a string into letters and digits in c

In your code you write current<= strlen(MMOC_cod) but you have to remember that you should iterate while current is strictly inferior to strlen(MMOC_cod) since you start at index 0.

I prefer to use memcpy when I know the length:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define BUFF_SIZE 256

int main()
{
char s[BUFF_SIZE];
char warehouse[BUFF_SIZE];
char productNo[BUFF_SIZE];
char qualifiers[BUFF_SIZE];

printf("Hello World enter your MMOC like >ATL1203S14< \n");
scanf("%s", s);
int n = strlen(s);
if (n > BUFF_SIZE)
return 1;
int i = 0;
while (isalpha((unsigned char)s[i])) {
warehouse[i] = s[i];
i++;
}
warehouse[i] = '\0';
int j = 0;
while (isdigit((unsigned char)s[i]))
productNo[j++] = s[i++];
productNo[j] = '\0';
memcpy(qualifiers,&s[i],n-i);
qualifiers[n-i] = '\0';

printf("warehouse: %s\n", warehouse);
printf("product Number: %s\n", productNo);
printf("qualifiers: %s\n", qualifiers);
return 0;
}

Output:

warehouse: ATL
product Number: 1203
qualifiers: S14

NB: If you receive an input of length superior to BUFF_SIZE your program will return.

How to split decimal numbers followed by letters?

Using strsplit with positive lookahead/lookbehind. The [a-z%] denotes the range of letters from a to z as well as the % sign and should be expanded if there are other possibilities.

r1 <- do.call(rbind, strsplit(A, "(?<=\\d)(?=[a-z%])", perl=TRUE))
res1 <- setNames(as.data.frame(cbind(A, r1)), LETTERS[1:3])
res1
# A B C
# 1 -0.00023--0.00243unitincrease -0.00023--0.00243 unitincrease
# 2 -0.00176-0.02176pmol/Lincrease(replication) -0.00176-0.02176 pmol/Lincrease(replication)
# 3 0.00180-0.01780%varianceunitdecrease 0.00180-0.01780 %varianceunitdecrease

You may also want to get the numbers,

res2 <- type.convert(as.data.frame(
do.call(rbind, strsplit(A, "(?<=\\d)-|(?<=\\d)(?=[a-z%])", perl=TRUE))))
res2
# V1 V2 V3
# 1 -0.00023 -0.00243 unitincrease
# 2 -0.00176 0.02176 pmol/Lincrease(replication)
# 3 0.00180 0.01780 %varianceunitdecrease

where:

str(res2)
# 'data.frame': 3 obs. of 3 variables:
# $ V1: num -0.00023 -0.00176 0.0018
# $ V2: num -0.00243 0.02176 0.0178
# $ V3: Factor w/ 3 levels "%varianceunitdecrease",..: 3 2 1

How to split strings into text and number?

I would approach this by using re.match in the following way:

import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
items = match.groups()
print(items)
>> ("foofo", "21")

Split string on both sides of a number

One option is extract from tidyr

library(tidyr)
library(dplyr)
df1 %>%
extract(data, into = c("first.letter", "number", "last.letter"),
"^([A-Z])(\\d+)([A-Z])$")
# first.letter number last.letter
#1 X 3 Y
#2 X 33 U
#3 Y 231 Z

Or with separate

df1 %>%
separate(data, into = c("first.letter", "number", "last.letter"),
sep= "(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])")
# first.letter number last.letter
#1 X 3 Y
#2 X 33 U
#3 Y 231 Z

Or another option is strsplit and then rbind

do.call(rbind, strsplit(df1$data, 
"(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])", perl = TRUE))

data

df1 <- structure(list(data = c("X3Y", "X33U", "Y231Z")), 
class = "data.frame", row.names = c(NA, -3L))


Related Topics



Leave a reply



Submit