split character data into numbers and letters
For your regex you have to use:
gsub("[[:digit:]]","",my.data)
The [:digit:]
character class only makes sense inside a set of []
.
Split a character to letters and numbers
We can use regex lookarounds to split between the letters and numbers
v1 <- strsplit(str1, "(?<=[A-Za-z])(?=[0-9])|(?<=[0-9])(?=[A-Za-z])", perl = TRUE)[[1]]
v1[c(TRUE, FALSE)]
#[1] "A" "B" "C"
as.numeric(v1[c(FALSE, TRUE)])
#[1] 1 10 5
data
str1 <- "A1B10C5"
Split string into letters and numbers, keep symbols
Try this:
compiled = re.compile(r'[A-Za-z]+|-?\d+\.\d+|\d+|\W')
compiled.findall("$100.0thousand")
# ['$', '100.0', 'thousand']
Here's an Advanced Edition™
advanced_edition = re.compile(r'[A-Za-z]+|-?\d+(?:\.\d+)?|(?:[^\w-]+|-(?!\d))+')
The difference is:
compiled.findall("$$$-100thousand") # ['$', '$', '$', '-', '100', 'thousand']
advanced_edition.findall("$$$-100thousand") # ['$$$', '-100', 'thousand']
How to split a string into numbers and characters
A regex find all approach might be appropriate here. We can find groups of all non digit or all digit characters, alternatively.
string = 'Hello, welcome to my world001'
parts = re.findall(r'\D+|\d+', string)
print(parts) # ['Hello, welcome to my world', '001']
tidyr separate column values into character and numeric using regex
You may use a (?<=[a-z])(?=[0-9])
lookaround based regex with tidyr::separate
:
> tidyr::separate(df, A, into = c("name", "value"), "(?<=[a-z])(?=[0-9])")
name value
1 enc 0
2 enc 10
3 enc 25
4 enc 100
5 harab 0
6 harab 25
7 harab 100
8 requi 0
9 requi 25
10 requi 100
The (?<=[a-z])(?=[0-9])
pattern matches a location in the string right in between a lowercase ASCII letter ((?<=[a-z])
) and a digit ((?=[0-9])
). The (?<=...)
is a positive lookahead that requires the presence of some pattern immediately to the left of the current location, and (?=...)
is a positive lookahead that requires the presence of its pattern immediately to the right of the current location. Thus, the letters and digits are kept intact when splitting.
Alternatively, you may use extract
:
extract(df, A, into = c("name", "value"), "^([a-z]+)(\\d+)$")
Output:
name value
1 enc 0
2 enc 10
3 enc 25
4 enc 100
5 harab 0
6 harab 25
7 harab 100
8 requi 0
9 requi 25
10 requi 100
The ^([a-z]+)(\\d+)$
pattern matches:
^
- start of input([a-z]+)
- Capturing group 1 (columnname
): one or more lowercase ASCII letters(\\d+)
- Capturing group 2 (columnvalue
): one or more digits$
- end of string.
split a string into letters and digits in c
In your code you write current<= strlen(MMOC_cod)
but you have to remember that you should iterate while current is strictly inferior to strlen(MMOC_cod) since you start at index 0.
I prefer to use memcpy
when I know the length:
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define BUFF_SIZE 256
int main()
{
char s[BUFF_SIZE];
char warehouse[BUFF_SIZE];
char productNo[BUFF_SIZE];
char qualifiers[BUFF_SIZE];
printf("Hello World enter your MMOC like >ATL1203S14< \n");
scanf("%s", s);
int n = strlen(s);
if (n > BUFF_SIZE)
return 1;
int i = 0;
while (isalpha((unsigned char)s[i])) {
warehouse[i] = s[i];
i++;
}
warehouse[i] = '\0';
int j = 0;
while (isdigit((unsigned char)s[i]))
productNo[j++] = s[i++];
productNo[j] = '\0';
memcpy(qualifiers,&s[i],n-i);
qualifiers[n-i] = '\0';
printf("warehouse: %s\n", warehouse);
printf("product Number: %s\n", productNo);
printf("qualifiers: %s\n", qualifiers);
return 0;
}
Output:
warehouse: ATL
product Number: 1203
qualifiers: S14
NB: If you receive an input of length superior to BUFF_SIZE your program will return.
How to split decimal numbers followed by letters?
Using strsplit
with positive lookahead/lookbehind. The [a-z%]
denotes the range of letters from a to z as well as the % sign and should be expanded if there are other possibilities.
r1 <- do.call(rbind, strsplit(A, "(?<=\\d)(?=[a-z%])", perl=TRUE))
res1 <- setNames(as.data.frame(cbind(A, r1)), LETTERS[1:3])
res1
# A B C
# 1 -0.00023--0.00243unitincrease -0.00023--0.00243 unitincrease
# 2 -0.00176-0.02176pmol/Lincrease(replication) -0.00176-0.02176 pmol/Lincrease(replication)
# 3 0.00180-0.01780%varianceunitdecrease 0.00180-0.01780 %varianceunitdecrease
You may also want to get the numbers,
res2 <- type.convert(as.data.frame(
do.call(rbind, strsplit(A, "(?<=\\d)-|(?<=\\d)(?=[a-z%])", perl=TRUE))))
res2
# V1 V2 V3
# 1 -0.00023 -0.00243 unitincrease
# 2 -0.00176 0.02176 pmol/Lincrease(replication)
# 3 0.00180 0.01780 %varianceunitdecrease
where:
str(res2)
# 'data.frame': 3 obs. of 3 variables:
# $ V1: num -0.00023 -0.00176 0.0018
# $ V2: num -0.00243 0.02176 0.0178
# $ V3: Factor w/ 3 levels "%varianceunitdecrease",..: 3 2 1
How to split strings into text and number?
I would approach this by using re.match
in the following way:
import re
match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
items = match.groups()
print(items)
>> ("foofo", "21")
Split string on both sides of a number
One option is extract
from tidyr
library(tidyr)
library(dplyr)
df1 %>%
extract(data, into = c("first.letter", "number", "last.letter"),
"^([A-Z])(\\d+)([A-Z])$")
# first.letter number last.letter
#1 X 3 Y
#2 X 33 U
#3 Y 231 Z
Or with separate
df1 %>%
separate(data, into = c("first.letter", "number", "last.letter"),
sep= "(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])")
# first.letter number last.letter
#1 X 3 Y
#2 X 33 U
#3 Y 231 Z
Or another option is strsplit
and then rbind
do.call(rbind, strsplit(df1$data,
"(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])", perl = TRUE))
data
df1 <- structure(list(data = c("X3Y", "X33U", "Y231Z")),
class = "data.frame", row.names = c(NA, -3L))
Related Topics
What Ides Are Available for R in Linux
Dplyr::Group_By_ with Character String Input of Several Variable Names
Get All Diagonal Vectors from Matrix
How to Parse Year + Week Number in R
Python's Xrange Alternative for R or How to Loop Over Large Dataset Lazilly
How to Reorder Data.Table Columns (Without Copying)
Conditionally Display a Block of Text in R Markdown
Merge Three Different Columns into a Date in R
Fill Na in a Time Series Only to a Limited Number
Is It a Good Practice to Call Functions in a Package via ::
Reading 40 Gb CSV File into R Using Bigmemory
Show Frequencies Along with Barplot in Ggplot2
Ggplot2: Changing the Order of Stacks on a Bar Graph
What's the Difference Between Integer Class and Numeric Class in R