R: Split String into Numeric and Return the Mean as a New Column in a Data Frame

R: split string into numeric and return the mean as a new column in a data frame

You could use sapply to loop through the list returned by strsplit, handling each of the list elements:

sapply(strsplit((df$a), split=", "), function(x) mean(as.numeric(x)))
# [1] 2.5 5.0 7.5

Split R string into individual characters

You could use

data.frame(Reduce(rbind, strsplit(df$V1, "")))

This returns

     X1 X2 X3 X4 X5 X6
init g g g g c c
X c c c c t t
X.1 t t t t t t
X.2 a a a a a a

or

data.frame(do.call(rbind, strsplit(df$V1, "")))

which returns

  X1 X2 X3 X4 X5 X6
1 g g g g c c
2 c c c c t t
3 t t t t t t
4 a a a a a a

Split a column, get the mean of the split columns, and update the result

Another option using data.table

library(data.table)
cols <- c("colA", "colB")
for(j in cols) {
tmp <- vapply(strsplit(test.val[[j]], "-"),
FUN = function(i) mean(as.numeric(i)),
FUN.VALUE = numeric(1))
set(test.val, j = j, value = tmp)
}
test.val
# id colA colB
#1: 1 125 15
#2: 2 200 25
#3: 3 300 10

Given a vector

x <- c("100-150", "200", "300")

the result of strsplit is a list of character vectors

strsplit(x, "-")
#[[1]]
#[1] "100" "150"

#[[2]]
#[1] "200"

#[[3]]
#[1] "300"

We wrap this into vapply and calculate the mean for each element after we converted each vector to numeric.

vapply(strsplit(x, "-"), function(x) mean(as.numeric(x)), numeric(1))
# [1] 125 200 300

We use this result to replace every column specified in cols using data.table's set function.

Split columns in a dataframe into a column that contains text not numbers and a column that contains numbers not text in R

Here's a dplyr solution using regular expression:

library(stringr)
library(dplyr)
df %>%
mutate(
a.text = gsub("(^|\\s)\\d+", "", a),
a.num = str_extract_all(a, "\\d+"),
b.text = gsub("(^|\\s)\\d+", "", b),
b.num = str_extract_all(b, "\\d+")
) %>%
select(c(4:7,3))
a.text a.num b.text b.num c
1 There are programs 5 four 2
2 - adult programs,- youth programs 2, 3 we don't collect this 6
3 25 from us, more from others 5 5
4 8
5 there are a number of programs 2
6 other agencies run our programs NA

R: My data frame has 2 columns that have a string of numbers in each row, is there a way to split the string and add the values of each column?

Here are a couple of approaches.

This uses a function list_reduction from SOfun.

df <- data.frame(A = c("1,2,3,4", "9,10,11,12,13"),
B = c("5,6,7,8", "14,15,16,17,18"))

## Grab `list_reduction` from "SOfun"
source("https://raw.githubusercontent.com/mrdwab/SOfun/master/R/list_reduction.R")

## Split the list
df_list <- lapply(df, function(x) type.convert(strsplit(as.character(x), ",", fixed = TRUE)))
df["C"] <- list_reduction(df_list, "+", flatten = TRUE)
df
# A B C
# 1 1,2,3,4 5,6,7,8 6, 8, 10, 12
# 2 9,10,11,12,13 14,15,16,17,18 23, 25, 27, 29, 31

This uses cSplit from "splitstackshape":

library(splitstackshape)
library(data.table)
cSplit(as.data.table(df, keep.rownames=TRUE), c("A", "B"), ",", "long")[
, C := A + B][, lapply(.SD, toString), "rn"]
# rn A B C
# 1: 1 1, 2, 3, 4 5, 6, 7, 8 6, 8, 10, 12
# 2: 2 9, 10, 11, 12, 13 14, 15, 16, 17, 18 23, 25, 27, 29, 31

How can I split a character string in a dataframe into multiple columns

You can use separate() from tidyr

tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left")
# ID x1 x2
# 1 1 < 0.1
# 2 2 <NA> 100
# 3 3 A 2.5
# 4 4 <NA> 200

If you absolutely need to remove the NA values, then you can do

tdy <- tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left")
tdy[is.na(tdy)] <- ""

and then we have

tdy
# ID x1 x2
# 1 1 < 0.1
# 2 2 100
# 3 3 A 2.5
# 4 4 200

R dataframe: How to split by 2 columns and calculate the mean

You can use dplyr for that.

For

library(dplyr)
df %>%
gather("Col","Numbers", C:length(.)) %>%
group_by(A, B) %>%
summarise(mean = mean(Numbers))

Best,

Colin

How to get R to create new column (named from left part of string in old column), and then put right part of string from old column into new column

Starting with

quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE")), class = "data.frame", row.names = c(NA, -5L))

The naive approach would be

data.frame(COLOR = trimws(sub("COLOR:", "", quux$oldColumn1)))
# COLOR
# 1 RED
# 2 RED
# 3 BLUE
# 4 GREEN
# 5 BLUE

But I'm assuming you have a more generic need. Let's assume that you have some more things to parse out of that, such as

quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE", "SIZE: 1", "SIZE: 3", "SIZE: 5")), class = "data.frame", row.names = c(NA, -8L))
quux
# oldColumn1
# 1 COLOR: RED
# 2 COLOR: RED
# 3 COLOR: BLUE
# 4 COLOR: GREEN
# 5 COLOR: BLUE
# 6 SIZE: 1
# 7 SIZE: 3
# 8 SIZE: 5

then we can generalize it with

tmp <- strcapture("(.*)\\s*:\\s*(.*)", quux$oldColumn1, list(k="", v=""))
tmp$ign <- ave(rep(1L, nrow(tmp)), tmp$k, FUN = seq_along)
reshape2::dcast(tmp, ign ~ k, value.var = "v")[,-1,drop=FALSE]
# COLOR SIZE
# 1 RED 1
# 2 RED 3
# 3 BLUE 5
# 4 GREEN <NA>
# 5 BLUE <NA>

--

Edit: alternative with updated data:

do.call(cbind, lapply(dat, function(X) {
nm <- sub(":.*", "", X[1])
out <- data.frame(trimws(sub(".*:", "", X)))
names(out) <- nm
out
}))
# COLOR SIZE DESIGNSTYLE
# 1 RED LARGE STYLED
# 2 RED MEDIUM ORIGINAL MAKER
# 3 BLUE XLARGE COUTURE
# 4 GREEN MEDIUM COUTURE
# 5 BLUE SMALL STYLED

Replace strings of numbers separated by commas with the median in R

We can split the 'a' column with strsplit on , followed by zero or more spaces (\\s*), loop over the list, convert to numeric and get the median, assign it to same column

df$a <- sapply(strsplit(df$a, ",\\s*"), function(x) median(as.numeric(x)))
df$a
#[1] 4 6 4 6

Or using tidyverse, we can use separate_rows to split the 'a' column and expand the rows while converting the type', then do a group by median

library(dplyr)
library(tidyr)
df %>%
separate_rows(a, convert = TRUE) %>%
group_by(b) %>%
summarise(a = median(a))

split values and then operate with them using R

Try this. Note: Added 'col.names' to suppress default handling of rownames.

x=c("1", "2", "3", "2:3","4","5","3:2")
datos <- data.frame(1:7, 1:7, x=x)
newframe <- cbind( datos[1:2],
read.table(text= as.character(datos[[3]]), sep=":",
fill=TRUE, colClasses="numeric",
col.names=c("V3", "V4")
)
)

> newframe
X1.7 X1.7.1 V3 V4
1 1 1 1 NA
2 2 2 2 NA
3 3 3 3 NA
4 4 4 2 3
5 5 5 4 NA
6 6 6 5 NA
7 7 7 3 2


Related Topics



Leave a reply



Submit