R: Split String into Numeric and Return the Mean as a New Column in a Data Frame

R: split string into numeric and return the mean as a new column in a data frame

You could use sapply to loop through the list returned by strsplit, handling each of the list elements:

sapply(strsplit((df$a), split=", "), function(x) mean(as.numeric(x)))
# [1] 2.5 5.0 7.5

Split R string into individual characters

You could use

data.frame(Reduce(rbind, strsplit(df$V1, "")))

This returns

     X1 X2 X3 X4 X5 X6
init  g  g  g  g  c  c
X     c  c  c  c  t  t
X.1   t  t  t  t  t  t
X.2   a  a  a  a  a  a

data.frame(do.call(rbind, strsplit(df$V1, "")))

which returns

  X1 X2 X3 X4 X5 X6
1  g  g  g  g  c  c
2  c  c  c  c  t  t
3  t  t  t  t  t  t
4  a  a  a  a  a  a

Split a column, get the mean of the split columns, and update the result

Another option using data.table

library(data.table)
cols <- c("colA", "colB")
for(j in cols) {
  tmp <- vapply(strsplit(test.val[[j]], "-"), 
                FUN = function(i) mean(as.numeric(i)), 
                FUN.VALUE = numeric(1))
  set(test.val, j = j, value = tmp)
}
test.val
#   id colA colB
#1:  1  125   15
#2:  2  200   25
#3:  3  300   10

Given a vector

x <- c("100-150", "200", "300")

the result of strsplit is a list of character vectors

strsplit(x, "-")
#[[1]]
#[1] "100" "150"

#[[2]]
#[1] "200"

#[[3]]
#[1] "300"

We wrap this into vapply and calculate the mean for each element after we converted each vector to numeric.

vapply(strsplit(x, "-"), function(x) mean(as.numeric(x)), numeric(1))
# [1] 125 200 300

We use this result to replace every column specified in cols using data.table's set function.

Split columns in a dataframe into a column that contains text not numbers and a column that contains numbers not text in R

Here's a dplyr solution using regular expression:

library(stringr)
library(dplyr)
df %>%
  mutate(
    a.text = gsub("(^|\\s)\\d+", "", a),
    a.num = str_extract_all(a, "\\d+"),
    b.text = gsub("(^|\\s)\\d+", "", b),
    b.num = str_extract_all(b, "\\d+") 
  ) %>% 
  select(c(4:7,3))
                              a.text a.num                     b.text b.num  c
1                 There are programs     5                       four        2
2  - adult programs,- youth programs  2, 3      we don't collect this        6
3                                       25  from us, more from others     5  5
4                                                                            8
5     there are a number of programs                                         2
6    other agencies run our programs                                        NA

R: My data frame has 2 columns that have a string of numbers in each row, is there a way to split the string and add the values of each column?

Here are a couple of approaches.

This uses a function list_reduction from SOfun.

df <- data.frame(A = c("1,2,3,4", "9,10,11,12,13"),
                 B = c("5,6,7,8", "14,15,16,17,18"))
                 
## Grab `list_reduction` from "SOfun"
source("https://raw.githubusercontent.com/mrdwab/SOfun/master/R/list_reduction.R")

## Split the list
df_list <- lapply(df, function(x) type.convert(strsplit(as.character(x), ",", fixed = TRUE)))
df["C"] <- list_reduction(df_list, "+", flatten = TRUE)
df
#               A              B                  C
# 1       1,2,3,4        5,6,7,8       6, 8, 10, 12
# 2 9,10,11,12,13 14,15,16,17,18 23, 25, 27, 29, 31

This uses cSplit from "splitstackshape":

library(splitstackshape)
library(data.table)
cSplit(as.data.table(df, keep.rownames=TRUE), c("A", "B"), ",", "long")[
  , C := A + B][, lapply(.SD, toString), "rn"]
#    rn                 A                  B                  C
# 1:  1        1, 2, 3, 4         5, 6, 7, 8       6, 8, 10, 12
# 2:  2 9, 10, 11, 12, 13 14, 15, 16, 17, 18 23, 25, 27, 29, 31

How can I split a character string in a dataframe into multiple columns

You can use separate() from tidyr

tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left")
#   ID   x1  x2
# 1  1    < 0.1
# 2  2 <NA> 100
# 3  3    A 2.5
# 4  4 <NA> 200

If you absolutely need to remove the NA values, then you can do

tdy <- tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left")
tdy[is.na(tdy)] <- ""

and then we have

tdy
#   ID x1  x2
# 1  1  < 0.1
# 2  2    100
# 3  3  A 2.5
# 4  4    200

R dataframe: How to split by 2 columns and calculate the mean

You can use dplyr for that.

For

library(dplyr)
df %>%
  gather("Col","Numbers", C:length(.)) %>%
  group_by(A, B) %>%
  summarise(mean = mean(Numbers))

Best,

Colin

How to get R to create new column (named from left part of string in old column), and then put right part of string from old column into new column

Starting with

quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE")), class = "data.frame", row.names = c(NA, -5L))

The naive approach would be

data.frame(COLOR = trimws(sub("COLOR:", "", quux$oldColumn1)))
#   COLOR
# 1   RED
# 2   RED
# 3  BLUE
# 4 GREEN
# 5  BLUE

But I'm assuming you have a more generic need. Let's assume that you have some more things to parse out of that, such as

quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE", "SIZE: 1", "SIZE: 3", "SIZE: 5")), class = "data.frame", row.names = c(NA, -8L))
quux
#     oldColumn1
# 1   COLOR: RED
# 2   COLOR: RED
# 3  COLOR: BLUE
# 4 COLOR: GREEN
# 5  COLOR: BLUE
# 6      SIZE: 1
# 7      SIZE: 3
# 8      SIZE: 5

then we can generalize it with

tmp <- strcapture("(.*)\\s*:\\s*(.*)", quux$oldColumn1, list(k="", v=""))
tmp$ign <- ave(rep(1L, nrow(tmp)), tmp$k, FUN = seq_along)
reshape2::dcast(tmp, ign ~ k, value.var = "v")[,-1,drop=FALSE]
#   COLOR SIZE
# 1   RED    1
# 2   RED    3
# 3  BLUE    5
# 4 GREEN <NA>
# 5  BLUE <NA>

Edit: alternative with updated data:

do.call(cbind, lapply(dat, function(X) {
  nm <- sub(":.*", "", X[1])
  out <- data.frame(trimws(sub(".*:", "", X)))
  names(out) <- nm
  out
}))
#   COLOR   SIZE    DESIGNSTYLE
# 1   RED  LARGE         STYLED
# 2   RED MEDIUM ORIGINAL MAKER
# 3  BLUE XLARGE        COUTURE
# 4 GREEN MEDIUM        COUTURE
# 5  BLUE  SMALL         STYLED

Replace strings of numbers separated by commas with the median in R

We can split the 'a' column with strsplit on , followed by zero or more spaces (\\s*), loop over the list, convert to numeric and get the median, assign it to same column

df$a <- sapply(strsplit(df$a, ",\\s*"), function(x) median(as.numeric(x)))
df$a
#[1] 4 6 4 6

Or using tidyverse, we can use separate_rows to split the 'a' column and expand the rows while converting the type', then do a group by median

library(dplyr)
library(tidyr)
df %>% 
     separate_rows(a, convert = TRUE) %>%
     group_by(b) %>% 
     summarise(a = median(a))

split values and then operate with them using R

Try this. Note: Added 'col.names' to suppress default handling of rownames.

x=c("1", "2", "3", "2:3","4","5","3:2")
 datos <- data.frame(1:7, 1:7, x=x)
newframe <- cbind( datos[1:2], 
                 read.table(text= as.character(datos[[3]]), sep=":",
                            fill=TRUE, colClasses="numeric", 
                           col.names=c("V3", "V4")
                           )
                  )

> newframe
  X1.7 X1.7.1 V3 V4
1    1      1  1 NA
2    2      2  2 NA
3    3      3  3 NA
4    4      4  2  3
5    5      5  4 NA
6    6      6  5 NA
7    7      7  3  2

R: Split String into Numeric and Return the Mean as a New Column in a Data Frame