R: split string into numeric and return the mean as a new column in a data frame
You could use sapply
to loop through the list returned by strsplit
, handling each of the list elements:
sapply(strsplit((df$a), split=", "), function(x) mean(as.numeric(x)))
# [1] 2.5 5.0 7.5
Split R string into individual characters
You could use
data.frame(Reduce(rbind, strsplit(df$V1, "")))
This returns
X1 X2 X3 X4 X5 X6
init g g g g c c
X c c c c t t
X.1 t t t t t t
X.2 a a a a a a
or
data.frame(do.call(rbind, strsplit(df$V1, "")))
which returns
X1 X2 X3 X4 X5 X6
1 g g g g c c
2 c c c c t t
3 t t t t t t
4 a a a a a a
Split a column, get the mean of the split columns, and update the result
Another option using data.table
library(data.table)
cols <- c("colA", "colB")
for(j in cols) {
tmp <- vapply(strsplit(test.val[[j]], "-"),
FUN = function(i) mean(as.numeric(i)),
FUN.VALUE = numeric(1))
set(test.val, j = j, value = tmp)
}
test.val
# id colA colB
#1: 1 125 15
#2: 2 200 25
#3: 3 300 10
Given a vector
x <- c("100-150", "200", "300")
the result of strsplit
is a list of character vectors
strsplit(x, "-")
#[[1]]
#[1] "100" "150"
#[[2]]
#[1] "200"
#[[3]]
#[1] "300"
We wrap this into vapply
and calculate the mean for each element after we converted each vector to numeric.
vapply(strsplit(x, "-"), function(x) mean(as.numeric(x)), numeric(1))
# [1] 125 200 300
We use this result to replace every column specified in cols
using data.table
's set
function.
Split columns in a dataframe into a column that contains text not numbers and a column that contains numbers not text in R
Here's a dplyr
solution using regular expression:
library(stringr)
library(dplyr)
df %>%
mutate(
a.text = gsub("(^|\\s)\\d+", "", a),
a.num = str_extract_all(a, "\\d+"),
b.text = gsub("(^|\\s)\\d+", "", b),
b.num = str_extract_all(b, "\\d+")
) %>%
select(c(4:7,3))
a.text a.num b.text b.num c
1 There are programs 5 four 2
2 - adult programs,- youth programs 2, 3 we don't collect this 6
3 25 from us, more from others 5 5
4 8
5 there are a number of programs 2
6 other agencies run our programs NA
R: My data frame has 2 columns that have a string of numbers in each row, is there a way to split the string and add the values of each column?
Here are a couple of approaches.
This uses a function list_reduction
from SOfun.
df <- data.frame(A = c("1,2,3,4", "9,10,11,12,13"),
B = c("5,6,7,8", "14,15,16,17,18"))
## Grab `list_reduction` from "SOfun"
source("https://raw.githubusercontent.com/mrdwab/SOfun/master/R/list_reduction.R")
## Split the list
df_list <- lapply(df, function(x) type.convert(strsplit(as.character(x), ",", fixed = TRUE)))
df["C"] <- list_reduction(df_list, "+", flatten = TRUE)
df
# A B C
# 1 1,2,3,4 5,6,7,8 6, 8, 10, 12
# 2 9,10,11,12,13 14,15,16,17,18 23, 25, 27, 29, 31
This uses cSplit
from "splitstackshape":
library(splitstackshape)
library(data.table)
cSplit(as.data.table(df, keep.rownames=TRUE), c("A", "B"), ",", "long")[
, C := A + B][, lapply(.SD, toString), "rn"]
# rn A B C
# 1: 1 1, 2, 3, 4 5, 6, 7, 8 6, 8, 10, 12
# 2: 2 9, 10, 11, 12, 13 14, 15, 16, 17, 18 23, 25, 27, 29, 31
How can I split a character string in a dataframe into multiple columns
You can use separate()
from tidyr
tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left")
# ID x1 x2
# 1 1 < 0.1
# 2 2 <NA> 100
# 3 3 A 2.5
# 4 4 <NA> 200
If you absolutely need to remove the NA
values, then you can do
tdy <- tidyr::separate(df, x, c("x1", "x2"), " ", fill = "left")
tdy[is.na(tdy)] <- ""
and then we have
tdy
# ID x1 x2
# 1 1 < 0.1
# 2 2 100
# 3 3 A 2.5
# 4 4 200
R dataframe: How to split by 2 columns and calculate the mean
You can use dplyr
for that.
For
library(dplyr)
df %>%
gather("Col","Numbers", C:length(.)) %>%
group_by(A, B) %>%
summarise(mean = mean(Numbers))
Best,
Colin
How to get R to create new column (named from left part of string in old column), and then put right part of string from old column into new column
Starting with
quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE")), class = "data.frame", row.names = c(NA, -5L))
The naive approach would be
data.frame(COLOR = trimws(sub("COLOR:", "", quux$oldColumn1)))
# COLOR
# 1 RED
# 2 RED
# 3 BLUE
# 4 GREEN
# 5 BLUE
But I'm assuming you have a more generic need. Let's assume that you have some more things to parse out of that, such as
quux <- structure(list(oldColumn1 = c("COLOR: RED", "COLOR: RED", "COLOR: BLUE", "COLOR: GREEN", "COLOR: BLUE", "SIZE: 1", "SIZE: 3", "SIZE: 5")), class = "data.frame", row.names = c(NA, -8L))
quux
# oldColumn1
# 1 COLOR: RED
# 2 COLOR: RED
# 3 COLOR: BLUE
# 4 COLOR: GREEN
# 5 COLOR: BLUE
# 6 SIZE: 1
# 7 SIZE: 3
# 8 SIZE: 5
then we can generalize it with
tmp <- strcapture("(.*)\\s*:\\s*(.*)", quux$oldColumn1, list(k="", v=""))
tmp$ign <- ave(rep(1L, nrow(tmp)), tmp$k, FUN = seq_along)
reshape2::dcast(tmp, ign ~ k, value.var = "v")[,-1,drop=FALSE]
# COLOR SIZE
# 1 RED 1
# 2 RED 3
# 3 BLUE 5
# 4 GREEN <NA>
# 5 BLUE <NA>
--
Edit: alternative with updated data:
do.call(cbind, lapply(dat, function(X) {
nm <- sub(":.*", "", X[1])
out <- data.frame(trimws(sub(".*:", "", X)))
names(out) <- nm
out
}))
# COLOR SIZE DESIGNSTYLE
# 1 RED LARGE STYLED
# 2 RED MEDIUM ORIGINAL MAKER
# 3 BLUE XLARGE COUTURE
# 4 GREEN MEDIUM COUTURE
# 5 BLUE SMALL STYLED
Replace strings of numbers separated by commas with the median in R
We can split the 'a' column with strsplit
on ,
followed by zero or more spaces (\\s*
), loop over the list
, convert to numeric
and get the median
, assign it to same column
df$a <- sapply(strsplit(df$a, ",\\s*"), function(x) median(as.numeric(x)))
df$a
#[1] 4 6 4 6
Or using tidyverse
, we can use separate_rows
to split the 'a' column and expand the rows while converting the type', then do a group by median
library(dplyr)
library(tidyr)
df %>%
separate_rows(a, convert = TRUE) %>%
group_by(b) %>%
summarise(a = median(a))
split values and then operate with them using R
Try this. Note: Added 'col.names' to suppress default handling of rownames.
x=c("1", "2", "3", "2:3","4","5","3:2")
datos <- data.frame(1:7, 1:7, x=x)
newframe <- cbind( datos[1:2],
read.table(text= as.character(datos[[3]]), sep=":",
fill=TRUE, colClasses="numeric",
col.names=c("V3", "V4")
)
)
> newframe
X1.7 X1.7.1 V3 V4
1 1 1 1 NA
2 2 2 2 NA
3 3 3 3 NA
4 4 4 2 3
5 5 5 4 NA
6 6 6 5 NA
7 7 7 3 2
Related Topics
Understanding Bandwidth Smoothing in Ggplot2
Increasing Whitespace Between Legend Items in Ggplot2
Stargazer Output Appears Below Text - Rmarkdown to PDF
Rolling by Group in Data.Table R
Adding a New Column to Matrix Error
Character "|" in Strsplit Function (Vertical Bar/Pipe)
Merge Data.Frames with Duplicates
Display Selected Folder Path in Shiny
Control Padding of Grobs Added to Patchwork
How to Highlight Area Between Two Lines? Ggplot
Create a Variable That Identifies the Original Data.Frame After Rbind Command in R
R: Split String into Numeric and Return the Mean as a New Column in a Data Frame
Calculating Inter-Purchase Time in R
How to Pass Multiple Group_By Arguments and a Dynamic Variable Argument to a Dplyr Function
R Shiny - Ui.R Seems to Not Recognize a Dataframe Read by Server.R
R: Get the Min/Max of Each Item of a Vector Compared to Single Value
Is There a Package or Technique Availabe for Calculating Large Factorials in R