Using strsplit within dplyr::mutate (without tibble::data_frame) raises Evaluation error: non-character argument
The problem you encountered is because the string was automatically converted in a factor
, therefore you cannot apply strsplit()
to a non-string object. My solution simply convert the MediaName
into a string
type.
require(dplyr)
df <- df %>%
dplyr::mutate(MediaName = as.character(levels(df$MediaName))[df$MediaName]) %>%
dplyr::mutate(TrialId = ifelse(Phase == "Familiarisation",
sapply(strsplit(MediaName, "_"), "[", 2),
sapply(strsplit(MediaName, "_"), "[", 1)))
solution<- c("A1", "B2", "A2", "B1", "A1", "B2", "A2", "B1", "HC", "TC", "RC")
identical(solution, df$TrialId)
[1] TRUE
Create new column with dplyr mutate and substring of existing column
You could use stringr::str_extract
:
library(stringr)
df %>%
dplyr::mutate(new_id = str_extract(id, "[^_]+$"))
#> id x new_id
#> 1 abcd_123_ABC 1 ABC
#> 2 abc_5234_NHYK 2 NHYK
The regex says, match one or more (+
) of the characters that aren't _
(the negating [^ ]
), followed by end of string ($
).
str_split for column values and then turn it into vector in R
We can loop-apply a function to unlist
the columns to have a list of vectors.
library(dplyr)
dat %>%
separate_rows(everything(), sep = "/")%>%
pivot_wider(names_from = ID, values_from = gene_ids, values_fn = list)%>%
lapply(unlist)
$A
[1] "101739" "20382" "13006" "212377" "114714" "66622" "140917"
$B
[1] "75717" "103573" "14852" "18141" "12567" "26429" "20842" "17975" "12545"
How to split a number into its digits in R
demog <- data.frame(socioEcon = c(12,34))
library(dplyr)
demog %>%
mutate(socioEcon = as.character(socioEcon)) %>%
tidyr::separate(socioEcon, c("A", "B"), sep = 1, remove = F)
socioEcon A B
1 12 1 2
2 34 3 4
Adding multiple columns in a dplyr mutate call
You can use separate()
from tidyr
in combination with dplyr
:
tst %>% separate(y, c("y1", "y2"), sep = "\\.", remove=FALSE)
x y y1 y2
1 1 BAR.baz BAR baz
2 2 FOO.foo FOO foo
3 3 BAZ.baz BAZ baz
4 4 BAZ.foo BAZ foo
5 5 BAZ.bar BAZ bar
6 6 FOO.baz FOO baz
7 7 BAR.bar BAR bar
8 8 BAZ.baz BAZ baz
9 9 FOO.bar FOO bar
10 10 BAR.foo BAR foo
Setting remove=TRUE
will remove column y
mutate two new columns based on splitting another
Dplyr doesn't handle elements of list columns in the same way as it does vector columns. So pass dplyr::rowwise()
before you mutate/unlist:
library(dplyr)
library(stringr)
orig <- tibble(sourceMedium = c('apples / pears', 'red / blue', 'green / grey',
'wet / dry', 'ear / nose', 'mac / linux'))
wrangled <- orig %>%
dplyr::mutate(tempcol = stringr::str_split(sourceMedium, ' / ')) %>%
dplyr::rowwise() %>%
dplyr::mutate(source = unlist(tempcol)[1], medium = unlist(tempcol)[2]) %>%
dplyr::select(-tempcol)
wrangled
Gives the following output:
Source: local data frame [6 x 3]
Groups: <by row>
# A tibble: 6 × 3
sourceMedium source medium
<chr> <chr> <chr>
1 apples / pears apples pears
2 red / blue red blue
3 green / grey green grey
4 wet / dry wet dry
5 ear / nose ear nose
6 mac / linux mac linux
>
Dplyr mutate duplicates list values when trying to index
I would recommend using the package stringr
, which is part of the tidyverse, and thus works seamlessly with dplyr.
data %>% mutate(Y = str_extract(date, "^\\d{4}"),
M = str_extract(date, "[A-Za-z]{3}"))
# index date R D Y M
# 1 1 2018 Jan 2-7 35 50 2018 Jan
# 2 2 2017 Dec 4-11 41 45 2017 Dec
# 3 3 2017 Nov 2-8 39 46 2017 Nov
# 4 4 2017 Oct 5-11 39 46 2017 Oct
# 5 5 2017 Sep 6-10 45 47 2017 Sep
# 6 6 2017 Aug 2-6 43 46 2017 Aug
str_extract
allows you to extract substrings based on a pattern -- here, we use two different regular expressions. The first matches 4 consecutive digits (\\d{4}
) at the start of the string (^
). The second expression simply takes 3 consecutive letters ([A-Za-z]
), which is safe given the structure of your dates.
If you'd still like to use strsplit
with mutate
, however, you can add a call to rowwise
:
data %>% rowwise() %>% mutate(Y = strsplit(date, split = " ")[[1]][1],
M = strsplit(date, split = " ")[[1]][2])
R Dplyr and string values, how to split and get the second element? vapply/sapply
Welcome to SO. it can be done in multiple ways. Try this:
## some data
df <- data.frame(height=c(11,12),time = c("1999-9-9 00:00:00","1999-9-9 00:00:02"),stringsAsFactors = FALSE)
df
#> height time
#> 1 11 1999-9-9 00:00:00
#> 2 12 1999-9-9 00:00:02
## In base R
df2<- df
df2$hms <- do.call(rbind,strsplit(df2$time," "))[,2]
df2[df2$hms=="00:00:00",]
#> height time hms
#> 1 11 1999-9-9 00:00:00 00:00:00
## In tidyverse
library(dplyr)
df3 <- df %>%
mutate(hms = gsub(".*(..:..:..).*","\\1",time)) %>%
filter(hms == "00:00:00")
df3
#> height time hms
#> 1 11 1999-9-9 00:00:00 00:00:00
Created on 2018-10-04 by the reprex package (v0.2.1)
Related Topics
Extract Survival Probabilities in Survfit by Groups
Index Element from List in Rcpp
Check If Character String Is a Valid Color Representation
Removing a List of Columns from a Data.Frame Using Subset
How Does One Aggregate and Summarize Data Quickly
Remove White Space Between Plots and Table in Grid.Arrange
Change Thickness Median Line Geom_Boxplot()
Multiple Filled.Contour Plots in One Graph Using with Par(Mfrow=C())
Write a Data Frame to CSV File Without Column Header in R
R Function Prcomp Fails with Na's Values Even Though Na's Are Allowed
How to Replace Numeric Codes with Value Labels from a Lookup Table
How to Replace Outliers with the 5Th and 95Th Percentile Values in R
Remove Duplicate Values Based on 2 Columns
Faster Way to Compare Rows in a Data Frame
Change Color Actionbutton Shiny R