Using Strsplit and Subset in Dplyr and Mutate

Using strsplit within dplyr::mutate (without tibble::data_frame) raises Evaluation error: non-character argument

The problem you encountered is because the string was automatically converted in a factor, therefore you cannot apply strsplit() to a non-string object. My solution simply convert the MediaName into a string type.

require(dplyr)    
df <- df %>%
dplyr::mutate(MediaName = as.character(levels(df$MediaName))[df$MediaName]) %>%
dplyr::mutate(TrialId = ifelse(Phase == "Familiarisation",
sapply(strsplit(MediaName, "_"), "[", 2),
sapply(strsplit(MediaName, "_"), "[", 1)))

solution<- c("A1", "B2", "A2", "B1", "A1", "B2", "A2", "B1", "HC", "TC", "RC")
identical(solution, df$TrialId)
[1] TRUE

Create new column with dplyr mutate and substring of existing column

You could use stringr::str_extract:

library(stringr)

df %>%
dplyr::mutate(new_id = str_extract(id, "[^_]+$"))

#> id x new_id
#> 1 abcd_123_ABC 1 ABC
#> 2 abc_5234_NHYK 2 NHYK

The regex says, match one or more (+) of the characters that aren't _ (the negating [^ ]), followed by end of string ($).

str_split for column values and then turn it into vector in R

We can loop-apply a function to unlist the columns to have a list of vectors.

library(dplyr)

dat %>%
separate_rows(everything(), sep = "/")%>%
pivot_wider(names_from = ID, values_from = gene_ids, values_fn = list)%>%
lapply(unlist)

$A
[1] "101739" "20382" "13006" "212377" "114714" "66622" "140917"

$B
[1] "75717" "103573" "14852" "18141" "12567" "26429" "20842" "17975" "12545"

How to split a number into its digits in R

demog <- data.frame(socioEcon = c(12,34))

library(dplyr)
demog %>%
mutate(socioEcon = as.character(socioEcon)) %>%
tidyr::separate(socioEcon, c("A", "B"), sep = 1, remove = F)

socioEcon A B
1 12 1 2
2 34 3 4

Adding multiple columns in a dplyr mutate call

You can use separate() from tidyr in combination with dplyr:

tst %>% separate(y, c("y1", "y2"), sep = "\\.", remove=FALSE)

x y y1 y2
1 1 BAR.baz BAR baz
2 2 FOO.foo FOO foo
3 3 BAZ.baz BAZ baz
4 4 BAZ.foo BAZ foo
5 5 BAZ.bar BAZ bar
6 6 FOO.baz FOO baz
7 7 BAR.bar BAR bar
8 8 BAZ.baz BAZ baz
9 9 FOO.bar FOO bar
10 10 BAR.foo BAR foo

Setting remove=TRUE will remove column y

mutate two new columns based on splitting another

Dplyr doesn't handle elements of list columns in the same way as it does vector columns. So pass dplyr::rowwise() before you mutate/unlist:

library(dplyr)
library(stringr)

orig <- tibble(sourceMedium = c('apples / pears', 'red / blue', 'green / grey',
'wet / dry', 'ear / nose', 'mac / linux'))

wrangled <- orig %>%
dplyr::mutate(tempcol = stringr::str_split(sourceMedium, ' / ')) %>%
dplyr::rowwise() %>%
dplyr::mutate(source = unlist(tempcol)[1], medium = unlist(tempcol)[2]) %>%
dplyr::select(-tempcol)
wrangled

Gives the following output:

Source: local data frame [6 x 3]
Groups: <by row>

# A tibble: 6 × 3
sourceMedium source medium
<chr> <chr> <chr>
1 apples / pears apples pears
2 red / blue red blue
3 green / grey green grey
4 wet / dry wet dry
5 ear / nose ear nose
6 mac / linux mac linux
>

Dplyr mutate duplicates list values when trying to index

I would recommend using the package stringr, which is part of the tidyverse, and thus works seamlessly with dplyr.

data %>% mutate(Y = str_extract(date, "^\\d{4}"),
M = str_extract(date, "[A-Za-z]{3}"))

# index date R D Y M
# 1 1 2018 Jan 2-7 35 50 2018 Jan
# 2 2 2017 Dec 4-11 41 45 2017 Dec
# 3 3 2017 Nov 2-8 39 46 2017 Nov
# 4 4 2017 Oct 5-11 39 46 2017 Oct
# 5 5 2017 Sep 6-10 45 47 2017 Sep
# 6 6 2017 Aug 2-6 43 46 2017 Aug

str_extract allows you to extract substrings based on a pattern -- here, we use two different regular expressions. The first matches 4 consecutive digits (\\d{4}) at the start of the string (^). The second expression simply takes 3 consecutive letters ([A-Za-z]), which is safe given the structure of your dates.

If you'd still like to use strsplit with mutate, however, you can add a call to rowwise:

data %>% rowwise() %>% mutate(Y = strsplit(date, split = " ")[[1]][1],
M = strsplit(date, split = " ")[[1]][2])

R Dplyr and string values, how to split and get the second element? vapply/sapply

Welcome to SO. it can be done in multiple ways. Try this:

## some data
df <- data.frame(height=c(11,12),time = c("1999-9-9 00:00:00","1999-9-9 00:00:02"),stringsAsFactors = FALSE)

df
#> height time
#> 1 11 1999-9-9 00:00:00
#> 2 12 1999-9-9 00:00:02

## In base R

df2<- df
df2$hms <- do.call(rbind,strsplit(df2$time," "))[,2]
df2[df2$hms=="00:00:00",]
#> height time hms
#> 1 11 1999-9-9 00:00:00 00:00:00

## In tidyverse

library(dplyr)
df3 <- df %>%
mutate(hms = gsub(".*(..:..:..).*","\\1",time)) %>%
filter(hms == "00:00:00")

df3
#> height time hms
#> 1 11 1999-9-9 00:00:00 00:00:00

Created on 2018-10-04 by the reprex package (v0.2.1)



Related Topics



Leave a reply



Submit