Tidyverse Pivot_Longer Several Sets of Columns, But Avoid Intermediate Mutate_Wider Steps

tidyverse pivot_longer several sets of columns, but avoid intermediate mutate_wider steps

You can use :

tidyr::pivot_longer(dat, cols = -id, 
names_to = c('.value', 'brand'),
names_sep = "_")


# id brand Q1r1 Q1r2
# <chr> <chr> <dbl> <dbl>
#1 A pepsi 1 1
#2 A cola 0 0
#3 B pepsi 0 1
#4 B cola 0 1
#5 C pepsi 1 1
#6 C cola 1 1

Pivot_wider with multiple (parallel) columns

library(dplyr)
library(tidyr)
pivot_longer(
df, -c(NAME, ISIN),
names_pattern = "(.*)([0-9])$", names_to = c("segm", ".value")
) %>%
rename(region = "0", sales= "1")
# # A tibble: 4 x 5
# NAME ISIN segm region sales
# <chr> <chr> <chr> <chr> <dbl>
# 1 APPLE US0378331005 WC1960 United States 109197000
# 2 APPLE US0378331005 WC1961 Other Foreign 125010000
# 3 MICROSOFT US5949181045 WC1960 United States 83953000
# 4 MICROSOFT US5949181045 WC1961 Other countries 84135000

(You can add %>% mutate(segm = gsub(".*(.)$", "\\1", segm)) to clean up segm, if you need/want.)

Stack dataframe columns with two distinct suffix into two columns, preferably using tidyverse

We don't need both pivot_longer and pivot_wider. it can be done within pivot_longer itself by specifying the names_to and the names_sep argument

library(dplyr)
library(tidyr)
test.tbl %>%
pivot_longer(cols = everything(), names_to = c('grp', '.value'),
names_sep = "_") %>%
select(-grp)

-output

# A tibble: 40 x 2
# diff avg
# <dbl> <dbl>
# 1 -0.626 1.51
# 2 0.919 1.36
# 3 -0.165 0.398
# 4 2.40 0.476
# 5 0.184 0.390
# 6 0.782 -0.103
# 7 -0.253 -0.612
# 8 -0.0392 -0.710
# 9 -0.836 -0.621
#10 0.0746 0.388
# … with 30 more rows

complex(ish) pivot_longer in R

As deschen recommended, I played around with the regex a bit and this pivot longer call does work as expected. (and is a bit cleaner than having to manually create the pivot_longer_spec and also works if there are an unequal amount of mem_was_there_x and new_number_yn_y (it will just insert missings, where applicable)

df1 %>% 
pivot_longer(
cols = c(matches("^mem_was_there"), matches("^new_number_yn")),
names_to = c('.value', 'nr'),
names_pattern = "([A-Za-z_]+_)([0-9]*)")

How to use Pivot_longer to reshape from wide-type data to long-type data with multiple variables

You can try :

tidyr::pivot_longer(df, cols = -ID_IE, 
names_to = c('.value', 'grade'),
names_pattern = '(.*)(\\d+)')

# A tibble: 8 x 4
# ID_IE grade BLS_tchrG ELS_tchrG
# <dbl> <chr> <dbl> <dbl>
#1 2135 2 1 1
#2 2135 7 1 1
#3 2101 2 0 0
#4 2101 7 2 0
#5 2103 2 0 0
#6 2103 7 3 0
#7 2111 2 1 1
#8 2111 7 4 1

data

Tried on this data :

df <- data.frame(ID_IE = c(2135, 2101, 2103, 2111), BLS_tchrG2 = c(1, 0, 0, 1), 
BLS_tchrG7 = 1:4,
ELS_tchrG2 = c(1, 0, 0, 1), ELS_tchrG7 = c(1, 0, 0, 1))

Long to wide format using variable names

We can usepivor_longer %>% pivot_wider. separateis not needed if we set the appropriate parameters to pivor_longer.

library(tidyr)

dataset %>%
pivot_longer(cols = matches('time\\d+$'), names_to = c('sport', 'time'), names_pattern = '(.*)\\.(.*)') %>%
pivot_wider(names_from = sport, values_from = value)

# A tibble: 15 × 5
id time basketball volleyball vollyeball
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 time1 2 2 NA
2 1 time2 3 3 NA
3 1 time3 1 NA 1
4 2 time1 5 3 NA
5 2 time2 4 4 NA
6 2 time3 8 NA 8
7 3 time1 4 4 NA
8 3 time2 5 3 NA
9 3 time3 4 NA 12
10 4 time1 3 0 NA
11 4 time2 3 1 NA
12 4 time3 3 NA 2
13 5 time1 3 1 NA
14 5 time2 2 3 NA
15 5 time3 1 NA 3

Achieving the equivalent of rbind using tidyr

We can use pivot_longer where we specify the names_sep as _ and the names_to with ".value" and a grouping name

library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('var'),
names_to = c('grp', '.value'), names_sep="_")
# group day grp mean sd
# <chr> <int> <chr> <dbl> <dbl>
# 1 a 1 var1 -0.63 5.49
# 2 a 1 var2 -0.62 5.82
# 3 a 2 var1 0.18 5.74
# 4 a 2 var2 -2.21 5.59
# 5 a 3 var1 -0.84 5.58
# 6 a 3 var2 1.12 5.92
# 7 b 1 var1 1.6 4.69
# 8 b 1 var2 -0.04 5.78
# 9 b 2 var1 0.33 6.51
#10 b 2 var2 -0.02 5.07
#11 b 3 var1 -0.82 5.39
#12 b 3 var2 0.94 3.01

we could remove the 'grp' column later

df %>% 
pivot_longer(cols = starts_with('var'),
names_to = c('grp', '.value'), names_sep="_") %>%
select(-grp)


Related Topics



Leave a reply



Submit