Stacking Multiple Columns Using Pivot Longer in R

Pivoting multiple sets of columns using pivot_longer in R

The brackets around the matched pattern represents that we are capturing that pattern as a group. In the below code, we capture one or more lower-case letters ([a-z]+) followed by a _ (not inside the brackets, thus it is removed) and the second capture group matches one or more digits (\\d+), and this will be matched with the corresponding values of names_to - i.e. .value represents the value of the column, thus we get the columns 'x' and 'y' with the values and the second will be a new column names that returs the suffix digits of the column names i.e. 'time'

library(tidyr)
pivot_longer(data, cols = -aid, names_to = c(".value", "time"),
names_pattern = "^([a-z]+)_(\\d+)")

-output

# A tibble: 20 × 4
aid time x y
<int> <chr> <dbl> <dbl>
1 1 1 -0.823 0.954
2 1 2 0.937 2.30
3 2 1 0.644 0.513
4 2 2 -0.281 0.0256
5 3 1 -1.11 0.0575
6 3 2 -0.248 -0.512
7 4 1 -1.04 0.578
8 4 2 -0.414 0.609
9 5 1 1.29 1.60
10 5 2 -1.78 0.759
11 1 1 -0.578 0.0430
12 1 2 -1.00 0.868
13 2 1 0.0900 -2.10
14 2 2 -0.795 -0.434
15 3 1 0.143 -1.13
16 3 2 0.420 0.145
17 4 1 -0.252 0.236
18 4 2 1.56 -0.0472
19 5 1 -0.256 -1.21
20 5 2 0.624 1.02

In the OP's code, there are two groups ((.) and (.)) and only one element in names_to, thus it fails along with the fact that there is _ between the 'x', 'y' and the digit. Also, by default, the names_pattern will be in regex mode and some characters are thus in metacharacter mode i.e. . represents any character and not the literal .

Pivot longer in dplyr for mutiple value columns

How about like this:

  library(tidyverse)
dat <- structure(list(total = c(9410, 12951.1794783802),
op = c(3896.66666666667, 6976.57663230241),
ox = c(2200, 4920.84776902887),
ox15 = c(183.333333333333, 694.262648008611),
ox30 = c(133.333333333333, 368.090117767537),
hy = c(283.333333333333, 1146.14924596984),
hy10 = c(NA, 433.993925588459)),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame"))

dat %>%
mutate(obs = 1:n()) %>%
pivot_longer(-obs, names_to="var", values_to="vals") %>%
pivot_wider(names_from="obs", values_from="vals", names_prefix="obs_")
#> # A tibble: 7 × 3
#> var obs_1 obs_2
#> <chr> <dbl> <dbl>
#> 1 total 9410 12951.
#> 2 op 3897. 6977.
#> 3 ox 2200 4921.
#> 4 ox15 183. 694.
#> 5 ox30 133. 368.
#> 6 hy 283. 1146.
#> 7 hy10 NA 434.

Created on 2022-02-03 by the reprex package (v2.0.1)

Stacking multiple columns using pivot longer in R

Would first rename columns and insert underscore right before number, then use that as separator in pivot_longer.

library(tidyverse)

df %>%
rename_at(-c(1:2), ~ str_replace(., "(\\w+)(\\d)", "\\1_\\2")) %>%
pivot_longer(cols = -c(1:2), names_to = c(".value", "num"), names_sep = "_")

Edit (2/7/20):

With your updated dataset, it appears that some of the variable column names don't have a number at the end. We can add 0 for those.

Also, I assume you want: group, ID, subect, x, y, speed that are repeated (with the first group in column 5 separated from its related variables in columns 7-11).

df1 %>%
rename_at(c(5,7:11), ~ paste0(., "0")) %>%
rename_at(-c(1:4, 6, 24), ~ str_replace(., "(\\w+)(\\d+)", "\\1_\\2")) %>%
pivot_longer(cols = -c(1:4, 6, 24), names_to = c(".value", "val"), names_sep = "_")

Output (Revised):

# A tibble: 18 x 13
time state reference num X speed.group2 val group ID subect x y speed
<dbl> <fct> <chr> <chr> <lgl> <lgl> <chr> <chr> <int> <int> <int> <int> <chr>
1 0 Alive;: 1880439 1 NA NA 0 1 1 21 3514 -1644 5.23
2 0 Alive;: 1880439 1 NA NA 1 0 13 9 882 -1468 1.70
3 0 Alive;: 1880439 1 NA NA 2 4 24 -1 5550 4400 0.00
4 0 Alive;: 1880439 NA NA NA 0 NA NA NA NA NA NA
5 0 Alive;: 1880439 NA NA NA 1 NA NA NA NA NA NA
6 0 Alive;: 1880439 NA NA NA 2 NA NA NA NA NA NA
7 0 Alive;: 1880439 2 NA NA 0 1 2 7 2807 -510 3.24
8 0 Alive;: 1880439 2 NA NA 1 4 14 -1 5550 4400 0.00
9 0 Alive;: 1880439 2 NA NA 2 0 25 1 5031 -74 0.54
10 0 Alive;: 1880439 3 NA NA 0 4 4 -1 5550 4400 0.00
11 0 Alive;: 1880439 3 NA NA 1 1 15 13 3004 1633 3.06
12 0 Alive;: 1880439 3 NA NA 2 1 26 18 3936 190 1.06
13 0 Alive;: 1880439 4 NA NA 0 0 5 2 3956 1297 2.31
14 0 Alive;: 1880439 4 NA NA 1 1 16 14 761 559 2.92
15 0 Alive;: 1880439 4 NA NA 2 0 27 5 3972 686 0.95
16 0 Alive;: 1880439 5 NA NA 0 0 6 6 3686 -55 3.57
17 0 Alive;: 1880439 5 NA NA 1 0 17 11 3317 1443 3.30
18 0 Alive;: 1880439 5 NA NA 2 0 28 10 3623 356 2.49

pivot longer with multiple columns and values

You can use names_pattern here.

tidyr::pivot_longer(df, 
cols = -c(id, type),
names_to = c('.value', 'year'),
names_pattern = '([a-z]+)(\\d+)')

# A tibble: 18 x 5
# type id year age bool
# * <int> <int> <chr> <int> <int>
# 1 1 1 2000 20 1
# 2 1 1 2001 21 1
# 3 1 1 2002 22 1
# 4 1 2 2000 35 2
# 5 1 2 2001 36 2
# 6 1 2 2002 37 2
# 7 1 3 2000 24 1
# 8 1 3 2001 25 1
# 9 1 3 2002 26 1
#10 2 1 2000 32 2
#11 2 1 2001 33 2
#12 2 1 2002 34 2
#13 2 2 2000 66 2
#14 2 2 2001 67 2
#15 2 2 2002 68 2
#16 2 3 2000 14 1
#17 2 3 2001 15 1
#18 2 3 2002 16 1

data

df <- structure(list(type = c(1L, 1L, 1L, 2L, 2L, 2L), id = c(1L, 2L, 
3L, 1L, 2L, 3L), age2000 = c(20L, 35L, 24L, 32L, 66L, 14L), age2001 = c(21L,
36L, 25L, 33L, 67L, 15L), age2002 = c(22L, 37L, 26L, 34L, 68L,
16L), bool2000 = c(1L, 2L, 1L, 2L, 2L, 1L), bool2001 = c(1L,
2L, 1L, 2L, 2L, 1L), bool2002 = c(1L, 2L, 1L, 2L, 2L, 1L)),
class = "data.frame", row.names = c(NA, -6L))

Is there way to pivot_longer to multiple values columns in R?

We don't need multiple calls if we specify the names_to as a vector of values i.e. .value - returns the value of the columns and 'group' the column with the suffix of column names. Here, we use names_sep as . to split at the .

library(tidyr)
pivot_longer(df, cols = -ids, names_to = c(".value", "group"),
names_sep = "\\.")

-output

# A tibble: 4 × 4
ids group mean se
<chr> <chr> <int> <int>
1 protein1 group1 982 3
2 protein1 group2 657 7
3 protein2 group1 663 9
4 protein2 group2 215 1

NOTE: values are different as sample was used in creation of input data without a set.seed specified

Pivot_longer for multiple columns of repeated measurements data

This probably adds nothing new to the already posted solutions, the only difference is the regex used for the names_pattern argument.

  • If you notice some of your column names are separated by one _ whereas others are separated by two _. \\w+ captures any word character, now if I specify we have a number after this with \\d+ as in time3 in time3_age, we tell pivot_longer to store this part of the column names corresponding to time3 in time column. Then the rest of the column names are used for the variable names we are trying to measure line age, systolicBP and med_hypt.
  • It should be noted that if we use \\w+\\d+ instead of \\w+ only the rest will be captured as column names whether it is med_hypt with underscore or systolicBP without underscore. But if we use only \\w+ it could also capture med and the resulting column will be hypt instead of med_hypt.
  • In the end since I defined two capture groups, I have to define either names_pattern or names_sep in a way to specify how each of them are defined and separated.
library(dplyr)

wide_data %>%
pivot_longer(!c(id, sex), names_to = c("time", ".value"),
names_pattern = "(\\w+\\d+)_(\\w+)")

# A tibble: 30 x 6
id sex time age systolicBP med_hypt
<dbl> <fct> <chr> <dbl> <dbl> <dbl>
1 12002 women time1 71.2 102 0
2 12002 women time2 74.2 NA 0
3 12002 women time3 78 NA 0
4 17001 men time1 67.9 152 0
5 17001 men time2 69.2 146 0
6 17001 men time3 74.2 160. 0
7 17002 women time1 66.5 NA 0
8 17002 women time2 67.8 NA 0
9 17002 women time3 72.8 NA 0
10 42001 men time1 57.7 170 0
# ... with 20 more rows

r - Pivot Longer distributing values into multiple columns

using pivot_longer:

sampleData %>%
pivot_longer(-ID, names_to='.value', names_pattern='([A-Za-z]+)', values_drop_na=TRUE)

# A tibble: 5 x 4
ID ExamCode ExamGrade AdminYear
<dbl> <dbl> <dbl> <dbl>
1 123 4 4 22
2 123 26 5 22
3 456 26 3 22
4 456 83 3 21
5 789 26 5 22

Pivot data into two different columns simultaneously using pivot_longer() in R?

Edit

Turns out, you can do it in one pivot_longer:

df %>% 
pivot_longer(-id,
names_to = c("variable", ".value"),
names_pattern = "(.*)\\.(.*)")%>%
rename(activation = act, fixation = fix)

with the same result.


Don't know how to do it in one go, but you could use

library(tidyr)
library(dplyr)

df %>%
pivot_longer(-id,
names_to = c("variable", "class"),
names_pattern = "(.*)\\.(.*)") %>%
pivot_wider(names_from = "class") %>%
rename(activation = act, fixation = fix)

This returns

# A tibble: 4 x 4
id variable activation fixation
<dbl> <chr> <dbl> <dbl>
1 1 v1 0.4 1
2 1 v2 0.5 0
3 2 v1 0.8 0
4 2 v2 0.7 1


Related Topics



Leave a reply



Submit