Pivot_Longer into Multiple Columns

pivot_longer into multiple columns

Here is solution following a similar method that @Fnguyen used but using the newer pivot_longer and pivot_wider construct:

library(dplyr)
library(tidyr)

longer<-pivot_longer(dat, cols=-1, names_pattern = "(.*)(..)$", names_to = c("limit", "name")) %>%
mutate(limit=ifelse(limit=="", "value", limit))

answer <-pivot_wider(longer, id_cols = c(group, name), names_from = limit, values_from = value, names_repair = "check_unique")

Most of the selecting, separating, mutating and renaming is taking place within the pivot function calls.

Update:
This regular expressions "(.*)(..)$" means:

( ) ( ) Look for two parts,

(.*) the first part should have zero or more characters

(..) the second part should have just 2 characters at the “$” end of the string

Is there way to pivot_longer to multiple values columns in R?

We don't need multiple calls if we specify the names_to as a vector of values i.e. .value - returns the value of the columns and 'group' the column with the suffix of column names. Here, we use names_sep as . to split at the .

library(tidyr)
pivot_longer(df, cols = -ids, names_to = c(".value", "group"),
names_sep = "\\.")

-output

# A tibble: 4 × 4
ids group mean se
<chr> <chr> <int> <int>
1 protein1 group1 982 3
2 protein1 group2 657 7
3 protein2 group1 663 9
4 protein2 group2 215 1

NOTE: values are different as sample was used in creation of input data without a set.seed specified

Pivot_longer for multiple columns of repeated measurements data

This probably adds nothing new to the already posted solutions, the only difference is the regex used for the names_pattern argument.

  • If you notice some of your column names are separated by one _ whereas others are separated by two _. \\w+ captures any word character, now if I specify we have a number after this with \\d+ as in time3 in time3_age, we tell pivot_longer to store this part of the column names corresponding to time3 in time column. Then the rest of the column names are used for the variable names we are trying to measure line age, systolicBP and med_hypt.
  • It should be noted that if we use \\w+\\d+ instead of \\w+ only the rest will be captured as column names whether it is med_hypt with underscore or systolicBP without underscore. But if we use only \\w+ it could also capture med and the resulting column will be hypt instead of med_hypt.
  • In the end since I defined two capture groups, I have to define either names_pattern or names_sep in a way to specify how each of them are defined and separated.
library(dplyr)

wide_data %>%
pivot_longer(!c(id, sex), names_to = c("time", ".value"),
names_pattern = "(\\w+\\d+)_(\\w+)")

# A tibble: 30 x 6
id sex time age systolicBP med_hypt
<dbl> <fct> <chr> <dbl> <dbl> <dbl>
1 12002 women time1 71.2 102 0
2 12002 women time2 74.2 NA 0
3 12002 women time3 78 NA 0
4 17001 men time1 67.9 152 0
5 17001 men time2 69.2 146 0
6 17001 men time3 74.2 160. 0
7 17002 women time1 66.5 NA 0
8 17002 women time2 67.8 NA 0
9 17002 women time3 72.8 NA 0
10 42001 men time1 57.7 170 0
# ... with 20 more rows

Pivoting multiple sets of columns using pivot_longer in R

The brackets around the matched pattern represents that we are capturing that pattern as a group. In the below code, we capture one or more lower-case letters ([a-z]+) followed by a _ (not inside the brackets, thus it is removed) and the second capture group matches one or more digits (\\d+), and this will be matched with the corresponding values of names_to - i.e. .value represents the value of the column, thus we get the columns 'x' and 'y' with the values and the second will be a new column names that returs the suffix digits of the column names i.e. 'time'

library(tidyr)
pivot_longer(data, cols = -aid, names_to = c(".value", "time"),
names_pattern = "^([a-z]+)_(\\d+)")

-output

# A tibble: 20 × 4
aid time x y
<int> <chr> <dbl> <dbl>
1 1 1 -0.823 0.954
2 1 2 0.937 2.30
3 2 1 0.644 0.513
4 2 2 -0.281 0.0256
5 3 1 -1.11 0.0575
6 3 2 -0.248 -0.512
7 4 1 -1.04 0.578
8 4 2 -0.414 0.609
9 5 1 1.29 1.60
10 5 2 -1.78 0.759
11 1 1 -0.578 0.0430
12 1 2 -1.00 0.868
13 2 1 0.0900 -2.10
14 2 2 -0.795 -0.434
15 3 1 0.143 -1.13
16 3 2 0.420 0.145
17 4 1 -0.252 0.236
18 4 2 1.56 -0.0472
19 5 1 -0.256 -1.21
20 5 2 0.624 1.02

In the OP's code, there are two groups ((.) and (.)) and only one element in names_to, thus it fails along with the fact that there is _ between the 'x', 'y' and the digit. Also, by default, the names_pattern will be in regex mode and some characters are thus in metacharacter mode i.e. . represents any character and not the literal .

Using pivot_longer with multiple column classes

We could use names_pattern after rearranging the substring in column names

library(dplyr)
library(tidyr)
library(stringr)
df_wide %>%
# rename the columns by rearranging the digits at the end
# "_(\\d+)(_.*)" - captures the digits (\\d+) after the _
# and the rest of the characters (_.*)
# replace with the backreference (\\2, \\1) of captured groups rearranged
rename_with(~ str_replace(., "_(\\d+)(_.*)", "\\2_\\1"), -resp_id) %>%
pivot_longer(cols = -resp_id, names_to = c( ".value", "question_number"),
names_pattern = "(.*)_(\\d+$)")

-output

# A tibble: 6 × 4
resp_id question_number question_info question_answer
<dbl> <chr> <chr> <dbl>
1 1 1 "What is your eye color?" 1
2 1 2 "What is your hair color?" 2
3 2 1 "Are you over 6 ft tall?" 1
4 2 2 "" NA
5 3 1 "What is your hair color?" 0
6 3 2 "Are you under 40?" 1

pivot_longer into several pairs of columns

With tidyverse, we can pivot on the two sets of columns that starts with belief and norm. We can then use regex to split into groups according to the first underscore (since some column names have multiple underscores). Essentially, we are saying to put belief or norm (the first group in the column name) into their own columns (i.e., .value), then the second part of the group (i.e., animal names) are put into one column named animal.

library(tidyverse)

df_raw %>%
pivot_longer(cols = c(starts_with("belief"), starts_with("norm")),
names_to = c('.value', 'animal'),
names_pattern = '(.*?)_(.*)') %>%
rename(belief_rating = belief, norm_rating = norm)

Output

  id      age gender animal    belief_rating norm_rating
<chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 b2x8 41 2 dog 1 10
2 b2x8 41 2 bull_frog 4 4
3 b2x8 41 2 fish 3 2
4 m89w 19 1 dog 3 3
5 m89w 19 1 bull_frog 6 1
6 m89w 19 1 fish 2 2
7 32x8 38 3 dog 1 8
8 32x8 38 3 bull_frog 5 9
9 32x8 38 3 fish 2 1

pivot_longer multiple variables of different kinds

In this case one has to use names_to combined with names_pattern:

library(dplyr)
library(tidyr)
> head(x,3)
case X1990 flag.1990 X2000 flag.2000
1 1 0.2772497942 a 0.1751129 c
2 2 0.0005183129 b 0.4407503 d
3 3 0.5106083730 a 0.9071830 c
> x %>%
pivot_longer(cols = -case,
names_to = c(".value", "year"),
names_pattern = "([^\\.]*)\\.*(\\d{4})")
# A tibble: 20 x 4
case year X flag
<int> <chr> <dbl> <chr>
1 1 1990 0.277 a
2 1 2000 0.175 c
3 2 1990 0.000518 b
4 2 2000 0.441 d
5 3 1990 0.511 a
6 3 2000 0.907 c
7 4 1990 0.0140 b
8 4 2000 0.851 d
9 5 1990 0.0647 a
10 5 2000 0.734 c
11 6 1990 0.955 b
12 6 2000 0.574 d
13 7 1990 0.0865 a
14 7 2000 0.482 c
15 8 1990 0.290 b
16 8 2000 0.331 d
17 9 1990 0.881 a
18 9 2000 0.158 c
19 10 1990 0.123 b
20 10 2000 0.480 d

Pivot data into two different columns simultaneously using pivot_longer() in R?

Edit

Turns out, you can do it in one pivot_longer:

df %>% 
pivot_longer(-id,
names_to = c("variable", ".value"),
names_pattern = "(.*)\\.(.*)")%>%
rename(activation = act, fixation = fix)

with the same result.


Don't know how to do it in one go, but you could use

library(tidyr)
library(dplyr)

df %>%
pivot_longer(-id,
names_to = c("variable", "class"),
names_pattern = "(.*)\\.(.*)") %>%
pivot_wider(names_from = "class") %>%
rename(activation = act, fixation = fix)

This returns

# A tibble: 4 x 4
id variable activation fixation
<dbl> <chr> <dbl> <dbl>
1 1 v1 0.4 1
2 1 v2 0.5 0
3 2 v1 0.8 0
4 2 v2 0.7 1

Pivot_longer to maintain two columns and make the rest long

If you want data in long format A, B to remain as it is remove them from cols :

library(dplyr)
library(tidyr)

df %>%
pivot_longer(cols = -c(A,B), names_to = 'Number', values_to = 'Value') %>%
type.convert(as.is = T) %>%
mutate(Variable = case_when(Number %in% c(1,2) ~ 'WW',
Number %in% c(34,39) ~ 'MM', TRUE ~ 'EE')) %>%
select(One = A, two = B, Number, Variable, Value)

# A tibble: 18 x 5
# One two Number Variable Value
# <chr> <chr> <int> <chr> <dbl>
# 1 A AA 1 WW 1.9
# 2 A AA 2 WW 1.9
# 3 A AA 34 MM 3.9
# 4 A AA 39 MM 2.9
# 5 A AA 158 EE 2.9
# 6 A AA 190 EE 22.1
# 7 B BB 1 WW 6.8
# 8 B BB 2 WW 6.8
# 9 B BB 34 MM 0.3
#10 B BB 39 MM 2.3
#11 B BB 158 EE 3
#12 B BB 190 EE 7.4
#13 C CC 1 WW 4.7
#14 C CC 2 WW 4.7
#15 C CC 34 MM 2.7
#16 C CC 39 MM 2.9
#17 C CC 158 EE 45
#18 C CC 190 EE 56


Related Topics



Leave a reply



Submit