pivot_longer into multiple columns
Here is solution following a similar method that @Fnguyen used but using the newer pivot_longer
and pivot_wider
construct:
library(dplyr)
library(tidyr)
longer<-pivot_longer(dat, cols=-1, names_pattern = "(.*)(..)$", names_to = c("limit", "name")) %>%
mutate(limit=ifelse(limit=="", "value", limit))
answer <-pivot_wider(longer, id_cols = c(group, name), names_from = limit, values_from = value, names_repair = "check_unique")
Most of the selecting, separating, mutating and renaming is taking place within the pivot function calls.
Update:
This regular expressions "(.*)(..)$" means:
( ) ( ) Look for two parts,
(.*) the first part should have zero or more characters
(..) the second part should have just 2 characters at the “$” end of the string
Is there way to pivot_longer to multiple values columns in R?
We don't need multiple calls if we specify the names_to
as a vector of values i.e. .value
- returns the value of the columns and 'group' the column with the suffix of column names. Here, we use names_sep
as .
to split at the .
library(tidyr)
pivot_longer(df, cols = -ids, names_to = c(".value", "group"),
names_sep = "\\.")
-output
# A tibble: 4 × 4
ids group mean se
<chr> <chr> <int> <int>
1 protein1 group1 982 3
2 protein1 group2 657 7
3 protein2 group1 663 9
4 protein2 group2 215 1
NOTE: values are different as sample
was used in creation of input data without a set.seed
specified
Pivot_longer for multiple columns of repeated measurements data
This probably adds nothing new to the already posted solutions, the only difference is the regex
used for the names_pattern
argument.
- If you notice some of your column names are separated by one
_
whereas others are separated by two_
.\\w+
captures any word character, now if I specify we have a number after this with\\d+
as intime3
intime3_age
, we tellpivot_longer
to store this part of the column names corresponding totime3
intime
column. Then the rest of the column names are used for the variable names we are trying to measure lineage
,systolicBP
andmed_hypt
. - It should be noted that if we use
\\w+\\d+
instead of\\w+
only the rest will be captured as column names whether it ismed_hypt
with underscore orsystolicBP
without underscore. But if we use only\\w+
it could also capture med and the resulting column will behypt
instead ofmed_hypt
. - In the end since I defined two capture groups, I have to define either
names_pattern
ornames_sep
in a way to specify how each of them are defined and separated.
library(dplyr)
wide_data %>%
pivot_longer(!c(id, sex), names_to = c("time", ".value"),
names_pattern = "(\\w+\\d+)_(\\w+)")
# A tibble: 30 x 6
id sex time age systolicBP med_hypt
<dbl> <fct> <chr> <dbl> <dbl> <dbl>
1 12002 women time1 71.2 102 0
2 12002 women time2 74.2 NA 0
3 12002 women time3 78 NA 0
4 17001 men time1 67.9 152 0
5 17001 men time2 69.2 146 0
6 17001 men time3 74.2 160. 0
7 17002 women time1 66.5 NA 0
8 17002 women time2 67.8 NA 0
9 17002 women time3 72.8 NA 0
10 42001 men time1 57.7 170 0
# ... with 20 more rows
Pivoting multiple sets of columns using pivot_longer in R
The brackets around the matched pattern represents that we are capturing that pattern as a group. In the below code, we capture one or more lower-case letters ([a-z]+
) followed by a _
(not inside the brackets, thus it is removed) and the second capture group matches one or more digits (\\d+
), and this will be matched with the corresponding values of names_to
- i.e. .value
represents the value of the column, thus we get the columns 'x' and 'y' with the values and the second will be a new column names that returs the suffix digits of the column names i.e. 'time'
library(tidyr)
pivot_longer(data, cols = -aid, names_to = c(".value", "time"),
names_pattern = "^([a-z]+)_(\\d+)")
-output
# A tibble: 20 × 4
aid time x y
<int> <chr> <dbl> <dbl>
1 1 1 -0.823 0.954
2 1 2 0.937 2.30
3 2 1 0.644 0.513
4 2 2 -0.281 0.0256
5 3 1 -1.11 0.0575
6 3 2 -0.248 -0.512
7 4 1 -1.04 0.578
8 4 2 -0.414 0.609
9 5 1 1.29 1.60
10 5 2 -1.78 0.759
11 1 1 -0.578 0.0430
12 1 2 -1.00 0.868
13 2 1 0.0900 -2.10
14 2 2 -0.795 -0.434
15 3 1 0.143 -1.13
16 3 2 0.420 0.145
17 4 1 -0.252 0.236
18 4 2 1.56 -0.0472
19 5 1 -0.256 -1.21
20 5 2 0.624 1.02
In the OP's code, there are two groups ((.)
and (.)
) and only one element in names_to
, thus it fails along with the fact that there is _
between the 'x', 'y' and the digit. Also, by default, the names_pattern
will be in regex mode and some characters are thus in metacharacter
mode i.e. .
represents any character
and not the literal .
Using pivot_longer with multiple column classes
We could use names_pattern
after rearranging the substring in column names
library(dplyr)
library(tidyr)
library(stringr)
df_wide %>%
# rename the columns by rearranging the digits at the end
# "_(\\d+)(_.*)" - captures the digits (\\d+) after the _
# and the rest of the characters (_.*)
# replace with the backreference (\\2, \\1) of captured groups rearranged
rename_with(~ str_replace(., "_(\\d+)(_.*)", "\\2_\\1"), -resp_id) %>%
pivot_longer(cols = -resp_id, names_to = c( ".value", "question_number"),
names_pattern = "(.*)_(\\d+$)")
-output
# A tibble: 6 × 4
resp_id question_number question_info question_answer
<dbl> <chr> <chr> <dbl>
1 1 1 "What is your eye color?" 1
2 1 2 "What is your hair color?" 2
3 2 1 "Are you over 6 ft tall?" 1
4 2 2 "" NA
5 3 1 "What is your hair color?" 0
6 3 2 "Are you under 40?" 1
pivot_longer into several pairs of columns
With tidyverse
, we can pivot on the two sets of columns that starts with belief
and norm
. We can then use regex to split into groups according to the first underscore (since some column names have multiple underscores). Essentially, we are saying to put belief
or norm
(the first group in the column name) into their own columns (i.e., .value
), then the second part of the group (i.e., animal names) are put into one column named animal
.
library(tidyverse)
df_raw %>%
pivot_longer(cols = c(starts_with("belief"), starts_with("norm")),
names_to = c('.value', 'animal'),
names_pattern = '(.*?)_(.*)') %>%
rename(belief_rating = belief, norm_rating = norm)
Output
id age gender animal belief_rating norm_rating
<chr> <dbl> <dbl> <chr> <dbl> <dbl>
1 b2x8 41 2 dog 1 10
2 b2x8 41 2 bull_frog 4 4
3 b2x8 41 2 fish 3 2
4 m89w 19 1 dog 3 3
5 m89w 19 1 bull_frog 6 1
6 m89w 19 1 fish 2 2
7 32x8 38 3 dog 1 8
8 32x8 38 3 bull_frog 5 9
9 32x8 38 3 fish 2 1
pivot_longer multiple variables of different kinds
In this case one has to use names_to
combined with names_pattern
:
library(dplyr)
library(tidyr)
> head(x,3)
case X1990 flag.1990 X2000 flag.2000
1 1 0.2772497942 a 0.1751129 c
2 2 0.0005183129 b 0.4407503 d
3 3 0.5106083730 a 0.9071830 c
> x %>%
pivot_longer(cols = -case,
names_to = c(".value", "year"),
names_pattern = "([^\\.]*)\\.*(\\d{4})")
# A tibble: 20 x 4
case year X flag
<int> <chr> <dbl> <chr>
1 1 1990 0.277 a
2 1 2000 0.175 c
3 2 1990 0.000518 b
4 2 2000 0.441 d
5 3 1990 0.511 a
6 3 2000 0.907 c
7 4 1990 0.0140 b
8 4 2000 0.851 d
9 5 1990 0.0647 a
10 5 2000 0.734 c
11 6 1990 0.955 b
12 6 2000 0.574 d
13 7 1990 0.0865 a
14 7 2000 0.482 c
15 8 1990 0.290 b
16 8 2000 0.331 d
17 9 1990 0.881 a
18 9 2000 0.158 c
19 10 1990 0.123 b
20 10 2000 0.480 d
Pivot data into two different columns simultaneously using pivot_longer() in R?
Edit
Turns out, you can do it in one pivot_longer
:
df %>%
pivot_longer(-id,
names_to = c("variable", ".value"),
names_pattern = "(.*)\\.(.*)")%>%
rename(activation = act, fixation = fix)
with the same result.
Don't know how to do it in one go, but you could use
library(tidyr)
library(dplyr)
df %>%
pivot_longer(-id,
names_to = c("variable", "class"),
names_pattern = "(.*)\\.(.*)") %>%
pivot_wider(names_from = "class") %>%
rename(activation = act, fixation = fix)
This returns
# A tibble: 4 x 4
id variable activation fixation
<dbl> <chr> <dbl> <dbl>
1 1 v1 0.4 1
2 1 v2 0.5 0
3 2 v1 0.8 0
4 2 v2 0.7 1
Pivot_longer to maintain two columns and make the rest long
If you want data in long format A
, B
to remain as it is remove them from cols
:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -c(A,B), names_to = 'Number', values_to = 'Value') %>%
type.convert(as.is = T) %>%
mutate(Variable = case_when(Number %in% c(1,2) ~ 'WW',
Number %in% c(34,39) ~ 'MM', TRUE ~ 'EE')) %>%
select(One = A, two = B, Number, Variable, Value)
# A tibble: 18 x 5
# One two Number Variable Value
# <chr> <chr> <int> <chr> <dbl>
# 1 A AA 1 WW 1.9
# 2 A AA 2 WW 1.9
# 3 A AA 34 MM 3.9
# 4 A AA 39 MM 2.9
# 5 A AA 158 EE 2.9
# 6 A AA 190 EE 22.1
# 7 B BB 1 WW 6.8
# 8 B BB 2 WW 6.8
# 9 B BB 34 MM 0.3
#10 B BB 39 MM 2.3
#11 B BB 158 EE 3
#12 B BB 190 EE 7.4
#13 C CC 1 WW 4.7
#14 C CC 2 WW 4.7
#15 C CC 34 MM 2.7
#16 C CC 39 MM 2.9
#17 C CC 158 EE 45
#18 C CC 190 EE 56
Related Topics
Data.Table Join and J-Expression Unexpected Behavior
Unzip Password Protected Zip Files in R
How to Insert Appendix After References in Rmd Using Rstudio
In R: Joining Vector Elements by Row, Converting Vector Rows to Strings
R: Save All Data.Frames in Workspace to Separate .Rdata Files
Ggplot2: How to Reduce Space Between Narrow Width Bars, After Coord_Flip, and Panel Border
Bold Formatting for Significant Values in a Rmarkdown Table
R - Set Execution Time Limit in Loop
How to Flatten R Data Frame That Contains Lists
Convert Map Data to Data Frame Using Fortify {Ggplot2} for Spatial Objects in R
How to Cache Data in Shiny Server
How Fill Part of a Circle Using Ggplot2
Adding a Ranking Column to a Dataframe
Inline Function Code Doesn't Compile
Shiny App File Upload: How to Save the Files Uploaded on a Shiny Gui to a Particular Destination
Import All the Functions of a Package Except One When Building a Package