Pivoting multiple sets of columns using pivot_longer in R
The brackets around the matched pattern represents that we are capturing that pattern as a group. In the below code, we capture one or more lower-case letters ([a-z]+
) followed by a _
(not inside the brackets, thus it is removed) and the second capture group matches one or more digits (\\d+
), and this will be matched with the corresponding values of names_to
- i.e. .value
represents the value of the column, thus we get the columns 'x' and 'y' with the values and the second will be a new column names that returs the suffix digits of the column names i.e. 'time'
library(tidyr)
pivot_longer(data, cols = -aid, names_to = c(".value", "time"),
names_pattern = "^([a-z]+)_(\\d+)")
-output
# A tibble: 20 × 4
aid time x y
<int> <chr> <dbl> <dbl>
1 1 1 -0.823 0.954
2 1 2 0.937 2.30
3 2 1 0.644 0.513
4 2 2 -0.281 0.0256
5 3 1 -1.11 0.0575
6 3 2 -0.248 -0.512
7 4 1 -1.04 0.578
8 4 2 -0.414 0.609
9 5 1 1.29 1.60
10 5 2 -1.78 0.759
11 1 1 -0.578 0.0430
12 1 2 -1.00 0.868
13 2 1 0.0900 -2.10
14 2 2 -0.795 -0.434
15 3 1 0.143 -1.13
16 3 2 0.420 0.145
17 4 1 -0.252 0.236
18 4 2 1.56 -0.0472
19 5 1 -0.256 -1.21
20 5 2 0.624 1.02
In the OP's code, there are two groups ((.)
and (.)
) and only one element in names_to
, thus it fails along with the fact that there is _
between the 'x', 'y' and the digit. Also, by default, the names_pattern
will be in regex mode and some characters are thus in metacharacter
mode i.e. .
represents any character
and not the literal .
Pivot longer in dplyr for mutiple value columns
How about like this:
library(tidyverse)
dat <- structure(list(total = c(9410, 12951.1794783802),
op = c(3896.66666666667, 6976.57663230241),
ox = c(2200, 4920.84776902887),
ox15 = c(183.333333333333, 694.262648008611),
ox30 = c(133.333333333333, 368.090117767537),
hy = c(283.333333333333, 1146.14924596984),
hy10 = c(NA, 433.993925588459)),
row.names = c(NA, -2L),
class = c("tbl_df", "tbl", "data.frame"))
dat %>%
mutate(obs = 1:n()) %>%
pivot_longer(-obs, names_to="var", values_to="vals") %>%
pivot_wider(names_from="obs", values_from="vals", names_prefix="obs_")
#> # A tibble: 7 × 3
#> var obs_1 obs_2
#> <chr> <dbl> <dbl>
#> 1 total 9410 12951.
#> 2 op 3897. 6977.
#> 3 ox 2200 4921.
#> 4 ox15 183. 694.
#> 5 ox30 133. 368.
#> 6 hy 283. 1146.
#> 7 hy10 NA 434.
Created on 2022-02-03 by the reprex package (v2.0.1)
Stacking multiple columns using pivot longer in R
Would first rename columns and insert underscore right before number, then use that as separator in pivot_longer
.
library(tidyverse)
df %>%
rename_at(-c(1:2), ~ str_replace(., "(\\w+)(\\d)", "\\1_\\2")) %>%
pivot_longer(cols = -c(1:2), names_to = c(".value", "num"), names_sep = "_")
Edit (2/7/20):
With your updated dataset, it appears that some of the variable column names don't have a number at the end. We can add 0
for those.
Also, I assume you want: group
, ID
, subect
, x
, y
, speed
that are repeated (with the first group
in column 5 separated from its related variables in columns 7-11).
df1 %>%
rename_at(c(5,7:11), ~ paste0(., "0")) %>%
rename_at(-c(1:4, 6, 24), ~ str_replace(., "(\\w+)(\\d+)", "\\1_\\2")) %>%
pivot_longer(cols = -c(1:4, 6, 24), names_to = c(".value", "val"), names_sep = "_")
Output (Revised):
# A tibble: 18 x 13
time state reference num X speed.group2 val group ID subect x y speed
<dbl> <fct> <chr> <chr> <lgl> <lgl> <chr> <chr> <int> <int> <int> <int> <chr>
1 0 Alive;: 1880439 1 NA NA 0 1 1 21 3514 -1644 5.23
2 0 Alive;: 1880439 1 NA NA 1 0 13 9 882 -1468 1.70
3 0 Alive;: 1880439 1 NA NA 2 4 24 -1 5550 4400 0.00
4 0 Alive;: 1880439 NA NA NA 0 NA NA NA NA NA NA
5 0 Alive;: 1880439 NA NA NA 1 NA NA NA NA NA NA
6 0 Alive;: 1880439 NA NA NA 2 NA NA NA NA NA NA
7 0 Alive;: 1880439 2 NA NA 0 1 2 7 2807 -510 3.24
8 0 Alive;: 1880439 2 NA NA 1 4 14 -1 5550 4400 0.00
9 0 Alive;: 1880439 2 NA NA 2 0 25 1 5031 -74 0.54
10 0 Alive;: 1880439 3 NA NA 0 4 4 -1 5550 4400 0.00
11 0 Alive;: 1880439 3 NA NA 1 1 15 13 3004 1633 3.06
12 0 Alive;: 1880439 3 NA NA 2 1 26 18 3936 190 1.06
13 0 Alive;: 1880439 4 NA NA 0 0 5 2 3956 1297 2.31
14 0 Alive;: 1880439 4 NA NA 1 1 16 14 761 559 2.92
15 0 Alive;: 1880439 4 NA NA 2 0 27 5 3972 686 0.95
16 0 Alive;: 1880439 5 NA NA 0 0 6 6 3686 -55 3.57
17 0 Alive;: 1880439 5 NA NA 1 0 17 11 3317 1443 3.30
18 0 Alive;: 1880439 5 NA NA 2 0 28 10 3623 356 2.49
pivot longer with multiple columns and values
You can use names_pattern
here.
tidyr::pivot_longer(df,
cols = -c(id, type),
names_to = c('.value', 'year'),
names_pattern = '([a-z]+)(\\d+)')
# A tibble: 18 x 5
# type id year age bool
# * <int> <int> <chr> <int> <int>
# 1 1 1 2000 20 1
# 2 1 1 2001 21 1
# 3 1 1 2002 22 1
# 4 1 2 2000 35 2
# 5 1 2 2001 36 2
# 6 1 2 2002 37 2
# 7 1 3 2000 24 1
# 8 1 3 2001 25 1
# 9 1 3 2002 26 1
#10 2 1 2000 32 2
#11 2 1 2001 33 2
#12 2 1 2002 34 2
#13 2 2 2000 66 2
#14 2 2 2001 67 2
#15 2 2 2002 68 2
#16 2 3 2000 14 1
#17 2 3 2001 15 1
#18 2 3 2002 16 1
data
df <- structure(list(type = c(1L, 1L, 1L, 2L, 2L, 2L), id = c(1L, 2L,
3L, 1L, 2L, 3L), age2000 = c(20L, 35L, 24L, 32L, 66L, 14L), age2001 = c(21L,
36L, 25L, 33L, 67L, 15L), age2002 = c(22L, 37L, 26L, 34L, 68L,
16L), bool2000 = c(1L, 2L, 1L, 2L, 2L, 1L), bool2001 = c(1L,
2L, 1L, 2L, 2L, 1L), bool2002 = c(1L, 2L, 1L, 2L, 2L, 1L)),
class = "data.frame", row.names = c(NA, -6L))
Is there way to pivot_longer to multiple values columns in R?
We don't need multiple calls if we specify the names_to
as a vector of values i.e. .value
- returns the value of the columns and 'group' the column with the suffix of column names. Here, we use names_sep
as .
to split at the .
library(tidyr)
pivot_longer(df, cols = -ids, names_to = c(".value", "group"),
names_sep = "\\.")
-output
# A tibble: 4 × 4
ids group mean se
<chr> <chr> <int> <int>
1 protein1 group1 982 3
2 protein1 group2 657 7
3 protein2 group1 663 9
4 protein2 group2 215 1
NOTE: values are different as sample
was used in creation of input data without a set.seed
specified
Pivot_longer for multiple columns of repeated measurements data
This probably adds nothing new to the already posted solutions, the only difference is the regex
used for the names_pattern
argument.
- If you notice some of your column names are separated by one
_
whereas others are separated by two_
.\\w+
captures any word character, now if I specify we have a number after this with\\d+
as intime3
intime3_age
, we tellpivot_longer
to store this part of the column names corresponding totime3
intime
column. Then the rest of the column names are used for the variable names we are trying to measure lineage
,systolicBP
andmed_hypt
. - It should be noted that if we use
\\w+\\d+
instead of\\w+
only the rest will be captured as column names whether it ismed_hypt
with underscore orsystolicBP
without underscore. But if we use only\\w+
it could also capture med and the resulting column will behypt
instead ofmed_hypt
. - In the end since I defined two capture groups, I have to define either
names_pattern
ornames_sep
in a way to specify how each of them are defined and separated.
library(dplyr)
wide_data %>%
pivot_longer(!c(id, sex), names_to = c("time", ".value"),
names_pattern = "(\\w+\\d+)_(\\w+)")
# A tibble: 30 x 6
id sex time age systolicBP med_hypt
<dbl> <fct> <chr> <dbl> <dbl> <dbl>
1 12002 women time1 71.2 102 0
2 12002 women time2 74.2 NA 0
3 12002 women time3 78 NA 0
4 17001 men time1 67.9 152 0
5 17001 men time2 69.2 146 0
6 17001 men time3 74.2 160. 0
7 17002 women time1 66.5 NA 0
8 17002 women time2 67.8 NA 0
9 17002 women time3 72.8 NA 0
10 42001 men time1 57.7 170 0
# ... with 20 more rows
r - Pivot Longer distributing values into multiple columns
using pivot_longer
:
sampleData %>%
pivot_longer(-ID, names_to='.value', names_pattern='([A-Za-z]+)', values_drop_na=TRUE)
# A tibble: 5 x 4
ID ExamCode ExamGrade AdminYear
<dbl> <dbl> <dbl> <dbl>
1 123 4 4 22
2 123 26 5 22
3 456 26 3 22
4 456 83 3 21
5 789 26 5 22
Pivot data into two different columns simultaneously using pivot_longer() in R?
Edit
Turns out, you can do it in one pivot_longer
:
df %>%
pivot_longer(-id,
names_to = c("variable", ".value"),
names_pattern = "(.*)\\.(.*)")%>%
rename(activation = act, fixation = fix)
with the same result.
Don't know how to do it in one go, but you could use
library(tidyr)
library(dplyr)
df %>%
pivot_longer(-id,
names_to = c("variable", "class"),
names_pattern = "(.*)\\.(.*)") %>%
pivot_wider(names_from = "class") %>%
rename(activation = act, fixation = fix)
This returns
# A tibble: 4 x 4
id variable activation fixation
<dbl> <chr> <dbl> <dbl>
1 1 v1 0.4 1
2 1 v2 0.5 0
3 2 v1 0.8 0
4 2 v2 0.7 1
Related Topics
Breaks for Scale_X_Date in Ggplot2 and R
How to Minimize Size of Object of Class "Lm" Without Compromising It Being Passed to Predict()
Update Plot Within Observer Loop in Shiny Application
Knitr: Opts_Chunk$Set() Not Working in Rscript Command
Is There a Limit for the Possible Number of Nested Ifelse Statements
How to Print a Variable Inside a for Loop to the Console in Real Time as the Loop Is Running
Additional Metrics in Caret - Ppv, Sensitivity, Specificity
Add Missing Rows to a Data Table
Extracting Data from Text Files
Robust and Clustered Standard Error in R for Probit and Logit Regression
Scoping of Variables in Aes(...) Inside a Function in Ggplot
Connect R and Vertica Using Rodbc
How to Create a Bar and Line Plot with R Dygraphs
Rename Columns by Pattern in R
Adding Multiple Lag Variables Using Dplyr and for Loops
Write.Table Writes Unwanted Leading Empty Column to Header When Has Rownames