Gather multiple sets of columns
This approach seems pretty natural to me:
df %>%
gather(key, value, -id, -time) %>%
extract(key, c("question", "loop_number"), "(Q.\\..)\\.(.)") %>%
spread(question, value)
First gather all question columns, use extract()
to separate into question
and loop_number
, then spread()
question back into the columns.
#> id time loop_number Q3.2 Q3.3
#> 1 1 2009-01-01 1 0.142259203 -0.35842736
#> 2 1 2009-01-01 2 0.061034802 0.79354061
#> 3 1 2009-01-01 3 -0.525686204 -0.67456611
#> 4 2 2009-01-02 1 -1.044461185 -1.19662936
#> 5 2 2009-01-02 2 0.393808163 0.42384717
r - gather multiple columns in multiple key columns with tidyr
Here's one way to do it -
df %>%
gather(key = "age", value = "age_values", age1, age2) %>%
gather(key = "weight", value = "weight_values", weight1, weight2) %>%
filter(substring(age, 4) == substring(weight, 7))
subject age age_values weight weight_values
1 a age1 33 weight1 90
2 b age1 35 weight1 67
3 a age2 43 weight2 70
4 b age2 45 weight2 87
Gather multiple columns with gather in R
Pivoting can be tough in the beginning.
The new version of gather()
is pivot_longer()
.
Here is how you can achieve your expected output.
First, you could just tell the function to pivot everything as default, using only the time as your identifier:
pivot_longer(df, -time) %>% head(5)
#> # A tibble: 30 x 3
#> time name value
#> <dbl> <chr> <dbl>
#> 1 1 value_model_a -0.560
#> 2 1 ci_low_model_a 1.72
#> 3 1 ci_high_model_a 1.22
#> 4 1 value_model_b 1.79
#> 5 1 ci_low_model_b -1.07
This is a start, but you can go further by setting a names separator. You could also use a regex using names_pattern
.
df_l = pivot_longer(df, -time, names_sep="_model_", names_to=c("name", "model"))
df_l
#> # A tibble: 30 x 4
#> time name model value
#> <dbl> <chr> <chr> <dbl>
#> 1 1 value a -0.560
#> 2 1 ci_low a 1.72
#> 3 1 ci_high a 1.22
#> 4 1 value b 1.79
#> 5 1 ci_low b -1.07
#> 6 1 ci_high b -1.69
#> 7 2 value a -0.230
#> 8 2 ci_low a 0.461
#> 9 2 ci_high a 0.360
#> 10 2 value b 0.498
#> # ... with 20 more rows
Finally, your expected output can be achieved by using pivot_wider()
with default values (which I explicitely wrote for academic purpose):
pivot_wider(df_l, names_from = "name", values_from = "value")
#> # A tibble: 10 x 5
#> time model value ci_low ci_high
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 1 a -0.560 1.72 1.22
#> 2 1 b 1.79 -1.07 -1.69
#> 3 2 a -0.230 0.461 0.360
#> 4 2 b 0.498 -0.218 0.838
#> 5 3 a 1.56 -1.27 0.401
#> 6 3 b -1.97 -1.03 0.153
#> 7 4 a 0.0705 -0.687 0.111
#> 8 4 b 0.701 -0.729 -1.14
#> 9 5 a 0.129 -0.446 -0.556
#> 10 5 b -0.473 -0.625 1.25
Created on 2021-03-03 by the reprex package (v1.0.0)
How to specify multiple columns with gather() function to tidy data
In the gather
function, value
specifies the name of value column in the result; To specify which columns to gather, you can use start_column:end_column
syntax, this will gather all columns from the start_column to end_column; In your case, it would be X0tot4:X20tot24
:
df %>% gather(key = 'Age.group', value = 'Value.name', X0tot4:X20tot24)
# V V
# V V
# V V
# Country Country.Code Year Age.group Value.name
#1 Viet Nam 704 1955 X0tot4 4606
#2 Viet Nam 704 1960 X0tot4 5842
#3 Viet Nam 704 1965 X0tot4 6571
#4 Viet Nam 704 1970 X0tot4 7065
#5 Viet Nam 704 1975 X0tot4 7658
#6 Viet Nam 704 1980 X0tot4 7991
#7 Viet Nam 704 1985 X0tot4 8630
How can you gather() multiple columns at the same time in dplyr (R)?
I’m assuming the issue here is with manually specify all the different variable names. Luckily, tidyverse
has the ?select_helpers
which make it easier to select columns based on different rules.
Instead of renaming the variables at the beginning, we can rename them at the end. This lets us use starts_with
to get all columns starting with x
or y
and gather them together in one step. Then we can use ends_with
to select the value columns from those gather steps and filter and drop them.
Finally, we replace all values of x1
, y1
etc. with their true values in one step using mutate_all
and a lookup table
# Make lookup table to match X and Y variables with Values
# the initial values should be the `names` (first) and the values to change them to
# should be the `values` (after the =)
lookup <- c('x1' = 'Green',
'x2' = 'Yellow',
'x3' = 'Orange',
'y1' = 'Yes',
'y2' = 'No',
'y3' = 'Maybe')
tb1 %>%
gather(X, Xval, starts_with('x')) %>% # Gather all variables that start with ‘x'
gather(Y, Yval, starts_with('y')) %>% # Gather all variables that start with ‘y'
filter_at(vars(ends_with('val')), # Looking in columns ending with ‘val'
all_vars(!is.na(.))) %>% %>% # Drop rows if ANY of these cols are NA
select(-ends_with('val')) %>% # Drop columns ending in ‘val'
mutate_all(~lookup[.]) # Replace value from lookup table in all cols
# A tibble: 3 x 2
X Y
<chr> <chr>
1 Green No
2 Yellow Maybe
3 Orange Maybe
One tricky thing with select_helpers is knowing when you an use them alone and when you need to “register” them with vars
. In gather
and select
, you can use them as is. In mutate
, filter
, summarize
, etc. you need to surround them with vars
tidyr:: gather multiple columns different types
I assume your expected output is incomplete as I don't see any entries for ID = 2
and ID = 3
.
You could do the following
df %>%
gather(k, v, -ID) %>%
separate(k, into = c("tmp", "X_num", "ss"), sep = "_") %>%
select(-tmp) %>%
spread(ss, v)
# ID X_num abc xyz
#1 1 1 1 1
#2 1 2 2 2
#3 1 3 2 1
#4 2 1 1 2
#5 2 2 1 0
#6 2 3 1 NA
#7 3 1 1 2
#8 3 2 1 1
#9 3 3 NA 0
Gather and mutate multiple columns in R
The problem arises because the pre_post
column you create is incorrect for tests 2 and 3, which you can verify by inspecting the df_2
dataframe. For example row 5 from your output is:
test_1_post 2 test_2_pre 6 test_3_pre 12 post
Whilst a single data pipeline is appropriate is many circumstances, this problem is more easily solved by splitting up the tables and using a union_all
, as per the following:
library(tidyverse)
library(tableone)
test_1_pre <- c(1,5,8,2)
test_1_post <- c(2,7,3,6)
test_2_pre <- c(6,3,6,5)
test_2_post <- c(9,8,9,1)
test_3_pre <- c(12,2,4,6)
test_3_post <- c(4,7,6,6)
df <- data.frame(test_1_pre, test_1_post, test_2_pre,
test_2_post, test_3_pre, test_3_post)
get_dfs <- function(df, suffix) {
df %>%
select(ends_with(suffix)) %>%
mutate(pre_post = suffix) %>%
# Drop _pre / _post suffixes in test column names ahead of union
rename_with(.fn = function(x) gsub(paste0("_", suffix), "", x),
.cols = starts_with("test"))
}
df_2 <- union_all(get_dfs(df, "pre"), get_dfs(df, "post"))
vars_df <- c("test_1", "test_2", "test_3")
table_df <- CreateTableOne(vars = vars_df,
data = df_2,
strata = "pre_post")
table_df
The output given is:
# Stratified by pre_post
# post pre p test
# n 4 4
# test_1 (mean (SD)) 4.50 (2.38) 4.00 (3.16) 0.809
# test_2 (mean (SD)) 6.75 (3.86) 5.00 (1.41) 0.427
# test_3 (mean (SD)) 5.75 (1.26) 6.00 (4.32) 0.915
This contains more conservative p-values arising from the different estimated standard errors, as compared with @Frank's answer.
Moreover, the structure of df_2
is cleaner:
# test_1 test_2 test_3 pre_post
# 1 6 12 pre
# 5 3 2 pre
# 8 6 4 pre
# 2 5 6 pre
# 2 9 4 post
# 7 8 7 post
# 3 9 6 post
# 6 1 6 post
Gather multiple key and value columns
Using tidyr::pivot_longer
-
tidyr::pivot_longer(df,
cols = -ID,
names_to = '.value',
names_pattern = '(.*)\\d+')
# A tibble: 36 x 3
# ID Date Fcast
# <int> <date> <dbl>
# 1 1 2000-01-01 0.452
# 2 1 2001-01-01 0.242
# 3 1 2002-01-01 -0.540
# 4 2 2000-02-01 1.54
# 5 2 2001-02-01 0.178
# 6 2 2002-02-01 0.883
# 7 3 2000-03-01 -0.987
# 8 3 2001-03-01 1.40
# 9 3 2002-03-01 0.675
#10 4 2000-04-01 -0.632
# … with 26 more rows
how do I gather 2 sets of columns in tidyr
If we are using gather
, we can do this in two steps. First, we reshape from 'wide' to 'long' format for the column names that starts with 'category' and in the next step, we do the same with the numeric column names by selecting with matches
. The matches
can regex patterns, so a pattern of ^[0-9]+$
means we match one or more numbers ([0-9]+
) from the start (^
) to the end ($
) of string. We can remove the columns that are not needed with select
.
library(tidyr)
library(dplyr)
gather(df, key, category, starts_with('category_')) %>%
gather(key2, year, matches('^[0-9]+$')) %>%
select(-starts_with('key'))
Or using the devel version of data.table
, this would be much easier as the melt
can take multiple patterns for measure
columns. We convert the 'data.frame' to 'data.table' (setDT(df)
), use melt
and specify the patterns
with in the measure
argument. We also have options to change the column names of the 'value' column. The 'variable' column is set to NULL as it was not needed in the expected output.
library(data.table)#v1.9.5+
melt(setDT(df), measure=patterns(c('^category', '^[0-9]+$')),
value.name=c('category', 'year'))[, variable:=NULL][]
gather() with two key columns
It is not a good idea to have duplicate column names in the data so I'll rename one of them.
names(df)[4] <- 'US_1'
gather
has been retired and replaced withpivot_longer
.This is not a traditional reshape because the data in the 1st row needs to be treated differently than rest of the rows so we can perform the reshaping separately and combine the result to get one final dataframe.
library(dplyr)
library(tidyr)
df1 <- df %>% slice(-1L) %>% pivot_longer(cols = -Country)
df %>%
slice(1L) %>%
pivot_longer(-Country, values_to = 'org_id') %>%
select(-Country) %>%
inner_join(df1, by = 'name') %>%
rename(Country = name, date = Country) -> result
result
# Country org_id date value
# <chr> <int> <chr> <int>
#1 US 332 02-15-20 25
#2 US 332 03-15-20 30
#3 Canada 778 02-15-20 35
#4 Canada 778 03-15-20 10
#5 US_1 920 02-15-20 54
#6 US_1 920 03-15-20 60
data
df <- structure(list(Country = c("org_id", "02-15-20", "03-15-20"),
US = c(332L, 25L, 30L), Canada = c(778L, 35L, 10L), US = c(920L,
54L, 60L)), class = "data.frame", row.names = c(NA, -3L))
Related Topics
R - Test If a String Vector Contains Any Element of Another List
Extract a Dplyr Tbl Column as a Vector
Subtract Value from Previous Row by Group
Pull Out P-Values and R-Squared from a Linear Regression
How to Save Warnings and Errors as Output from a Function
Concatenating Two Text Columns in Dplyr
Splitting a Dataframe into Several Dataframes
Using Ggplot2, How to Insert a Break in the Axis
How to Filter Multiple Columns With Same Condition in R
Sum Across Multiple Columns With Dplyr
Split Comma-Separated Strings in a Column into Separate Rows
Add Legend to Ggplot2 Line Plot
How to Prevent Ifelse() from Turning Date Objects into Numeric Objects
Test If a Vector Contains a Given Element
How to Use R'S Ellipsis Feature When Writing Your Own Function
Dictionary Style Replace Multiple Items
Cluster Analysis in R: Determine the Optimal Number of Clusters