Gather Multiple Sets of Columns

Gather multiple sets of columns

This approach seems pretty natural to me:

df %>%
  gather(key, value, -id, -time) %>%
  extract(key, c("question", "loop_number"), "(Q.\\..)\\.(.)") %>%
  spread(question, value)

First gather all question columns, use extract() to separate into question and loop_number, then spread() question back into the columns.

#>    id       time loop_number         Q3.2        Q3.3
#> 1   1 2009-01-01           1  0.142259203 -0.35842736
#> 2   1 2009-01-01           2  0.061034802  0.79354061
#> 3   1 2009-01-01           3 -0.525686204 -0.67456611
#> 4   2 2009-01-02           1 -1.044461185 -1.19662936
#> 5   2 2009-01-02           2  0.393808163  0.42384717

r - gather multiple columns in multiple key columns with tidyr

Here's one way to do it -

df %>%
  gather(key = "age", value = "age_values", age1, age2) %>%
  gather(key = "weight", value = "weight_values", weight1, weight2) %>%
  filter(substring(age, 4) == substring(weight, 7))

  subject  age age_values  weight weight_values
1       a age1         33 weight1            90
2       b age1         35 weight1            67
3       a age2         43 weight2            70
4       b age2         45 weight2            87

Gather multiple columns with gather in R

Pivoting can be tough in the beginning.

The new version of gather() is pivot_longer().

Here is how you can achieve your expected output.

First, you could just tell the function to pivot everything as default, using only the time as your identifier:

pivot_longer(df, -time) %>% head(5)
#> # A tibble: 30 x 3
#>     time name             value
#>    <dbl> <chr>            <dbl>
#>  1     1 value_model_a   -0.560
#>  2     1 ci_low_model_a   1.72 
#>  3     1 ci_high_model_a  1.22 
#>  4     1 value_model_b    1.79 
#>  5     1 ci_low_model_b  -1.07

This is a start, but you can go further by setting a names separator. You could also use a regex using names_pattern.

df_l = pivot_longer(df, -time, names_sep="_model_", names_to=c("name", "model"))
df_l
#> # A tibble: 30 x 4
#>     time name    model  value
#>    <dbl> <chr>   <chr>  <dbl>
#>  1     1 value   a     -0.560
#>  2     1 ci_low  a      1.72 
#>  3     1 ci_high a      1.22 
#>  4     1 value   b      1.79 
#>  5     1 ci_low  b     -1.07 
#>  6     1 ci_high b     -1.69 
#>  7     2 value   a     -0.230
#>  8     2 ci_low  a      0.461
#>  9     2 ci_high a      0.360
#> 10     2 value   b      0.498
#> # ... with 20 more rows

Finally, your expected output can be achieved by using pivot_wider() with default values (which I explicitely wrote for academic purpose):

pivot_wider(df_l, names_from = "name", values_from = "value")
#> # A tibble: 10 x 5
#>     time model   value ci_low ci_high
#>    <dbl> <chr>   <dbl>  <dbl>   <dbl>
#>  1     1 a     -0.560   1.72    1.22 
#>  2     1 b      1.79   -1.07   -1.69 
#>  3     2 a     -0.230   0.461   0.360
#>  4     2 b      0.498  -0.218   0.838
#>  5     3 a      1.56   -1.27    0.401
#>  6     3 b     -1.97   -1.03    0.153
#>  7     4 a      0.0705 -0.687   0.111
#>  8     4 b      0.701  -0.729  -1.14 
#>  9     5 a      0.129  -0.446  -0.556
#> 10     5 b     -0.473  -0.625   1.25

^{Created on 2021-03-03 by the reprex package (v1.0.0)}

How to specify multiple columns with gather() function to tidy data

In the gather function, value specifies the name of value column in the result; To specify which columns to gather, you can use start_column:end_column syntax, this will gather all columns from the start_column to end_column; In your case, it would be X0tot4:X20tot24:

df %>% gather(key = 'Age.group', value = 'Value.name', X0tot4:X20tot24)
#                        V                     V
#                              V               V
#                                    V         V
#      Country Country.Code Year Age.group Value.name
#1   Viet Nam           704 1955    X0tot4       4606
#2   Viet Nam           704 1960    X0tot4       5842
#3   Viet Nam           704 1965    X0tot4       6571
#4   Viet Nam           704 1970    X0tot4       7065
#5   Viet Nam           704 1975    X0tot4       7658
#6   Viet Nam           704 1980    X0tot4       7991
#7   Viet Nam           704 1985    X0tot4       8630

How can you gather() multiple columns at the same time in dplyr (R)?

I’m assuming the issue here is with manually specify all the different variable names. Luckily, tidyverse has the ?select_helpers which make it easier to select columns based on different rules.

Instead of renaming the variables at the beginning, we can rename them at the end. This lets us use starts_with to get all columns starting with x or y and gather them together in one step. Then we can use ends_with to select the value columns from those gather steps and filter and drop them.

Finally, we replace all values of x1, y1 etc. with their true values in one step using mutate_all and a lookup table

# Make lookup table to match X and Y variables with Values
  # the initial values should be the `names` (first) and the values to change them to
  # should be the `values` (after the =)
lookup <- c('x1' = 'Green',
            'x2' = 'Yellow',
            'x3' = 'Orange',
            'y1' = 'Yes',
            'y2' = 'No',
            'y3' = 'Maybe')

tb1 %>%
    gather(X, Xval, starts_with('x')) %>%    # Gather all variables that start with ‘x'
    gather(Y, Yval, starts_with('y')) %>%    # Gather all variables that start with ‘y'
    filter_at(vars(ends_with('val')),        # Looking in columns ending with ‘val'
              all_vars(!is.na(.))) %>% %>%    # Drop rows if ANY of these cols are NA
    select(-ends_with('val')) %>%            # Drop columns ending in ‘val'
    mutate_all(~lookup[.])                   # Replace value from lookup table in all cols

# A tibble: 3 x 2
  X      Y 
  <chr>  <chr>
1 Green  No   
2 Yellow Maybe
3 Orange Maybe

One tricky thing with select_helpers is knowing when you an use them alone and when you need to “register” them with vars. In gather and select, you can use them as is. In mutate, filter, summarize, etc. you need to surround them with vars

tidyr:: gather multiple columns different types

I assume your expected output is incomplete as I don't see any entries for ID = 2 and ID = 3.

You could do the following

df %>%
    gather(k, v, -ID) %>%
    separate(k, into = c("tmp", "X_num", "ss"), sep = "_") %>%
    select(-tmp) %>%
    spread(ss, v)
#  ID X_num abc xyz
#1  1     1   1   1
#2  1     2   2   2
#3  1     3   2   1
#4  2     1   1   2
#5  2     2   1   0
#6  2     3   1  NA
#7  3     1   1   2
#8  3     2   1   1
#9  3     3  NA   0

Gather and mutate multiple columns in R

The problem arises because the pre_post column you create is incorrect for tests 2 and 3, which you can verify by inspecting the df_2 dataframe. For example row 5 from your output is:

test_1_post      2  test_2_pre      6  test_3_pre     12     post

Whilst a single data pipeline is appropriate is many circumstances, this problem is more easily solved by splitting up the tables and using a union_all, as per the following:

library(tidyverse)
library(tableone)

test_1_pre <- c(1,5,8,2)
test_1_post <- c(2,7,3,6)
test_2_pre <- c(6,3,6,5)
test_2_post <- c(9,8,9,1)
test_3_pre <- c(12,2,4,6)
test_3_post <- c(4,7,6,6)

df <- data.frame(test_1_pre, test_1_post, test_2_pre,
                 test_2_post, test_3_pre, test_3_post)

get_dfs <- function(df, suffix) {
  df %>% 
    select(ends_with(suffix)) %>% 
    mutate(pre_post = suffix) %>% 
    # Drop _pre / _post suffixes in test column names ahead of union
    rename_with(.fn = function(x) gsub(paste0("_", suffix), "", x),
                .cols = starts_with("test"))
}

df_2 <- union_all(get_dfs(df, "pre"), get_dfs(df, "post"))

vars_df <- c("test_1", "test_2", "test_3")

table_df <- CreateTableOne(vars = vars_df,
                           data = df_2,
                           strata = "pre_post")

table_df

The output given is:

# Stratified by pre_post
# post        pre         p      test
# n                     4           4                   
# test_1 (mean (SD)) 4.50 (2.38) 4.00 (3.16)  0.809     
# test_2 (mean (SD)) 6.75 (3.86) 5.00 (1.41)  0.427     
# test_3 (mean (SD)) 5.75 (1.26) 6.00 (4.32)  0.915

This contains more conservative p-values arising from the different estimated standard errors, as compared with @Frank's answer.

Moreover, the structure of df_2 is cleaner:

# test_1 test_2 test_3 pre_post
#      1      6     12      pre
#      5      3      2      pre
#      8      6      4      pre
#      2      5      6      pre
#      2      9      4     post
#      7      8      7     post
#      3      9      6     post
#      6      1      6     post

Gather multiple key and value columns

Using tidyr::pivot_longer -

tidyr::pivot_longer(df, 
                    cols = -ID, 
                    names_to = '.value', 
                    names_pattern = '(.*)\\d+')

# A tibble: 36 x 3
#      ID Date        Fcast
#   <int> <date>      <dbl>
# 1     1 2000-01-01  0.452
# 2     1 2001-01-01  0.242
# 3     1 2002-01-01 -0.540
# 4     2 2000-02-01  1.54 
# 5     2 2001-02-01  0.178
# 6     2 2002-02-01  0.883
# 7     3 2000-03-01 -0.987
# 8     3 2001-03-01  1.40 
# 9     3 2002-03-01  0.675
#10     4 2000-04-01 -0.632
# … with 26 more rows

how do I gather 2 sets of columns in tidyr

If we are using gather, we can do this in two steps. First, we reshape from 'wide' to 'long' format for the column names that starts with 'category' and in the next step, we do the same with the numeric column names by selecting with matches. The matches can regex patterns, so a pattern of ^[0-9]+$ means we match one or more numbers ([0-9]+) from the start (^) to the end ($) of string. We can remove the columns that are not needed with select.

library(tidyr)
library(dplyr) 
gather(df, key, category, starts_with('category_')) %>%
     gather(key2, year, matches('^[0-9]+$')) %>%
     select(-starts_with('key'))

Or using the devel version of data.table, this would be much easier as the melt can take multiple patterns for measure columns. We convert the 'data.frame' to 'data.table' (setDT(df)), use melt and specify the patterns with in the measure argument. We also have options to change the column names of the 'value' column. The 'variable' column is set to NULL as it was not needed in the expected output.

library(data.table)#v1.9.5+
melt(setDT(df), measure=patterns(c('^category', '^[0-9]+$')), 
           value.name=c('category', 'year'))[, variable:=NULL][]

gather() with two key columns

It is not a good idea to have duplicate column names in the data so I'll rename one of them.

names(df)[4] <- 'US_1'

gather has been retired and replaced with pivot_longer.
This is not a traditional reshape because the data in the 1st row needs to be treated differently than rest of the rows so we can perform the reshaping separately and combine the result to get one final dataframe.

library(dplyr)
library(tidyr)

df1 <- df %>% slice(-1L) %>% pivot_longer(cols = -Country)

df %>% 
  slice(1L) %>%
  pivot_longer(-Country, values_to = 'org_id') %>%
  select(-Country) %>%
  inner_join(df1, by = 'name') %>%
  rename(Country = name, date = Country) -> result

result

#  Country org_id date     value
#  <chr>    <int> <chr>    <int>
#1 US         332 02-15-20    25
#2 US         332 03-15-20    30
#3 Canada     778 02-15-20    35
#4 Canada     778 03-15-20    10
#5 US_1       920 02-15-20    54
#6 US_1       920 03-15-20    60

data

df <- structure(list(Country = c("org_id", "02-15-20", "03-15-20"), 
    US = c(332L, 25L, 30L), Canada = c(778L, 35L, 10L), US = c(920L, 
    54L, 60L)), class = "data.frame", row.names = c(NA, -3L))

Gather Multiple Sets of Columns