Gathering Wide Columns into Multiple Long Columns Using Pivot_Longer

Gathering wide columns into multiple long columns using pivot_longer

I have found the answer to my question:

pivot_longer - transforms the columns in wide format starting with 'hf' and 'ac' to long format in separate columns

names_to parameters:

.value = contains metadata on the cell values that correspond to the original columns
these values are pivoted in long format and added in new columns "hf" and "ac"
column "group" has the original column endings (e.g. the numbers 1-6) pivoted to long format
names_pattern = regex argument specifying character "_" where column names are to be broken up

df3 <- df %>%
  tidyr::pivot_longer(
    cols = c(
      starts_with("hf"),
      starts_with("ac"),
      starts_with("cs"),
      starts_with("se")
    ),
    names_to = c(".value", "level"),
    names_pattern = "(.*)_(.*)"
  )

Pivot_longer for multiple columns of repeated measurements data

This probably adds nothing new to the already posted solutions, the only difference is the regex used for the names_pattern argument.

If you notice some of your column names are separated by one _ whereas others are separated by two _. \\w+ captures any word character, now if I specify we have a number after this with \\d+ as in time3 in time3_age, we tell pivot_longer to store this part of the column names corresponding to time3 in time column. Then the rest of the column names are used for the variable names we are trying to measure line age, systolicBP and med_hypt.
It should be noted that if we use \\w+\\d+ instead of \\w+ only the rest will be captured as column names whether it is med_hypt with underscore or systolicBP without underscore. But if we use only \\w+ it could also capture med and the resulting column will be hypt instead of med_hypt.
In the end since I defined two capture groups, I have to define either names_pattern or names_sep in a way to specify how each of them are defined and separated.

library(dplyr)

wide_data %>%
  pivot_longer(!c(id, sex), names_to = c("time", ".value"), 
               names_pattern = "(\\w+\\d+)_(\\w+)")

# A tibble: 30 x 6
      id sex   time    age systolicBP med_hypt
   <dbl> <fct> <chr> <dbl>      <dbl>    <dbl>
 1 12002 women time1  71.2       102         0
 2 12002 women time2  74.2        NA         0
 3 12002 women time3  78          NA         0
 4 17001 men   time1  67.9       152         0
 5 17001 men   time2  69.2       146         0
 6 17001 men   time3  74.2       160.        0
 7 17002 women time1  66.5        NA         0
 8 17002 women time2  67.8        NA         0
 9 17002 women time3  72.8        NA         0
10 42001 men   time1  57.7       170         0
# ... with 20 more rows

Pivot_longer: Rotating multiple columns of data with same data types

You were on the right path. Renaming is needed since only the name columns do not have any suffix to identify them. .value identifies part of the original column name that you want to uniquely identify as new columns. If you remove everything until the last underscore the part that remains are the new column names which you can specify using regex in names_pattern.

library(dplyr)
library(tidyr)

df %>%  
  rename(contact_1_name=contact_1, 
         contact_2_name=contact_2) %>%
  pivot_longer(cols = everything(), 
               names_to = '.value', 
               names_pattern = '.*_(\\w+)')

#  prefix name              loc        
#  <chr>  <chr>             <chr>      
#1 Mr.    Bob Johnson       Earth      
#2 Dr.    Tommy Two Tones   London     
#3 Mrs.   Robert Johnson    New York   
#4 Mr.    Tommy Three Tones Geneva     
#5 Dr.    Bobby Johnson     Los Angeles
#6 Mrs.   Tommy No Tones    Paris

Using pivot_longer with multiple paired columns in the wide dataset

You want to use .value in the names_to argument:

input %>%
  pivot_longer(
    -event, 
    names_to = c(".value", "item"), 
    names_sep = "_"
  ) %>% 
  select(-item)

# A tibble: 4 x 3
  event url   name 
  <int> <fct> <fct>
1     1 g1    dc   
2     1 g2    sf   
3     2 g3    nyc  
4     2 g4    la

From this article on pivoting:

Note the special name .value: this tells pivot_longer() that that part of the column name specifies the “value” being measured (which will become a variable in the output).

Using pivot_longer to separate columns into long format

The data column names to be used in 'long' format doesn't all have the same pattern in column names. Therefore, the steps included are

rename columns that doesn't have the ... or _ in their column names by adding those with paste/str_c
reshape to long format with pivot_longer - taking into account the pattern in names with either names_sep or names_pattern, specify the names_to as a vector of c(".value", "trait") in the same order we want the column values and the suffix value to be stored as separate columns
Once we reshaped, create a grouping column based on the values in the 'trait' (some of them are numbers - create a logical vector and get the cumulative sum) along with the other grouping 'geno_name', 'observation_id' (which doesn't create a unique column though))
Now summarise the other columns by slicing the first row after ordering based on NA elements i.e. if there are no NA, the first value will be non-NA or else it will be NA

library(dplyr)
library(stringr)
library(tidyr)
x %>%
   rename_at(vars(names(.)[!str_detect(names(.), "[_.]+")]), 
       ~ str_c("value...", .)) %>%
   pivot_longer(cols = 3:ncol(.), 
      names_to = c(".value", "trait"), names_sep = "\\.+") %>% 
   group_by(geno_name, observation_id, 
       grp = cumsum(str_detect(trait, "\\D+"))) %>%
   summarise(across(everything(), ~ .[order(is.na(.))][1]),
         .groups = 'drop') %>%
   select(-grp)

-output

# A tibble: 2 x 6
#  geno_name observation_id trait   value unit    method     
#  <chr>              <dbl> <chr>   <dbl> <chr>   <chr>      
#1 MB mixed              10 lipids  NA    <NA>    <NA>       
#2 MB mixed              10 density  1.12 g cm^-3 3D scanning

data

x <- structure(list(geno_name = "MB mixed", observation_id = 10, lipids = NA, 
    unit...3 = NA, method...4 = NA, density = 1.125, unit...6 = "g cm^-3", 
    method...7 = "3D scanning"), class = "data.frame", row.names = c(NA, 
-1L))

Gather or pivot_longer on multiple columns?

Consider this approach

df %>% 
  pivot_longer(matches("\\d$"), names_to = c("name", "year"), names_pattern = "([^\\d]+)(\\d+)$") %>% 
  pivot_wider()

First, transform the dataframe into one with only three columns id, nameyear, and value; concurrently separate the second column nameyear into name and year. Then, just pivot the two columns name and value wider.

Output

# A tibble: 14 x 4
      id year  emp   marstat 
   <int> <chr> <chr> <chr>   
 1     1 1     ft    married 
 2     1 2     ft    divorced
 3     2 1     ft    married 
 4     2 2     ft    married 
 5     3 1     pt    divorced
 6     3 2     ft    divorced
 7     4 1     pt    single  
 8     4 2     ft    single  
 9     5 1     ft    single  
10     5 2     no    single  
11     6 1     no    single  
12     6 2     pt    married 
13     7 1     no    single  
14     7 2     ft    single

How do we transform a dataset in R using pivot_longer with multiple columns

Probably not the most elegant solution, but I was able to solve my own problem using the steps below:

a <- df %>% 
  select(person,initial_event_date, type_initial) %>%
  mutate(visit_type = 'initial')
b <- df %>%
  filter(visit_prior == 'Y') %>%
  select(person, initial_event_date, prior_visit_type, day_cnt_prior) %>% 
  mutate(visit_type = 'visit_prior',
         day_cnt_prior = as.integer(day_cnt_prior))
c <- df %>% filter(visit_after == 'Y') %>%
  select(person, initial_event_date, visit_after_type, day_cnt_after) %>% 
  mutate(visit_type = 'visit_after',
         day_cnt_after = as.integer(day_cnt_after))

bind_rows(a,b,c) %>% 
  arrange(person) %>%
  mutate(visit_reason = dplyr::coalesce(type_initial, prior_visit_type, visit_after_type),
         visit_type   = dplyr::coalesce(visit_type),
         day_cnt      = dplyr::coalesce(day_cnt_after, day_cnt_prior)) %>% 
  select(person, initial_event_date,visit_type, visit_reason, day_cnt) %>% 
  replace_na(list(day_cnt = 0))

Using pivot_longer to restructure wide data, with multiple columns, from a spreadsheet

If we are interested in returning the 'FullName' and the 'SOCW' columns (duplicated) in single column, select the columns of interest, then use pivot_longer with names_pattern as the ".value" and capture the substring from the column name without the . ([^.]+) followed by digits.

library(dplyr)
library(tidyr)
my_data %>% 
    select(FullName, starts_with("SOCW")) %>% 
    pivot_longer(cols = starts_with("SOCW"), names_to = ".value", 
         names_pattern = '^(SOCW[^.]+)')
# A tibble: 6 x 6
  FullName SOCW725 SOCW748 SOCW799 SOCW752 SOCW782
  <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Beavis B    3.5     3.22    2.56    3.33    4.2 
2 Beavis B    2.33    3.23   NA      NA      NA   
3 Beavis B    3.33   NA      NA      NA      NA   
4 El Guapo    3.25    3.02    2.75    4.33    4.15
5 El Guapo    3.33    3.42   NA       4      NA   
6 El Guapo    2.67   NA      NA      NA      NA

data.frame doesn't by default allow duplicate column names. It uses make.unique to modify the column names by appending .1, .2, etc. for each duplicates.

if we need only three columns

library(stringr)
my_data %>% 
   select(FullName, starts_with("SOCW")) %>% 
   pivot_longer(cols = starts_with("SOCW")) %>% 
   mutate(name = str_remove(name, "\\.\\d+$"))
# A tibble: 18 x 3
   FullName name    value
   <chr>    <chr>   <dbl>
 1 Beavis B SOCW725  3.5 
 2 Beavis B SOCW748  3.22
 3 Beavis B SOCW799  2.56
 4 Beavis B SOCW725  2.33
 5 Beavis B SOCW752  3.33
 6 Beavis B SOCW782  4.2 
 7 Beavis B SOCW725  3.33
 8 Beavis B SOCW748  3.23
 9 Beavis B SOCW752 NA   
10 El Guapo SOCW725  3.25
11 El Guapo SOCW748  3.02
12 El Guapo SOCW799  2.75
13 El Guapo SOCW725  3.33
14 El Guapo SOCW752  4.33
15 El Guapo SOCW782  4.15
16 El Guapo SOCW725  2.67
17 El Guapo SOCW748  3.42
18 El Guapo SOCW752  4

data

my_data <- structure(list(FullName = c("Beavis B", "El Guapo"), SOCW725 = c(3.5, 
3.25), SOCW748 = c(3.22, 3.02), SOCW799 = c(2.56, 2.75), Average = c(3.07, 
3.18), SOCW725.1 = c(2.33, 3.33), SOCW752 = c(3.33, 4.33), SOCW782 = c(4.2, 
4.15), Average.1 = c(3.5, 2.25), SOCW725.2 = c(3.33, 2.67), SOCW748.1 = c(3.23, 
3.42), SOCW752.1 = c(NA, 4L), Average.2 = c(3, 2.44)),
 class = "data.frame", row.names = c(NA, 
-2L))

Gathering Wide Columns into Multiple Long Columns Using Pivot_Longer