Using Pivot to Flip Data from Wide to Tall

Using PIVOT to Flip Data from Wide to Tall

You can unpivot the data using CROSS APPLY (VALUES). Here is an article to explain how this is done:

http://www.sqlservercentral.com/articles/CROSS+APPLY+VALUES+UNPIVOT/91234/

Basically the code is:

SELECT vend,
year,
month,
dols,
qty
FROM YourTable t
CROSS APPLY
(
VALUES
(1, I1_DOLS, I1_QTY),
(2, I2_DOLS, I2_QTY),
(3, I3_DOLS, I3_QTY)
) x (month, dols, qty);

See SQL Fiddle with Demo

Or you could use a UNION ALL query:

select vend, year, 1 month, [I1_DOLS] Dols, [I1_QTY] Qty
from yourtable
union all
select vend, year, 2 month, [I2_DOLS] Dols, [I2_QTY] Qty
from yourtable
union all
select vend, year, 3 month, [I3_DOLS] Dols, [I3_QTY] Qty
from yourtable

See SQL Fiddle with Demo

Or you can even apply both the UNPIVOT and the PIVOT function to transform the data:

select *
from
(
select vend,
year,
replace(replace(replace(col, 'I', ''), '_Dols', ''), '_Qty', '') month,
case when col like '%Dols%' then 'dols' else 'qty' end col_name,
value
from
(
select vend, year, [I1_DOLS], [I1_QTY], [I2_DOLS], [I2_QTY], [I3_DOLS], [I3_QTY]
from yourtable
) src
unpivot
(
value
for col in ([I1_DOLS], [I1_QTY], [I2_DOLS], [I2_QTY], [I3_DOLS], [I3_QTY])
) un
) unp
pivot
(
max(value)
for col_name in (dols, qty)
) piv

See SQL Fiddle with Demo.

All three will give the same result:

| VEND | YEAR | MONTH |   DOLS | QTY |
--------------------------------------
| 1234 | 2011 | 1 | 101587 | 508 |
| 1234 | 2011 | 2 | 203345 | 334 |
| 1234 | 2011 | 3 | 105938 | 257 |
| 1234 | 2012 | 1 | 257843 | 587 |
| 1234 | 2012 | 2 | 235883 | 247 |
| 1234 | 2012 | 3 | 178475 | 456 |
| 1011 | 2010 | 1 | 584737 | 432 |
| 1011 | 2010 | 2 | 587274 | 356 |
| 1011 | 2010 | 3 | 175737 | 563 |
| 1011 | 2011 | 1 | 517774 | 356 |
| 1011 | 2011 | 2 | 483858 | 456 |
| 1011 | 2011 | 3 | 481785 | 354 |

R pivot_longer(): tidyr wide to long manipulation reverse pivot summary to individual values

You can get the data in long format and use uncount to replicate rows.

library(tidyr)
pivot_longer(household.data, cols = -household.composition) %>% uncount(value)

# A tibble: 199 x 2
# household.composition name
# <chr> <chr>
# 1 Singles town.a
# 2 Singles town.a
# 3 Singles town.a
# 4 Singles town.a
# 5 Singles town.a
# 6 Singles town.a
# 7 Singles town.a
# 8 Singles town.a
# 9 Singles town.a
#10 Singles town.a
# … with 189 more rows

R, pivot wide to long while changing column names

Here is one option where we create a key/value dataset based on the multiple choice question and the column names, then do a join with the reshaped data to return the mapped column

library(dplyr)
library(tidyr)
library(stringr)
keydat <- tibble(name = str_c("oncore_roles___", 1:12),
Oncore_role = c("Calendar Build", "Protocol Management",
"Subject Managment", "Financials", "Principal Investigator",
"Protocol Management Finance", "Regulatory",
"Investigational Pharmacist", "Division Director", "CTO Signoff",
"Roles Administration", "Statistical Analysis"))
df %>%
pivot_longer(cols = starts_with('oncore_roles')) %>%
filter(value == 1) %>%
inner_join(keydat) %>%
select(-name)

-output

# A tibble: 6 × 7
fname employee_number job_role ActiveAccount CanAccess value Oncore_role
<chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 Linda 00000123456 Dept Research Admin Yes No 1 Calendar Build
2 Linda 00000123456 Dept Research Admin Yes No 1 Protocol Management
3 Linda 00000123456 Dept Research Admin Yes No 1 Subject Managment
4 Bob 654321 Research Regulatory Assistant Yes No 1 Principal Investigator
5 Bob 654321 Research Regulatory Assistant Yes No 1 Regulatory
6 Bob 654321 Research Regulatory Assistant Yes No 1 Statistical Analysis

Pivoting wide to long format and then nesting columns

A tidyverse approach to achieve your desired result may look like so:

library(tibble)

df_1 <-
tribble(~key, ~values.male, ~values.female, ~values.red, ~values.green, ~value,
"gender", 0.5, 0.5, NA, NA, NA,
"age", NA, NA, NA, NA, "50",
"color", NA, NA, TRUE, FALSE, NA,
"time_of_day", NA, NA, NA, NA, "noon")

library(tidyr)
library(dplyr)
library(purrr)

df_pivoted <- df_1 %>%
mutate(across(everything(), as.character)) %>%
pivot_longer(-key, names_to = "level", names_prefix = "^values\\.", values_drop_na = TRUE) %>%
group_by(key) %>%
nest() %>%
mutate(data = map(data, ~ if (all(.x$level == "value")) deframe(.x) else .x))
df_pivoted
#> # A tibble: 4 x 2
#> # Groups: key [4]
#> key data
#> <chr> <list>
#> 1 gender <tibble [2 × 2]>
#> 2 age <chr [1]>
#> 3 color <tibble [2 × 2]>
#> 4 time_of_day <chr [1]>

EDIT Following the clarification in your comments on the desired result we could simply get rid of the map statement as the end (which basically was meant for converting the tibbles for categories without levels to a vector) and add a mutate statement before nesting to replace the level with NA for categories without a level:

pivot_nest <- function(x) {
mutate(x, across(everything(), as.character)) %>%
pivot_longer(-key, names_to = "level", names_prefix = "^values\\.", values_drop_na = TRUE) %>%
group_by(key) %>%
mutate(level = ifelse(all(level == "value"), NA_character_, level)) %>%
nest()
}

df_pivoted <- df_1 %>%
pivot_nest()
df_pivoted
#> # A tibble: 4 x 2
#> # Groups: key [4]
#> key data
#> <chr> <list>
#> 1 gender <tibble [2 × 2]>
#> 2 age <tibble [1 × 2]>
#> 3 color <tibble [2 × 2]>
#> 4 time_of_day <tibble [1 × 2]>
df_pivoted$data
#> [[1]]
#> # A tibble: 2 x 2
#> level value
#> <chr> <chr>
#> 1 male 0.5
#> 2 male 0.5
#>
#> [[2]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> 50
#>
#> [[3]]
#> # A tibble: 2 x 2
#> level value
#> <chr> <chr>
#> 1 red TRUE
#> 2 red FALSE
#>
#> [[4]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> noon

df_2 <- tribble(~key, ~value, "age", "50", "income", "100000", "time_of_day", "noon")

df_pivoted2 <- df_2 %>%
pivot_nest()
df_pivoted2
#> # A tibble: 3 x 2
#> # Groups: key [3]
#> key data
#> <chr> <list>
#> 1 age <tibble [1 × 2]>
#> 2 income <tibble [1 × 2]>
#> 3 time_of_day <tibble [1 × 2]>
df_pivoted2$data
#> [[1]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> 50
#>
#> [[2]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> 100000
#>
#> [[3]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> noon

How to pivot from wide to long format based on multiple column name separators?

I experimented with the results I produced in the earlier solution and pivot_wider then pivot_longer and found how to make it work for pivot_longer.Your original approach was very close.

 dat %>%
pivot_longer(
cols = !c(part, type, sp),
names_to = c("var", "site", "side", "misc"),
names_sep = "_",
values_to = "value"
)

part type sp var site side misc value
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
7 P1 pre slow var2 site2 ALL NA 1
8 P1 pre slow var2 site2 ALL M 1
9 P1 pre slow var1 site1 L NA 1
10 P1 pre slow var1 site1 R NA 1

Reshaping data.frame from wide to long format

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

reshape(d, 
direction = "long",
varying = list(names(d)[3:7]),
v.names = "Value",
idvar = c("Code", "Country"),
timevar = "Year",
times = 1950:1954)

How to pivot wide to long format for columns with different names?

One of the corresponding set of columns i.e Finding, 'Additional Finding' is slightly different than the other sets as those sets of column names have value 2 at the end. Thus, we change only the suffix part of the column 'Finding' columns and then use pivot_longer

library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_with(~ str_c("Finding", seq_along(.x)), contains("Finding")) %>%
pivot_longer(cols = -c(Date, Site), names_to = c(".value"),
names_pattern = "(\\D+)\\d*$", values_drop_na = TRUE)

-output

# A tibble: 8 × 6
Date Site Finding `Topic Area` Type `Risk Ranking`
<dttm> <chr> <chr> <chr> <chr> <chr>
1 2022-03-29 00:00:00 A Yes A 1 Medium
2 2022-03-29 00:00:00 A Yes A 1 Medium
3 2022-03-30 00:00:00 B Yes B 2 Low
4 2022-03-30 00:00:00 B Yes B 2 Medium
5 2022-03-25 00:00:00 C Yes C 3 Medium
6 2022-03-25 00:00:00 C Yes C 2 Low
7 2022-03-25 00:00:00 D Yes D 4 Medium
8 2022-03-25 00:00:00 D Yes D 3 Medium

Pivot from long format to wide format in a dataframe

We could use pivot_wider from tidyr after creating a sequence column with rowid

library(dplyr)
library(tidyr0
library(data.table)
Moment %>%
mutate(rn = rowid(SectionCut)) %>%
pivot_wider(names_from = SectionCut, values_from = DF)

-output

# A tibble: 4 x 9
# N_l UG S Sample rn `1` `2` `3` Left
# <chr> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#1 1 84 12 S00 1 0.367 0.000319 0.413 0.483
#2 2 84 12 S00 2 0.540 0.000505 0.611 0.646
#3 3 84 12 S00 3 0.591 0.000511 0.685 0.626
#4 4 84 12 S00 4 0.497 0.000395 0.595 NA

R Pivot multiple columns from wide to long

Here is an option with pivot_longer after rearranging the column names substring. We rename the columns that starts with 'Day' by rearraning the the words i.e. the substring 'a', 'b' in the column name is shifted to the end with the delimiter _ by capturing those words ((\\w+)) and then in the replacement for str_replace, changing the backreference order (\\1, \\2 etc)

library(dplyr)
library(stringr)
library(tidyr)
dat %>%
rename_at(vars(starts_with('Day')),
~ str_replace(., '(\\w+) (\\w+) (\\w+)', "\\1\\3_\\2")) %>%
pivot_longer(cols = starts_with('Day'),
names_to = c( ".value", "DayID"), names_sep="_")

-output

# A tibble: 6 x 5
# aColumn anotherColumn DayID DayTime DayValue
# <chr> <chr> <chr> <chr> <chr>
#1 stuff more stuff a 12:00 R
#2 stuff more stuff b 12:45 R
#3 stuff more stuff a 13:00 I
#4 stuff more stuff b 14:00 P
#5 stuff more stuff a 14:00 O
#6 stuff more stuff b 15:35 O

How to reshape/recast data in Excel from wide- to long-format

There is a simple R script that will help with this. The function accepts the path to your csv file and the number of header variables you have. In the example image/data I provided, there are 7 header variables. That is, the actual data (T9_est1) starts on the 8th column.

# Use the command below if you do not have the tidyverse package installed. 
# install.packages("tidyverse")
library(tidyverse)

read_data_long <- function(path_to_csv, header_vars) {
data_table <- read_csv(path_to_csv)
fields_to_melt <- names(data_table[,as.numeric(header_vars+1):ncol(data_table)])
melted <- gather(data_table, fields_to_melt, key = 'variable', value = 'values')
return(melted)
}

# Change the file path to where your data is and where you want it written to.
# Also change "7" to the number of header variables your data has.
melted_data <- read_data_long("path_to_input_file.csv", 7)
write_csv(melted_data, "new_path_to_melted_file.csv")

(Updated 7/25/18 with a more elegant solution; Again 9/28/18 with small change.)



Related Topics



Leave a reply



Submit