Using PIVOT to Flip Data from Wide to Tall
You can unpivot the data using CROSS APPLY (VALUES)
. Here is an article to explain how this is done:
http://www.sqlservercentral.com/articles/CROSS+APPLY+VALUES+UNPIVOT/91234/
Basically the code is:
SELECT vend,
year,
month,
dols,
qty
FROM YourTable t
CROSS APPLY
(
VALUES
(1, I1_DOLS, I1_QTY),
(2, I2_DOLS, I2_QTY),
(3, I3_DOLS, I3_QTY)
) x (month, dols, qty);
See SQL Fiddle with Demo
Or you could use a UNION ALL
query:
select vend, year, 1 month, [I1_DOLS] Dols, [I1_QTY] Qty
from yourtable
union all
select vend, year, 2 month, [I2_DOLS] Dols, [I2_QTY] Qty
from yourtable
union all
select vend, year, 3 month, [I3_DOLS] Dols, [I3_QTY] Qty
from yourtable
See SQL Fiddle with Demo
Or you can even apply both the UNPIVOT
and the PIVOT
function to transform the data:
select *
from
(
select vend,
year,
replace(replace(replace(col, 'I', ''), '_Dols', ''), '_Qty', '') month,
case when col like '%Dols%' then 'dols' else 'qty' end col_name,
value
from
(
select vend, year, [I1_DOLS], [I1_QTY], [I2_DOLS], [I2_QTY], [I3_DOLS], [I3_QTY]
from yourtable
) src
unpivot
(
value
for col in ([I1_DOLS], [I1_QTY], [I2_DOLS], [I2_QTY], [I3_DOLS], [I3_QTY])
) un
) unp
pivot
(
max(value)
for col_name in (dols, qty)
) piv
See SQL Fiddle with Demo.
All three will give the same result:
| VEND | YEAR | MONTH | DOLS | QTY |
--------------------------------------
| 1234 | 2011 | 1 | 101587 | 508 |
| 1234 | 2011 | 2 | 203345 | 334 |
| 1234 | 2011 | 3 | 105938 | 257 |
| 1234 | 2012 | 1 | 257843 | 587 |
| 1234 | 2012 | 2 | 235883 | 247 |
| 1234 | 2012 | 3 | 178475 | 456 |
| 1011 | 2010 | 1 | 584737 | 432 |
| 1011 | 2010 | 2 | 587274 | 356 |
| 1011 | 2010 | 3 | 175737 | 563 |
| 1011 | 2011 | 1 | 517774 | 356 |
| 1011 | 2011 | 2 | 483858 | 456 |
| 1011 | 2011 | 3 | 481785 | 354 |
R pivot_longer(): tidyr wide to long manipulation reverse pivot summary to individual values
You can get the data in long format and use uncount
to replicate rows.
library(tidyr)
pivot_longer(household.data, cols = -household.composition) %>% uncount(value)
# A tibble: 199 x 2
# household.composition name
# <chr> <chr>
# 1 Singles town.a
# 2 Singles town.a
# 3 Singles town.a
# 4 Singles town.a
# 5 Singles town.a
# 6 Singles town.a
# 7 Singles town.a
# 8 Singles town.a
# 9 Singles town.a
#10 Singles town.a
# … with 189 more rows
R, pivot wide to long while changing column names
Here is one option where we create a key/value dataset based on the multiple choice question and the column names, then do a join with the reshaped data to return the mapped column
library(dplyr)
library(tidyr)
library(stringr)
keydat <- tibble(name = str_c("oncore_roles___", 1:12),
Oncore_role = c("Calendar Build", "Protocol Management",
"Subject Managment", "Financials", "Principal Investigator",
"Protocol Management Finance", "Regulatory",
"Investigational Pharmacist", "Division Director", "CTO Signoff",
"Roles Administration", "Statistical Analysis"))
df %>%
pivot_longer(cols = starts_with('oncore_roles')) %>%
filter(value == 1) %>%
inner_join(keydat) %>%
select(-name)
-output
# A tibble: 6 × 7
fname employee_number job_role ActiveAccount CanAccess value Oncore_role
<chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 Linda 00000123456 Dept Research Admin Yes No 1 Calendar Build
2 Linda 00000123456 Dept Research Admin Yes No 1 Protocol Management
3 Linda 00000123456 Dept Research Admin Yes No 1 Subject Managment
4 Bob 654321 Research Regulatory Assistant Yes No 1 Principal Investigator
5 Bob 654321 Research Regulatory Assistant Yes No 1 Regulatory
6 Bob 654321 Research Regulatory Assistant Yes No 1 Statistical Analysis
Pivoting wide to long format and then nesting columns
A tidyverse approach to achieve your desired result may look like so:
library(tibble)
df_1 <-
tribble(~key, ~values.male, ~values.female, ~values.red, ~values.green, ~value,
"gender", 0.5, 0.5, NA, NA, NA,
"age", NA, NA, NA, NA, "50",
"color", NA, NA, TRUE, FALSE, NA,
"time_of_day", NA, NA, NA, NA, "noon")
library(tidyr)
library(dplyr)
library(purrr)
df_pivoted <- df_1 %>%
mutate(across(everything(), as.character)) %>%
pivot_longer(-key, names_to = "level", names_prefix = "^values\\.", values_drop_na = TRUE) %>%
group_by(key) %>%
nest() %>%
mutate(data = map(data, ~ if (all(.x$level == "value")) deframe(.x) else .x))
df_pivoted
#> # A tibble: 4 x 2
#> # Groups: key [4]
#> key data
#> <chr> <list>
#> 1 gender <tibble [2 × 2]>
#> 2 age <chr [1]>
#> 3 color <tibble [2 × 2]>
#> 4 time_of_day <chr [1]>
EDIT Following the clarification in your comments on the desired result we could simply get rid of the map statement as the end (which basically was meant for converting the tibbles for categories without levels to a vector) and add a mutate statement before nesting to replace the level with NA for categories without a level
:
pivot_nest <- function(x) {
mutate(x, across(everything(), as.character)) %>%
pivot_longer(-key, names_to = "level", names_prefix = "^values\\.", values_drop_na = TRUE) %>%
group_by(key) %>%
mutate(level = ifelse(all(level == "value"), NA_character_, level)) %>%
nest()
}
df_pivoted <- df_1 %>%
pivot_nest()
df_pivoted
#> # A tibble: 4 x 2
#> # Groups: key [4]
#> key data
#> <chr> <list>
#> 1 gender <tibble [2 × 2]>
#> 2 age <tibble [1 × 2]>
#> 3 color <tibble [2 × 2]>
#> 4 time_of_day <tibble [1 × 2]>
df_pivoted$data
#> [[1]]
#> # A tibble: 2 x 2
#> level value
#> <chr> <chr>
#> 1 male 0.5
#> 2 male 0.5
#>
#> [[2]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> 50
#>
#> [[3]]
#> # A tibble: 2 x 2
#> level value
#> <chr> <chr>
#> 1 red TRUE
#> 2 red FALSE
#>
#> [[4]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> noon
df_2 <- tribble(~key, ~value, "age", "50", "income", "100000", "time_of_day", "noon")
df_pivoted2 <- df_2 %>%
pivot_nest()
df_pivoted2
#> # A tibble: 3 x 2
#> # Groups: key [3]
#> key data
#> <chr> <list>
#> 1 age <tibble [1 × 2]>
#> 2 income <tibble [1 × 2]>
#> 3 time_of_day <tibble [1 × 2]>
df_pivoted2$data
#> [[1]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> 50
#>
#> [[2]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> 100000
#>
#> [[3]]
#> # A tibble: 1 x 2
#> level value
#> <chr> <chr>
#> 1 <NA> noon
How to pivot from wide to long format based on multiple column name separators?
I experimented with the results I produced in the earlier solution and pivot_wider then pivot_longer and found how to make it work for pivot_longer.Your original approach was very close.
dat %>%
pivot_longer(
cols = !c(part, type, sp),
names_to = c("var", "site", "side", "misc"),
names_sep = "_",
values_to = "value"
)
part type sp var site side misc value
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
7 P1 pre slow var2 site2 ALL NA 1
8 P1 pre slow var2 site2 ALL M 1
9 P1 pre slow var1 site1 L NA 1
10 P1 pre slow var1 site1 R NA 1
Reshaping data.frame from wide to long format
reshape()
takes a while to get used to, just as melt
/cast
. Here is a solution with reshape, assuming your data frame is called d
:
reshape(d,
direction = "long",
varying = list(names(d)[3:7]),
v.names = "Value",
idvar = c("Code", "Country"),
timevar = "Year",
times = 1950:1954)
How to pivot wide to long format for columns with different names?
One of the corresponding set of columns i.e Finding, 'Additional Finding' is slightly different than the other sets as those sets of column names have value 2 at the end. Thus, we change only the suffix part of the column 'Finding' columns and then use pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_with(~ str_c("Finding", seq_along(.x)), contains("Finding")) %>%
pivot_longer(cols = -c(Date, Site), names_to = c(".value"),
names_pattern = "(\\D+)\\d*$", values_drop_na = TRUE)
-output
# A tibble: 8 × 6
Date Site Finding `Topic Area` Type `Risk Ranking`
<dttm> <chr> <chr> <chr> <chr> <chr>
1 2022-03-29 00:00:00 A Yes A 1 Medium
2 2022-03-29 00:00:00 A Yes A 1 Medium
3 2022-03-30 00:00:00 B Yes B 2 Low
4 2022-03-30 00:00:00 B Yes B 2 Medium
5 2022-03-25 00:00:00 C Yes C 3 Medium
6 2022-03-25 00:00:00 C Yes C 2 Low
7 2022-03-25 00:00:00 D Yes D 4 Medium
8 2022-03-25 00:00:00 D Yes D 3 Medium
Pivot from long format to wide format in a dataframe
We could use pivot_wider
from tidyr
after creating a sequence column with rowid
library(dplyr)
library(tidyr0
library(data.table)
Moment %>%
mutate(rn = rowid(SectionCut)) %>%
pivot_wider(names_from = SectionCut, values_from = DF)
-output
# A tibble: 4 x 9
# N_l UG S Sample rn `1` `2` `3` Left
# <chr> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
#1 1 84 12 S00 1 0.367 0.000319 0.413 0.483
#2 2 84 12 S00 2 0.540 0.000505 0.611 0.646
#3 3 84 12 S00 3 0.591 0.000511 0.685 0.626
#4 4 84 12 S00 4 0.497 0.000395 0.595 NA
R Pivot multiple columns from wide to long
Here is an option with pivot_longer
after rearranging the column names substring. We rename
the columns that starts with 'Day' by rearraning the the words i.e. the substring 'a', 'b' in the column name is shifted to the end with the delimiter _
by capturing those words ((\\w+)
) and then in the replacement for str_replace
, changing the backreference order (\\1
, \\2
etc)
library(dplyr)
library(stringr)
library(tidyr)
dat %>%
rename_at(vars(starts_with('Day')),
~ str_replace(., '(\\w+) (\\w+) (\\w+)', "\\1\\3_\\2")) %>%
pivot_longer(cols = starts_with('Day'),
names_to = c( ".value", "DayID"), names_sep="_")
-output
# A tibble: 6 x 5
# aColumn anotherColumn DayID DayTime DayValue
# <chr> <chr> <chr> <chr> <chr>
#1 stuff more stuff a 12:00 R
#2 stuff more stuff b 12:45 R
#3 stuff more stuff a 13:00 I
#4 stuff more stuff b 14:00 P
#5 stuff more stuff a 14:00 O
#6 stuff more stuff b 15:35 O
How to reshape/recast data in Excel from wide- to long-format
There is a simple R script that will help with this. The function accepts the path to your csv file and the number of header variables you have. In the example image/data I provided, there are 7 header variables. That is, the actual data (T9_est1) starts on the 8th column.
# Use the command below if you do not have the tidyverse package installed.
# install.packages("tidyverse")
library(tidyverse)
read_data_long <- function(path_to_csv, header_vars) {
data_table <- read_csv(path_to_csv)
fields_to_melt <- names(data_table[,as.numeric(header_vars+1):ncol(data_table)])
melted <- gather(data_table, fields_to_melt, key = 'variable', value = 'values')
return(melted)
}
# Change the file path to where your data is and where you want it written to.
# Also change "7" to the number of header variables your data has.
melted_data <- read_data_long("path_to_input_file.csv", 7)
write_csv(melted_data, "new_path_to_melted_file.csv")
(Updated 7/25/18 with a more elegant solution; Again 9/28/18 with small change.)
Related Topics
4 Byte Unsigned Int in SQL Server
SQL Server 'In' or 'Or' - Which Is Fastest
ASP.NET MVC: Execute Razor from Db String
T-SQL Get Number of Working Days Between 2 Dates
Splitting Variable Length Delimited String Across Multiple Rows (Sql)
Max Row Size in SQL Server 2012 with Varchar(Max) Fields
Postgresql, Select from Max Id
How to List All the Column Names in Netezza
Powershell SQL Select Output to Variable
SQL Pivot Select from List (In Select)
Odd SQL Server 2012 Identity Issue
Getting a Dynamically-Generated Pivot-Table into a Temp Table
How to Get the Latest Record for Each Item in Cosmosdb Using SQL
Entity Framework and Cross/Outer Apply