Convert R dataframe from long to wide format, but with unequal group sizes, for use with qcc
You can create a sequence column ('.id') using getanID
from splitstackshape
and use dcast
from data.table
to convert the long format to wide format. The output of splitstackshape
is a data.table. When we load splitstackshape
, data.table will also be loaded. So, if you already have the devel version of data.table, then the dcast
from data.table
can be used as well.
library(splitstackshape)
dcast(getanID(df1, 'time'), time~.id, value.var='measure')
# time 1 2 3 4 5
#1: 2001 Q1 0.1468068 0.53593193 0.5609797 NA NA
#2: 2001 Q2 -1.4810269 0.18150972 NA NA NA
#3: 2001 Q3 1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502
Update
As @snoram mentioned in the comments, function rowid
from data.table
makes it easier to use just data.table
alone
library(data.table)
dcast(setDT(df1), time ~ rowid(time), value.var = "measure")
Combine long-format data frames with different length and convert to wide format
Using data.table
library(data.table)
dcast(setDT(fd), id ~ paste0('x.time', time), value.var = 'x')
-output
id x.time1 x.time2 x.time3 x.time4 x.time5
1: 1 0 0 0 0 0
2: 2 NA NA NA NA 1
3: 3 NA NA 0 NA NA
4: 4 NA 0 0 NA NA
5: 5 0 NA NA NA NA
Cannot accurately convert from long format to wide in r
We need to create a sequence column as there are duplicates
library(dplyr)
library(tidyr)
data_ige %>%
group_by(ID, date, test) %>%
mutate(rn = row_number()) %>%
ungroup %>%
spread(test, value) %>%
#or use pivot_wider as spread is getting deprecated
# pivot_wider(names_from = test, values_from = value) %>%
select(-rn)
# A tibble: 8 x 9
# ID date `1` `3` `4` `5` `6` `7` `8`
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A 2008 0.035 NA NA NA NA NA NA
#2 A 2011 2.75 NA NA NA NA NA NA
#3 B 2011 9.99 3.65 0.68 0.02 0.17 0.5 NA
#4 C 2008 0 NA NA NA NA NA NA
#5 C 2011 NA NA NA NA NA NA 0.09
#6 D 2008 0 0 0 0 0 0.59 0
#7 D 2011 0 0.49 0.2 0.08 0.16 0.5 0.13
#8 D 2011 9.99 NA NA NA NA NA NA
data
data_ige <- structure(list(ID = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), .Label = c("A", "B", "C", "D"), class = "factor"), date = c(2008,
2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2008, 2011, 2008,
2008, 2008, 2008, 2008, 2008, 2008, 2011, 2011, 2011, 2011, 2011,
2011, 2011), test = c(1, 1, 1, 3, 4, 5, 6, 7, 8, 1, 1, 1, 3,
4, 5, 6, 7, 8, 1, 3, 4, 5, 6, 7, 8), value = c(0.035, 2.75, 9.99,
3.65, 0.68, 0.02, 0.17, 0.5, 0.09, 0, 0, 0, 0, 0, 0, 0, 0.59,
0, 9.99, 0.49, 0.2, 0.08, 0.16, 0.5, 0.13)),
class = "data.frame", row.names = c(NA,
-25L))
How can you turn to a long, tidy format a dataframe with unequal number of columns?
You don't need to pivot here, just bind rows for each set of columns separately. You could manually do it just doing:
library(tidyverse)
bind_rows(
df[,1:3],
df[,c(1,4:5)],
df[,c(1,6:7)]
)
Then just filter out the rows with NA
values. If you have additional columns to do it, you can instead use purrr::map_dfr
on a numeric vector for column indexing to automatically select the correct columns and then bind them together. Then just use dplyr::filter(across(...)
to drop the rows with all NA
.
map_dfr(
seq(2,6,2),
~df[, c(1, .x, .x + 1)]
) %>%
filter(across(c(x,y), ~ !is.na(.x))) %>%
arrange(id, y, x)
#> # A tibble: 6 × 3
#> id x y
#> <chr> <dbl> <chr>
#> 1 T1 4 A
#> 2 T1 7 A
#> 3 T2 5 B
#> 4 T2 8 B
#> 5 T2 4 F
#> 6 T3 6 C
I added the final dplyr::arrange()
call to match your output, you can adjust to how you actually want to order your data.
Reshape long to wide with dates - R
We can use dcast
library(data.table)
dcast(setDT(df), id~paste0("date.", rowid(id)), value.var = "date")
# id date.1 date.2
#1: 1 2015-01-03 2012-03-04
#2: 2 2016-07-21 2016-09-08
Or using tidyverse
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
mutate(i1 = paste0("date.", row_number())) %>%
spread(i1, date)
dcast for huge dataframe [R]
The easy solution to this case turned out to be switching back to the old reshape package. Which means useing cast instead of dcast. Arun's comments are highly usable, providede one can actually update.
Related
Related Topics
Is There a Fast Parser for Date
How to Install 2 Different R Versions on Debian
Chloropleth Map with Geojson and Ggplot2
Gradient Breaks in a Ggplot Stat_Bin2D Plot
Why Should Someone Use {} for Initializing an Empty Object in R
R Reshape2 'Aggregation Function Missing: Defaulting to Length'
Ggplot2: Group X Axis Discrete Values into Subgroups
How to Extract Unique Elements from a Data.Frame in R
Remove Text Inside Brackets, Parens, And/Or Braces
How to Save a Data Frame in a Txt or Excel File Separated by Columns
Adjusting the Node Size in Igraph Using a Matrix
What Is the "Embracing Operator" '{{ }}'
Scraping a Complex HTML Table into a Data.Frame in R
Dplyr . and _No Visible Binding for Global Variable '.'_ Note in Package Check
How to Convert a Factor Column That Contains Decimal Numbers to Numeric