Reshape Wide to Long with Character Suffixes Instead of Numeric Suffixes

reshape wide to long with character suffixes instead of numeric suffixes

This works (to specify to varying what columns go with who):

reshape(dadmom, direction="long",  varying=list(c(2, 4), c(3, 5)), 
sep="", v.names=c("name", "inc"), timevar="dadmom",
times=c("d", "m"))

So you actually have nested repeated measures here; both name and inc for mom and dad. Because you have more than one series of repeated measures you have to supply a list to varying that tells reshape which group gets stacked on the other group.

So the two approaches to this problem are to provide a list as I did or to rename the columns the way the R beast likes them as you did.

See my recent blogs on base reshape for more on this (particularly the second link deals with this):

reshape (part I)

reshape (part II)

Reshape data set from wide to long format grouped by variable suffix

Using reshape we can set the cutpoints with sep="".

reshape(d, idvar="ID", varying=2:5, timevar="YEAR", sep="", direction="long")
# ID YEAR MI FRAC
# 1.1995 1 1995 2 3
# 7.1995 7 1995 3 10
# 10.1995 10 1995 1 2
# 1.1996 1 1996 2 4
# 7.1996 7 1996 12 1
# 10.1996 10 1996 1 1

Data

d <- structure(list(ID = c(1L, 7L, 10L), MI_1995 = c(2L, 3L, 1L),
FRAC_1995 = c(3L, 10L, 2L), MI_1996 = c(2L, 12L, 1L),
FRAC_1996 = c(4L, 1L, 1L)), row.names = c(NA, -3L),
class = "data.frame")

pandas wide_to_long suffix parameter

TLDR: Regex capturing groups can be used for the suffix parameter.

The suffix parameter tells pandas.wide_to_long which columns it should include in the transformation based on the suffix after the stub.

The default behavior of wide to long assumes that your columns are labeled with numbers so for instance columns A1, A2, A3, A4 will work fine without specifying the suffix parameter, while Aone, Atwo, Athree, Afour will fail.

As explained, it also has various other uses in the rare cases that your columns may be A1, A2, A3, A4, A100, and you don't want to actually include A100 because it isn't actually related to the other A# columns.

Here are some illustrative examples.

import pandas as pd
df = pd.DataFrame({'id': [1,2], 'A_1': ['a', 'b'],
'A_2': ['aa', 'bb'], 'A_3': ['aaa', 'bbb'],
'A_person': ['Mike', 'Amy']})

pd.wide_to_long(df, stubnames='A_', i='id', j='num')
# A_person A_
#id num
#1 1 Mike a
#2 1 Amy b
#1 2 Mike aa
#2 2 Amy bb
#1 3 Mike aaa
#2 3 Amy bbb

Because the default behavior is to only consider numbers, 'A_person' was ignored. If you wanted to add that to the conversion, then you would use the suffix parameter. Let's tell it we want either numbers or words.

pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='(\d+|\w+)')
# A_
#id suffix
#1 1 a
#2 1 b
#1 2 aa
#2 2 bb
#1 3 aaa
#2 3 bbb
#1 person Mike
#2 person Amy

Now if your df starts without numeric suffixes, you can take care of that with the suffix parameter too. The default call will fail because it expects numbers, but telling it to look for words gives you what you want.

df = pd.DataFrame({'id': [1,2], 'A_one': ['a', 'b'],
'A_two': ['aa', 'bb'], 'A_three': ['aaa', 'bbb'],
'A_person': ['Mike', 'Amy']})

pd.wide_to_long(df, stubnames='A_', i='id', j='num')
#Empty DataFrame
#Columns: [A_three, A_person, A_one, A_two, A_]
#Index: []

pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='\w+')
# A_
#id suffix
#1 one a
#2 one b
#1 person Mike
#2 person Amy
#1 three aaa
#2 three bbb
#1 two aa
#2 two bb

And if you don't want to include A_person you can tell the suffix parameter to only include certain stubs.

pd.wide_to_long(df, stubnames='A_', i='id', j='num', suffix='(one|two|three)')
# A_person A_
#id num
#1 one Mike a
#2 one Amy b
#1 three Mike aaa
#2 three Amy bbb
#1 two Mike aa
#2 two Amy bb

Basically, if you can capture it with regex, you can pass it to suffix to use only the columns you want.

Reshape from Long to Wide Format by Multiple Factors

base R:

One way can be:

reshape(cbind(dat1[1:2], stack(dat1, 3:4)), timevar = 'timeperiod',
dir = 'wide', idvar = c('name', 'ind'))

name ind values.Q1 values.Q2 values.Q3 values.Q4
1 firstName height 2 9 1 2
5 secondName height 11 15 16 10
9 firstName weight 1 4 2 8
13 secondName weight 2 9 1 2

If using other packages, consider recast function from reshape package:

reshape2::recast(dat1, name+variable~timeperiod, id.var = c('name', 'timeperiod'))
name variable Q1 Q2 Q3 Q4
1 firstName height 2 9 1 2
2 firstName weight 1 4 2 8
3 secondName height 11 15 16 10
4 secondName weight 2 9 1 2

R: long to wide transformation using reduce and setting suffixes

Here is a base R method using split and do.call:

# get list of data frame, drop the split factor (Species)
myList <- split(iris[, -which(names(iris) == "Species")], iris$Species)
# perform wide transformation
do.call(data.frame, myList)

This puts the species names at the front. It would not be too hard to move them to the back using gsub.

Here is part of the result:

  setosa.Sepal.Length setosa.Sepal.Width setosa.Petal.Length setosa.Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4

The other species are additional columns.

answer for Update #1

This gets a bit more complicated, though the first line is the same:

# get list of data frame, drop the split factor (Species)
myList <- split(iris[, -which(names(iris) == "Species")], iris$Species)
# add names to data.frames
myList <- lapply(names(myList),
function(i) {
setNames(myList[[i]],
c(paste0(head(names(myList[[i]]), -1), ".", i), "id"))
})

# merge the data.frames together
Reduce(function(x, y) {merge(x, y, by="id", all=TRUE)}, myList)

This results in the naming that you wanted with the Species appended to the end of each variable.

How to Restructure R Data Frame in R

Here is a solution that uses tidyr. Specifically, the gather function is used to combine the two employee columns. This also generates a column bsaed on the column headers (employee1 and employee2) which is called key. We remove that with select from dplyr.

library(tidyr)
library(dplyr)

df <- read.table(
text = "boss employee1 employee2
1 wil james andy
2 james dean bert
3 billy herb collin
4 tony mike david",
header = TRUE,
stringsAsFactors = FALSE
)

df2 <- df %>%
gather(key, employee, -boss) %>%
select(-key)

> df2
boss employee
1 wil james
2 james dean
3 billy herb
4 tony mike
5 wil andy
6 james bert
7 billy collin
8 tony david

I would be shocked if there isn't a slicker, base solution but this should work for you.

reshape from wide to long group of variables

If you've only got two locations, you can just chuck them in regex, accounting for the fact that they could be at the beginning or end of the name:

library(tidyverse)

df_wide %>%
gather(variable, value, -Month) %>%
mutate(location = sub('.*(Cabo|Acapulco).*', '\\1', variable),
variable = sub('_?(Cabo|Acapulco)_?', '', variable)) %>%
spread(variable, value)
#> # A tibble: 24 x 6
#> Month location BED_BUGS BU_PCT LOS_AVG TOTAL_OCCUPIED
#> * <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Acapulco 3 0.6260116 4.307667 6498
#> 2 1 Cabo 5 0.6470034 5.223000 19216
#> 3 2 Acapulco 0 0.6777457 4.247500 6566
#> 4 2 Cabo 3 0.6167027 5.893571 17095
#> 5 3 Acapulco 1 0.6348126 4.327742 6809
#> 6 3 Cabo 5 0.6372108 5.229677 19556
#> 7 4 Acapulco 6 0.6548170 4.220000 6797
#> 8 4 Cabo 4 0.6357912 5.356667 18883
#> 9 5 Acapulco 5 0.6409659 4.162903 6875
#> 10 5 Cabo 2 0.6449006 5.344194 19792
#> # ... with 14 more rows


Related Topics



Leave a reply



Submit