Reshape Wide to Long with Character Suffixes Instead of Numeric Suffixes

reshape wide to long with character suffixes instead of numeric suffixes

This works (to specify to varying what columns go with who):

reshape(dadmom, direction="long",  varying=list(c(2, 4), c(3, 5)), 
        sep="", v.names=c("name", "inc"), timevar="dadmom",
        times=c("d", "m"))

So you actually have nested repeated measures here; both name and inc for mom and dad. Because you have more than one series of repeated measures you have to supply a list to varying that tells reshape which group gets stacked on the other group.

So the two approaches to this problem are to provide a list as I did or to rename the columns the way the R beast likes them as you did.

See my recent blogs on base reshape for more on this (particularly the second link deals with this):

reshape (part I)

reshape (part II)

Reshape data set from wide to long format grouped by variable suffix

Using reshape we can set the cutpoints with sep="".

reshape(d, idvar="ID", varying=2:5, timevar="YEAR", sep="", direction="long")
#         ID YEAR MI FRAC
# 1.1995   1 1995  2    3
# 7.1995   7 1995  3   10
# 10.1995 10 1995  1    2
# 1.1996   1 1996  2    4
# 7.1996   7 1996 12    1
# 10.1996 10 1996  1    1

Data

d <- structure(list(ID = c(1L, 7L, 10L), MI_1995 = c(2L, 3L, 1L),
                    FRAC_1995 = c(3L, 10L, 2L), MI_1996 = c(2L, 12L, 1L),
                    FRAC_1996 = c(4L, 1L, 1L)), row.names = c(NA, -3L),
               class = "data.frame")

pandas wide_to_long suffix parameter

TLDR: Regex capturing groups can be used for the suffix parameter.

The suffix parameter tells pandas.wide_to_long which columns it should include in the transformation based on the suffix after the stub.

The default behavior of wide to long assumes that your columns are labeled with numbers so for instance columns A1, A2, A3, A4 will work fine without specifying the suffix parameter, while Aone, Atwo, Athree, Afour will fail.

As explained, it also has various other uses in the rare cases that your columns may be A1, A2, A3, A4, A100, and you don't want to actually include A100 because it isn't actually related to the other A# columns.

Here are some illustrative examples.

import pandas as pd
df = pd.DataFrame({'id': [1,2], 'A_1': ['a', 'b'],
                  'A_2': ['aa', 'bb'], 'A_3': ['aaa', 'bbb'],
                  'A_person': ['Mike', 'Amy']})

pd.wide_to_long(df, stubnames='A_', i='id', j='num')
#       A_person   A_
#id num              
#1  1       Mike    a
#2  1        Amy    b
#1  2       Mike   aa
#2  2        Amy   bb
#1  3       Mike  aaa
#2  3        Amy  bbb

Because the default behavior is to only consider numbers, 'A_person' was ignored. If you wanted to add that to the conversion, then you would use the suffix parameter. Let's tell it we want either numbers or words.

pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='(\d+|\w+)')
#             A_
#id suffix         
#1  1          a
#2  1          b
#1  2         aa
#2  2         bb
#1  3        aaa
#2  3        bbb
#1  person  Mike
#2  person   Amy

Now if your df starts without numeric suffixes, you can take care of that with the suffix parameter too. The default call will fail because it expects numbers, but telling it to look for words gives you what you want.

df = pd.DataFrame({'id': [1,2], 'A_one': ['a', 'b'],
                  'A_two': ['aa', 'bb'], 'A_three': ['aaa', 'bbb'],
                  'A_person': ['Mike', 'Amy']})

pd.wide_to_long(df, stubnames='A_', i='id', j='num')
#Empty DataFrame
#Columns: [A_three, A_person, A_one, A_two, A_]
#Index: []

pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='\w+')
#             A_
#id suffix         
#1  one        a
#2  one        b
#1  person  Mike
#2  person   Amy
#1  three    aaa
#2  three    bbb
#1  two       aa
#2  two       bb

And if you don't want to include A_person you can tell the suffix parameter to only include certain stubs.

pd.wide_to_long(df, stubnames='A_', i='id', j='num', suffix='(one|two|three)')
#         A_person   A_
#id num                
#1  one       Mike    a
#2  one        Amy    b
#1  three     Mike  aaa
#2  three      Amy  bbb
#1  two       Mike   aa
#2  two        Amy   bb

Basically, if you can capture it with regex, you can pass it to suffix to use only the columns you want.

Reshape from Long to Wide Format by Multiple Factors

base R:

One way can be:

reshape(cbind(dat1[1:2], stack(dat1, 3:4)), timevar = 'timeperiod',
        dir = 'wide', idvar = c('name', 'ind'))

         name    ind values.Q1 values.Q2 values.Q3 values.Q4
1   firstName height         2         9         1         2
5  secondName height        11        15        16        10
9   firstName weight         1         4         2         8
13 secondName weight         2         9         1         2

If using other packages, consider recast function from reshape package:

reshape2::recast(dat1, name+variable~timeperiod, id.var = c('name', 'timeperiod'))
        name variable Q1 Q2 Q3 Q4
1  firstName   height  2  9  1  2
2  firstName   weight  1  4  2  8
3 secondName   height 11 15 16 10
4 secondName   weight  2  9  1  2

R: long to wide transformation using reduce and setting suffixes

Here is a base R method using split and do.call:

# get list of data frame, drop the split factor (Species)
myList <- split(iris[, -which(names(iris) == "Species")], iris$Species)
# perform wide transformation
do.call(data.frame, myList)

This puts the species names at the front. It would not be too hard to move them to the back using gsub.

Here is part of the result:

  setosa.Sepal.Length setosa.Sepal.Width setosa.Petal.Length setosa.Petal.Width
1                  5.1                3.5                 1.4                0.2
2                  4.9                3.0                 1.4                0.2
3                  4.7                3.2                 1.3                0.2
4                  4.6                3.1                 1.5                0.2
5                  5.0                3.6                 1.4                0.2
6                  5.4                3.9                 1.7                0.4

The other species are additional columns.

answer for Update #1

This gets a bit more complicated, though the first line is the same:

# get list of data frame, drop the split factor (Species)
myList <- split(iris[, -which(names(iris) == "Species")], iris$Species)
# add names to data.frames
myList <- lapply(names(myList),
                 function(i) {
                       setNames(myList[[i]],
                         c(paste0(head(names(myList[[i]]), -1), ".", i), "id"))
                 })

# merge the data.frames together
Reduce(function(x, y) {merge(x, y, by="id", all=TRUE)}, myList)

This results in the naming that you wanted with the Species appended to the end of each variable.

How to Restructure R Data Frame in R

Here is a solution that uses tidyr. Specifically, the gather function is used to combine the two employee columns. This also generates a column bsaed on the column headers (employee1 and employee2) which is called key. We remove that with select from dplyr.

library(tidyr)
library(dplyr)

df <- read.table(
      text = "boss employee1 employee2
      1   wil     james      andy
      2 james      dean      bert
      3 billy      herb    collin
      4  tony      mike     david",
      header = TRUE,
      stringsAsFactors = FALSE
    )

    df2 <- df %>%
      gather(key, employee, -boss) %>%
      select(-key)

> df2
   boss employee
1   wil    james
2 james     dean
3 billy     herb
4  tony     mike
5   wil     andy
6 james     bert
7 billy   collin
8  tony    david

I would be shocked if there isn't a slicker, base solution but this should work for you.

reshape from wide to long group of variables

If you've only got two locations, you can just chuck them in regex, accounting for the fact that they could be at the beginning or end of the name:

library(tidyverse)

df_wide %>% 
    gather(variable, value, -Month) %>% 
    mutate(location = sub('.*(Cabo|Acapulco).*', '\\1', variable), 
           variable = sub('_?(Cabo|Acapulco)_?', '', variable)) %>% 
    spread(variable, value)
#> # A tibble: 24 x 6
#>    Month location BED_BUGS    BU_PCT  LOS_AVG TOTAL_OCCUPIED
#>  * <dbl>    <chr>    <dbl>     <dbl>    <dbl>          <dbl>
#>  1     1 Acapulco        3 0.6260116 4.307667           6498
#>  2     1     Cabo        5 0.6470034 5.223000          19216
#>  3     2 Acapulco        0 0.6777457 4.247500           6566
#>  4     2     Cabo        3 0.6167027 5.893571          17095
#>  5     3 Acapulco        1 0.6348126 4.327742           6809
#>  6     3     Cabo        5 0.6372108 5.229677          19556
#>  7     4 Acapulco        6 0.6548170 4.220000           6797
#>  8     4     Cabo        4 0.6357912 5.356667          18883
#>  9     5 Acapulco        5 0.6409659 4.162903           6875
#> 10     5     Cabo        2 0.6449006 5.344194          19792
#> # ... with 14 more rows

Reshape Wide to Long with Character Suffixes Instead of Numeric Suffixes