reshape wide to long with character suffixes instead of numeric suffixes
This works (to specify to varying what columns go with who):
reshape(dadmom, direction="long", varying=list(c(2, 4), c(3, 5)),
sep="", v.names=c("name", "inc"), timevar="dadmom",
times=c("d", "m"))
So you actually have nested repeated measures here; both name and inc for mom and dad. Because you have more than one series of repeated measures you have to supply a list
to varying that tells reshape
which group gets stacked on the other group.
So the two approaches to this problem are to provide a list as I did or to rename the columns the way the R beast likes them as you did.
See my recent blogs on base reshape
for more on this (particularly the second link deals with this):
reshape (part I)
reshape (part II)
Reshape data set from wide to long format grouped by variable suffix
Using reshape
we can set the cutpoints with sep=""
.
reshape(d, idvar="ID", varying=2:5, timevar="YEAR", sep="", direction="long")
# ID YEAR MI FRAC
# 1.1995 1 1995 2 3
# 7.1995 7 1995 3 10
# 10.1995 10 1995 1 2
# 1.1996 1 1996 2 4
# 7.1996 7 1996 12 1
# 10.1996 10 1996 1 1
Data
d <- structure(list(ID = c(1L, 7L, 10L), MI_1995 = c(2L, 3L, 1L),
FRAC_1995 = c(3L, 10L, 2L), MI_1996 = c(2L, 12L, 1L),
FRAC_1996 = c(4L, 1L, 1L)), row.names = c(NA, -3L),
class = "data.frame")
pandas wide_to_long suffix parameter
TLDR: Regex capturing groups can be used for the suffix parameter.
The suffix
parameter tells pandas.wide_to_long
which columns it should include in the transformation based on the suffix after the stub.
The default behavior of wide to long assumes that your columns are labeled with numbers so for instance columns A1, A2, A3, A4
will work fine without specifying the suffix parameter, while Aone, Atwo, Athree, Afour
will fail.
As explained, it also has various other uses in the rare cases that your columns may be A1, A2, A3, A4, A100
, and you don't want to actually include A100
because it isn't actually related to the other A#
columns.
Here are some illustrative examples.
import pandas as pd
df = pd.DataFrame({'id': [1,2], 'A_1': ['a', 'b'],
'A_2': ['aa', 'bb'], 'A_3': ['aaa', 'bbb'],
'A_person': ['Mike', 'Amy']})
pd.wide_to_long(df, stubnames='A_', i='id', j='num')
# A_person A_
#id num
#1 1 Mike a
#2 1 Amy b
#1 2 Mike aa
#2 2 Amy bb
#1 3 Mike aaa
#2 3 Amy bbb
Because the default behavior is to only consider numbers, 'A_person'
was ignored. If you wanted to add that to the conversion, then you would use the suffix
parameter. Let's tell it we want either numbers or words.
pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='(\d+|\w+)')
# A_
#id suffix
#1 1 a
#2 1 b
#1 2 aa
#2 2 bb
#1 3 aaa
#2 3 bbb
#1 person Mike
#2 person Amy
Now if your df
starts without numeric suffixes, you can take care of that with the suffix parameter too. The default call will fail because it expects numbers, but telling it to look for words gives you what you want.
df = pd.DataFrame({'id': [1,2], 'A_one': ['a', 'b'],
'A_two': ['aa', 'bb'], 'A_three': ['aaa', 'bbb'],
'A_person': ['Mike', 'Amy']})
pd.wide_to_long(df, stubnames='A_', i='id', j='num')
#Empty DataFrame
#Columns: [A_three, A_person, A_one, A_two, A_]
#Index: []
pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='\w+')
# A_
#id suffix
#1 one a
#2 one b
#1 person Mike
#2 person Amy
#1 three aaa
#2 three bbb
#1 two aa
#2 two bb
And if you don't want to include A_person
you can tell the suffix parameter to only include certain stubs.
pd.wide_to_long(df, stubnames='A_', i='id', j='num', suffix='(one|two|three)')
# A_person A_
#id num
#1 one Mike a
#2 one Amy b
#1 three Mike aaa
#2 three Amy bbb
#1 two Mike aa
#2 two Amy bb
Basically, if you can capture it with regex, you can pass it to suffix to use only the columns you want.
Reshape from Long to Wide Format by Multiple Factors
base R:
One way can be:
reshape(cbind(dat1[1:2], stack(dat1, 3:4)), timevar = 'timeperiod',
dir = 'wide', idvar = c('name', 'ind'))
name ind values.Q1 values.Q2 values.Q3 values.Q4
1 firstName height 2 9 1 2
5 secondName height 11 15 16 10
9 firstName weight 1 4 2 8
13 secondName weight 2 9 1 2
If using other packages, consider recast
function from reshape
package:
reshape2::recast(dat1, name+variable~timeperiod, id.var = c('name', 'timeperiod'))
name variable Q1 Q2 Q3 Q4
1 firstName height 2 9 1 2
2 firstName weight 1 4 2 8
3 secondName height 11 15 16 10
4 secondName weight 2 9 1 2
R: long to wide transformation using reduce and setting suffixes
Here is a base R method using split
and do.call
:
# get list of data frame, drop the split factor (Species)
myList <- split(iris[, -which(names(iris) == "Species")], iris$Species)
# perform wide transformation
do.call(data.frame, myList)
This puts the species names at the front. It would not be too hard to move them to the back using gsub
.
Here is part of the result:
setosa.Sepal.Length setosa.Sepal.Width setosa.Petal.Length setosa.Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
The other species are additional columns.
answer for Update #1
This gets a bit more complicated, though the first line is the same:
# get list of data frame, drop the split factor (Species)
myList <- split(iris[, -which(names(iris) == "Species")], iris$Species)
# add names to data.frames
myList <- lapply(names(myList),
function(i) {
setNames(myList[[i]],
c(paste0(head(names(myList[[i]]), -1), ".", i), "id"))
})
# merge the data.frames together
Reduce(function(x, y) {merge(x, y, by="id", all=TRUE)}, myList)
This results in the naming that you wanted with the Species appended to the end of each variable.
How to Restructure R Data Frame in R
Here is a solution that uses tidyr
. Specifically, the gather
function is used to combine the two employee
columns. This also generates a column bsaed on the column headers (employee1
and employee2
) which is called key
. We remove that with select
from dplyr
.
library(tidyr)
library(dplyr)
df <- read.table(
text = "boss employee1 employee2
1 wil james andy
2 james dean bert
3 billy herb collin
4 tony mike david",
header = TRUE,
stringsAsFactors = FALSE
)
df2 <- df %>%
gather(key, employee, -boss) %>%
select(-key)
> df2
boss employee
1 wil james
2 james dean
3 billy herb
4 tony mike
5 wil andy
6 james bert
7 billy collin
8 tony david
I would be shocked if there isn't a slicker, base solution but this should work for you.
reshape from wide to long group of variables
If you've only got two locations, you can just chuck them in regex, accounting for the fact that they could be at the beginning or end of the name:
library(tidyverse)
df_wide %>%
gather(variable, value, -Month) %>%
mutate(location = sub('.*(Cabo|Acapulco).*', '\\1', variable),
variable = sub('_?(Cabo|Acapulco)_?', '', variable)) %>%
spread(variable, value)
#> # A tibble: 24 x 6
#> Month location BED_BUGS BU_PCT LOS_AVG TOTAL_OCCUPIED
#> * <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Acapulco 3 0.6260116 4.307667 6498
#> 2 1 Cabo 5 0.6470034 5.223000 19216
#> 3 2 Acapulco 0 0.6777457 4.247500 6566
#> 4 2 Cabo 3 0.6167027 5.893571 17095
#> 5 3 Acapulco 1 0.6348126 4.327742 6809
#> 6 3 Cabo 5 0.6372108 5.229677 19556
#> 7 4 Acapulco 6 0.6548170 4.220000 6797
#> 8 4 Cabo 4 0.6357912 5.356667 18883
#> 9 5 Acapulco 5 0.6409659 4.162903 6875
#> 10 5 Cabo 2 0.6449006 5.344194 19792
#> # ... with 14 more rows
Related Topics
Setting the Color for an Individual Data Point
Create a Formula in a Data.Table Environment in R
Using Lapply to Change Column Names of a List of Data Frames
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
How Make 2 Column Layout in R Markdown When Rendering PDF
Include Data Examples in Developing R Packages
Could Not Find Function Inside Foreach Loop
How to Get Factor Matrices in R
Ggplot: Multiple Years on Same Plot by Month
Check If Each Row of a Data Frame Is Contained in Another Data Frame
Highlight (Shade) Plot Background in Specific Time Range
R - Customizing X Axis Values in Histogram
How to Rotate Legend Symbols in Ggplot2
How to Make My Axis Ticks Face Inwards in Ggplot2
When Using Ggplot in R, How to Remove Margins Surrounding the Plot Area