Melt Using Patterns When Variable Names Contain String Information - Avoid Coercion to Numeric

Melt using patterns when variable names contain string information - avoid coercion to numeric

From data.table 1.14.1 (in development; installation), the new function measure makes it much easier to melt data with concatenated variable names to a desired format (see ?measure.

The separator argument is used to create different groups of measure.vars. In the ... argument, we further specify the fate of the values corresponding to the groups generated by sep.

In OP, the variable names are of the form species_number, e.g. dog_one. Thus, we need two symbols in ... to specify how groups before and after the separator should be treated, one for the species (dog or cat) and one for the numbers (one-three).

If a symbol in ... is set to value.name, then "melt returns multiple value columns (with names defined by the unique values in that group)". Thus, because you want multiple columns for each species, the first group defined by the separator, the first symbol in ... should be value.name.

The second group, after the separator, are the numbers, so this is specified as the second symbol in .... We want in a single value column for the numbers, so here we specify the desired column name of the output variable, e.g. "nr".

melt(B, measure.vars = measure(value.name, nr, sep = "_"))

idcol nr dog cat
# 1: 1 one 1 101
# 2: 2 one 2 102
# 3: 3 one 3 103
# 4: 4 one 4 104
# 5: 5 one 5 105
# 6: 1 two 6 106
# 7: 2 two 7 107
# 8: 3 two 8 108
# 9: 4 two 9 109
# 10: 5 two 10 110
# 11: 1 three 11 111
# 12: 2 three 12 112
# 13: 3 three 13 113
# 14: 4 three 14 114
# 15: 5 three 15 115

Pre data.table 1.14.1

There might be easier ways, but this seems to work:

# grab suffixes of 'variable' names
suff <- unique(sub('^.*_', '', names(B[ , -1])))
# suff <- unique(tstrsplit(names(B[, -1]), "_")[[2]])

# melt
B2 <- melt(B, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat"))

# replace factor levels in 'variable' with the suffixes
setattr(B2$variable, "levels", suff)

B2
# idcol variable dog cat
# 1: 1 one 1 101
# 2: 2 one 2 102
# 3: 3 one 3 103
# 4: 4 one 4 104
# 5: 5 one 5 105
# 6: 1 two 6 106
# 7: 2 two 7 107
# 8: 3 two 8 108
# 9: 4 two 9 109
# 10: 5 two 10 110
# 11: 1 three 11 111
# 12: 2 three 12 112
# 13: 3 three 13 113
# 14: 4 three 14 114
# 15: 5 three 15 115

Two related data.table issues:

melt.data.table should offer variable to match on the name, rather than the number

FR: expansion of melt functionality for handling names of output.


This is one of the (rare) instances where I believe good'ol base::reshape is cleaner. Its sep argument comes in handy here — both the names of the 'value' column and the levels of the 'variable' columns are generated in one go:

reshape(data = B,
varying = names(B[ , -1]),
sep = "_",
direction = "long")

Using the melt function

You can use melt like

library(data.table)
melt(setDT(df), id="Name", measure=patterns("sale$", "result$"),
value.name=c("SaleDate", "Result"))


# Name variable SaleDate Result
# 1: Fred 1 3/01/2019 352
# 2: Peter 1 10/08/2018 209
# 3: Fred 2 5/12/2018 953
# 4: Peter 2 20/06/2018 987
# 5: Fred 3 2/10/2018 965
# 6: Peter 3 21/02/2018 618
# 7: Fred 4 29/08/2018 125
# 8: Peter 4 16/07/2018 902
# 9: Fred 5 26/04/2018 264
#10: Peter 5 5/07/2018 71

To get the variable names correct based on this answer we can do

suff <- unique(sub('\\..*', '', names(df)[-1]))

B2 <- melt(setDT(df), id="Name", measure=patterns("sale$", "result$"),
value.name=c("SaleDate", "Result"))
setattr(B2$variable, "levels", suff)

B2
# Name variable SaleDate Result
# 1: Fred first 3/01/2019 352
# 2: Peter first 10/08/2018 209
# 3: Fred second 5/12/2018 953
# 4: Peter second 20/06/2018 987
# 5: Fred third 2/10/2018 965
# 6: Peter third 21/02/2018 618
# 7: Fred fourth 29/08/2018 125
# 8: Peter fourth 16/07/2018 902
# 9: Fred fifth 26/04/2018 264
#10: Peter fifth 5/07/2018 71

Or the tidyverse way would be

library(tidyverse)
df %>%
gather(key, value, -Name) %>%
group_by(key = sub(".*\\.", "", key)) %>%
mutate(row = row_number()) %>%
spread(key, value) %>%
select(-row)

variable values in data.table::melt with patterns

With data.table 1.14.1 (dev version as of 2021-05-18) it is possible to solve it using the newly incorporated measure function. Like this:

melt(df1, measure.vars= measure(value.name, date, pattern="(actual|pred)_(.*)"))

Material_code date actual pred
1: 111 202009 30 25
2: 112 202009 19 23
3: 111 202010 44 52
4: 112 202010 70 68
5: 111 202011 24 27
6: 112 202011 93 100

Check ?measure as well as the release news for more information.

data.table: split columns, then wide- to long-format

We can use measure with patterns in data.table

library(data.table)
nm1 <- unique(sub(":.*", "", names(dt)[-(1:2)]))
melt(dt, measure = patterns("max", "min"),
value.name = c("max", "min"), variable.name = "receiver")[,
receiver := nm1[receiver]][]

-output

         date  location   receiver max min
1: 2021-01-01 Westpark receiver_a 20 10
2: 2021-01-02 Northpark receiver_a 30 15
3: 2021-01-03 Estpark receiver_a 25 20
4: 2021-01-04 Southpark receiver_a 15 5
5: 2021-01-01 Westpark receiver_b 15 15
6: 2021-01-02 Northpark receiver_b 45 45
7: 2021-01-03 Estpark receiver_b 10 10
8: 2021-01-04 Southpark receiver_b 50 50

Melt and cast data table using pattern

We can do this with splitstackshape. It gives the '.time_1' column automatically

library(splitstackshape)
merged.stack(dt, var.stubs=c("a", "b"), sep="_")
# id .time_1 a b
#1: 1 3 1 7
#2: 1 4 4 10
#3: 2 3 2 8
#4: 2 4 5 11
#5: 3 3 3 9
#6: 3 4 6 12


Related Topics



Leave a reply



Submit