Melt Using Patterns When Variable Names Contain String Information - Avoid Coercion to Numeric

Melt using patterns when variable names contain string information - avoid coercion to numeric

From data.table 1.14.1 (in development; installation), the new function measure makes it much easier to melt data with concatenated variable names to a desired format (see ?measure.

The separator argument is used to create different groups of measure.vars. In the ... argument, we further specify the fate of the values corresponding to the groups generated by sep.

In OP, the variable names are of the form species_number, e.g. dog_one. Thus, we need two symbols in ... to specify how groups before and after the separator should be treated, one for the species (dog or cat) and one for the numbers (one-three).

If a symbol in ... is set to value.name, then "melt returns multiple value columns (with names defined by the unique values in that group)". Thus, because you want multiple columns for each species, the first group defined by the separator, the first symbol in ... should be value.name.

The second group, after the separator, are the numbers, so this is specified as the second symbol in .... We want in a single value column for the numbers, so here we specify the desired column name of the output variable, e.g. "nr".

melt(B, measure.vars = measure(value.name, nr, sep = "_"))

      idcol    nr dog cat
#  1:     1   one   1 101
#  2:     2   one   2 102
#  3:     3   one   3 103
#  4:     4   one   4 104
#  5:     5   one   5 105
#  6:     1   two   6 106
#  7:     2   two   7 107
#  8:     3   two   8 108
#  9:     4   two   9 109
# 10:     5   two  10 110
# 11:     1 three  11 111
# 12:     2 three  12 112
# 13:     3 three  13 113
# 14:     4 three  14 114
# 15:     5 three  15 115

Pre data.table 1.14.1

There might be easier ways, but this seems to work:

# grab suffixes of 'variable' names
suff <- unique(sub('^.*_', '', names(B[ , -1])))
# suff <- unique(tstrsplit(names(B[, -1]), "_")[[2]])

# melt
B2 <- melt(B, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat"))
   
# replace factor levels in 'variable' with the suffixes
setattr(B2$variable, "levels", suff)

B2
#     idcol variable dog cat
# 1:      1      one   1 101
# 2:      2      one   2 102
# 3:      3      one   3 103
# 4:      4      one   4 104
# 5:      5      one   5 105
# 6:      1      two   6 106
# 7:      2      two   7 107
# 8:      3      two   8 108
# 9:      4      two   9 109
# 10:     5      two  10 110
# 11:     1    three  11 111
# 12:     2    three  12 112
# 13:     3    three  13 113
# 14:     4    three  14 114
# 15:     5    three  15 115

Two related data.table issues:

melt.data.table should offer variable to match on the name, rather than the number

FR: expansion of melt functionality for handling names of output.

This is one of the (rare) instances where I believe good'ol base::reshape is cleaner. Its sep argument comes in handy here — both the names of the 'value' column and the levels of the 'variable' columns are generated in one go:

reshape(data = B,
        varying = names(B[ , -1]),
        sep = "_",
        direction = "long")

Using the melt function

You can use melt like

library(data.table)
melt(setDT(df), id="Name", measure=patterns("sale$", "result$"),
                value.name=c("SaleDate", "Result"))


#     Name variable   SaleDate Result
# 1:  Fred        1  3/01/2019    352
# 2: Peter        1 10/08/2018    209
# 3:  Fred        2  5/12/2018    953
# 4: Peter        2 20/06/2018    987
# 5:  Fred        3  2/10/2018    965
# 6: Peter        3 21/02/2018    618
# 7:  Fred        4 29/08/2018    125
# 8: Peter        4 16/07/2018    902
# 9:  Fred        5 26/04/2018    264
#10: Peter        5  5/07/2018     71

To get the variable names correct based on this answer we can do

suff <- unique(sub('\\..*', '', names(df)[-1]))

B2 <- melt(setDT(df), id="Name", measure=patterns("sale$", "result$"),
                      value.name=c("SaleDate", "Result"))
setattr(B2$variable, "levels", suff)

B2
#     Name variable   SaleDate Result
# 1:  Fred    first  3/01/2019    352
# 2: Peter    first 10/08/2018    209
# 3:  Fred   second  5/12/2018    953
# 4: Peter   second 20/06/2018    987
# 5:  Fred    third  2/10/2018    965
# 6: Peter    third 21/02/2018    618
# 7:  Fred   fourth 29/08/2018    125
# 8: Peter   fourth 16/07/2018    902
# 9:  Fred    fifth 26/04/2018    264
#10: Peter    fifth  5/07/2018     71

Or the tidyverse way would be

library(tidyverse)
df %>%
  gather(key, value, -Name) %>%
  group_by(key = sub(".*\\.", "", key)) %>%
  mutate(row = row_number()) %>%
  spread(key, value) %>%
  select(-row)

variable values in data.table::melt with patterns

With data.table 1.14.1 (dev version as of 2021-05-18) it is possible to solve it using the newly incorporated measure function. Like this:

melt(df1, measure.vars= measure(value.name, date, pattern="(actual|pred)_(.*)"))

   Material_code   date actual pred
1:           111 202009     30   25
2:           112 202009     19   23
3:           111 202010     44   52
4:           112 202010     70   68
5:           111 202011     24   27
6:           112 202011     93  100

Check ?measure as well as the release news for more information.

data.table: split columns, then wide- to long-format

We can use measure with patterns in data.table

library(data.table)
nm1 <- unique(sub(":.*", "", names(dt)[-(1:2)]))
melt(dt, measure = patterns("max", "min"),
    value.name = c("max", "min"), variable.name = "receiver")[, 
     receiver := nm1[receiver]][]

-output

         date  location   receiver max min
1: 2021-01-01  Westpark receiver_a  20  10
2: 2021-01-02 Northpark receiver_a  30  15
3: 2021-01-03   Estpark receiver_a  25  20
4: 2021-01-04 Southpark receiver_a  15   5
5: 2021-01-01  Westpark receiver_b  15  15
6: 2021-01-02 Northpark receiver_b  45  45
7: 2021-01-03   Estpark receiver_b  10  10
8: 2021-01-04 Southpark receiver_b  50  50

Melt and cast data table using pattern

We can do this with splitstackshape. It gives the '.time_1' column automatically

library(splitstackshape)
merged.stack(dt, var.stubs=c("a", "b"), sep="_")
#   id .time_1 a  b
#1:  1       3 1  7
#2:  1       4 4 10
#3:  2       3 2  8
#4:  2       4 5 11
#5:  3       3 3  9
#6:  3       4 6 12

Melt Using Patterns When Variable Names Contain String Information - Avoid Coercion to Numeric