Melt using patterns when variable names contain string information - avoid coercion to numeric
From data.table 1.14.1
(in development; installation), the new function measure
makes it much easier to melt data with concatenated variable names to a desired format (see ?measure
.
The sep
arator argument is used to create different groups of measure.vars
. In the ...
argument, we further specify the fate of the values corresponding to the groups generated by sep
.
In OP, the variable names are of the form species_number
, e.g. dog_one
. Thus, we need two symbols in ...
to specify how groups before and after the sep
arator should be treated, one for the species (dog or cat) and one for the numbers (one-three).
If a symbol in ...
is set to value.name
, then "melt
returns multiple value columns (with names defined by the unique values in that group)". Thus, because you want multiple columns for each species, the first group defined by the separator, the first symbol in ...
should be value.name
.
The second group, after the separator, are the numbers, so this is specified as the second symbol in ...
. We want in a single value column for the numbers, so here we specify the desired column name of the output variable, e.g. "nr".
melt(B, measure.vars = measure(value.name, nr, sep = "_"))
idcol nr dog cat
# 1: 1 one 1 101
# 2: 2 one 2 102
# 3: 3 one 3 103
# 4: 4 one 4 104
# 5: 5 one 5 105
# 6: 1 two 6 106
# 7: 2 two 7 107
# 8: 3 two 8 108
# 9: 4 two 9 109
# 10: 5 two 10 110
# 11: 1 three 11 111
# 12: 2 three 12 112
# 13: 3 three 13 113
# 14: 4 three 14 114
# 15: 5 three 15 115
Pre data.table 1.14.1
There might be easier ways, but this seems to work:
# grab suffixes of 'variable' names
suff <- unique(sub('^.*_', '', names(B[ , -1])))
# suff <- unique(tstrsplit(names(B[, -1]), "_")[[2]])
# melt
B2 <- melt(B, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat"))
# replace factor levels in 'variable' with the suffixes
setattr(B2$variable, "levels", suff)
B2
# idcol variable dog cat
# 1: 1 one 1 101
# 2: 2 one 2 102
# 3: 3 one 3 103
# 4: 4 one 4 104
# 5: 5 one 5 105
# 6: 1 two 6 106
# 7: 2 two 7 107
# 8: 3 two 8 108
# 9: 4 two 9 109
# 10: 5 two 10 110
# 11: 1 three 11 111
# 12: 2 three 12 112
# 13: 3 three 13 113
# 14: 4 three 14 114
# 15: 5 three 15 115
Two related data.table
issues:
melt.data.table should offer variable
to match on the name, rather than the number
FR: expansion of melt functionality for handling names of output.
This is one of the (rare) instances where I believe good'ol base::reshape
is cleaner. Its sep
argument comes in handy here — both the names of the 'value' column and the levels of the 'variable' columns are generated in one go:
reshape(data = B,
varying = names(B[ , -1]),
sep = "_",
direction = "long")
Using the melt function
You can use melt
like
library(data.table)
melt(setDT(df), id="Name", measure=patterns("sale$", "result$"),
value.name=c("SaleDate", "Result"))
# Name variable SaleDate Result
# 1: Fred 1 3/01/2019 352
# 2: Peter 1 10/08/2018 209
# 3: Fred 2 5/12/2018 953
# 4: Peter 2 20/06/2018 987
# 5: Fred 3 2/10/2018 965
# 6: Peter 3 21/02/2018 618
# 7: Fred 4 29/08/2018 125
# 8: Peter 4 16/07/2018 902
# 9: Fred 5 26/04/2018 264
#10: Peter 5 5/07/2018 71
To get the variable names correct based on this answer we can do
suff <- unique(sub('\\..*', '', names(df)[-1]))
B2 <- melt(setDT(df), id="Name", measure=patterns("sale$", "result$"),
value.name=c("SaleDate", "Result"))
setattr(B2$variable, "levels", suff)
B2
# Name variable SaleDate Result
# 1: Fred first 3/01/2019 352
# 2: Peter first 10/08/2018 209
# 3: Fred second 5/12/2018 953
# 4: Peter second 20/06/2018 987
# 5: Fred third 2/10/2018 965
# 6: Peter third 21/02/2018 618
# 7: Fred fourth 29/08/2018 125
# 8: Peter fourth 16/07/2018 902
# 9: Fred fifth 26/04/2018 264
#10: Peter fifth 5/07/2018 71
Or the tidyverse
way would be
library(tidyverse)
df %>%
gather(key, value, -Name) %>%
group_by(key = sub(".*\\.", "", key)) %>%
mutate(row = row_number()) %>%
spread(key, value) %>%
select(-row)
variable values in data.table::melt with patterns
With data.table
1.14.1 (dev version as of 2021-05-18) it is possible to solve it using the newly incorporated measure
function. Like this:
melt(df1, measure.vars= measure(value.name, date, pattern="(actual|pred)_(.*)"))
Material_code date actual pred
1: 111 202009 30 25
2: 112 202009 19 23
3: 111 202010 44 52
4: 112 202010 70 68
5: 111 202011 24 27
6: 112 202011 93 100
Check ?measure
as well as the release news for more information.
data.table: split columns, then wide- to long-format
We can use measure
with patterns
in data.table
library(data.table)
nm1 <- unique(sub(":.*", "", names(dt)[-(1:2)]))
melt(dt, measure = patterns("max", "min"),
value.name = c("max", "min"), variable.name = "receiver")[,
receiver := nm1[receiver]][]
-output
date location receiver max min
1: 2021-01-01 Westpark receiver_a 20 10
2: 2021-01-02 Northpark receiver_a 30 15
3: 2021-01-03 Estpark receiver_a 25 20
4: 2021-01-04 Southpark receiver_a 15 5
5: 2021-01-01 Westpark receiver_b 15 15
6: 2021-01-02 Northpark receiver_b 45 45
7: 2021-01-03 Estpark receiver_b 10 10
8: 2021-01-04 Southpark receiver_b 50 50
Melt and cast data table using pattern
We can do this with splitstackshape
. It gives the '.time_1' column automatically
library(splitstackshape)
merged.stack(dt, var.stubs=c("a", "b"), sep="_")
# id .time_1 a b
#1: 1 3 1 7
#2: 1 4 4 10
#3: 2 3 2 8
#4: 2 4 5 11
#5: 3 3 3 9
#6: 3 4 6 12
Related Topics
Remove Kernel on Jupyter Notebook
How to Find the Length of a String in R
How to Change Line Width in Ggplot
How to Select R Data.Table Rows Based on Substring Match (A La SQL Like)
How to Properly Document S4 Methods Using Roxygen2
Appending a List to a List of Lists in R
How to Add Elements to a List in R (Loop)
Find Multiple Strings Using Str_Extract_All
Reshape a Dataframe to Long Format with Multiple Sets of Measure Columns
R: How to Sum Columns Grouped by a Factor
Why am I Losing Categorical Data in My Regression Summary
Correctly Specifying "Logical Conditions" (In R)
Change the Position of the Strip Label in Ggplot from the Top to the Bottom
Make R Exit with Non-Zero Status Code
R Command Line Passing a Filename to Script in Arguments (Windows)