Reshaping Data with Count

Reshaping data with count

You've mixed up the LHS and RHS of the formula.

Try:

library(reshape2)
dcast(dados_teste, customer ~ cat_one, value.var = "valor")
# Aggregation function missing: defaulting to length
# customer cama mesa
# 1 A 1 2
# 2 B 1 0
# 3 C 1 0
# 4 D 0 1

The "error" that you refer to is actually just a warning that tells you that it is just counting the number of values--not applying any other function. So, in this case, it's perfectly acceptable.

If you want to get rid of it, specify fun.aggregate = length.

dcast(dados_teste, customer ~ cat_one, 
value.var = "valor", fun.aggregate = length)

If its just counts of two columns that you're after, you could also look at table:

as.data.frame.matrix(table(dados_teste[c(2, 1)]))
# cama mesa
# A 1 2
# B 1 0
# C 1 0
# D 0 1

Reshaping data from long to wide with both sums and counts

From data.table v1.9.6, it is possible to cast multiple value.var columns and also cast by providing multiple fun.aggregate functions. See below:

library(data.table)

df <- data.table(df)
dcast(df, id ~ type, fun = list(length, sum), value.var = c("val"))
id val_length_A val_length_B val_length_C val_sum_A val_sum_B val_sum_C
1: 1 2 1 0 1 2 0
2: 2 1 1 1 0 0 4

Reshape Count Data using R

As mentioned by erocoar and thelatemail, there is an ambiguity in the results without additional information. However, the OP has shown the expected result. From this, we can derive an additional rule: "Treat Type and Age independently and fill up in column order".

The solutions below try to reproduce the expected result by a series of reshaping operations. For reshaping, the melt() and dcast() functions from the data.table package are used. There are two variants.

1. Reshape in one go

library(data.table)
# reshape from wide to long
long <- melt(setDT(DT), id.vars = c("Developer", "Total"))
# create new variable var
long[variable %like% "tached", var2 := "Type"][
variable %in% c("New", "Old"), var2 := "Age"][
, var2 := forcats::fct_inorder(var2)][]
   Developer Total variable value var2
1: Dev A 2 Attached 1 Type
2: Dev B 3 Attached 2 Type
3: Dev A 2 Detached 1 Type
4: Dev B 3 Detached 1 Type
5: Dev A 2 New 0 Age
6: Dev B 3 New 2 Age
7: Dev A 2 Old 2 Age
8: Dev B 3 Old 1 Age
# repeat row indices as many times as given by value
repeated_rows <- long[rep(1:.N, value)]
repeated_rows
    Developer Total variable value var2
1: Dev A 2 Attached 1 Type
2: Dev B 3 Attached 2 Type
3: Dev B 3 Attached 2 Type
4: Dev A 2 Detached 1 Type
5: Dev B 3 Detached 1 Type
6: Dev B 3 New 2 Age
7: Dev B 3 New 2 Age
8: Dev A 2 Old 2 Age
9: Dev A 2 Old 2 Age
10: Dev B 3 Old 1 Age
# partial reshape from long to wide
result <- dcast(repeated_rows, Developer + rowid(var2, Developer) ~ var2,
value.var = "variable")[
# remove helper column
, var2 := NULL][]
result
   Developer     Type Age
1: Dev A Attached Old
2: Dev A Detached Old
3: Dev B Attached New
4: Dev B Attached New
5: Dev B Detached Old

2. Reshape Type and Age columns separately

library(data.table)
result <- merge(
melt(setDT(DT), measure.vars = patterns("tached"),
variable.name = c("Type"))[
rep(1:.N, value), .(Developer, rn = rowid(Developer), Type)],
melt(setDT(DT), measure.vars = patterns("New|Old"),
variable.name = c("Age"))[
rep(1:.N, value), .(Developer, rn = rowid(Developer), Age)]
)[, rn := NULL][]
result
   Developer     Type Age
1: Dev A Attached Old
2: Dev A Detached Old
3: Dev B Attached New
4: Dev B Attached New
5: Dev B Detached Old

Verify result

merge(
dcast(result, Developer ~ Type),
dcast(result, Developer ~ Age),
by = "Developer")
   Developer Attached Detached New Old
1: Dev A 1 1 0 2
2: Dev B 2 1 2 1

Data

library(data.table)
DT <- fread(
'Developer Total Attached Detached New Old
"Dev A" 2 1 1 0 2
"Dev B" 3 2 1 2 1',
colClasses = list(integer = 2:6))

Transforming Aggregate Count Data into Individual Data in R

Here is another solution using base R:

inds <- rep(seq_len(nrow(df1)), df1$C)
df2 <- df1[inds,]

Change the last line to

df2 <- df1[inds, 1:2]

to drop column C.

This gives different row names than the answer by Nad Pat. Which to use is probably a matter of personal preference.

R Long to wide with count and sum

Here's one way using dplyr and tidyr libraries. We first calculate sum of Amt values for each ID, get the data in long format, count number of rows and get it back in wide format.

library(dplyr)  
library(tidyr)

df %>%
group_by(ID) %>%
mutate(Amt = sum(Amt)) %>%
pivot_longer(cols = c(Method, Source)) %>%
count(ID, value, Amt, name) %>%
pivot_wider(names_from = c(name, value), values_from = n, values_fill = 0)

# ID Amt Method_A Method_C Source_X Source_Y Source_Z Method_B Method_D Method_E
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 60 2 1 1 1 1 0 0 0
#2 2 25 0 0 0 1 1 1 1 0
#3 3 40 0 1 2 0 0 0 1 0
#4 4 20 0 0 0 0 2 0 0 2

Reshape a count variable

You can use the following code:

df <- structure(list(ID = 1:4, concept = c("a", "b", "c", "d"), count = c(1L, 
2L, 4L, 6L)), class = "data.frame", row.names = c(NA, -4L))

library(reshape2)
dcast(df, ID ~ concept, value.var = "count", fill = 0)
#> ID a b c d
#> 1 1 1 0 0 0
#> 2 2 0 2 0 0
#> 3 3 0 0 4 0
#> 4 4 0 0 0 6

Created on 2022-07-08 by the reprex package (v2.0.1)

Reshaping count-summarised data into long form in R

Just use rep:

summry[rep(rownames(summry), summry$freq), c("value", "cat")]
# value cat
# 1 1 A
# 1.1 1 A
# 1.2 1 A
# 2 2 A
# 3 3 B
# 3.1 3 B

A variation of this approach can be found in expandRows from my "SOfun" package. If you had that loaded, you would be able to simply do:

expandRows(summry, "freq")

Count from long to wide format

In case you need it as a data.frame, here's an option with data.table

library(data.table)
setDT(df)

dcast(df, id ~ text, fun.aggregate = length)
# id arrange stock
# 1: 1 1 2
# 2: 2 2 0

Easy way to convert long to wide format with counts

You can accomplish this with a simple table() statement. You can play with setting factor levels to get your responses the way you want.

sample.data$Decision <- factor(x = sample.data$Decision,
levels = c("Referred","Approved","Declined"))

table(Case = sample.data$Case,sample.data$Decision)

Case Referred Approved Declined
1 3 1 0
2 1 0 1
3 2 0 1
4 0 1 0
5 0 0 1


Related Topics



Leave a reply



Submit