Faster Ways to Calculate Frequencies and Cast from Long to Wide

Faster ways to calculate frequencies and cast from long to wide

You don't need ddply for this. The dcast from reshape2 is sufficient:

dat <- data.frame(
id = c(rep(1, 4), 2),
week = c(1:3, 1, 3)
)

library(reshape2)
dcast(dat, id~week, fun.aggregate=length)

id 1 2 3
1 1 2 1 1
2 2 0 0 1

Edit : For a base R solution (other than table - as posted by Joshua Uhlrich), try xtabs:

xtabs(~id+week, data=dat)

week
id 1 2 3
1 2 1 1
2 0 0 1

Count from long to wide format

In case you need it as a data.frame, here's an option with data.table

library(data.table)
setDT(df)

dcast(df, id ~ text, fun.aggregate = length)
# id arrange stock
# 1: 1 1 2
# 2: 2 2 0

Easy way to convert long to wide format with counts

You can accomplish this with a simple table() statement. You can play with setting factor levels to get your responses the way you want.

sample.data$Decision <- factor(x = sample.data$Decision,
levels = c("Referred","Approved","Declined"))

table(Case = sample.data$Case,sample.data$Decision)

Case Referred Approved Declined
1 3 1 0
2 1 0 1
3 2 0 1
4 0 1 0
5 0 0 1

long to wide format aggregate R tidyverse

Not really sure how you get the 3 count for GENEa and READSb, but assuming you want the count, you can try the following:


library(tidyverse)

df <- tibble(
READS = rep(c("READa", "READb", "READc"), each = 3),
GENE = rep(c("GENEa", "GENEb", "GENEc"), each = 3),
COMMENT = rep(c("CommentA", "CommentA", "CommentA"), each = 3)
)
df
#> # A tibble: 9 x 3
#> READS GENE COMMENT
#> <chr> <chr> <chr>
#> 1 READa GENEa CommentA
#> 2 READa GENEa CommentA
#> 3 READa GENEa CommentA
#> 4 READb GENEb CommentA
#> 5 READb GENEb CommentA
#> 6 READb GENEb CommentA
#> 7 READc GENEc CommentA
#> 8 READc GENEc CommentA
#> 9 READc GENEc CommentA

df %>%
count(READS, GENE) %>%
pivot_wider(
names_from = GENE, values_from = n,
values_fill = list(n = 0)
)
#> # A tibble: 3 x 4
#> READS GENEa GENEb GENEc
#> <chr> <int> <int> <int>
#> 1 READa 3 0 0
#> 2 READb 0 3 0
#> 3 READc 0 0 3

Created on 2019-12-13 by the reprex package (v0.3.0)

Many Hot encoder in R

Using tidyverse:

df %>%
mutate(week = paste("week", week, sep = "")) %>%
group_by(id, week) %>%
summarise(n = n()) %>%
ungroup() %>%
spread(key = week, value = n) %>%
mutate_all(funs(replace(., is.na(.), 0)))

# A tibble: 5 x 6
id week1 week2 week3 week4 week5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 222. 0. 0. 0. 1. 0.
2 264. 0. 0. 1. 0. 1.
3 277. 0. 1. 0. 0. 0.
4 345. 1. 2. 0. 0. 1.
5 351. 0. 1. 0. 0. 0.

reshape two column data to sparse matrix in r long to wide

You can do something like this:

library(tidyverse)

dat <- tribble(~"ID", ~"Click",
1, "A",
1, "B",
1, "E",
2, "A",
2, "Q",
3, "B",
3, "D",
3, "F")

table(dat)
#> ID A B D E F Q
#> 1 1 1 0 1 0 0
#> 2 1 0 0 0 0 1
#> 3 0 1 1 0 1 0

Created on 2019-02-25 by the reprex package (v0.2.1)

EDIT: To clarify my post you don't need library(tidyverse) or to build your data with tribble() the function you are looking for is table()

Reshape data in R, cast function arguments

The OP asked for help with the arguments to the cast() function of the reshape package. However, the reshape package was superseded by the reshape2 package from the same package author. According to the package description, the reshape2 package is

A Reboot of the Reshape Package

Using reshape2, the desired result can be produced with

reshape2::dcast(wc, PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length, 
value.var = "TARGET_TYPE")
# PARENT_MOL_CHEMBL_ID ABL EGFR TP53
#1 C10 1 1 0
#2 C939 0 0 1

BTW: The data.table package has implemented (and enhanced) dcast() as well. So, the same result can be produced with

data.table::dcast(wc, PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length, 
value.var = "TARGET_TYPE")


Additional columns

The OP mentioned other columns in the data frame which should be shown together with the spread or wide data. Unfortunately, the OP hasn't supplied particular sample data, so we have to consider two use cases.

Case 1: Additional columns go along with the id column

The data could look like

wc
# PARENT_MOL_CHEMBL_ID TARGET_TYPE extra_col1
#1 C10 ABL a
#2 C10 EGFR a
#3 C939 TP53 b

Note that the values in extra_col1 are in line with PARENT_MOL_CHEMBL_ID.

This is an easy case, because the formula in dcast() accepts ... which represents all other variables not used in the formula:

reshape2::dcast(wc, ... ~ TARGET_TYPE, fun.aggregate = length, 
value.var = "TARGET_TYPE")
# PARENT_MOL_CHEMBL_ID extra_col1 ABL EGFR TP53
#1 C10 a 1 1 0
#2 C939 b 0 0 1

The resulting data.frame does contain all other columns.

Case2: Additional columns don't go along with the id column

Now, another column is added:

wc
# PARENT_MOL_CHEMBL_ID TARGET_TYPE extra_col1 extra_col2
#1 C10 ABL a 1
#2 C10 EGFR a 2
#3 C939 TP53 b 3

Note that extra_col2 has two different values for C10. This will cause the simple approach to fail. So, a two step approach has to be implemented: reshaping first and joining afterwards with the original data frame. The data.table package is used for both steps, now:

library(data.table)
# reshape from long to wide, result has only one row per id column
wide <- dcast(setDT(wc), PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length,
value.var = "TARGET_TYPE")
# right join, i.e., all rows of wc are included
wide[wc, on = "PARENT_MOL_CHEMBL_ID"]
# PARENT_MOL_CHEMBL_ID ABL EGFR TP53 TARGET_TYPE extra_col1 extra_col2
#1: C10 1 1 0 ABL a 1
#2: C10 1 1 0 EGFR a 2
#3: C939 0 0 1 TP53 b 3

The result shows the aggregated values in wide format together with any other columns.

Manipulation of data frame using Group by or Aggregate in R

A simple way to do this is :table(df)

R aggregating a column values into rows

library(reshape2) # or you could use data.table's dcast function
dcast(df, ID + Zoo ~ Last_date)

# ID Zoo Feb_2018 Jan_2018 Nov_2017 Oct_2017
# 1 ABC-DEF DENVER 0 0 3 2
# 2 HG-IJK MEMPHIS 0 1 0 0
# 3 JK-LMO MEMPHIS 1 0 0 0

This gives a warning about not specifying the value var or aggregation function. You can be a little more verbose to avoid the warning

dcast(df, ID + Zoo ~ Last_date, value.var = 'Last_date', length)

Data used

df <- data.table::fread("
ID Zoo Last_date
ABC-DEF DENVER Oct_2017
ABC-DEF DENVER Oct_2017
ABC-DEF DENVER Nov_2017
ABC-DEF DENVER Nov_2017
ABC-DEF DENVER Nov_2017
HG-IJK MEMPHIS Jan_2018
JK-LMO MEMPHIS Feb_2018
")

How to convert data from rows into a specific columns and count them up in R?

We can use table from base R

table(df1)

If there are many columns, subset the dataset by selecting those specific columns and then apply the table

table(df1[c("PLAYER", "SURFACE")])


Related Topics



Leave a reply



Submit