﻿ Faster Ways to Calculate Frequencies and Cast from Long to Wide - ITCodar

# Faster Ways to Calculate Frequencies and Cast from Long to Wide

## Faster ways to calculate frequencies and cast from long to wide

You don't need `ddply` for this. The `dcast` from `reshape2` is sufficient:

``dat <- data.frame(    id = c(rep(1, 4), 2),    week = c(1:3, 1, 3))library(reshape2)dcast(dat, id~week, fun.aggregate=length)  id 1 2 31  1 2 1 12  2 0 0 1``

Edit : For a base R solution (other than `table` - as posted by Joshua Uhlrich), try `xtabs`:

``xtabs(~id+week, data=dat)   weekid  1 2 3  1 2 1 1  2 0 0 1``

## Count from long to wide format

In case you need it as a data.frame, here's an option with data.table

``library(data.table)setDT(df)dcast(df, id ~ text, fun.aggregate = length)#    id arrange stock# 1:  1       1     2# 2:  2       2     0``

## Easy way to convert long to wide format with counts

You can accomplish this with a simple `table()` statement. You can play with setting factor levels to get your responses the way you want.

``sample.data\$Decision <- factor(x = sample.data\$Decision,                               levels = c("Referred","Approved","Declined"))table(Case = sample.data\$Case,sample.data\$Decision)Case Referred Approved Declined   1        3        1        0   2        1        0        1   3        2        0        1   4        0        1        0   5        0        0        1``

## long to wide format aggregate R tidyverse

Not really sure how you get the 3 count for `GENEa` and `READSb`, but assuming you want the count, you can try the following:

``library(tidyverse)df <- tibble(  READS = rep(c("READa", "READb", "READc"), each = 3),   GENE = rep(c("GENEa", "GENEb", "GENEc"), each = 3),   COMMENT = rep(c("CommentA", "CommentA", "CommentA"), each = 3))df#> # A tibble: 9 x 3#>   READS GENE  COMMENT #>   <chr> <chr> <chr>   #> 1 READa GENEa CommentA#> 2 READa GENEa CommentA#> 3 READa GENEa CommentA#> 4 READb GENEb CommentA#> 5 READb GENEb CommentA#> 6 READb GENEb CommentA#> 7 READc GENEc CommentA#> 8 READc GENEc CommentA#> 9 READc GENEc CommentAdf %>%  count(READS, GENE) %>%  pivot_wider(    names_from = GENE, values_from = n,    values_fill = list(n = 0)  )#> # A tibble: 3 x 4#>   READS GENEa GENEb GENEc#>   <chr> <int> <int> <int>#> 1 READa     3     0     0#> 2 READb     0     3     0#> 3 READc     0     0     3``

Created on 2019-12-13 by the reprex package (v0.3.0)

## Many Hot encoder in R

Using `tidyverse`:

``df %>%  mutate(week = paste("week", week, sep = "")) %>%  group_by(id, week) %>%   summarise(n = n()) %>%  ungroup() %>%  spread(key = week, value = n) %>%   mutate_all(funs(replace(., is.na(.), 0)))# A tibble: 5 x 6     id week1 week2 week3 week4 week5  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>1  222.    0.    0.    0.    1.    0.2  264.    0.    0.    1.    0.    1.3  277.    0.    1.    0.    0.    0.4  345.    1.    2.    0.    0.    1.5  351.    0.    1.    0.    0.    0.``

## reshape two column data to sparse matrix in r long to wide

You can do something like this:

``library(tidyverse)dat <- tribble(~"ID",  ~"Click",          1,   "A",            1,   "B",            1,   "E",            2,   "A",            2,   "Q",            3,   "B",            3,   "D",            3,   "F")table(dat)#> ID  A B D E F Q#>   1 1 1 0 1 0 0#>   2 1 0 0 0 0 1#>   3 0 1 1 0 1 0``

Created on 2019-02-25 by the reprex package (v0.2.1)

EDIT: To clarify my post you don't need `library(tidyverse)` or to build your data with `tribble()` the function you are looking for is `table()`

## Reshape data in R, cast function arguments

The OP asked for help with the arguments to the `cast()` function of the `reshape` package. However, the `reshape` package was superseded by the `reshape2` package from the same package author. According to the package description, the `reshape2` package is

A Reboot of the Reshape Package

Using `reshape2`, the desired result can be produced with

``reshape2::dcast(wc, PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length,                 value.var = "TARGET_TYPE")#  PARENT_MOL_CHEMBL_ID ABL EGFR TP53#1                  C10   1    1    0#2                 C939   0    0    1``

BTW: The `data.table` package has implemented (and enhanced) `dcast()` as well. So, the same result can be produced with

``data.table::dcast(wc, PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length,                   value.var = "TARGET_TYPE")``

The OP mentioned other columns in the data frame which should be shown together with the spread or wide data. Unfortunately, the OP hasn't supplied particular sample data, so we have to consider two use cases.

### Case 1: Additional columns go along with the id column

The data could look like

``wc#  PARENT_MOL_CHEMBL_ID TARGET_TYPE extra_col1#1                  C10         ABL          a#2                  C10        EGFR          a#3                 C939        TP53          b``

Note that the values in `extra_col1` are in line with `PARENT_MOL_CHEMBL_ID`.

This is an easy case, because the formula in `dcast()` accepts `...` which represents all other variables not used in the formula:

``reshape2::dcast(wc, ... ~ TARGET_TYPE, fun.aggregate = length,                 value.var = "TARGET_TYPE")#  PARENT_MOL_CHEMBL_ID extra_col1 ABL EGFR TP53#1                  C10          a   1    1    0#2                 C939          b   0    0    1``

The resulting data.frame does contain all other columns.

### Case2: Additional columns don't go along with the id column

``wc#  PARENT_MOL_CHEMBL_ID TARGET_TYPE extra_col1 extra_col2#1                  C10         ABL          a          1#2                  C10        EGFR          a          2#3                 C939        TP53          b          3``

Note that `extra_col2` has two different values for `C10`. This will cause the simple approach to fail. So, a two step approach has to be implemented: reshaping first and joining afterwards with the original data frame. The `data.table` package is used for both steps, now:

``library(data.table)# reshape from long to wide, result has only one row per id columnwide <- dcast(setDT(wc), PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length,                 value.var = "TARGET_TYPE")# right join, i.e., all rows of wc are includedwide[wc, on = "PARENT_MOL_CHEMBL_ID"]#   PARENT_MOL_CHEMBL_ID ABL EGFR TP53 TARGET_TYPE extra_col1 extra_col2#1:                  C10   1    1    0         ABL          a          1#2:                  C10   1    1    0        EGFR          a          2#3:                 C939   0    0    1        TP53          b          3``

The result shows the aggregated values in wide format together with any other columns.

## Manipulation of data frame using Group by or Aggregate in R

A simple way to do this is :`table(df)`

## R aggregating a column values into rows

``library(reshape2) # or you could use data.table's dcast functiondcast(df, ID + Zoo ~ Last_date)#        ID     Zoo Feb_2018 Jan_2018 Nov_2017 Oct_2017# 1 ABC-DEF  DENVER        0        0        3        2# 2  HG-IJK MEMPHIS        0        1        0        0# 3  JK-LMO MEMPHIS        1        0        0        0``

This gives a warning about not specifying the value var or aggregation function. You can be a little more verbose to avoid the warning

``dcast(df, ID + Zoo ~ Last_date, value.var = 'Last_date', length)``

Data used

``df <- data.table::fread("ID           Zoo            Last_dateABC-DEF     DENVER          Oct_2017ABC-DEF     DENVER          Oct_2017ABC-DEF     DENVER          Nov_2017  ABC-DEF     DENVER          Nov_2017  ABC-DEF     DENVER          Nov_2017  HG-IJK      MEMPHIS         Jan_2018JK-LMO      MEMPHIS         Feb_2018")``

## How to convert data from rows into a specific columns and count them up in R?

We can use `table` from `base R`

``table(df1)``

If there are many columns, subset the dataset by selecting those specific columns and then apply the `table`

``table(df1[c("PLAYER", "SURFACE")])``