Grouped Correlation with Dplyr (Works Only on Console)

Grouped correlation with dplyr (works only on console)

What you experience is related to having both plyr and dplyr loaded at the same time. Since both packages have summarize functions, there can be conflicts if you don't specify explicitly which package you want to use. For the example data, this means:

require(dplyr)
set.seed(123)
xx = data.frame(group = rep(1:4, 100), a = rnorm(400) , b = rnorm(400))

Using dplyr as intended:

gp = group_by(xx, group)
dplyr::summarize(gp, cor(a, b))
#Source: local data frame [4 x 2]
#
# group cor(a, b)
#1 1 -0.02073084
#2 2 0.12803353
#3 3 0.06236264
#4 4 -0.06181904

Or using plyr

gp = group_by(xx, group)
plyr::summarize(gp, cor(a, b))
# cor(a, b)
#1 0.02739193

So either avoid loading both packages or specify the package by using package::function.

Calculate significance of correlation in grouped data with dplyr

Could this be what you want?

df %>%
group_by(group) %>%
summarize(cor.test(x,y)[["p.value"]])

The thing is that cor.test() returns a list and not a single value, so you need to pick the element out of the list that you are interested in.

Correlation using funs in dplyr

An alternative approach is to just call the cor function once since this will calculate all required correlations. Repeated calls to cor might be a performance issue for a large data set. Code to do this and extract the correlation pairs with labels could look like:

#
# calculate correlations and display in matrix format
#
cor_matrix <- df %>% group_by(Universe) %>%
do(as.data.frame(cor(.[,-1], method="spearman", use="pairwise.complete.obs")))
#
# to add row names
#
cor_matrix1 <- cor_matrix %>%
data.frame(row=rep(colnames(.)[-1], n_groups(.)))
#
# calculate correlations and display in column format
#
num_col=ncol(df[,-1])
out_indx <- which(upper.tri(diag(num_col)))
cor_cols <- df %>% group_by(Universe) %>%
do(melt(cor(.[,-1], method="spearman", use="pairwise.complete.obs"), value.name="cor")[out_indx,])

Ifelse with conditional on grouped data

Another possible solution, based on a nested ifelse:

library(dplyr)

example2 <- tibble::tribble(
~Group, ~Code, ~Value,
"1", "A", 1,
"1", "B", 1,
"1", "C", 5,
"2", "A", 1,
"2", "B", 5
)

example2 %>%
group_by(Group) %>%
mutate(GroupStatus = ifelse("C" %in% Code,
ifelse(Value[Code == "C"] == 5, 1, 0), 0)) %>%
ungroup

#> # A tibble: 5 × 4
#> Group Code Value GroupStatus
#> <chr> <chr> <dbl> <dbl>
#> 1 1 A 1 1
#> 2 1 B 1 1
#> 3 1 C 5 1
#> 4 2 A 1 0
#> 5 2 B 5 0

dplyr sample_n where n is the value of a grouped variable

One possible answer, but I'm not convinced it's the optimal answer: permute the rows of the data frame with dplyr::sample_frac (and a fraction of 1), then slice the required number of rows:

> set.seed(1)
> dg %>%
dplyr::sample_frac(1) %>%
dplyr::slice(1:unique(NDG))

This gives the correct output.

Source: local data frame [6 x 3]
Groups: GLB, NDG

Gene GLB NDG
1 A4GNT 3 1
2 AHSG 3 2
3 A4GNT 3 2
4 ACVR2B 10 1
5 AADAC 10 2
6 ACVR2B 10 2

And I suppose I can just write a function to do this in one line if necessary.

dbplyr group by dynamic variable names

Here are two potential approaches.

(1) Most similar to the approach you are already using, we first have to tell R that the character string should be treated as symbolic:

iris_table %>%
group_by(!!!syms(grouping_variable)) %>%
summarise(sum_petal_length = sum(Petal.Length))

Note the syms before the !!!. This approach uses some features of the rlang package that can be useful in other contexts. However, it is no longer the recommended approach for programming with dplyr.

(2) The recommended approach for doing this kind of programming with dplyr is:

iris_table %>%
group_by(.data[[grouping_variable]]) %>%
summarise(sum_petal_length = sum(Petal.Length))

Both of these approaches will give you the correct SQL translation when working with dbplyr:

data(iris)
iris_table = tbl_lazy(iris, con = simulate_mssql())
# The grouping variable
grouping_variable <- "Species"

# approach 1
iris_table %>%
group_by(!!!syms(grouping_variable)) %>%
summarise(sum_petal_length = sum(Petal.Length))
# translation from approach 1
# <SQL>
# SELECT `Species`, SUM(`Petal.Length`) AS `sum_petal_length`
# FROM `df`
# GROUP BY `Species`

# approach 2
iris_table %>%
group_by(.data[[grouping_variable]]) %>%
summarise(sum_petal_length = sum(Petal.Length))
# translation from approach 2
# <SQL>
# SELECT `Species`, SUM(`Petal.Length`) AS `sum_petal_length`
# FROM `df`
# GROUP BY `Species`

`group_by` and keep grouping levels as nested data frame's name

You need to add setNames in the map step :

library(tidyverse)

warpbreaks %>%
group_by(tension) %>%
nest() %>%
ungroup %>%
mutate(models=map(data,~glm(breaks~wool,data=.x)),
jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
p_cont = setNames(map(means,
~emmeans::contrast(.x, "pairwise",infer = c(T,T))),.$tension))

If you want to name all the list output use across :

warpbreaks %>%
group_by(tension) %>%
nest() %>%
ungroup %>%
mutate(models=map(data,~glm(breaks~wool,data=.x)),
jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
p_cont = map(means, ~emmeans::contrast(.x, "pairwise",infer = c(T,T))),
across(models:p_cont, setNames, .$tension)) -> result

result$jt

#$L
# model term df1 df2 F.ratio p.value
# wool 1 Inf 5.653 0.0174

#$M
# model term df1 df2 F.ratio p.value
# wool 1 Inf 1.253 0.2630

#$H
# model term df1 df2 F.ratio p.value
# wool 1 Inf 2.321 0.1277

Why does group_by() in dplyr not work when switching from sum to counting unique occurrences?

The problem were the colNames() that you defined and added to your call to datatable. I commented those lines out and it works. The problem didn't arise with your sum data.frame because here the colnames were actually present in the data.frame, which is not the case in the length(unique)) data.frame.

library(dplyr)
library(DT)
library(shiny)
library(shinyWidgets)
library(tidyverse)

ui <-
fluidPage(
fluidRow(
column(width = 8,
h3("Data table:"),
tableOutput("data"),
h3("Sum the data table columns:"),

radioButtons(
inputId = "grouping",
label = NULL,
choiceNames = c("By period 1", "By period 2"),
choiceValues = c("Period_1", "Period_2"),
selected = "Period_1",
inline = TRUE
),
DT::dataTableOutput("sums")
)
)
)

server <- function(input, output, session) {
mydat <- reactive({
data.frame(
ID = c(115,115,111,88,120,16),
Period_1 = c("2020-01", "2020-02", "2020-03", "2020-01", "2020-02", "2020-03"),
Period_2 = c(1, 2, 3, 1, 1, 4),
ColA = c(1000.01, 20, 30, 40, 50, 60),
ColB = c(15.06, 25, 35, 45, 55, 65)
)
})

# colNames <- reactive({c(input$grouping, "Col A", "Col B") })

summed_data <- reactive({
print(input$grouping)
mydat() %>%
dplyr::filter(Period_2 == 1) %>%
dplyr::group_by(!!sym(input$grouping)) %>%
dplyr::summarise(count = length(unique(ID)))
})

# summed_data <- reactive({
# print(input$grouping)
# data() %>%
# group_by(across(all_of(input$grouping))) %>%
# select("ColA","ColB") %>%
# summarise(across(everything(), sum))
# })

output$data <- renderTable(mydat())

output$sums <- renderDT({
summed_data() %>%
datatable(
rownames = FALSE,
# colnames=colNames() # < add colNames()
)
})

}

shinyApp(ui, server)


Related Topics



Leave a reply



Submit