Grouped correlation with dplyr (works only on console)
What you experience is related to having both plyr
and dplyr
loaded at the same time. Since both packages have summarize
functions, there can be conflicts if you don't specify explicitly which package you want to use. For the example data, this means:
require(dplyr)
set.seed(123)
xx = data.frame(group = rep(1:4, 100), a = rnorm(400) , b = rnorm(400))
Using dplyr
as intended:
gp = group_by(xx, group)
dplyr::summarize(gp, cor(a, b))
#Source: local data frame [4 x 2]
#
# group cor(a, b)
#1 1 -0.02073084
#2 2 0.12803353
#3 3 0.06236264
#4 4 -0.06181904
Or using plyr
gp = group_by(xx, group)
plyr::summarize(gp, cor(a, b))
# cor(a, b)
#1 0.02739193
So either avoid loading both packages or specify the package by using package::function.
Calculate significance of correlation in grouped data with dplyr
Could this be what you want?
df %>%
group_by(group) %>%
summarize(cor.test(x,y)[["p.value"]])
The thing is that cor.test()
returns a list and not a single value, so you need to pick the element out of the list that you are interested in.
Correlation using funs in dplyr
An alternative approach is to just call the cor
function once since this will calculate all required correlations. Repeated calls to cor
might be a performance issue for a large data set. Code to do this and extract the correlation pairs with labels could look like:
#
# calculate correlations and display in matrix format
#
cor_matrix <- df %>% group_by(Universe) %>%
do(as.data.frame(cor(.[,-1], method="spearman", use="pairwise.complete.obs")))
#
# to add row names
#
cor_matrix1 <- cor_matrix %>%
data.frame(row=rep(colnames(.)[-1], n_groups(.)))
#
# calculate correlations and display in column format
#
num_col=ncol(df[,-1])
out_indx <- which(upper.tri(diag(num_col)))
cor_cols <- df %>% group_by(Universe) %>%
do(melt(cor(.[,-1], method="spearman", use="pairwise.complete.obs"), value.name="cor")[out_indx,])
Ifelse with conditional on grouped data
Another possible solution, based on a nested ifelse
:
library(dplyr)
example2 <- tibble::tribble(
~Group, ~Code, ~Value,
"1", "A", 1,
"1", "B", 1,
"1", "C", 5,
"2", "A", 1,
"2", "B", 5
)
example2 %>%
group_by(Group) %>%
mutate(GroupStatus = ifelse("C" %in% Code,
ifelse(Value[Code == "C"] == 5, 1, 0), 0)) %>%
ungroup
#> # A tibble: 5 × 4
#> Group Code Value GroupStatus
#> <chr> <chr> <dbl> <dbl>
#> 1 1 A 1 1
#> 2 1 B 1 1
#> 3 1 C 5 1
#> 4 2 A 1 0
#> 5 2 B 5 0
dplyr sample_n where n is the value of a grouped variable
One possible answer, but I'm not convinced it's the optimal answer: permute the rows of the data frame with dplyr::sample_frac
(and a fraction of 1), then slice the required number of rows:
> set.seed(1)
> dg %>%
dplyr::sample_frac(1) %>%
dplyr::slice(1:unique(NDG))
This gives the correct output.
Source: local data frame [6 x 3]
Groups: GLB, NDG
Gene GLB NDG
1 A4GNT 3 1
2 AHSG 3 2
3 A4GNT 3 2
4 ACVR2B 10 1
5 AADAC 10 2
6 ACVR2B 10 2
And I suppose I can just write a function to do this in one line if necessary.
dbplyr group by dynamic variable names
Here are two potential approaches.
(1) Most similar to the approach you are already using, we first have to tell R that the character string should be treated as symbolic:
iris_table %>%
group_by(!!!syms(grouping_variable)) %>%
summarise(sum_petal_length = sum(Petal.Length))
Note the syms
before the !!!
. This approach uses some features of the rlang package that can be useful in other contexts. However, it is no longer the recommended approach for programming with dplyr.
(2) The recommended approach for doing this kind of programming with dplyr is:
iris_table %>%
group_by(.data[[grouping_variable]]) %>%
summarise(sum_petal_length = sum(Petal.Length))
Both of these approaches will give you the correct SQL translation when working with dbplyr:
data(iris)
iris_table = tbl_lazy(iris, con = simulate_mssql())
# The grouping variable
grouping_variable <- "Species"
# approach 1
iris_table %>%
group_by(!!!syms(grouping_variable)) %>%
summarise(sum_petal_length = sum(Petal.Length))
# translation from approach 1
# <SQL>
# SELECT `Species`, SUM(`Petal.Length`) AS `sum_petal_length`
# FROM `df`
# GROUP BY `Species`
# approach 2
iris_table %>%
group_by(.data[[grouping_variable]]) %>%
summarise(sum_petal_length = sum(Petal.Length))
# translation from approach 2
# <SQL>
# SELECT `Species`, SUM(`Petal.Length`) AS `sum_petal_length`
# FROM `df`
# GROUP BY `Species`
`group_by` and keep grouping levels as nested data frame's name
You need to add setNames
in the map
step :
library(tidyverse)
warpbreaks %>%
group_by(tension) %>%
nest() %>%
ungroup %>%
mutate(models=map(data,~glm(breaks~wool,data=.x)),
jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
p_cont = setNames(map(means,
~emmeans::contrast(.x, "pairwise",infer = c(T,T))),.$tension))
If you want to name all the list output use across
:
warpbreaks %>%
group_by(tension) %>%
nest() %>%
ungroup %>%
mutate(models=map(data,~glm(breaks~wool,data=.x)),
jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
p_cont = map(means, ~emmeans::contrast(.x, "pairwise",infer = c(T,T))),
across(models:p_cont, setNames, .$tension)) -> result
result$jt
#$L
# model term df1 df2 F.ratio p.value
# wool 1 Inf 5.653 0.0174
#$M
# model term df1 df2 F.ratio p.value
# wool 1 Inf 1.253 0.2630
#$H
# model term df1 df2 F.ratio p.value
# wool 1 Inf 2.321 0.1277
Why does group_by() in dplyr not work when switching from sum to counting unique occurrences?
The problem were the colNames()
that you defined and added to your call to datatable
. I commented those lines out and it works. The problem didn't arise with your sum
data.frame
because here the colnames
were actually present in the data.frame
, which is not the case in the length(unique))
data.frame
.
library(dplyr)
library(DT)
library(shiny)
library(shinyWidgets)
library(tidyverse)
ui <-
fluidPage(
fluidRow(
column(width = 8,
h3("Data table:"),
tableOutput("data"),
h3("Sum the data table columns:"),
radioButtons(
inputId = "grouping",
label = NULL,
choiceNames = c("By period 1", "By period 2"),
choiceValues = c("Period_1", "Period_2"),
selected = "Period_1",
inline = TRUE
),
DT::dataTableOutput("sums")
)
)
)
server <- function(input, output, session) {
mydat <- reactive({
data.frame(
ID = c(115,115,111,88,120,16),
Period_1 = c("2020-01", "2020-02", "2020-03", "2020-01", "2020-02", "2020-03"),
Period_2 = c(1, 2, 3, 1, 1, 4),
ColA = c(1000.01, 20, 30, 40, 50, 60),
ColB = c(15.06, 25, 35, 45, 55, 65)
)
})
# colNames <- reactive({c(input$grouping, "Col A", "Col B") })
summed_data <- reactive({
print(input$grouping)
mydat() %>%
dplyr::filter(Period_2 == 1) %>%
dplyr::group_by(!!sym(input$grouping)) %>%
dplyr::summarise(count = length(unique(ID)))
})
# summed_data <- reactive({
# print(input$grouping)
# data() %>%
# group_by(across(all_of(input$grouping))) %>%
# select("ColA","ColB") %>%
# summarise(across(everything(), sum))
# })
output$data <- renderTable(mydat())
output$sums <- renderDT({
summed_data() %>%
datatable(
rownames = FALSE,
# colnames=colNames() # < add colNames()
)
})
}
shinyApp(ui, server)
Related Topics
Scraping Tables on Multiple Web Pages with Rvest in R
Visualising and Rotating a Matrix
How to Take a Rolling Product Using Data.Table
Separate Ordering in Ggplot Facets
Plot Line on Top of Stacked Bar Chart in Ggplot2
Installing R Studio with Anaconda
Use Loop to Split a List into Multiple Dataframes
How to Classify a Given Date/Time by the Season (E.G. Summer, Autumn)
Use of .By and .Eachi in the Data.Table Package
Making Gsub Only Replace Entire Words
R: How to Aggregate Some Columns While Keeping Other Columns
Format Axis Tick Labels to Percentage in Plotly
Update an Entire Row in Data.Table in R
Tidyr Separate Only First N Instances
System Is Computationally Singular: Reciprocal Condition Number in R
Memory Limits in Data Table: Negative Length Vectors Are Not Allowed
How to Select_If in Dplyr, Where the Logical Condition Is Negated