Conditional mean statement
If you want to exclude the non-smokers, you have a few options. The easiest is probably this:
mean(bwght[bwght$cigs>0,"cigs"])
With a data frame, the first variable is the row and the next is the column. So, you can subset using dataframe[1,2]
to get the first row, second column. You can also use logic in the row selection. By using bwght$cigs>0
as the first element, you are subsetting to only have the rows where cigs
is not zero.
Your other ones didn't work for the following reasons:
mean(bwght$cigs| bwght$cigs>0)
This is effectively a logical comparison. You're asking for the TRUE / FALSE result of bwght$cigs OR bwght$cigs>0
, and then taking the mean on it. I'm not totally sure, but I think R can't even take data typed as logical for the mean()
function.
mean(bwght$cigs>0 | bwght$cigs=TRUE)
Same problem. You use the |
sign, which returns a logical, and R is trying to take the mean of logicals.
if(bwght$cigs > 0){sum(bwght$cigs)}
By any chance, were you a SAS programmer originally? This looks like how I used to type at first. Basically, if()
doesn't work the same way in R as it does in SAS. In that example, you are using bwght$cigs > 0
as the if condition, which won't work because R will only look at the first element of the vector resulting from bwght$cigs > 0. R handles looping differently from SAS - check out functions like lapply, tapply, and so on.
x <-as.numeric(bwght$cigs, rm="0")
mean(x)
I honestly don't know what this would do. It might work if rm="0"
didn't have quotes...?
Conditional Mean Statement Across Dataframe
The apply-family of functions is useful here:
sapply(df, function(x) mean(x[x>6], na.rm=T))
SQL: Conditional mean based on groups
You can calculate the median as an analytic function and then aggregate:
select group_id,
avg(case when value < value_median then value end) as avg_below_median
from (select t.*,
median(value) over (partition by group_id) as value_median
from t
) t
group by group_id;
Note: You could filter using a where
clause as well:
select group_id, avg(value) as avg_below_median
from (select t.*,
median(value) over (partition by group_id) as value_median
from t
) t
where value < value_median
group by group_id;
But the first method makes it simpler to add other expressions.
how to calculate the mean with conditions?
Here's a quick data.table
solution (assuming coef
is a
)
library(data.table)
setDT(df)[, .(MeanASmall = mean(b[-40 <= a & a < 0]),
MeanABig = mean(b[0 <= a & a <= 40]))]
# MeanASmall MeanABig
# 1: 33.96727 89.46
If a
range is limited, you could do this quickly with base R too
sapply(split(df, df$a >= 0), function(x) mean(x$b))
# FALSE TRUE
# 33.96727 89.46000
Conditional mean over a Pandas DataFrame
Conditional mean is indeed a thing in pandas. You can use DataFrame.groupby()
:
means = data2.groupby('voteChoice').mean()
or maybe, in your case, the following would be more efficient:
means = data2.groupby('voteChoice')['socialIdeology2'].mean()
to drill down to the mean you're looking for. (The first case will calculate means for all columns.) This is assuming that voteChoice
is the name of the column you want to condition on.
How to calculate the mean in R with several conditions
This will compute stim_ending_t (6) x modality (3) = 18 group means.
First I generate some data like your analysis_v
or analysis_a
data frames:
library(dplyr)
library(tidyr)
analysis_v <- data.frame(stim_ending_t = rep(seq(1, 3.5, 0.5), each = 30),
visbility = rep(c(1, 0, 0), 60),
soundvolume = rep(c(0, 1, 0), 60),
key_resp_2.rt = runif(180, 1, 5))
Then I pipe the object into the code block:
analysis_v %>%
group_by(stim_ending_t, visbility, soundvolume) %>%
summarize(average = mean(key_resp_2.rt)) %>%
ungroup() %>%
mutate(key = case_when(visbility == 0 & soundvolume == 0 ~ "blank",
visbility == 0 & soundvolume == 1 ~ "only_sound",
visbility == 1 & soundvolume == 0 ~ "only_images")) %>%
select(-visbility, -soundvolume) %>%
spread(key, average)
Which results in the requested output format:
# A tibble: 6 x 4
stim_ending_t blank only_images only_sound
<dbl> <dbl> <dbl> <dbl>
1 1 3.28 3.55 2.84
2 1.5 2.64 3.11 2.32
3 2 3.27 3.72 2.42
4 2.5 2.14 3.01 2.30
5 3 2.47 3.03 3.02
6 3.5 2.93 2.92 2.78
You would need to repeat the code block using analysis_a
to get those means.
Calculate conditional mean in R with dplyr (like group by in SQL)
I think what you are looking for (if you want to use dplyr) is a combination of the functions group_by
and mutate
.
library(dplyr)
city <- c("a", "a", "b", "b", "c")
temp <- 1:5
df <- data.frame(city, temp)
df %>% group_by(city) %>% mutate(mean(temp))
Which would output:
city temp mean(temp)
(fctr) (int) (dbl)
1 a 1 1.5
2 a 2 1.5
3 b 3 3.5
4 b 4 3.5
5 c 5 5.0
On a side note, I do not think 50,000 rows is that big of a data set for dplyr. I would not worry too much unless this code is going to be inside some kind of loop or you have 1M+ rows. As Heroka sugested in the comments, data.table is a better alternative when it comes to performance in most cases.
Edit: removed unnecessary step
Python need to get the average or mean of a column of data when the value in a different column is between two values
Use the pandas
between function:
df.loc[df['ColB'].between(7, 8), 'ColA'].mean()
Related Topics
Increase Space Between Legend Keys Without Increasing Legend Keys
Filter Data Table by Dynamic Column Name
How to Use R to Create a Word Co-Occurrence Matrix
Uri Routing for Shinydashboard Using Shiny.Router
Complete Time Series by Group in R
How to Load Any Package in R (Unable to Load Shared Object)
How to Not Plot Gaps in Timeseries with R
Grouped Bar Graph Custom Colours
How Is Ggplot2 Plus Operator Defined
Bar Plot for Count Data by Group in R
Programmatically Create Tab and Plot in Markdown
Convert Byte Encoding to Unicode
Compare Two Columns Element-Wise