Conditional Mean Statement

Conditional mean statement

If you want to exclude the non-smokers, you have a few options. The easiest is probably this:

mean(bwght[bwght$cigs>0,"cigs"])

With a data frame, the first variable is the row and the next is the column. So, you can subset using dataframe[1,2] to get the first row, second column. You can also use logic in the row selection. By using bwght$cigs>0 as the first element, you are subsetting to only have the rows where cigs is not zero.

Your other ones didn't work for the following reasons:

mean(bwght$cigs| bwght$cigs>0)

This is effectively a logical comparison. You're asking for the TRUE / FALSE result of bwght$cigs OR bwght$cigs>0, and then taking the mean on it. I'm not totally sure, but I think R can't even take data typed as logical for the mean() function.

mean(bwght$cigs>0 | bwght$cigs=TRUE)

Same problem. You use the | sign, which returns a logical, and R is trying to take the mean of logicals.

if(bwght$cigs > 0){sum(bwght$cigs)}

By any chance, were you a SAS programmer originally? This looks like how I used to type at first. Basically, if() doesn't work the same way in R as it does in SAS. In that example, you are using bwght$cigs > 0 as the if condition, which won't work because R will only look at the first element of the vector resulting from bwght$cigs > 0. R handles looping differently from SAS - check out functions like lapply, tapply, and so on.

x <-as.numeric(bwght$cigs, rm="0")
mean(x)

I honestly don't know what this would do. It might work if rm="0" didn't have quotes...?

Conditional Mean Statement Across Dataframe

The apply-family of functions is useful here:

sapply(df, function(x) mean(x[x>6], na.rm=T))

SQL: Conditional mean based on groups

You can calculate the median as an analytic function and then aggregate:

select group_id,
avg(case when value < value_median then value end) as avg_below_median
from (select t.*,
median(value) over (partition by group_id) as value_median
from t
) t
group by group_id;

Note: You could filter using a where clause as well:

select group_id, avg(value) as avg_below_median
from (select t.*,
median(value) over (partition by group_id) as value_median
from t
) t
where value < value_median
group by group_id;

But the first method makes it simpler to add other expressions.

how to calculate the mean with conditions?

Here's a quick data.table solution (assuming coef is a)

library(data.table)
setDT(df)[, .(MeanASmall = mean(b[-40 <= a & a < 0]),
MeanABig = mean(b[0 <= a & a <= 40]))]
# MeanASmall MeanABig
# 1: 33.96727 89.46

If a range is limited, you could do this quickly with base R too

sapply(split(df, df$a >= 0), function(x) mean(x$b))
# FALSE TRUE
# 33.96727 89.46000

Conditional mean over a Pandas DataFrame

Conditional mean is indeed a thing in pandas. You can use DataFrame.groupby():

means = data2.groupby('voteChoice').mean()

or maybe, in your case, the following would be more efficient:

means = data2.groupby('voteChoice')['socialIdeology2'].mean()

to drill down to the mean you're looking for. (The first case will calculate means for all columns.) This is assuming that voteChoice is the name of the column you want to condition on.

How to calculate the mean in R with several conditions

This will compute stim_ending_t (6) x modality (3) = 18 group means.

First I generate some data like your analysis_v or analysis_a data frames:

library(dplyr)
library(tidyr)

analysis_v <- data.frame(stim_ending_t = rep(seq(1, 3.5, 0.5), each = 30),
visbility = rep(c(1, 0, 0), 60),
soundvolume = rep(c(0, 1, 0), 60),
key_resp_2.rt = runif(180, 1, 5))

Then I pipe the object into the code block:

analysis_v %>% 
group_by(stim_ending_t, visbility, soundvolume) %>%
summarize(average = mean(key_resp_2.rt)) %>%
ungroup() %>%
mutate(key = case_when(visbility == 0 & soundvolume == 0 ~ "blank",
visbility == 0 & soundvolume == 1 ~ "only_sound",
visbility == 1 & soundvolume == 0 ~ "only_images")) %>%
select(-visbility, -soundvolume) %>%
spread(key, average)

Which results in the requested output format:

# A tibble: 6 x 4
stim_ending_t blank only_images only_sound
<dbl> <dbl> <dbl> <dbl>
1 1 3.28 3.55 2.84
2 1.5 2.64 3.11 2.32
3 2 3.27 3.72 2.42
4 2.5 2.14 3.01 2.30
5 3 2.47 3.03 3.02
6 3.5 2.93 2.92 2.78

You would need to repeat the code block using analysis_a to get those means.

Calculate conditional mean in R with dplyr (like group by in SQL)

I think what you are looking for (if you want to use dplyr) is a combination of the functions group_byand mutate.

library(dplyr)
city <- c("a", "a", "b", "b", "c")
temp <- 1:5
df <- data.frame(city, temp)

df %>% group_by(city) %>% mutate(mean(temp))

Which would output:

    city  temp mean(temp)
(fctr) (int) (dbl)
1 a 1 1.5
2 a 2 1.5
3 b 3 3.5
4 b 4 3.5
5 c 5 5.0

On a side note, I do not think 50,000 rows is that big of a data set for dplyr. I would not worry too much unless this code is going to be inside some kind of loop or you have 1M+ rows. As Heroka sugested in the comments, data.table is a better alternative when it comes to performance in most cases.

Edit: removed unnecessary step

Python need to get the average or mean of a column of data when the value in a different column is between two values

Use the pandas between function:

df.loc[df['ColB'].between(7, 8), 'ColA'].mean()


Related Topics



Leave a reply



Submit