R How Many Element Satisfy a Condition

R how many element satisfy a condition?

If z consists of only TRUE or FALSE, then simply

length(which(z))

How to count how many elements satisfy a condition in an idiomatic way?

Suggestion 1: A slightly more idiomatic way would be to replace

length(data[data <= myoffsets[i]])

with

sum(data <= myoffsets[i])

This way you don't end up taking a subset of data for each value in myoffsets, only to compute its length and discard.

Suggestion 2: The c() in the for is redundant. The following would do exactly the same with fewer keystrokes: for(i in 1:length(myoffsets)).

Lastly, if you prefer to get rid of the explicit loop, something like this might be to your taste:

myres$x <- sapply(myoffsets, function(o)sum(data<=o))

summarize count with a condition

Just change the last line:

df %>%
group_by(year) %>%
summarize(number_quadrats = n(), # find total number of rows
average_count = mean(count_numeric, na.rm=T),# find average value
number_p = sum(count == "p"))

By summing a boolean vector, you are essentially counting the number of times the condition is met.

What's the fastest way to find the number of elements that satisfy a condition?

Firstly, pre-compute values you use more than once, for example, r**2. With r=1000, this made it about 30% faster for me (~1.1s → ~.85s total run time).

r_sq = r**2

Next, to save memory, don't make a list of a filter when all you need to know is its length. Instead, sum over a map, or better yet, a generator expression:

q = sum(x**2 + y**2 <= r_sq for x, y in quardrant)
return (4*q - 4*r + 4) / r_sq

This saves a bit of time by not constructing a list, but as a bonus, using unpacking instead of indexing also saves a surprising amount of time -- about 7% for me (~.74s → ~.69s total run time).


Next, coming back to the first point, if you think about it, you're getting x and y from a product, which means you're calculating the square of each number 0..r, 2*r times. It'd be faster to calculate the squares ahead of time.

quardrant_sq = product((x**2 for x in range(r+1)), repeat=2)
q = sum(a+b <= r_sq for a, b in quardrant_sq)

This gives a massive improvement, about 250% faster! (~.66s → ~.19s total run time).


Lastly, since you're dealing with only numbers, you could look into using NumPy to further optimize your code.

Remove elements from a list by condition

We need to extract the column within the loop. LDF is a list of data.frame/tibble, thus LDF$Value doesn't exist

i1 <- sapply(LDF, function(x) sum(x$Value)) > 0
LDF[i1]

-output

[[1]]
# A tibble: 18 x 2
Date Value
<date> <dbl>
1 2021-05-18 120000
2 2021-05-20 40000
3 2021-05-31 55000
4 2021-05-31 -11.4
5 2021-06-01 -115092.
6 2021-06-09 30000
7 2021-06-17 98400
8 2021-07-01 1720
9 2021-07-01 50000
10 2021-07-01 -50063.
11 2021-07-12 -2503.
12 2021-07-13 -20022.
13 2021-08-09 28619.
14 2021-08-25 45781.
15 2021-09-01 14954.
16 2021-09-10 -6017.
17 2021-09-15 -3311.
18 2021-09-16 -140373.

To check the elements that are deleted, negate (!) the logical vector and check

which(!i1)

gives the position

LDF[!i1]

Or may use Filter as well

Filter(\(x) sum(x$Value) >0, LDF)

Or with keep from purrr

library(purrr)
keep(LDF, ~ sum(.x$Value) > 0)

Or the opposite is discard

discard(LDF, ~ sum(.x$Value) > 0)

Return table with count of elements that match a condition

Subset the data for "PASS" value and then use table :

temp <- subset(df, Outcome == 'PASS')
table(temp$Participant, temp$Trial)

# T01 T02 T03
# P01 2 0 1
# P02 1 2 1
# P03 0 2 1


Related Topics



Leave a reply



Submit