Sum Object in a Column Between an Interval Defined by Another Column

Sum object in a column between an interval defined by another dataframe

Try:

mapply(myfn, C, D)
# [1] 7 31 12

The problem is that your function is not naturally vectorized. You can see that because your return value is a sum of the inputs, and sum is not a vectorized operation.

Beyond that, if you look at myfn, the expression A < c & A >= d doesn't make sense when c and d have more than one value. There, you are comparing each value in df to the corresponding value in your C and D vectors (so first value to first, second to second, etc.), instead of comparing all the values in df to each value in C and D in turn.

By using mapply, I'm basically looping through your function with as arguments a single value from C and D at a time.

Fortunately in your case it turns out that C,D have different number of elements than df, so you actually got a warning. If they were the same length you would not have gotten a warning and you would have gotten a single value answer, instead of the three you are presumably looking for.

There are better ways to do this, but the mapply approach is pretty trivial here and works with your code pretty much as is.

Sum object in a column between an interval defined by another column

You're almost there.

with(my.df, sum(my.df[A >= 1 & A < 3, "B"]))

EDIT

Chase challenged me to explain away the code at hand. When reading R code, it's best to read from inwards out. Here we go.

my.df is a data.frame (think Excel sheet) with columns A and B. Square brackets [] are used to subset anything from this object like so: [rows, columns]. For example, [1, ] would return the entire first row, and if you add a column number (or column name), you get value in first row in that column (e.g. [1, 2], where you would get value in the first row of the second column).
We will now subset rows in my.df with A >= 1 & A < 3. What we're saying here is we want to see those rows that have values in A bigger or equal to 1 and smaller than 3. This will give us all rows that satisfy this criterion. If we add , "B" it means we want to select column B. Since we already subsetted the result to contain only rows that fit the above criterion, by entering column name B we get values only in the column. Once you have those values from column B you sum them using sum().
Use of with function is there to make our life easier. If we hadn't used that, we would be forced to call columns by their full name my.df$A and my.df$B.

R - How to sum objects in a column between an interval defined by conditions on another column

Try the following, assuming your original dataframe is df:

df2 <- df # create a duplicate df to destroy
z <- data.frame(nrow=length(unique(df$A)), ncol=2) # output dataframe
names(z) <- c("A","B")
j <- 1 # output indexing variable
u <- unique(df$A) # unique vals of A
i <- u[1]
s <- TRUE # just for the while() loop
while(s){
z[j,] <- c(i,sum(df2[df2$A %in% c(i-1,i,i+1),2]))
df2 <- df2[!df2$A %in% c(i-1,i,i+1),]
j <- j + 1 # index the output
u <- u[!u %in% c(i-1,i,i+1)] # cleanup the u vector
if(length(u)==0) # conditionally exit the loop
s <- FALSE
else
i <- min(u) # reset value to sum by
}

I know that's kind of messy code, but it's a sort of tough problem given all of the different indices.

Why can't I sum values ​for a given period with intervals present in another table?

as I understood the question, all you need is to show for how many hours a user rented something divided by user's statuses (free, VIP, VIP+)

If so, you're almost there. I believe the problem is in "r.date between" condition.

when I changed it to

and r.date between h.date_from and h.date_to

the query gave me answer which seems to be correct.

PS. Not sure if you need to use left join here. As far as I can understand the logic, I think a regular join suits better

UPD: would this help?

SELECT p.name, 
h.group_name,
sum(case when r.type = '10' then r.hours else 0 end)
FROM client p
left join history h
on h.user_cid = p.c_id
join rent r
on r.owner_id = p.c_id
and r.date between h.date_from and h.date_to
where p.c_id = 'poa'
and r.date between '2021-01-01' and '2021-06-30'
group by p.name, h.group_name

UPD2: Are you sure you have tested on same data as in the question? On my test system it gave exactly result you needed

result

UPD3: here, check the result with some additional test data
Note for simplicity reason I have commented date limitation clause

result with additional data

How to sum the variables according to the times it falls into t1[i] t[j] t2[i]

k and j are the same in your loops, and the inner loop can be replaced with a vectorized version:

t3 <- head(t3,-1) # editing the error the OP left in place
nint <- length(t1)
N <- vector('list',nint)
Sums <- vector('integer',nint)
for (i in 1:nint){
N[[i]] <- numbers[which(findInterval(t3,c(t1[i],t2[i]))==1)]
Sums[i] <- sum(N[[i]])
}

Comment 1. This gives the same result as @bmoore's, with the numbers stored in N and then summed in Sums. You need N to be a list to do what you were intending, I think, while this line

N[i] <- numbers[i]

was overwriting a single value, instead of adding it to the vector as @holgrich did with c(N,numbers[i]).

Comment 2. findInterval can do unexpected things when t3 equals either t1[i] or t2[i], so you could instead use which(t3 > t1[i] & t3 < t2[i]) to state the inequalities explicitly.

Comment 3. Going without loops entirely, as in @bmoore's answer, is the more standard thing to do in R.

How to calculate the sum of values between a range of time in R?

You need to supply an origin to convert those numerics to Date Time objects. You will get an error otherwise telling you to do so. Afterwards, cutting based on this variable is simple.

cuts <- as.Date(date.per.year, origin = as.Date("1970/1/1"))
binned <- cut(df$Dates,
breaks = cuts)

N.B. Breakpoints are inclusive, so your df$Dates will be NA for the first and last several values.

You will notice that, for example, the unique levels of this date-time factor are

 unique(binned)
[1] <NA> 2003-02-27 2004-02-12 2005-02-01 2006-02-17
[6] 2007-02-20 2008-02-21 2009-03-18 2010-02-12 2011-02-23
[11] 2012-03-09 2013-03-28
11 Levels: 2003-02-27 2004-02-12 2005-02-01 ... 2013-03-28

As per group-wise sum, there are thousands of Stack Overflow posts which may help you accomplish this. You could for example,

  df %>% mutate(binned = cut(Dates, breaks =cuts)) %>% 
group_by(binned) %>% summarize(sum(rain))

# A tibble: 12 x 2
binned sum(rain)
<fctr> <dbl>
1 2003-02-27 7.996658
2 2004-02-12 -11.950646
3 2005-02-01 30.443479
4 2006-02-17 19.687989
5 2007-02-20 -2.088648
6 2008-02-21 33.837560
7 2009-03-18 -5.039810
8 2010-02-12 -5.235960
9 2011-02-23 -9.806273
10 2012-03-09 -3.887545
11 2013-03-28 30.446548
12 NA 36.634249

remember that the NA in row 12 represents the total sum of rain both before 2003-02-27 and after 2013-03-28.



Related Topics



Leave a reply



Submit