Adding New Column with Diff() Function When There Is One Less Row in R

Adding new column with diff() function when there is one less row in R

Here are two approaches. Both put an NA in the first row of diff_qsec and put diff(qsec) in the remaining rows:

library(dplyr)  
mtcars %>% mutate(diff_qsec = qsec - lag(qsec)) # dplyr has its own version of lag

transform(mtcars, diff_qsec = c(NA, diff(qsec)))

Also, on the general issue of padding see: How can I pad a vector with NA from the front?

Calculate derivative diff() and keep length - add NA

From this answer to a question of mine.

If you were looking for a generic way to prepend NA

pad  <- function(x, n) {
    len.diff <- n - length(x)
    c(rep(NA, len.diff), x) 
} 

x <- 1:10
dif <- pad(diff(x, lag=1), length(x))

but if you are not afraid to bring in zoo library it's better to do:

library(zoo)
x <- 1:5
as.vector(diff(zoo(x), na.pad=TRUE)) # convert x to zoo first, then diff (that invokes zoo's diff which takes a na.pad=TRUE)
# NA 1 1 1 1 (same length as original x vector)

Calculate difference between values in consecutive rows by group

The package data.table can do this fairly quickly, using the shift function.

require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame

df[ , diff := value - shift(value), by = group]    
#   group value diff
#1:     1    10   NA
#2:     1    20   10
#3:     1    25    5
#4:     2     5   NA
#5:     2    10    5
#6:     2    15    5
setDF(df) #if you want to convert back to old data.frame syntax

Or using the lag function in dplyr

df %>%
    group_by(group) %>%
    mutate(Diff = value - lag(value))
#   group value  Diff
#   <int> <int> <int>
# 1     1    10    NA
# 2     1    20    10
# 3     1    25     5
# 4     2     5    NA
# 5     2    10     5
# 6     2    15     5

For alternatives pre-data.table::shift and pre-dplyr::lag, see edits.

R apply() custom function to every row in data frame

Another approach is modifying your existing function such that it is vectorised.

    t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
    if(!equal.variance) 
    {
        se <- sqrt( (s1^2/n1) + (s2^2/n2) )
        # welch-satterthwaite df
        df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
    } else
    {
        # pooled standard deviation, scaled by the sample sizes
        se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) ) 
        df <- n1+n2-2
    }      
    t <- (m1-m2-m0)/se 
    dat <- vapply(seq_len(length(m1)), 
                  function(x){c(m1[x]-m2[x], se[x], t[x], 2*pt(-abs(t[x]),df[x]))},
                  numeric(4))  #one tailed m2 > m1. Replace with "2*pt(-abs(t),df))" for two tailed. 
    dat <- t(dat)
    dat <- as.data.frame(dat)
    names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
    return(dat) 
}

This approach allows you to pass in vectors for your various inputs and it will provide a data frame of equal length to your inputs. It uses the vapply function to return a vector of length 4 for each value provided.

Under this approach, you can simply go

t.test2(MPAmeans$reference_mean, MPAmeans$MPA_mean, MPAmeans$sd_reference, MPAmeans$sd_MPA, MPAmeans$n_reference, MPAmeans$n_MPA)

(or whatever you end up calling your variables)

How to drop observations with inter-row difference being less than a specific value

If I understood correctly, you could do:

library(data.table)

z <- z[, filt := min(x), by = cumsum(c(1, +(x >= shift(x) + 0.5)[-1]))][
  , filt := ifelse(x == filt, 
                   shift(x, fill = x[1]), 
                   filt)][
                     x - filt >= 0.5 | x == filt, ][, filt := NULL]

Explanation:

First we calculate the minimum of x by each group;
Group is created by cumsum(c(1, +(x >= shift(x) + 0.5)[-1])). Therein, we check for each row whether x >= shift(x) + 0.5 (difference between x and previous row is larger or equal to 0.5). This evaluates to TRUE or FALSE which we turn to 1 and 0 with the + sign; as the first row will always be NA (as there is no previous one), we remove it with [-1] after the expression. As this means the first value will be missing from the vector, we construct another one which begins with 1 and is followed by what we have computed before. Afterwards we apply the cumsum - the latter assigns a value each time when there is a new row larger or equal than previous one + 0.5; if there is no such row in-between, it continues assigning the last number (as we have inserted 1 as the beginning of vector, it will start at 1 and increase by +1 every time it'll encounter the row which satisfied the condition for non-exclusion);
There will be rows with only 1 row per previously created groups; in this case, we need to cross-check for difference with the exact previous row. In all other cases we cross-check for difference with the first row of the group (i.e. last row which should not be deleted according to criteria as it was larger than previous one + 0.5);
After that we just remove those rows which don't satisfy the condition plus we keep the row which is equal to itself (will always be the first one); we remove the filtering variable at the end.

Output:

      x          t
1: 10.0 1970-01-28
2: 10.5 1970-02-02
3: 11.1 1970-02-03
4: 14.0 1970-02-04
5: 14.6 1970-02-08
6: 17.0 1970-02-09
7: 30.0 1970-02-11

Adding New Column with Diff() Function When There Is One Less Row in R