Adding new column with diff() function when there is one less row in R
Here are two approaches. Both put an NA
in the first row of diff_qsec
and put diff(qsec)
in the remaining rows:
library(dplyr)
mtcars %>% mutate(diff_qsec = qsec - lag(qsec)) # dplyr has its own version of lag
transform(mtcars, diff_qsec = c(NA, diff(qsec)))
Also, on the general issue of padding see: How can I pad a vector with NA from the front?
Calculate derivative diff() and keep length - add NA
From this answer to a question of mine.
If you were looking for a generic way to prepend NA
pad <- function(x, n) {
len.diff <- n - length(x)
c(rep(NA, len.diff), x)
}
x <- 1:10
dif <- pad(diff(x, lag=1), length(x))
but if you are not afraid to bring in zoo
library it's better to do:
library(zoo)
x <- 1:5
as.vector(diff(zoo(x), na.pad=TRUE)) # convert x to zoo first, then diff (that invokes zoo's diff which takes a na.pad=TRUE)
# NA 1 1 1 1 (same length as original x vector)
Calculate difference between values in consecutive rows by group
The package data.table
can do this fairly quickly, using the shift
function.
require(data.table)
df <- data.table(group = rep(c(1, 2), each = 3), value = c(10,20,25,5,10,15))
#setDT(df) #if df is already a data frame
df[ , diff := value - shift(value), by = group]
# group value diff
#1: 1 10 NA
#2: 1 20 10
#3: 1 25 5
#4: 2 5 NA
#5: 2 10 5
#6: 2 15 5
setDF(df) #if you want to convert back to old data.frame syntax
Or using the lag
function in dplyr
df %>%
group_by(group) %>%
mutate(Diff = value - lag(value))
# group value Diff
# <int> <int> <int>
# 1 1 10 NA
# 2 1 20 10
# 3 1 25 5
# 4 2 5 NA
# 5 2 10 5
# 6 2 15 5
For alternatives pre-data.table::shift
and pre-dplyr::lag
, see edits.
R apply() custom function to every row in data frame
Another approach is modifying your existing function such that it is vectorised.
t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
if(!equal.variance)
{
se <- sqrt( (s1^2/n1) + (s2^2/n2) )
# welch-satterthwaite df
df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
} else
{
# pooled standard deviation, scaled by the sample sizes
se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) )
df <- n1+n2-2
}
t <- (m1-m2-m0)/se
dat <- vapply(seq_len(length(m1)),
function(x){c(m1[x]-m2[x], se[x], t[x], 2*pt(-abs(t[x]),df[x]))},
numeric(4)) #one tailed m2 > m1. Replace with "2*pt(-abs(t),df))" for two tailed.
dat <- t(dat)
dat <- as.data.frame(dat)
names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
return(dat)
}
This approach allows you to pass in vectors for your various inputs and it will provide a data frame of equal length to your inputs. It uses the vapply
function to return a vector of length 4 for each value provided.
Under this approach, you can simply go
t.test2(MPAmeans$reference_mean, MPAmeans$MPA_mean, MPAmeans$sd_reference, MPAmeans$sd_MPA, MPAmeans$n_reference, MPAmeans$n_MPA)
(or whatever you end up calling your variables)
How to drop observations with inter-row difference being less than a specific value
If I understood correctly, you could do:
library(data.table)
z <- z[, filt := min(x), by = cumsum(c(1, +(x >= shift(x) + 0.5)[-1]))][
, filt := ifelse(x == filt,
shift(x, fill = x[1]),
filt)][
x - filt >= 0.5 | x == filt, ][, filt := NULL]
Explanation:
- First we calculate the minimum of
x
by each group; - Group is created by
cumsum(c(1, +(x >= shift(x) + 0.5)[-1]))
. Therein, we check for each row whetherx >= shift(x) + 0.5
(difference betweenx
and previous row is larger or equal to 0.5). This evaluates toTRUE
orFALSE
which we turn to 1 and 0 with the+
sign; as the first row will always beNA
(as there is no previous one), we remove it with[-1]
after the expression. As this means the first value will be missing from the vector, we construct another one which begins with 1 and is followed by what we have computed before. Afterwards we apply thecumsum
- the latter assigns a value each time when there is a new row larger or equal than previous one + 0.5; if there is no such row in-between, it continues assigning the last number (as we have inserted 1 as the beginning of vector, it will start at 1 and increase by +1 every time it'll encounter the row which satisfied the condition for non-exclusion); - There will be rows with only 1 row per previously created groups; in this case, we need to cross-check for difference with the exact previous row. In all other cases we cross-check for difference with the first row of the group (i.e. last row which should not be deleted according to criteria as it was larger than previous one + 0.5);
- After that we just remove those rows which don't satisfy the condition plus we keep the row which is equal to itself (will always be the first one); we remove the filtering variable at the end.
Output:
x t
1: 10.0 1970-01-28
2: 10.5 1970-02-02
3: 11.1 1970-02-03
4: 14.0 1970-02-04
5: 14.6 1970-02-08
6: 17.0 1970-02-09
7: 30.0 1970-02-11
Related Topics
How to Do Range Grouping on a Column Using Dplyr
Plot a Line Chart with Conditional Colors Depending on Values
Rolling Sum by Another Variable in R
Ggplot2, Axis Not Showing After Using Theme(Axis.Line=Element_Line())
How to Create a Grouped Boxplot in R
Subsetting a Dataframe for a Specified Month and Year
Use R Code or Windows User Variable ("%Userprofile%") in Yaml
Converting Factors to Binary in R
Calculate Group Mean While Excluding Current Observation Using Dplyr
Dt: Dynamically Change Column Values Based on Selectinput from Another Column in R Shiny App
Error in Loading Rgl Package with MAC Os X
How to Spread or Cast Multiple Values in R
How to Filter Rows Based on Difference in Dates Between Rows in R
Lm Function in R Does Not Give Coefficients for All Factor Levels in Categorical Data
How to Complete Missing Factor Levels in Data Frame
Subset Xts Object by Time of Day