How to Compare a Value in a Column to the Previous One Using R

How can I compare a value in a column to the previous one using R?

Firstly, create the new colum (dat is the name of your dataset):

Status <- ave(dat[ , "Stimulus"], c(0, cumsum(abs(diff(dat[ , "Stimulus"])))),
FUN = function(x)
if(!x[1]) "PRE" else c(rep("OK", min(2, length(x))),
rep("POST", length(x) - 2)))

Now, combine both objects:

cbind(dat, Status)

The result:

      ID    V1     V2 Stimulus Status
[1,] 1 74.80 803.0 0 PRE
[2,] 1 75.98 790.9 0 PRE
[3,] 1 75.95 791.1 0 PRE
[4,] 1 65.70 918.7 0 PRE
[5,] 1 59.63 1005.6 13 OK
[6,] 1 59.44 1012.0 13 OK
[7,] 1 59.62 1010.0 13 POST
[8,] 1 63.85 942.4 13 POST
[9,] 1 60.75 992.9 0 PRE
[10,] 1 59.62 1010.0 0 PRE
[11,] 1 61.68 974.0 0 PRE
[12,] 1 65.21 921.4 15 OK
[13,] 1 59.23 1012.0 15 OK
[14,] 1 61.23 979.5 15 POST
[15,] 1 70.80 849.2 0 PRE

R: Compare each value with set of previous values in column

We can use rollapply but need to adjust the window size based on expected output. If you want to check for previous 4 values, set the window size to be 5 and then check if the last value is higher than all the previous ones.

library(dplyr)
library(zoo)
k <- 5

df %>% mutate(Diff = rollapplyr(X, k, function(x) all(x[k] > x[-k]), fill = NA))

# X Diff
#1 1 NA
#2 2 NA
#3 3 NA
#4 4 NA
#5 5 TRUE
#6 4 FALSE
#7 3 FALSE
#8 2 FALSE
#9 1 FALSE
#10 2 FALSE
#11 3 FALSE
#12 4 TRUE

Create a column in R to compare values within a group and flag as greater than (1), less than (0) or equal (2)

df %>%
group_by(Round) %>%
mutate( Flag1 = replace(rank(Score) - 1, length(unique(Score)) == 1, 2))

Round Team Score Flag Flag1
<int> <chr> <int> <int> <dbl>
1 1 Team1 4 0 0
2 1 Team2 8 1 1
3 2 Team1 9 1 1
4 2 Team2 2 0 0
5 3 Team1 6 2 2
6 3 Team2 6 2 2
7 4 Team1 14 1 1
8 4 Team2 9 0 0

Comparing a variable with itself in the previous year in R

Use lag() to access the value one row up. As long as we group by Good and Week and sort by year, that should give the previous year's price:

df %>% 
group_by(GoodID, Week) %>%
arrange(Year) %>%
mutate(Price_Last_Year = lag(Price)) %>%
ungroup()

R Generating values based on comparison of previous columns

I'd use the apply function with which :)

Set up our vector of names

person_names= names(df[,1:5]) #Presumably the column names are the names

The 1:5 is just there in case you have other columns in your dataset you don't want considered for the minimum check.

Now we can use apply on a custom function which return a name from whichever column has the lowest value for each row.

df$Min <- apply(df[,1:5], 1, function(x){person_names[which.min(x)]})

Our custom function is as I described already, apply simply applies the function to each column or row of a data-frame or matrix. The second argument 1 indicates rows, if we wanted columns we could change that to a 2.

which.min just returns the element number of wherever the minimum is. person_names have our names in order, and which.min returns a number which indicates which name has the smallest value.

You could compress this all down into a one-line solution if you wanted to do away with the person_names variable.

df$Min <- apply(df[,1:5], 1, function(x){names(df[,1:5])[which.min(x)]})

If you only have the 5 name columns, drop the 1:5, if you have columns wherever, just replace that with a vector of your column names or numbers.

EDIT: I saw your comment on the other answer. To accommodate for ties, I'll change the custom function so that it checks for all matches with the minimum value of x, then pastes them together with some custom separator. I'll also modify your data so Donna and Racheal tie in the second row.

df <- read.table(text = 'Amy  Abe  Donna  Racheal  Mike     Min       u
5 34 54 56 23 Amy 0
43 11 3 3 21 Donna 1
54 32 21 54 1 Mike 1
21 5 43 32 21 Abe 1
32 21 23 5 32 Racheal 0', header = T)

person_names <- names(df[,1:5])

df$Min <- apply(df[,1:5], 1, function(x){paste(person_names[x == min(x)],
collapse = ", ")})

> df
Amy Abe Donna Racheal Mike Min u
1 5 34 54 56 23 Amy 0
2 43 11 3 3 21 Donna, Racheal 1
3 54 32 21 54 1 Mike 1
4 21 5 43 32 21 Abe 1
5 32 21 23 5 32 Racheal 0

I've set the collapse argument to ", ", which is the separator I've arbitrarily chosen. You could adjust this to just be a space " ", or a semi-colon, or whatever you wanted.

Again, that can be compressed to a one line answer by getting rid of the separate line for person_names.

Function that compares one column values against all other column values and returns matching one in R

Here's an approach where I match every row of df1 with every row of df2, then take x and y from z (as implied by your logic of comparing z-x to y; this is the same as comparing z-x-y to zero). Finally, I look at each row of df1 and keep the match with the lowest absolute difference.

library(dplyr)
left_join(
df1 %>% mutate(dummy = 1, row = row_number()),
df2 %>% mutate(dummy = 1, row = row_number()), by = "dummy") %>%
mutate(diff = z - x - y) %>%
group_by(row.x) %>%
slice_min(abs(diff)) %>%
ungroup()

Result (I used set.seed(42) before generating df1+df2.)

# A tibble: 10 x 9
n nn y z dummy row.x x row.y diff
<dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl>
1 0 1 1.37 1.30 1 1 0.0361 20 -0.102
2 1 1 -0.565 2.29 1 2 1.90 5 0.956
3 2 1 0.363 -1.39 1 3 -1.76 8 0.0112
4 3 1 0.633 -0.279 1 4 -0.851 18 -0.0607
5 4 1 0.404 -0.133 1 5 -0.609 14 0.0713
6 0 2 -0.106 0.636 1 6 0.705 12 0.0372
7 1 2 1.51 -0.284 1 7 -1.78 2 -0.0145
8 2 2 -0.0947 -2.66 1 8 -2.41 19 -0.148
9 3 2 2.02 -2.44 1 9 -2.41 19 -2.04
10 4 2 -0.0627 1.32 1 10 1.21 4 0.168

R - Comparing values in a column and creating a new column with the results of this comparison. Is there a better way than looping?

There are a couple things to consider in your example.

First, to avoid a loop, you can create a copy of the vector that is shifted by one position. (There are about 20 ways to do this.) Then when you test vector B vs C it will do element-by-element comparison of each position vs its neighbor.

Second, equality comparisons don't work with NA -- they always return NA. So NA == NA is not TRUE it is NA! Again, there are about 20 ways to get around this, but here I have just replaced all the NAs in the temporary vector with a placeholder that will work for the tests of equality.

Finally, you have to decide what you want to do with the last value (which doesn't have a neighbor). Here I have put 1, which is your assignment for "doesn't match its neighbor".

So, depending on the range of values possible in b, you could do

c = df$b 
z = length(c)
c[is.na(c)] = 'x' # replace NA with value that will allow equality test
df$mov = c(1 * !(c[1:z-1] == c[2:z]),1) # add 1 to the end for the last value

Using if to compare column values

If your data frame only has two observations (i.e., rows), what you want is fairly straightforward:

# Create some sample data...
df <- data.frame(
Country = c("Austria", "Boliva"),
Ratio = c(28.2, 7.8)
)

# Create a new variable in the data frame...
df$Rank <- ifelse(df$Ratio == max(df$Ratio), "Highest", "Lowest")

comparing a value with next value in a column

You don't necessarily need a loop for this since most of the R functions are vectorized. Here is a way to do this in base R, dplyr and data.table without using a loop.

#Base R 
transform(df, N = ifelse(Interval != c(tail(Interval, -1), NA), criteria/Count, NA))

#dplyr
library(dplyr)
df %>% mutate(N = if_else(Interval != lead(Interval), criteria/Count, NA_real_))

#data.table
library(data.table)
setDT(df)[, N:= fifelse(Interval != shift(Interval, type = 'lead'), criteria/Count, NA_real_)]

All of which return :

#   Interval Count criteria         N
#1 0 0 0 NA
#2 0 1 0 NA
#3 0 2 0 NA
#4 0 3 0 0.0000000
#5 1 4 1 NA
#6 1 5 2 NA
#7 1 6 3 NA
#8 1 7 4 0.5714286
#9 2 8 1 NA
#10 2 9 2 0.2222222
#11 3 10 3 NA

I return NA instead of blank value because if we return blank value the entire column becomes of type character and the numbers are no longer useful. In the answer you can replace NA with '' to get a blank value.

data

df <- structure(list(Interval = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 2L, 
2L, 3L), Count = 0:10, criteria = c(0L, 0L, 0L, 0L, 1L, 2L, 3L,
4L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, -11L))


Related Topics



Leave a reply



Submit