How can I compare a value in a column to the previous one using R?
Firstly, create the new colum (dat
is the name of your dataset):
Status <- ave(dat[ , "Stimulus"], c(0, cumsum(abs(diff(dat[ , "Stimulus"])))),
FUN = function(x)
if(!x[1]) "PRE" else c(rep("OK", min(2, length(x))),
rep("POST", length(x) - 2)))
Now, combine both objects:
cbind(dat, Status)
The result:
ID V1 V2 Stimulus Status
[1,] 1 74.80 803.0 0 PRE
[2,] 1 75.98 790.9 0 PRE
[3,] 1 75.95 791.1 0 PRE
[4,] 1 65.70 918.7 0 PRE
[5,] 1 59.63 1005.6 13 OK
[6,] 1 59.44 1012.0 13 OK
[7,] 1 59.62 1010.0 13 POST
[8,] 1 63.85 942.4 13 POST
[9,] 1 60.75 992.9 0 PRE
[10,] 1 59.62 1010.0 0 PRE
[11,] 1 61.68 974.0 0 PRE
[12,] 1 65.21 921.4 15 OK
[13,] 1 59.23 1012.0 15 OK
[14,] 1 61.23 979.5 15 POST
[15,] 1 70.80 849.2 0 PRE
R: Compare each value with set of previous values in column
We can use rollapply
but need to adjust the window size based on expected output. If you want to check for previous 4 values, set the window size to be 5 and then check if the last value is higher than all the previous ones.
library(dplyr)
library(zoo)
k <- 5
df %>% mutate(Diff = rollapplyr(X, k, function(x) all(x[k] > x[-k]), fill = NA))
# X Diff
#1 1 NA
#2 2 NA
#3 3 NA
#4 4 NA
#5 5 TRUE
#6 4 FALSE
#7 3 FALSE
#8 2 FALSE
#9 1 FALSE
#10 2 FALSE
#11 3 FALSE
#12 4 TRUE
Create a column in R to compare values within a group and flag as greater than (1), less than (0) or equal (2)
df %>%
group_by(Round) %>%
mutate( Flag1 = replace(rank(Score) - 1, length(unique(Score)) == 1, 2))
Round Team Score Flag Flag1
<int> <chr> <int> <int> <dbl>
1 1 Team1 4 0 0
2 1 Team2 8 1 1
3 2 Team1 9 1 1
4 2 Team2 2 0 0
5 3 Team1 6 2 2
6 3 Team2 6 2 2
7 4 Team1 14 1 1
8 4 Team2 9 0 0
Comparing a variable with itself in the previous year in R
Use lag()
to access the value one row up. As long as we group by Good and Week and sort by year, that should give the previous year's price:
df %>%
group_by(GoodID, Week) %>%
arrange(Year) %>%
mutate(Price_Last_Year = lag(Price)) %>%
ungroup()
R Generating values based on comparison of previous columns
I'd use the apply function with which :)
Set up our vector of names
person_names= names(df[,1:5]) #Presumably the column names are the names
The 1:5
is just there in case you have other columns in your dataset you don't want considered for the minimum check.
Now we can use apply on a custom function which return a name from whichever column has the lowest value for each row.
df$Min <- apply(df[,1:5], 1, function(x){person_names[which.min(x)]})
Our custom function is as I described already, apply simply applies the function to each column or row of a data-frame or matrix. The second argument 1
indicates rows, if we wanted columns we could change that to a 2
.
which.min
just returns the element number of wherever the minimum is. person_names
have our names in order, and which.min
returns a number which indicates which name has the smallest value.
You could compress this all down into a one-line solution if you wanted to do away with the person_names
variable.
df$Min <- apply(df[,1:5], 1, function(x){names(df[,1:5])[which.min(x)]})
If you only have the 5 name columns, drop the 1:5
, if you have columns wherever, just replace that with a vector of your column names or numbers.
EDIT: I saw your comment on the other answer. To accommodate for ties, I'll change the custom function so that it checks for all matches with the minimum value of x, then pastes them together with some custom separator. I'll also modify your data so Donna and Racheal tie in the second row.
df <- read.table(text = 'Amy Abe Donna Racheal Mike Min u
5 34 54 56 23 Amy 0
43 11 3 3 21 Donna 1
54 32 21 54 1 Mike 1
21 5 43 32 21 Abe 1
32 21 23 5 32 Racheal 0', header = T)
person_names <- names(df[,1:5])
df$Min <- apply(df[,1:5], 1, function(x){paste(person_names[x == min(x)],
collapse = ", ")})
> df
Amy Abe Donna Racheal Mike Min u
1 5 34 54 56 23 Amy 0
2 43 11 3 3 21 Donna, Racheal 1
3 54 32 21 54 1 Mike 1
4 21 5 43 32 21 Abe 1
5 32 21 23 5 32 Racheal 0
I've set the collapse
argument to ", ", which is the separator I've arbitrarily chosen. You could adjust this to just be a space " ", or a semi-colon, or whatever you wanted.
Again, that can be compressed to a one line answer by getting rid of the separate line for person_names
.
Function that compares one column values against all other column values and returns matching one in R
Here's an approach where I match every row of df1
with every row of df2
, then take x and y from z (as implied by your logic of comparing z-x to y; this is the same as comparing z-x-y to zero). Finally, I look at each row of df1 and keep the match with the lowest absolute difference.
library(dplyr)
left_join(
df1 %>% mutate(dummy = 1, row = row_number()),
df2 %>% mutate(dummy = 1, row = row_number()), by = "dummy") %>%
mutate(diff = z - x - y) %>%
group_by(row.x) %>%
slice_min(abs(diff)) %>%
ungroup()
Result (I used set.seed(42)
before generating df1
+df2
.)
# A tibble: 10 x 9
n nn y z dummy row.x x row.y diff
<dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int> <dbl>
1 0 1 1.37 1.30 1 1 0.0361 20 -0.102
2 1 1 -0.565 2.29 1 2 1.90 5 0.956
3 2 1 0.363 -1.39 1 3 -1.76 8 0.0112
4 3 1 0.633 -0.279 1 4 -0.851 18 -0.0607
5 4 1 0.404 -0.133 1 5 -0.609 14 0.0713
6 0 2 -0.106 0.636 1 6 0.705 12 0.0372
7 1 2 1.51 -0.284 1 7 -1.78 2 -0.0145
8 2 2 -0.0947 -2.66 1 8 -2.41 19 -0.148
9 3 2 2.02 -2.44 1 9 -2.41 19 -2.04
10 4 2 -0.0627 1.32 1 10 1.21 4 0.168
R - Comparing values in a column and creating a new column with the results of this comparison. Is there a better way than looping?
There are a couple things to consider in your example.
First, to avoid a loop, you can create a copy of the vector that is shifted by one position. (There are about 20 ways to do this.) Then when you test vector B
vs C
it will do element-by-element comparison of each position vs its neighbor.
Second, equality comparisons don't work with NA -- they always return NA. So NA == NA
is not TRUE
it is NA
! Again, there are about 20 ways to get around this, but here I have just replaced all the NA
s in the temporary vector with a placeholder that will work for the tests of equality.
Finally, you have to decide what you want to do with the last value (which doesn't have a neighbor). Here I have put 1
, which is your assignment for "doesn't match its neighbor".
So, depending on the range of values possible in b
, you could do
c = df$b
z = length(c)
c[is.na(c)] = 'x' # replace NA with value that will allow equality test
df$mov = c(1 * !(c[1:z-1] == c[2:z]),1) # add 1 to the end for the last value
Using if to compare column values
If your data frame only has two observations (i.e., rows), what you want is fairly straightforward:
# Create some sample data...
df <- data.frame(
Country = c("Austria", "Boliva"),
Ratio = c(28.2, 7.8)
)
# Create a new variable in the data frame...
df$Rank <- ifelse(df$Ratio == max(df$Ratio), "Highest", "Lowest")
comparing a value with next value in a column
You don't necessarily need a loop for this since most of the R functions are vectorized. Here is a way to do this in base R, dplyr
and data.table
without using a loop.
#Base R
transform(df, N = ifelse(Interval != c(tail(Interval, -1), NA), criteria/Count, NA))
#dplyr
library(dplyr)
df %>% mutate(N = if_else(Interval != lead(Interval), criteria/Count, NA_real_))
#data.table
library(data.table)
setDT(df)[, N:= fifelse(Interval != shift(Interval, type = 'lead'), criteria/Count, NA_real_)]
All of which return :
# Interval Count criteria N
#1 0 0 0 NA
#2 0 1 0 NA
#3 0 2 0 NA
#4 0 3 0 0.0000000
#5 1 4 1 NA
#6 1 5 2 NA
#7 1 6 3 NA
#8 1 7 4 0.5714286
#9 2 8 1 NA
#10 2 9 2 0.2222222
#11 3 10 3 NA
I return NA
instead of blank value because if we return blank value the entire column becomes of type character and the numbers are no longer useful. In the answer you can replace NA
with ''
to get a blank value.
data
df <- structure(list(Interval = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 2L,
2L, 3L), Count = 0:10, criteria = c(0L, 0L, 0L, 0L, 1L, 2L, 3L,
4L, 1L, 2L, 3L)), class = "data.frame", row.names = c(NA, -11L))
Related Topics
R Xml - Combining Parent and Child Nodes(W Same Name) into Data Frame
R: Replace All Values in a Dataframe Lower Than a Threshold with Na
Warning: Non-Integer #Successes in a Binomial Glm! (Survey Packages)
Unnesting a List of Lists in a Data Frame Column
Obtain Latitude and Longitude from Address Without the Use of Google API
Search Within a String That Does Not Contain a Pattern
How Does Gganimate Order an Ordered Bar Time-Series
Parse String with Additional Characters in Format to Date
R "For Loop" Error Messages {}
R Web Application Introduction
Reading in Chunks at a Time Using Fread in Package Data.Table
What's the Difference Between Hex Code (\X) and Unicode (\U) Chars