Identifying Where Value Changes in R Data.Frame Column

Identifying where value changes in R data.frame column

A simple solution is to use the lag function in dplyr:

which(df$value != dplyr::lag(df$value))

How to determine when a change in value occurs in R

Like this:

df$ind[c(FALSE, diff(as.numeric(df$value)) == 100)]

Determine when columns of a data.frame change value and return indices of the change

In data.table version 1.8.10 (stable version in CRAN), there's a(n) (unexported) function called duplist that does exactly this. And it's also written in C and is therefore terribly fast.

require(data.table) # 1.8.10
data.table:::duplist(x[, 3:5])
# [1] 1 4 5

If you're using the development version of data.table (1.8.11), then there's a more efficient version (in terms of memory) renamed as uniqlist, that does exactly the same job. Probably this should be exported for next release. Seems to have come up on SO more than once. Let's see.

require(data.table) # 1.8.11
data.table:::uniqlist(x[, 3:5])
# [1] 1 4 5

How to identify value change in one column in R?

Perhaps you could look at the transition of drug_type from "A" to "B", or include where the number of distinct drug_brand is greater than 1?

library(tidyverse)

df %>%
group_by(id) %>%
filter(any(drug_type == "B" & lag(drug_type) == "A") |
n_distinct(drug_brand) > 1)

find the place where the variable in a dataframe changes its value

Here is a possible way. Use diff to get the values where column b changes but be carefull, the first value of b, by definition of change, hasn't changed. (The problem is that diff returns a vector with one less element.)

inx <- c(FALSE, diff(data$b) != 0)
data[inx, ]
# a b
#4 4 1

After seeing the OP's comment to another post, the following code shows that this method can also solve the issue when b starts with any value,not just zero.

data2 <- data.frame(a=c(1,2,3,4,5,6),b=c(1,1,1,0,0,0))
inx <- c(FALSE, diff(data2$b) != 0)
data2[inx, ]
# a b
#4 4 0

How to create column that identifies another column where the row values change?

I suppose you could create a difference matrix for the first 4 columns (using your data frame df):

df_diff <- rbind(0, diff(as.matrix(df[1:4])))

Which would give you:

     A  B C D
[1,] 0 0 0 0
[2,] 0 0 0 1
[3,] 0 -6 0 1
[4,] 4 0 0 0

Then, using sapply with an index for your data frame and the different matrix, you could do the following:

df$F <- sapply(seq_len(nrow(df)), function(i) ifelse(df[i, 5] < 0, 
names(which(df_diff[i, ] != 0))[1],
NA_character_))

This will check for negative values in column 5, and for those negative select the first column name found with a difference identified in the different matrix (different of not zero). Otherwise, will put in NA. A new column F will contain this result.

Output

  A B C D  E    F
1 0 6 0 0 0 <NA>
2 0 6 0 1 -5 D
3 0 0 0 2 4 <NA>
4 4 0 0 2 -1 A

Data

df <- structure(list(A = c(0, 0, 0, 4), B = c(6, 6, 0, 0), C = c(0, 
0, 0, 0), D = c(0, 1, 2, 2), E = c(0, -5, 4, -1), F = c(NA, "D",
NA, "A")), row.names = c(NA, -4L), class = "data.frame")

Finding the column index where a row changes values in R dataframe/Datatable

What about this:

x <- sapply(1:NCOL(df), function(x) rle(df[x,])$values)

Output of x:

[[1]]
Col2 Col3
1 2 9

[[2]]
Col1 Col2 Col3
2 2 7 6

[[3]]
Col1 Col2 Col3
3 1 5 4

Then if you'd like the full range of before/after values, you could use:

lapply(x,function(i) paste0(i,collapse="->"))

[[1]]
[1] "2->9"

[[2]]
[1] "2->7->6"

[[3]]
[1] "1->5->4"

Identifying where Remark column changes in R data.frame based on time stamp

Try this

d <- data.frame(ID, Remarks, Date, stringsAsFactors = F)
d %>% filter(Remarks != lag(Remarks, default = ''))

Output:

  ID     Remarks                Date
1 1 joined 2020/08/01 06:31:38
2 1 newrole 2020/08/01 13:17:07
3 1 transferred 2020/08/01 13:29:01
4 2 joined 2020/08/03 06:31:38
5 2 newrole 2020/08/04 06:31:38
6 2 transferred 2020/08/04 13:17:07
7 3 joined 2020/08/07 13:17:07
8 3 newrole 2020/08/07 13:29:01
9 3 transferred 2020/08/10 13:29:01


Related Topics



Leave a reply



Submit