Remove Duplicate Rows from Xts Object

Remove duplicate rows from xts object

Should't it be index(mt.xts) rather than mt.xts$Index?
The following seems to work.

# Sample data
library(xts)
x <- xts(
1:10,
rep( seq.Date( Sys.Date(), by="day", length=5 ), each=2 )
)

# Remove rows with a duplicated timestamp
y <- x[ ! duplicated( index(x) ), ]

# Remove rows with a duplicated timestamp, but keep the latest one
z <- x[ ! duplicated( index(x), fromLast = TRUE ), ]

Method to rbind xts objects that removes duplicate rows

I don't believe there is an xts method for this, but we can still make it work, in at least a couple of ways.

If you look at ?rbind.xts you'll see this:

Identical indexed series are bound in the order or the arguments passed to rbind.

We can use that to our advantage.

First some example data

library(xts)

structure(c(5, 4, 2, 2, 4, 3, 3, 5), class = c("xts", "zoo"), .indexCLASS
= "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index =
structure(c(949449600, 949536000, 949708800, 949795200, 949881600,
949968000, 950054400, 950227200), tzone = "UTC", tclass = "Date"), .Dim =
c(8L, 1L)) -> d1

structure(c(3, 3, 3, 4, 2, 3, 3, 5), class = c("xts", "zoo"), .indexCLASS
= "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index =
structure(c(948931200, 949104000, 949190400, 949449600, 949536000,
949622400, 949708800, 950054400), tzone = "UTC", tclass = "Date"), .Dim =
c(8L, 1L)) -> d2

If we then do an rbind() we'll get the duplicate values in the order we supplied d1 and d2. We can then use duplicated() to find the duplicates, and negate (!) that index to deselect them.

dat.bind <- rbind(d1, d2)

dat.bind.d1 <- dat.bind[!duplicated(time(dat.bind))]

To select the other set of duplicated values we can either switch the the order of arguments in rbind(), or we can shift the boolean vector we created with duplicated() one to the left, and thereby deselect the first, rather than the second, of two identical values.

dat.bind.d2 <- dat.bind[c(!duplicated(time(dat.bind))[-1], TRUE)]

There is one caveat with this approach, and that is that d1 and d2 must not individually have duplicate indices. If we use merge() instead we don't have this limitation.

We do an outer join (maning all values are included, NAs filled in as necessary). Then we can simply replace the NAs in one column with values at the same index in the other column.

dat.merged <- merge(d1, d2, join="outer")

dat.merged.d1 <- replace(dat.merged[, 1],
is.na(dat.merged[, 1]),
dat.merged[is.na(dat.merged[, 1]), 2])

dat.merged.d2 <- replace(dat.merged[, 2],
is.na(dat.merged[, 2]),
dat.merged[is.na(dat.merged[, 2]), 1])

How to remove a row from zoo/xts object, given a timestamp

If z1 and z2 are your zoo objects then to rbind while removing any duplicates in z2:

rbind( z1, z2[ ! time(z2) %in% time(z1) ] )

Regarding deleting points in a zoo object having specified times, the above already illustrates this but in general if tt is a vector of times to delete:

z[ ! time(z) %in% tt ]

or if we knew there were a single element in tt then z[ time(z) != tt ] .

how can we remove the rows from xts based on the seconds criteria

You can first truncate the time and then remove duplicates. Since the 30 second elements are the non-unique elements, they get removed:

library(xts)
xts3 <- xts(x=rnorm(10), order.by=as.POSIXct(strptime("2021-11-04 05:57:00", "%Y-%m-%d %H:%M:%S")+1:10*30), born=as.POSIXct("1899-05-08"))

# Round observations in z to the next hour
index(xts3) <- as.POSIXct(trunc(index(xts3), units="mins"))

# Remove duplicate times in z
xts3_dup <- make.index.unique(xts3, drop = TRUE)

xts
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:58:00 -2.56173420
2021-11-04 05:59:00 0.64355622
2021-11-04 05:59:00 -0.18794658
2021-11-04 06:00:00 0.03005718
2021-11-04 06:00:00 0.64367384
2021-11-04 06:01:00 0.74716446
2021-11-04 06:01:00 -0.29986731
2021-11-04 06:02:00 -0.57503711

> xts3_dup
[,1]
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:59:00 0.64355622
2021-11-04 06:00:00 0.03005718
2021-11-04 06:01:00 0.74716446
2021-11-04 06:02:00 -0.57503711

xts - Delete rows based on certain criterias

If you only have 0, 2, and 3 as values you can use diff to get most of the rules in 1 go. Only those records are needed where the difference is 1 (2 above 3) or -1 (3 above 2). So the absolute value of diff will be what we need. And we need the first row where the value is 2. Those we combine to get the result xts3_filtered.
xts3_filtered <- c(xts3[first(which(xts3$code == 2))], xts3[abs(diff(xts3$code)) == 1])

                    code
2013-07-24 09:02:00 2
2013-07-24 09:02:00 2
2013-07-24 09:07:00 3

Now we have a duplicate row because both rules select the record where the first 2 occurs. So we remove any duplicates with the following code

xts3_filtered[!duplicated(index(xts3_filtered))]
code
2013-07-24 09:02:00 2
2013-07-24 09:07:00 3

xts - Delete specific rows without transform to other format

The following lines result in equal output.

xts3[-2, ]

xts3[index(xts3) != index(xts3[xts3$column.one == 2])]

column.one
2013-07-24 09:01:00 1
2013-07-24 09:03:00 3

But for xts / zoo timeseries it is better and safer to work with the indexes as this leads to a finer control of what you want / can achieve with them.

Removing rows according to duplicated index

The result of duplicated() is a logical vector. So to negate it you have to use a logical negation, with not(), ie :

ciao <- afspline[not(duplicated(index(afspline))),]

You can also use ! as a shortcut :

ciao <- afspline[!(duplicated(index(afspline))),]

rbind time series and drop identical dates

Subset the y's to omit those containing an index in x

z <- rbind(x,y[!(index(y) %in% index(x))])

How to subset xts object based upon [is not] condition

I'm not sure why you would expect my.object[!"2015/2015-03-01"] to work. Applying a logical function to a character string doesn't make sense.

Regardless, one way to accomplish what you want is to use the which.i argument to [.xts to find the integer indices. Then you can remove those observations from your xts object by using a negative i in another call to [.xts.

R> require(xts)
R> data(sample_matrix)
R> x <- as.xts(sample_matrix)
R> unwantedObs <- x["2007-01-04/2007-06-28", which.i=TRUE]
R> x[-unwantedObs,]
Open High Low Close
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-06-29 47.63629 47.77563 47.61733 47.66471
2007-06-30 47.67468 47.94127 47.67468 47.76719
R> # in one line:
R> #x[-x["2007-01-04/2007-06-28", which.i=TRUE],]


Related Topics



Leave a reply



Submit