Remove duplicate rows from xts object
Should't it be index(mt.xts)
rather than mt.xts$Index
?
The following seems to work.
# Sample data
library(xts)
x <- xts(
1:10,
rep( seq.Date( Sys.Date(), by="day", length=5 ), each=2 )
)
# Remove rows with a duplicated timestamp
y <- x[ ! duplicated( index(x) ), ]
# Remove rows with a duplicated timestamp, but keep the latest one
z <- x[ ! duplicated( index(x), fromLast = TRUE ), ]
Method to rbind xts objects that removes duplicate rows
I don't believe there is an xts
method for this, but we can still make it work, in at least a couple of ways.
If you look at ?rbind.xts
you'll see this:
Identical indexed series are bound in the order or the arguments passed to rbind.
We can use that to our advantage.
First some example data
library(xts)
structure(c(5, 4, 2, 2, 4, 3, 3, 5), class = c("xts", "zoo"), .indexCLASS
= "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index =
structure(c(949449600, 949536000, 949708800, 949795200, 949881600,
949968000, 950054400, 950227200), tzone = "UTC", tclass = "Date"), .Dim =
c(8L, 1L)) -> d1
structure(c(3, 3, 3, 4, 2, 3, 3, 5), class = c("xts", "zoo"), .indexCLASS
= "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index =
structure(c(948931200, 949104000, 949190400, 949449600, 949536000,
949622400, 949708800, 950054400), tzone = "UTC", tclass = "Date"), .Dim =
c(8L, 1L)) -> d2
If we then do an rbind()
we'll get the duplicate values in the order we supplied d1
and d2
. We can then use duplicated()
to find the duplicates, and negate (!
) that index to deselect them.
dat.bind <- rbind(d1, d2)
dat.bind.d1 <- dat.bind[!duplicated(time(dat.bind))]
To select the other set of duplicated values we can either switch the the order of arguments in rbind()
, or we can shift the boolean vector we created with duplicated()
one to the left, and thereby deselect the first, rather than the second, of two identical values.
dat.bind.d2 <- dat.bind[c(!duplicated(time(dat.bind))[-1], TRUE)]
There is one caveat with this approach, and that is that d1
and d2
must not individually have duplicate indices. If we use merge()
instead we don't have this limitation.
We do an outer join (maning all values are included, NA
s filled in as necessary). Then we can simply replace the NA
s in one column with values at the same index in the other column.
dat.merged <- merge(d1, d2, join="outer")
dat.merged.d1 <- replace(dat.merged[, 1],
is.na(dat.merged[, 1]),
dat.merged[is.na(dat.merged[, 1]), 2])
dat.merged.d2 <- replace(dat.merged[, 2],
is.na(dat.merged[, 2]),
dat.merged[is.na(dat.merged[, 2]), 1])
How to remove a row from zoo/xts object, given a timestamp
If z1
and z2
are your zoo objects then to rbind
while removing any duplicates in z2
:
rbind( z1, z2[ ! time(z2) %in% time(z1) ] )
Regarding deleting points in a zoo object having specified times, the above already illustrates this but in general if tt
is a vector of times to delete:
z[ ! time(z) %in% tt ]
or if we knew there were a single element in tt
then z[ time(z) != tt ]
.
how can we remove the rows from xts based on the seconds criteria
You can first truncate the time and then remove duplicates. Since the 30 second elements are the non-unique elements, they get removed:
library(xts)
xts3 <- xts(x=rnorm(10), order.by=as.POSIXct(strptime("2021-11-04 05:57:00", "%Y-%m-%d %H:%M:%S")+1:10*30), born=as.POSIXct("1899-05-08"))
# Round observations in z to the next hour
index(xts3) <- as.POSIXct(trunc(index(xts3), units="mins"))
# Remove duplicate times in z
xts3_dup <- make.index.unique(xts3, drop = TRUE)
xts
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:58:00 -2.56173420
2021-11-04 05:59:00 0.64355622
2021-11-04 05:59:00 -0.18794658
2021-11-04 06:00:00 0.03005718
2021-11-04 06:00:00 0.64367384
2021-11-04 06:01:00 0.74716446
2021-11-04 06:01:00 -0.29986731
2021-11-04 06:02:00 -0.57503711
> xts3_dup
[,1]
2021-11-04 05:57:00 -0.19766541
2021-11-04 05:58:00 -0.00902353
2021-11-04 05:59:00 0.64355622
2021-11-04 06:00:00 0.03005718
2021-11-04 06:01:00 0.74716446
2021-11-04 06:02:00 -0.57503711
xts - Delete rows based on certain criterias
If you only have 0, 2, and 3 as values you can use diff
to get most of the rules in 1 go. Only those records are needed where the difference is 1 (2 above 3) or -1 (3 above 2). So the absolute value of diff
will be what we need. And we need the first row where the value is 2. Those we combine to get the result xts3_filtered.
xts3_filtered <- c(xts3[first(which(xts3$code == 2))], xts3[abs(diff(xts3$code)) == 1])
code
2013-07-24 09:02:00 2
2013-07-24 09:02:00 2
2013-07-24 09:07:00 3
Now we have a duplicate row because both rules select the record where the first 2 occurs. So we remove any duplicates with the following code
xts3_filtered[!duplicated(index(xts3_filtered))]
code
2013-07-24 09:02:00 2
2013-07-24 09:07:00 3
xts - Delete specific rows without transform to other format
The following lines result in equal output.
xts3[-2, ]
xts3[index(xts3) != index(xts3[xts3$column.one == 2])]
column.one
2013-07-24 09:01:00 1
2013-07-24 09:03:00 3
But for xts / zoo timeseries it is better and safer to work with the indexes as this leads to a finer control of what you want / can achieve with them.
Removing rows according to duplicated index
The result of duplicated()
is a logical vector. So to negate it you have to use a logical negation, with not()
, ie :
ciao <- afspline[not(duplicated(index(afspline))),]
You can also use !
as a shortcut :
ciao <- afspline[!(duplicated(index(afspline))),]
rbind time series and drop identical dates
Subset the y's to omit those containing an index in x
z <- rbind(x,y[!(index(y) %in% index(x))])
How to subset xts object based upon [is not] condition
I'm not sure why you would expect my.object[!"2015/2015-03-01"]
to work. Applying a logical function to a character string doesn't make sense.
Regardless, one way to accomplish what you want is to use the which.i
argument to [.xts
to find the integer indices. Then you can remove those observations from your xts object by using a negative i
in another call to [.xts
.
R> require(xts)
R> data(sample_matrix)
R> x <- as.xts(sample_matrix)
R> unwantedObs <- x["2007-01-04/2007-06-28", which.i=TRUE]
R> x[-unwantedObs,]
Open High Low Close
2007-01-02 50.03978 50.11778 49.95041 50.11778
2007-01-03 50.23050 50.42188 50.23050 50.39767
2007-06-29 47.63629 47.77563 47.61733 47.66471
2007-06-30 47.67468 47.94127 47.67468 47.76719
R> # in one line:
R> #x[-x["2007-01-04/2007-06-28", which.i=TRUE],]
Related Topics
Error in Get(As.Character(Fun), Mode = "Function", Envir = Envir)
Ggplot2': Label Values of Barplot That Uses 'Fun.Y="Mean"' of 'Stat_Summary'
Rcurl: Http Authentication When Site Responds with Http 401 Code Without Www-Authenticate
How to Add Abline with Lattice Xyplot Function
Page Refresh Button in R Shiny
Problems with Dplyr and Posixlt Data
Installing "Rgl" Package in R, MAC Osx El Captian
R Data.Table Conditional Aggregation
In R, How to Suppress "Note: No Visible Binding for Global Variable"
Extract Name of Data.Frame in R as Character
R, Sweave, Latex - Escape Variables to Be Printed in Latex
Combine Multiple .Rdata Files Containing Objects with the Same Name into One Single .Rdata File
R: Saving Ggplot2 Plots in a List
Using Rollmean When There Are Missing Values (Na)
Filled Contour Plot with R/Ggplot/Ggmap
Error: Maximal Number of Dlls Reached
Error When Plotting Sf Object --- Error: Could Not Find Function "Geom_Sf"