Remove last N rows in data frame with the arbitrary number of rows
head
with a negative index is convenient for this...
df <- data.frame( a = 1:10 )
head(df,-5)
# a
#1 1
#2 2
#3 3
#4 4
#5 5
p.s. your seq()
example may be written slightly less(?) awkwardly using the named arguments by
and length.out
(shortened to len
) like this -seq(nrow(df),by=-1,len=5)
.
How to remove first and last N rows in a dataframe using R?
You can perform this specifying the range of rows you want to leave in the final dataset:
df_adj <- df[1001:( nrow(df) - 1000 ),]
Just make sure you have enough rows to perform this. A safer approach might be:
df_adj <- if( nrow(df) > 2000 ) df[1001:( nrow(df) - 1000 ),] else df
remove last n elements from a row in a dataframe
You may use
head_col_last_1 <- str_extract(head_col$V1, "\\S+(?:\\s+\\S+){1,2}(?=\\s*$)")
The pattern matches:
\\S+
- 1+ non-whitespace chars(?:\\s+\\S+){1,2}
- one or two occurrences of\\s+
- 1+ whitespace chars\\S+
- 1+ non-whitespace chars
(?=\\s*$)
- that are followed with 0+ whitespaces and the end of string.
Deleting a subset of rows based on other variables
A potential base R solution would be:
d <- data.frame(station = rep(paste("station", 1:3), c(250, 1000, 150)),
depth = rnorm(250 + 1000 + 150, 100, 10))
d$grp_counter <- do.call("c", lapply(tapply(d$depth, d$station, length), seq_len))
d$grp_length <- rep(tapply(d$depth, d$station, length), tapply(d$depth, d$station, length))
d <- d[d$grp_counter <= (d$grp_length - 50),]
d
# OR w/o auxiliary vars: subset(d, select = -c(grp_counter, grp_length))
Simultaneously remove the first and last rows of a data frame until reaching a row that does not have an NA
base R
r <- rle(complete.cases(df))
str(r, vec.len = 9)
# List of 2
# $ lengths: int [1:9] 2 1 1 1 1 3 1 1 4
# $ values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
# - attr(*, "class")= chr "rle"
r$values[ -c(1, length(r$values)) ] <- TRUE
str(r, vec.len = 9)
# List of 2
# $ lengths: int [1:9] 2 1 1 1 1 3 1 1 4
# $ values : logi [1:9] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
# - attr(*, "class")= chr "rle"
df[inverse.rle(r),]
# var1 var2 var3 var4
# 3 3 3 8 8
# 4 4 NA 9 9
# 5 5 2 10 10
# 6 6 NA 11 11
# 7 7 3 12 12
# 8 8 4 13 13
# 9 9 2 14 14
# 10 10 NA 15 15
# 11 11 4 16 16
dplyr
For your question of efficiency, you can adapt the rle
solution to dplyr as well (that should be trivial), but I see no reason why the use of complete.cases
and cumany
/rev
would be a problem. You can improve on your attempt by not calculating complete.cases(.)
twice as you're doing, storing it in an interim column.
library(dplyr)
df %>%
mutate(aux = complete.cases(cur_data())) %>%
filter(cumany(aux) & rev(cumany(rev(aux))))
# var1 var2 var3 var4 aux
# 1 3 3 8 8 TRUE
# 2 4 NA 9 9 FALSE
# 3 5 2 10 10 TRUE
# 4 6 NA 11 11 FALSE
# 5 7 3 12 12 TRUE
# 6 8 4 13 13 TRUE
# 7 9 2 14 14 TRUE
# 8 10 NA 15 15 FALSE
# 9 11 4 16 16 TRUE
data.table
(Just an adaptation of the dplyr version.)
library(data.table)
setDT(df)
df[, aux := complete.cases(.SD)
][ cumsum(aux) > 0 & rev(cumsum(rev(aux)) > 0), ]
# var1 var2 var3 var4 aux
# <int> <num> <int> <int> <lgcl>
# 1: 3 3 8 8 TRUE
# 2: 4 NA 9 9 FALSE
# 3: 5 2 10 10 TRUE
# 4: 6 NA 11 11 FALSE
# 5: 7 3 12 12 TRUE
# 6: 8 4 13 13 TRUE
# 7: 9 2 14 14 TRUE
# 8: 10 NA 15 15 FALSE
# 9: 11 4 16 16 TRUE
Drop last n rows within pandas dataframe groupby
You can use groupby
and drop
as below:
n = 2
df.drop(df.groupby(['a','b']).tail(n).index, axis=0)
Remove last row of data frame until reaching a row that does not have an NA
We could use na.trim
from zoo
package:
library(zoo)
library(dplyr)
df %>%
slice(1:nrow(na.trim(df, "right", is.na = "any")))
var1 var2 var3 var4
1 1 3 6 NA
2 2 6 7 7
3 3 3 8 8
4 4 NA 9 9
5 5 2 10 10
6 6 NA 11 11
7 7 3 12 12
8 8 4 13 13
9 9 2 14 14
10 10 NA 15 15
11 11 4 16 16
How to delete the last two rows of a df with pandas
Beter is select all rows without last 2 by iloc
:
df = df.iloc[:-2]
print (df)
name year reports
Cochice Jason 2012 4
Pima Molly 2012 24
Santa Cruz Tina 2013 31
Remove rows in a group by until the last row meets some condition
It seems you could use drop_duplicates
with different rule depending on type
:
out = pd.concat([df.query("type=='A'").drop_duplicates(subset=['id','type'], keep='first'),
df.query("type=='B'").drop_duplicates(subset=['id','type'], keep='last')]).sort_index()
Output:
id type
0 1 A
1 1 B
3 2 B
4 2 A
5 3 A
8 3 B
Related Topics
Using Ggplot2, How to Insert a Break in the Axis
Saving Output of Confusionmatrix as a .Csv Table
Order Bars in Ggplot2 Bar Graph
Overlap Join With Start and End Positions
Select/Assign to Data.Table When Variable Names Are Stored in a Character Vector
Dictionary Style Replace Multiple Items
Adding a Column of Means by Group to Original Data
R Reshape Data Frame from Long to Wide Format
How to Add a Suffix (Or Prefix) Elements of an Existing List
Add X and Y Axis to All Facet_Wrap
How to Combine Multiple Variable Data to a Single Variable Data
Transpose/Reshape Dataframe Without "Timevar" from Long to Wide Format
Does Ifelse Really Calculate Both of Its Vectors Every Time? Is It Slow