Remove Last N Rows in Data Frame With the Arbitrary Number of Rows

Remove last N rows in data frame with the arbitrary number of rows

head with a negative index is convenient for this...

df <- data.frame( a = 1:10 )
head(df,-5)
# a
#1 1
#2 2
#3 3
#4 4
#5 5

p.s. your seq() example may be written slightly less(?) awkwardly using the named arguments by and length.out (shortened to len) like this -seq(nrow(df),by=-1,len=5).

How to remove first and last N rows in a dataframe using R?

You can perform this specifying the range of rows you want to leave in the final dataset:

 df_adj <- df[1001:( nrow(df) - 1000 ),]

Just make sure you have enough rows to perform this. A safer approach might be:

df_adj <- if( nrow(df) > 2000 ) df[1001:( nrow(df) - 1000 ),] else df

remove last n elements from a row in a dataframe

You may use

head_col_last_1 <- str_extract(head_col$V1, "\\S+(?:\\s+\\S+){1,2}(?=\\s*$)")

The pattern matches:

  • \\S+ - 1+ non-whitespace chars
  • (?:\\s+\\S+){1,2} - one or two occurrences of

    • \\s+ - 1+ whitespace chars
    • \\S+ - 1+ non-whitespace chars
  • (?=\\s*$) - that are followed with 0+ whitespaces and the end of string.

Deleting a subset of rows based on other variables

A potential base R solution would be:

d <- data.frame(station = rep(paste("station", 1:3), c(250, 1000, 150)),
depth = rnorm(250 + 1000 + 150, 100, 10))

d$grp_counter <- do.call("c", lapply(tapply(d$depth, d$station, length), seq_len))
d$grp_length <- rep(tapply(d$depth, d$station, length), tapply(d$depth, d$station, length))
d <- d[d$grp_counter <= (d$grp_length - 50),]
d

# OR w/o auxiliary vars: subset(d, select = -c(grp_counter, grp_length))

Simultaneously remove the first and last rows of a data frame until reaching a row that does not have an NA

base R

r <- rle(complete.cases(df))
str(r, vec.len = 9)
# List of 2
# $ lengths: int [1:9] 2 1 1 1 1 3 1 1 4
# $ values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
# - attr(*, "class")= chr "rle"
r$values[ -c(1, length(r$values)) ] <- TRUE
str(r, vec.len = 9)
# List of 2
# $ lengths: int [1:9] 2 1 1 1 1 3 1 1 4
# $ values : logi [1:9] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
# - attr(*, "class")= chr "rle"
df[inverse.rle(r),]
# var1 var2 var3 var4
# 3 3 3 8 8
# 4 4 NA 9 9
# 5 5 2 10 10
# 6 6 NA 11 11
# 7 7 3 12 12
# 8 8 4 13 13
# 9 9 2 14 14
# 10 10 NA 15 15
# 11 11 4 16 16

dplyr

For your question of efficiency, you can adapt the rle solution to dplyr as well (that should be trivial), but I see no reason why the use of complete.cases and cumany/rev would be a problem. You can improve on your attempt by not calculating complete.cases(.) twice as you're doing, storing it in an interim column.

library(dplyr)
df %>%
mutate(aux = complete.cases(cur_data())) %>%
filter(cumany(aux) & rev(cumany(rev(aux))))
# var1 var2 var3 var4 aux
# 1 3 3 8 8 TRUE
# 2 4 NA 9 9 FALSE
# 3 5 2 10 10 TRUE
# 4 6 NA 11 11 FALSE
# 5 7 3 12 12 TRUE
# 6 8 4 13 13 TRUE
# 7 9 2 14 14 TRUE
# 8 10 NA 15 15 FALSE
# 9 11 4 16 16 TRUE

data.table

(Just an adaptation of the dplyr version.)

library(data.table)
setDT(df)
df[, aux := complete.cases(.SD)
][ cumsum(aux) > 0 & rev(cumsum(rev(aux)) > 0), ]
# var1 var2 var3 var4 aux
# <int> <num> <int> <int> <lgcl>
# 1: 3 3 8 8 TRUE
# 2: 4 NA 9 9 FALSE
# 3: 5 2 10 10 TRUE
# 4: 6 NA 11 11 FALSE
# 5: 7 3 12 12 TRUE
# 6: 8 4 13 13 TRUE
# 7: 9 2 14 14 TRUE
# 8: 10 NA 15 15 FALSE
# 9: 11 4 16 16 TRUE

Drop last n rows within pandas dataframe groupby

You can use groupby and drop as below:

n = 2
df.drop(df.groupby(['a','b']).tail(n).index, axis=0)

Remove last row of data frame until reaching a row that does not have an NA

We could use na.trim from zoo package:

library(zoo)
library(dplyr)

df %>%
slice(1:nrow(na.trim(df, "right", is.na = "any")))
   var1 var2 var3 var4
1 1 3 6 NA
2 2 6 7 7
3 3 3 8 8
4 4 NA 9 9
5 5 2 10 10
6 6 NA 11 11
7 7 3 12 12
8 8 4 13 13
9 9 2 14 14
10 10 NA 15 15
11 11 4 16 16

How to delete the last two rows of a df with pandas

Beter is select all rows without last 2 by iloc:

df = df.iloc[:-2]
print (df)
name year reports
Cochice Jason 2012 4
Pima Molly 2012 24
Santa Cruz Tina 2013 31

Remove rows in a group by until the last row meets some condition

It seems you could use drop_duplicates with different rule depending on type:

out = pd.concat([df.query("type=='A'").drop_duplicates(subset=['id','type'], keep='first'), 
df.query("type=='B'").drop_duplicates(subset=['id','type'], keep='last')]).sort_index()

Output:

   id type
0 1 A
1 1 B
3 2 B
4 2 A
5 3 A
8 3 B


Related Topics



Leave a reply



Submit