Remove Last N Rows in Data Frame With the Arbitrary Number of Rows

Remove last N rows in data frame with the arbitrary number of rows

head with a negative index is convenient for this...

df <- data.frame( a = 1:10 )
head(df,-5)
#  a
#1 1
#2 2
#3 3
#4 4
#5 5

p.s. your seq() example may be written slightly less(?) awkwardly using the named arguments by and length.out (shortened to len) like this -seq(nrow(df),by=-1,len=5).

How to remove first and last N rows in a dataframe using R?

You can perform this specifying the range of rows you want to leave in the final dataset:

 df_adj <- df[1001:( nrow(df) - 1000 ),]

Just make sure you have enough rows to perform this. A safer approach might be:

df_adj <- if( nrow(df) > 2000 ) df[1001:( nrow(df) - 1000 ),] else df

remove last n elements from a row in a dataframe

You may use

head_col_last_1 <- str_extract(head_col$V1, "\\S+(?:\\s+\\S+){1,2}(?=\\s*$)")

The pattern matches:

\\S+ - 1+ non-whitespace chars
(?:\\s+\\S+){1,2} - one or two occurrences of
- \\s+ - 1+ whitespace chars
- \\S+ - 1+ non-whitespace chars
(?=\\s*$) - that are followed with 0+ whitespaces and the end of string.

Deleting a subset of rows based on other variables

A potential base R solution would be:

d <- data.frame(station = rep(paste("station", 1:3), c(250, 1000, 150)),
                depth = rnorm(250 + 1000 + 150, 100, 10))

d$grp_counter <- do.call("c", lapply(tapply(d$depth, d$station, length), seq_len))
d$grp_length <- rep(tapply(d$depth, d$station, length), tapply(d$depth, d$station, length))
d <- d[d$grp_counter <= (d$grp_length - 50),]
d

# OR w/o auxiliary vars: subset(d, select = -c(grp_counter, grp_length))

Simultaneously remove the first and last rows of a data frame until reaching a row that does not have an NA

base R

r <- rle(complete.cases(df))
str(r, vec.len = 9)
# List of 2
#  $ lengths: int [1:9] 2 1 1 1 1 3 1 1 4
#  $ values : logi [1:9] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
#  - attr(*, "class")= chr "rle"
r$values[ -c(1, length(r$values)) ] <- TRUE
str(r, vec.len = 9)
# List of 2
#  $ lengths: int [1:9] 2 1 1 1 1 3 1 1 4
#  $ values : logi [1:9] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
#  - attr(*, "class")= chr "rle"
df[inverse.rle(r),]
#    var1 var2 var3 var4
# 3     3    3    8    8
# 4     4   NA    9    9
# 5     5    2   10   10
# 6     6   NA   11   11
# 7     7    3   12   12
# 8     8    4   13   13
# 9     9    2   14   14
# 10   10   NA   15   15
# 11   11    4   16   16

dplyr

For your question of efficiency, you can adapt the rle solution to dplyr as well (that should be trivial), but I see no reason why the use of complete.cases and cumany/rev would be a problem. You can improve on your attempt by not calculating complete.cases(.) twice as you're doing, storing it in an interim column.

library(dplyr)
df %>%
  mutate(aux = complete.cases(cur_data())) %>%
  filter(cumany(aux) & rev(cumany(rev(aux))))
#   var1 var2 var3 var4   aux
# 1    3    3    8    8  TRUE
# 2    4   NA    9    9 FALSE
# 3    5    2   10   10  TRUE
# 4    6   NA   11   11 FALSE
# 5    7    3   12   12  TRUE
# 6    8    4   13   13  TRUE
# 7    9    2   14   14  TRUE
# 8   10   NA   15   15 FALSE
# 9   11    4   16   16  TRUE

data.table

(Just an adaptation of the dplyr version.)

library(data.table)
setDT(df)
df[, aux := complete.cases(.SD)
  ][ cumsum(aux) > 0 & rev(cumsum(rev(aux)) > 0), ]
#     var1  var2  var3  var4    aux
#    <int> <num> <int> <int> <lgcl>
# 1:     3     3     8     8   TRUE
# 2:     4    NA     9     9  FALSE
# 3:     5     2    10    10   TRUE
# 4:     6    NA    11    11  FALSE
# 5:     7     3    12    12   TRUE
# 6:     8     4    13    13   TRUE
# 7:     9     2    14    14   TRUE
# 8:    10    NA    15    15  FALSE
# 9:    11     4    16    16   TRUE

Drop last n rows within pandas dataframe groupby

You can use groupby and drop as below:

n = 2
df.drop(df.groupby(['a','b']).tail(n).index, axis=0)

Remove last row of data frame until reaching a row that does not have an NA

We could use na.trim from zoo package:

library(zoo)
library(dplyr)

df %>% 
  slice(1:nrow(na.trim(df, "right", is.na = "any")))

   var1 var2 var3 var4
1     1    3    6   NA
2     2    6    7    7
3     3    3    8    8
4     4   NA    9    9
5     5    2   10   10
6     6   NA   11   11
7     7    3   12   12
8     8    4   13   13
9     9    2   14   14
10   10   NA   15   15
11   11    4   16   16

How to delete the last two rows of a df with pandas

Beter is select all rows without last 2 by iloc:

df = df.iloc[:-2]
print (df)
             name  year  reports
Cochice     Jason  2012        4
Pima        Molly  2012       24
Santa Cruz   Tina  2013       31

Remove rows in a group by until the last row meets some condition

It seems you could use drop_duplicates with different rule depending on type:

out = pd.concat([df.query("type=='A'").drop_duplicates(subset=['id','type'], keep='first'), 
                 df.query("type=='B'").drop_duplicates(subset=['id','type'], keep='last')]).sort_index()

Output:

Remove Last N Rows in Data Frame With the Arbitrary Number of Rows