Deleting Every N-Th Row in a Dataframe

Deleting every n-th row in a dataframe

You could create a function as follows

Nth.delete<-function(dataframe, n)dataframe[-(seq(n,to=nrow(dataframe),by=n)),]

Let's test it out

DF<-data.frame(A=1:15, B=rnorm(15), C=sample(LETTERS,15))
Nth.delete(DF, 3)

Pandas every nth row

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

How to remove multiple columns every nth column in R?

As these are index, use - to remove those columns

i1 <- rep(seq(3, ncol(df), 4) , each = 2) + 0:1
df[,-i1]

Or another option is to use a logical index to recycle

df[!c(FALSE, FALSE, TRUE, TRUE)]

data

set.seed(24)
df <- as.data.frame(matrix(rnorm(12 * 4), 4, 12))

Remove nth row in R data frame?

There are a lot of diffrent ways to do that, a simple one is this:

# make an index eg. every 3th
ind <- seq(1, nrow(df), by=3)

# make subset --> this would choose every `ind` row
df[ind, ]

# --> this would exclude ale `ind` row
df[-ind, ]

hth

How do you remove every second row in a pandas dataframe?

I am assuming there are many ways to do this. But I just use iloc

df = df.iloc[::2,:]

Try it and let me know if it worked for you.

Deleting every nth column from a dataframe in r

You can do this in a very simple way in base.

example[, c(TRUE, TRUE, FALSE)]

The logical vector will repeat as needed for the columns. If you want it to scale, you can do something like this.

n <- 3
example[, c(rep(TRUE, n - 1), FALSE)]

If you prefer, the dplyr equivalent of this can be:

example %>%
select(everything()[c(TRUE, TRUE, FALSE)])

How to delete every nth row of an if one contains a zero?

I would do it in two passes. It is a lot cleaner, and it might even be faster under some circumstances. Here's an implementation without numpy; feel free to convert it to use array().

AA =(['0','A','B','C','D','E'],
['X','2','3','3','3','4'],
['Y','3','4','9','7','3'],
['Z','3','4','6','3','4'],
['X','2','3','3','3','4'],
['Y','3','4','8','7','0'],
['Z','3','4','6','3','4'],
['X','2','5','3','3','4'],
['Y','3','4','0','7','3'],
['Z','3','4','6','3','4'])

todrop = set(row[0] for row in AA[1:] if '0' in row)
filtered = list(row for row in AA[1:] if row[0] not in todrop)

Since row[0] does not contain the exact indicator label, write a simple function that will extract the label and use that instead of the entire row[0]. Details depend on what your data actually looks like.

Option 2: In case you really want to do it by counting the rows (which I don't recommend): Save the row numbers modulo 3, instead of the row ID. It's about the same amount of work:

relabeled = list((n % 3, row) for n, row in enumerate(AA[1:]))
todrop = set(n for n, row in relabeled if '0' in row) # Will save {1} for Y
filtered = list(row for n, row in relabeled if n not in todrop)

Drop every nth column in pandas dataframe

The issue with code is, each time you drop a column in your loop, you end up with a different set of columns because you overwrite the df back after each iteration. When you try to drop the next 3rd column of THAT new set of columns, you not only drop the wrong one, you end up running out of columns eventually. That's why you get the error you are getting.

iter1 -> 0,1,3,4,5,6,7,8,9,10 ... n #first you drop 2 which is 3rd col
iter2 -> 0,1,3,4,5,7,8,9,10 ... n #next you drop 6 which is 6th col (should be 5)
iter3 -> 0,1,3,4,5,7,8,9, ... n #next you drop 10 which is 9th col (should be 8)

What you want to do is calculate the indexes beforehand and then remove them in one go.


You can simply just get the indexes of columns you want to remove with range and then drop those.

drop_idx = list(range(2,df.shape[1],3)) #Indexes to drop
df2 = df.drop(drop_idx, axis=1) #Drop them at once over axis=1

print('old columns->', list(df.columns))
print('idx to drop->', drop_idx)
print('new columns->',list(df2.columns))
old columns-> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
idx to drop-> [2, 5, 8]
new columns-> [0, 1, 3, 4, 6, 7, 9]

Note: This works only because your columns names are same as indexes. If however, your column names are not like that, you will have to do an extra step of fetching the column names based on the index you want to drop.

drop_idx = list(range(2,df.shape[1],3))
drop_cols = [j for i,j in enumerate(df.columns) if i in drop_idx] #<--
df2 = df.drop(drop_cols, axis=1)

Skipping every nth row in pandas

One possible solution is create mask by modulo and filter by boolean indexing:

df = pd.DataFrame({'a':range(10, 30)}, index=range(20))
#print (df)

b = df[np.mod(np.arange(df.index.size),4)!=0]
print (b)
a
1 11
2 12
3 13
5 15
6 16
7 17
9 19
10 20
11 21
13 23
14 24
15 25
17 27
18 28
19 29

Details:

print (np.mod(np.arange(df.index.size),4))
[0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3]

print (np.mod(np.arange(df.index.size),4)!=0)
[False True True True False True True True False True True True
False True True True False True True True]

If unique index values use a bit changed @jpp solution from comment:

b = df.drop(df.index[::4], 0)


Related Topics



Leave a reply



Submit