Deleting every n-th row in a dataframe
You could create a function as follows
Nth.delete<-function(dataframe, n)dataframe[-(seq(n,to=nrow(dataframe),by=n)),]
Let's test it out
DF<-data.frame(A=1:15, B=rnorm(15), C=sample(LETTERS,15))
Nth.delete(DF, 3)
Pandas every nth row
I'd use iloc
, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:
df.iloc[::5, :]
How to remove multiple columns every nth column in R?
As these are index, use -
to remove those columns
i1 <- rep(seq(3, ncol(df), 4) , each = 2) + 0:1
df[,-i1]
Or another option is to use a logical index to recycle
df[!c(FALSE, FALSE, TRUE, TRUE)]
data
set.seed(24)
df <- as.data.frame(matrix(rnorm(12 * 4), 4, 12))
Remove nth row in R data frame?
There are a lot of diffrent ways to do that, a simple one is this:
# make an index eg. every 3th
ind <- seq(1, nrow(df), by=3)
# make subset --> this would choose every `ind` row
df[ind, ]
# --> this would exclude ale `ind` row
df[-ind, ]
hth
How do you remove every second row in a pandas dataframe?
I am assuming there are many ways to do this. But I just use iloc
df = df.iloc[::2,:]
Try it and let me know if it worked for you.
Deleting every nth column from a dataframe in r
You can do this in a very simple way in base.
example[, c(TRUE, TRUE, FALSE)]
The logical vector will repeat as needed for the columns. If you want it to scale, you can do something like this.
n <- 3
example[, c(rep(TRUE, n - 1), FALSE)]
If you prefer, the dplyr
equivalent of this can be:
example %>%
select(everything()[c(TRUE, TRUE, FALSE)])
How to delete every nth row of an if one contains a zero?
I would do it in two passes. It is a lot cleaner, and it might even be faster under some circumstances. Here's an implementation without numpy; feel free to convert it to use array()
.
AA =(['0','A','B','C','D','E'],
['X','2','3','3','3','4'],
['Y','3','4','9','7','3'],
['Z','3','4','6','3','4'],
['X','2','3','3','3','4'],
['Y','3','4','8','7','0'],
['Z','3','4','6','3','4'],
['X','2','5','3','3','4'],
['Y','3','4','0','7','3'],
['Z','3','4','6','3','4'])
todrop = set(row[0] for row in AA[1:] if '0' in row)
filtered = list(row for row in AA[1:] if row[0] not in todrop)
Since row[0]
does not contain the exact indicator label, write a simple function that will extract the label and use that instead of the entire row[0]
. Details depend on what your data actually looks like.
Option 2: In case you really want to do it by counting the rows (which I don't recommend): Save the row numbers modulo 3, instead of the row ID. It's about the same amount of work:
relabeled = list((n % 3, row) for n, row in enumerate(AA[1:]))
todrop = set(n for n, row in relabeled if '0' in row) # Will save {1} for Y
filtered = list(row for n, row in relabeled if n not in todrop)
Drop every nth column in pandas dataframe
The issue with code is, each time you drop a column in your loop, you end up with a different set of columns because you overwrite the df
back after each iteration. When you try to drop the next 3rd column of THAT new set of columns, you not only drop the wrong one, you end up running out of columns eventually. That's why you get the error you are getting.
iter1 -> 0,1,3,4,5,6,7,8,9,10 ... n #first you drop 2 which is 3rd col
iter2 -> 0,1,3,4,5,7,8,9,10 ... n #next you drop 6 which is 6th col (should be 5)
iter3 -> 0,1,3,4,5,7,8,9, ... n #next you drop 10 which is 9th col (should be 8)
What you want to do is calculate the indexes beforehand and then remove them in one go.
You can simply just get the indexes of columns you want to remove with range and then drop those.
drop_idx = list(range(2,df.shape[1],3)) #Indexes to drop
df2 = df.drop(drop_idx, axis=1) #Drop them at once over axis=1
print('old columns->', list(df.columns))
print('idx to drop->', drop_idx)
print('new columns->',list(df2.columns))
old columns-> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
idx to drop-> [2, 5, 8]
new columns-> [0, 1, 3, 4, 6, 7, 9]
Note: This works only because your columns names are same as indexes. If however, your column names are not like that, you will have to do an extra step of fetching the column names based on the index you want to drop.
drop_idx = list(range(2,df.shape[1],3))
drop_cols = [j for i,j in enumerate(df.columns) if i in drop_idx] #<--
df2 = df.drop(drop_cols, axis=1)
Skipping every nth row in pandas
One possible solution is create mask by modulo and filter by boolean indexing
:
df = pd.DataFrame({'a':range(10, 30)}, index=range(20))
#print (df)
b = df[np.mod(np.arange(df.index.size),4)!=0]
print (b)
a
1 11
2 12
3 13
5 15
6 16
7 17
9 19
10 20
11 21
13 23
14 24
15 25
17 27
18 28
19 29
Details:
print (np.mod(np.arange(df.index.size),4))
[0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3]
print (np.mod(np.arange(df.index.size),4)!=0)
[False True True True False True True True False True True True
False True True True False True True True]
If unique index values use a bit changed @jpp solution from comment:
b = df.drop(df.index[::4], 0)
Related Topics
Output a Good-Looking Matrix Using Rendertable()
How to Reorder the Items in a Legend
How to Add a Condition to the Geom_Point Size
How to Change Angle of Line in Customized Legend in Ggplot2
Dplyr Rowwise Sum and Other Functions Like Max
Extend an Irregular Sequence and Add Zeros to Missing Values
Reshape Data Long to Wide - Understanding Reshape Parameters
How to Change Stacking Order in Stacked Bar Chart in R
R Fast Single Item Lookup from List VS Data.Table VS Hash
How to Change the Now Deprecated Dplyr::Funs() Which Includes an Ifelse Argument
How to One-Hot-Encode Factor Variables with Data.Table
Identifying the Outliers in a Data Set in R
Get Margin Line Locations in Log Space