How to Delete Rows from a Pandas Dataframe Based on a Conditional Expression

How to delete rows from a pandas DataFrame based on a conditional expression

When you do len(df['column name']) you are just getting one number, namely the number of rows in the DataFrame (i.e., the length of the column itself). If you want to apply len to each element in the column, use df['column name'].map(len). So try

df[df['column name'].map(len) < 2]

Different ways to conditional Drop Row in Pandas

They are not the same.

df = df.drop(df[df.AE == "X"].index)

Is dropping rows by their index value, if the indexes are not unique, then the index of the rows where df['AE'] == "X" might be shared across other cases.

df = df[df["AE"] != "X"]

Here we are slicing the dataframe and keeping all rows where df["AE"] is different from "X". There is no consideration regarding the index value and actually are not dropping rows, but actually keeping those that meet a criteria.

Delete Rows in Pandas DataFrame based on conditional expression

Use pandas.Series.str.startswith:

new_df = df[~df["text"].str.startswith("RT")]
print(new_df)

Output:

   index   text  is_retweet
0 0 Test False
3 3 Test2 False

Deleting DataFrame row in Pandas based on column value

If I'm understanding correctly, it should be as simple as:

df = df[df.line_race != 0]

Delete some rows in dataframe based on condition in another column

Not the most beautiful of ways to do it but this should work.

df = df.loc[df['value'].groupby(df['name']).cumsum().groupby(df['name']).cumsum() <=1]

Remove rows from pandas DataFrame based on condition

General boolean indexing

df[df['Species'] != 'Cat']
# df[df['Species'].ne('Cat')]

Index Name Species
1 1 Jill Dog
3 3 Harry Dog
4 4 Hannah Dog


df.query

df.query("Species != 'Cat'")

Index Name Species
1 1 Jill Dog
3 3 Harry Dog
4 4 Hannah Dog

For information on the pd.eval() family of functions, their features and use cases, please visit Dynamic Expression Evaluation in pandas using pd.eval().



df.isin

df[~df['Species'].isin(['Cat'])]

Index Name Species
1 1 Jill Dog
3 3 Harry Dog
4 4 Hannah Dog

How to delete row in pandas dataframe based on condition if string is found in cell value of type list?

Since you filter on a list column, apply lambda would probably be the easiest:

df.loc[df.jpgs.apply(lambda x: "123.jpg" not in x)]

Quick comments on your attempts:

  • In df = df.drop(df["123.jpg" in df.jpgs].index) you are checking whether the exact value "123.jpg" is contained in the column ("123.jpg" in df.jpgs) rather than in any of the lists, which is not what you want.

  • In df = df[df['jpgs'].str.contains('123.jpg') == False] goes in the right direction, but you are missing the regex=False keyword, as shown in Ibrahim's answer.

  • df[df.jpgs.count("123.jpg") == 0] is also not applicable here, since count returns the total number of non-NaN values in the Series.

Drop rows on multiple conditions in pandas dataframe

drop is a method, you are calling it using [], that is why it gives you:

'method' object is not subscriptable

change to () (a normal method call) and it should work:

import pandas as pd

df = pd.DataFrame({"col_1": (0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0),
"col_2": (0.0, 0.24, 1.0, 0.0, 0.22, 3.11, 0.0),
"col_3": ("Mon", "Tue", "Thu", "Fri", "Mon", "Tue", "Thu")})

df_new = df.drop(df[(df['col_1'] == 1.0) & (df['col_2'] == 0.0)].index)
print(df_new)

Output

   col_1  col_2 col_3
0 0.0 0.00 Mon
1 0.0 0.24 Tue
2 1.0 1.00 Thu
4 0.0 0.22 Mon
5 1.0 3.11 Tue


Related Topics



Leave a reply



Submit