Remove Rows from Data Frame Where a Row Matches a String

How to drop rows from pandas data frame that contains a particular string in a particular column?

pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:

In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))

In [92]: df
Out[92]:
A C
0 5 foo
1 3 bar
2 5 fooXYZbar
3 6 bat

In [93]: df[~df.C.str.contains("XYZ")]
Out[93]:
A C
0 5 foo
1 3 bar
3 6 bat

Remove Rows From Data Frame where a Row matches a String

Just use the == with the negation symbol (!). If dtfm is the name of your data.frame:

dtfm[!dtfm$C == "Foo", ]

Or, to move the negation in the comparison:

dtfm[dtfm$C != "Foo", ]

Or, even shorter using subset():

subset(dtfm, C!="Foo")

Remove dataframe row containing a specific in a list value from a list

You can approach in the following steps:

  1. You can use pd.Series.explode() on each column/element to expand the list of strings into multiple rows, with each row contains only strings (all lists already got expanded / exploded into rows).

  2. Then check the dataframe for strings in the to_delete list by using .isin().

  3. Group by index level 0 (which contains original row index before explode) to aggregate and summarize the multiple rows matching result back into one row (using .sum() under groupby()).

  4. Then .sum(axis=1) to check row-wise any matching string to delete.

  5. Check for rows with 0 match (those rows to retain) and form a boolean index of the resulting rows.

  6. Finally, use .loc to filter the rows without matching to retain.



df.loc[df.apply(pd.Series.explode).isin(to_delete).groupby(level=0).sum().sum(axis=1).eq(0)]

Result:

         A        B          C           D           E
1 string2 string5 [string8] [string13] [string16]

The original dataframe can be built for testing from the following codes:

data = {'A': ['string1', 'string2', 'string3'],
'B': ['string4', 'string5', 'string6'],
'C': [['string7', 'string10'], ['string8'], ['string9']],
'D': [['string11', 'string 12'], ['string13'], ['string14']],
'E': [['string15'], ['string16'], ['string17']]}

df = pd.DataFrame(data)

Delete rows containing specific strings in R

This should do the trick:

df[- grep("REVERSE", df$Name),]

Or a safer version would be:

df[!grepl("REVERSE", df$Name),]

How to delete row in pandas dataframe based on condition if string is found in cell value of type list?

Since you filter on a list column, apply lambda would probably be the easiest:

df.loc[df.jpgs.apply(lambda x: "123.jpg" not in x)]

Quick comments on your attempts:

  • In df = df.drop(df["123.jpg" in df.jpgs].index) you are checking whether the exact value "123.jpg" is contained in the column ("123.jpg" in df.jpgs) rather than in any of the lists, which is not what you want.

  • In df = df[df['jpgs'].str.contains('123.jpg') == False] goes in the right direction, but you are missing the regex=False keyword, as shown in Ibrahim's answer.

  • df[df.jpgs.count("123.jpg") == 0] is also not applicable here, since count returns the total number of non-NaN values in the Series.

Delete the rows matching specific strings in multiple columns with And condition

Try this

df = df[(~df.Science.str.match('Poor')) | (~df.Maths.str.match('Bad'))]

  Student   Science English Maths
0 A Good Good Good
2 C Avg Good Avg
4 E Poor Avg Avg
5 D Poor Good Good

You can also have a look at this
Thread to why the odd behaviour takes place.
Its because you are giving condition w.r.t what you want to keep in the dataframe and not on what you want to drop

How to delete ANY row containing specific string in pandas?

You can use isin with any.

df = df[~df.isin(['refused']).any(axis=1)]

Remove Rows From Data Frame where a Row match from a list

You could use anti_join from the dplyr package.

ANrule4 <-
data.frame(Group_Account = c(2911, 2944, 2949, 1415, 1695, 1761, 1912, 2570))

listremove <-
data.frame(Group_Account = c(2911, 2946, 2945, 2944, 2949))

ANrule4 %>% anti_join(listremove, by = "Group_Account")

Group_Account
1 1415
2 1695
3 1761
4 1912
5 2570

Removing rows from dataframe that contains string in a particular column

There are multiple ways you can do this :

Convert to numeric and remove NA values

subset(df, !is.na(as.numeric(Score)))

# ID Score
#1 1001 4
#2 1002 20
#5 1005 30

Or with grepl find if there are any non-numeric characters in them and remove them

subset(df, !grepl('\\D', Score))

This can be done with grep as well.

df[grep('\\D', df$Score, invert = TRUE), ]

data

df <- structure(list(ID = 1001:1005, Score = c("4", "20", "h", "v", 
"30")), class = "data.frame", row.names = c(NA, -5L))


Related Topics



Leave a reply



Submit