How to drop rows from pandas data frame that contains a particular string in a particular column?
pandas has vectorized string operations, so you can just filter out the rows that contain the string you don't want:
In [91]: df = pd.DataFrame(dict(A=[5,3,5,6], C=["foo","bar","fooXYZbar", "bat"]))
In [92]: df
Out[92]:
A C
0 5 foo
1 3 bar
2 5 fooXYZbar
3 6 bat
In [93]: df[~df.C.str.contains("XYZ")]
Out[93]:
A C
0 5 foo
1 3 bar
3 6 bat
Remove Rows From Data Frame where a Row matches a String
Just use the ==
with the negation symbol (!
). If dtfm is the name of your data.frame:
dtfm[!dtfm$C == "Foo", ]
Or, to move the negation in the comparison:
dtfm[dtfm$C != "Foo", ]
Or, even shorter using subset()
:
subset(dtfm, C!="Foo")
Remove dataframe row containing a specific in a list value from a list
You can approach in the following steps:
You can use
pd.Series.explode()
on each column/element to expand the list of strings into multiple rows, with each row contains only strings (all lists already got expanded / exploded into rows).Then check the dataframe for strings in the
to_delete
list by using.isin()
.Group by index level 0 (which contains original row index before explode) to aggregate and summarize the multiple rows matching result back into one row (using
.sum()
undergroupby()
).Then
.sum(axis=1)
to check row-wise any matching string to delete.Check for rows with 0 match (those rows to retain) and form a boolean index of the resulting rows.
Finally, use
.loc
to filter the rows without matching to retain.
df.loc[df.apply(pd.Series.explode).isin(to_delete).groupby(level=0).sum().sum(axis=1).eq(0)]
Result:
A B C D E
1 string2 string5 [string8] [string13] [string16]
The original dataframe can be built for testing from the following codes:
data = {'A': ['string1', 'string2', 'string3'],
'B': ['string4', 'string5', 'string6'],
'C': [['string7', 'string10'], ['string8'], ['string9']],
'D': [['string11', 'string 12'], ['string13'], ['string14']],
'E': [['string15'], ['string16'], ['string17']]}
df = pd.DataFrame(data)
Delete rows containing specific strings in R
This should do the trick:
df[- grep("REVERSE", df$Name),]
Or a safer version would be:
df[!grepl("REVERSE", df$Name),]
How to delete row in pandas dataframe based on condition if string is found in cell value of type list?
Since you filter on a list column, apply lambda would probably be the easiest:
df.loc[df.jpgs.apply(lambda x: "123.jpg" not in x)]
Quick comments on your attempts:
In
df = df.drop(df["123.jpg" in df.jpgs].index)
you are checking whether the exact value "123.jpg" is contained in the column ("123.jpg" in df.jpgs
) rather than in any of the lists, which is not what you want.In
df = df[df['jpgs'].str.contains('123.jpg') == False]
goes in the right direction, but you are missing theregex=False
keyword, as shown in Ibrahim's answer.df[df.jpgs.count("123.jpg") == 0]
is also not applicable here, sincecount
returns the total number of non-NaN values in the Series.
Delete the rows matching specific strings in multiple columns with And condition
Try this
df = df[(~df.Science.str.match('Poor')) | (~df.Maths.str.match('Bad'))]
Student Science English Maths
0 A Good Good Good
2 C Avg Good Avg
4 E Poor Avg Avg
5 D Poor Good Good
You can also have a look at this
Thread to why the odd behaviour takes place.
Its because you are giving condition w.r.t what you want to keep in the dataframe and not on what you want to drop
How to delete ANY row containing specific string in pandas?
You can use isin
with any
.
df = df[~df.isin(['refused']).any(axis=1)]
Remove Rows From Data Frame where a Row match from a list
You could use anti_join
from the dplyr
package.
ANrule4 <-
data.frame(Group_Account = c(2911, 2944, 2949, 1415, 1695, 1761, 1912, 2570))
listremove <-
data.frame(Group_Account = c(2911, 2946, 2945, 2944, 2949))
ANrule4 %>% anti_join(listremove, by = "Group_Account")
Group_Account
1 1415
2 1695
3 1761
4 1912
5 2570
Removing rows from dataframe that contains string in a particular column
There are multiple ways you can do this :
Convert to numeric and remove NA
values
subset(df, !is.na(as.numeric(Score)))
# ID Score
#1 1001 4
#2 1002 20
#5 1005 30
Or with grepl
find if there are any non-numeric characters in them and remove them
subset(df, !grepl('\\D', Score))
This can be done with grep
as well.
df[grep('\\D', df$Score, invert = TRUE), ]
data
df <- structure(list(ID = 1001:1005, Score = c("4", "20", "h", "v",
"30")), class = "data.frame", row.names = c(NA, -5L))
Related Topics
Check for Installed Packages Before Running Install.Packages()
Select Every Other Element from a Vector
Seeing If Data Is Normally Distributed in R
Changing Whisker Definition in Geom_Boxplot
Can't Print to PDF Ggplot Charts
Sending Email in R via Outlook
Add Error Bars to Show Standard Deviation on a Plot in R
Cut Function in R- Labeling Without Scientific Notations for Use in Ggplot2
Angle Between Two Vectors in R
How to Wait for a Keypress in R
Extract Matrix Column Values by Matrix Column Name
How to Remove an Element from a List
Non-Equi Join Using Data.Table: Column Missing from the Output
Use Merge() to Update a Data Frame with Values from a Second Data Frame
Read.CSV Warning 'Eof Within Quoted String' Prevents Complete Reading of File
Using Gsub to Extract Character String Before White Space in R