Deleting Rows from CSV Based on Cell Contents from Another Csv

delete rows in csv based on specific column value python

Here's a solution using csv module:

import csv

with open('fin.csv', 'r') as fin, open('fout.csv', 'w', newline='') as fout:

# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=True)
writer = csv.writer(fout, delimiter=',')

# write headers
writer.writerow(next(reader))

# iterate and write rows based on condition
for i in reader:
if int(i[-1]) > 2000:
writer.writerow(i)

Result:

id,type,state,location,number of students
213,primary school,California,Los Angeles,3213
155,secondary school,Pennsylvania,Pittsburgh,2141

How to remove rows from csv based on matching data

I verified the following to work as you need on the kind of data you provided/described:

import csv
from cStringIO import StringIO

# parse the data you're about to filter with
with open('filters.csv', 'rb') as f:
filters = {(row[0], row[1]) for row in csv.reader(f, delimiter=',')}

out_f = StringIO() # use e.g. `with open('out.csv', 'wb') as out_f` for real file output
out = csv.writer(out_f, delimiter=',')

# go thru your rows and see if the pair (row[1], row[8]) is
# found in the previously parsed set of filters; if yes, skip the row
with open('data.csv', 'rb') as f:
for row in csv.reader(f, delimiter=','):
if (row[1], row[8]) not in filters:
out.writerow(row)

# for debugging only
print out_f.getvalue() # prints the resulting filtered CSV data

NOTE: the {... for ... in ...} is set-comprehension syntax; depending on your Python version, you might need to change this to the equivalent set(... for ... in ...) for it to work.

How to delete rows from a csv file based on a list values from another file?

What about the following:

 awk -F, '(NR==FNR){a[$1];next}!($1 in a)' blacklist.csv candidates.csv

How does this work?

An awk program is a series of pattern-action pairs, written as:

condition { action }
condition { action }
...

where condition is typically an expression and action a series of commands. Here, the first condition-action pairs read:

  • (NR==FNR){a[$1];next} if the total record count NR equals the record count of the file FNR (i.e. if we are reading the first file), store all values in array a and skip to the next record (do not do anything else)
  • !($1 in a) if the first field is not in the array a then perform the default action which is print the line. This will only work on the second file as the condition of the first condition-action pair does not hold.

Using Python to delete rows in a csv file that contain certain chars

You should check whether the column Symbol contains any of the characters of interest. Method contains takes a regular expression:

bad_rows = df.Symbol.str.contains('[.^]')
df_clean = df[~bad_rows]

Delete rows of CSV file based on the value of a column

Instead of deleting the records, think of which ones you're going to print. I guess it's <=4. In idiomatic awk you can write this as

$ awk -F, '$4<=4' file

1,don_rickles,Don Rickles,3
1,jim_varney,Jim Varney,4
1,tim_allen,Tim Allen,2
1,tom_hanks,Tom Hanks,1


Related Topics



Leave a reply



Submit