delete rows in csv based on specific column value python
Here's a solution using csv
module:
import csv
with open('fin.csv', 'r') as fin, open('fout.csv', 'w', newline='') as fout:
# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=True)
writer = csv.writer(fout, delimiter=',')
# write headers
writer.writerow(next(reader))
# iterate and write rows based on condition
for i in reader:
if int(i[-1]) > 2000:
writer.writerow(i)
Result:
id,type,state,location,number of students
213,primary school,California,Los Angeles,3213
155,secondary school,Pennsylvania,Pittsburgh,2141
How to remove rows from csv based on matching data
I verified the following to work as you need on the kind of data you provided/described:
import csv
from cStringIO import StringIO
# parse the data you're about to filter with
with open('filters.csv', 'rb') as f:
filters = {(row[0], row[1]) for row in csv.reader(f, delimiter=',')}
out_f = StringIO() # use e.g. `with open('out.csv', 'wb') as out_f` for real file output
out = csv.writer(out_f, delimiter=',')
# go thru your rows and see if the pair (row[1], row[8]) is
# found in the previously parsed set of filters; if yes, skip the row
with open('data.csv', 'rb') as f:
for row in csv.reader(f, delimiter=','):
if (row[1], row[8]) not in filters:
out.writerow(row)
# for debugging only
print out_f.getvalue() # prints the resulting filtered CSV data
NOTE: the {... for ... in ...}
is set-comprehension syntax; depending on your Python version, you might need to change this to the equivalent set(... for ... in ...)
for it to work.
How to delete rows from a csv file based on a list values from another file?
What about the following:
awk -F, '(NR==FNR){a[$1];next}!($1 in a)' blacklist.csv candidates.csv
How does this work?
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition
is typically an expression and action
a series of commands. Here, the first condition-action pairs read:
(NR==FNR){a[$1];next}
if the total record countNR
equals the record count of the fileFNR
(i.e. if we are reading the first file), store all values in arraya
and skip to the next record (do not do anything else)!($1 in a)
if the first field is not in the arraya
then perform the default action which is print the line. This will only work on the second file as the condition of the first condition-action pair does not hold.
Using Python to delete rows in a csv file that contain certain chars
You should check whether the column Symbol
contains any of the characters of interest. Method contains
takes a regular expression:
bad_rows = df.Symbol.str.contains('[.^]')
df_clean = df[~bad_rows]
Delete rows of CSV file based on the value of a column
Instead of deleting the records, think of which ones you're going to print. I guess it's <=4
. In idiomatic awk
you can write this as
$ awk -F, '$4<=4' file
1,don_rickles,Don Rickles,3
1,jim_varney,Jim Varney,4
1,tim_allen,Tim Allen,2
1,tom_hanks,Tom Hanks,1
Related Topics
How to Add List into a New Column in CSV - Python
Convert Number Strings With Commas in Pandas Dataframe to Float
How to Restart a Program Based on User Input
Discord Bot Messaging a User With a Specific User Id
Finding Out Who Got the Highest Mark Among the Students
Importing Modules from Parent Folder
Python - How to Pad the Output of a MySQL Table
How to Check List Containing Nan
Reduce Multi-Index/Multi-Level Dataframe to Single Index, Single Level
Check Json Data Is None in Python
Simple Digit Recognition Ocr in Opencv-Python
Python Creating Dictionary from Excel Data
Convert Spark Dataframe Column to Python List
Remove Timestamp from Date String in Python
Invalidargumenterror: Logits and Labels Must Have the Same First Dimension Seq2Seq Tensorflow