Skip the Headers When Editing a CSV File Using Python

How to skip the headers when processing a csv file using Python?

Your reader variable is an iterable, by looping over it you retrieve the rows.

To make it skip one item before your loop, simply call next(reader, None) and ignore the return value.

You can also simplify your code a little; use the opened files as context managers to have them closed automatically:

with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
reader = csv.reader(infile)
next(reader, None) # skip the headers
writer = csv.writer(outfile)
for row in reader:
# process each row
writer.writerow(row)

# no need to close, the files are closed automatically when you get to this point.

If you wanted to write the header to the output file unprocessed, that's easy too, pass the output of next() to writer.writerow():

headers = next(reader, None)  # returns the headers or `None` if the input is empty
if headers:
writer.writerow(headers)

Can't skip header row in csv file with python

I don't think you are using the next() function correctly.

Here's the an example from the documentation:

import csv

with open('eggs.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print(', '.join(row))

When you use csv.reader, it takes the csv file and for each row creates an iterable object. So, if you want to skip the first row (the header row) simply make this change.

with open(aws_env_list) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in list(csv_reader)[1:]:
""" do whatever action """

When you add [1:] to the end of csv_reader, it tells it to select only the 2nd object (since 0 is the first). You create in essence a subset of the object which does not contain the first element.

Pandas read_csv() conditionally skipping header row

If the headers in your CSV files follow a similar pattern, you can do something simple like sniffing out the first line before determining whether to skip the first row or not.

filename = '/path/to/file.csv'
skiprows = int('Created in' in next(open(filename)))
df = pd.read_csv(filename, skiprows=skiprows)

Good pratice would be to use a context manager, so you could also do this:

filename = '/path/to/file.csv'
skiprows = 0
with open(filename, 'r+') as f:
for line in f:
if line.startswith('Created '):
skiprows = 1
break
df = pd.read_csv(filename, skiprows=skiprows)

How to delete first row in a csv file using python

FILENAME = 'test.csv'
DELETE_LINE_NUMBER = 1

with open(FILENAME) as f:
data = f.read().splitlines() # Read csv file
with open(FILENAME, 'w') as g:
g.write('\n'.join([data[:DELETE_LINE_NUMBER]] + data[DELETE_LINE_NUMBER+1:])) # Write to file

Original test.csv:

ID, Name
0, ABC
1, DEF
2, GHI
3, JKL
4, MNO

After run:

ID, Name
1, DEF
2, GHI
3, JKL
4, MNO

(deleted 0, ABC)

How to skip header of CSV file while appending it to another CSV file using python

The problem you have is recognizing the response, which lines to skip, and which lines to retain. You have implemented slicing to extract response lines after the first line, assuming that the actual content starts on the second line. Based upon your described symptoms, this is not (always) the case.

#fully describe header here,
header="STATION,STATION_ELEVATION,LATITUDE,LONGITUDE,..."

def isheader(line,header,delim=','):
l = line.split(delim) #may want to fold case
h = header.split(delim) #may want to fold case
n = sum(list(set(l) & set(h)))
return n==len(h)

response = requests.get(Link)
actual_file = glob.glob(path_to_data+'\\Data\\*')
new_target_file = path_to_data+'\\Data'+'\\'+State+'_'+date+'_'+St_Id+'.csv'
# Write to .CSV
if not os.path.exists(new_target_file):
with open(actual_file[0], "a") as f:
for x in response.text.split('\n')[1:]:
if len(x) < 2: continue #skip empty lines
if isheader(x,header,','): continue #skip header
f.write(x+'\n')
#with performs close automatically
os.rename(actual_file[0],new_target_file)
else:
logging.warning("File already exist")

Take a look at this question for how to use with open,
How to open a file using the open with statement

Here is an example of how to compare two lists (such as a header list to a row list),
Comparing two lists in Python

Python CSV read in as dictionary with nth row as header

Since reader operates on an opened file object, you can just skip the lines yourself by calling readline() in advance:

from io import StringIO
from csv import DictReader

data = StringIO(
"""linewedon'twant
linewedon'twant
x,y
0,1
1,5
2,10
"""
)

with data as f:
for _ in range(2):
f.readline()

reader = DictReader(f)
for row in reader:
print(row)

Obviously, with data as f would be with open("csv") as f:; I left it in here (where it isn't needed) to keep the structure the same.

python csv header ignore while keep appending data to csv file

A slightly simpler alternative to Mr Evans approach would be to use the following test in place of the test for existence:

fileEmpty = os.stat('collection1.dat').st_size == 0

This obviates the need to do a seek, etc.

EDIT: Complete code:

import random
import csv
import os.path
from time import gmtime, strftime

filename = '/home/robdata/collection1.dat'

fileEmpty = os.stat(filename).st_size == 0

v = random.randint(0, 100)

with open(filename, "a") as csvfile:
headers = ['DATE', 'value']
writer = csv.DictWriter(csvfile, delimiter=',', lineterminator='\n',fieldnames=headers)
if fileEmpty:
writer.writeheader() # file doesn't exist yet, write a header

writer.writerow({'DATE': strftime("%Y-%m-%d %H:%M:%S", gmtime()), 'value': v})

funny behaviour when editing a csv file in excel and then doing some data filtering in pandas

Excel does not leave any file "untouched". It applies formatting to every file it opens (e.g. float values like "5.06" will be interpreted as date and changed to "05 Jun"). Depending on the expected datatype these rows might be displayed wrongly or missing in your notebook.
Better use sed or awk to manipulate csv files (or a text editor for smaller files).



Related Topics



Leave a reply



Submit