How to Read and Write CSV Files With Python

How do I read and write CSV files with Python?

Here are some minimal complete examples how to read CSV files and how to write CSV files with Python.

Python 3: Reading a CSV file

Pure Python

import csv

# Define data
data = [
(1, "A towel,", 1.0),
(42, " it says, ", 2.0),
(1337, "is about the most ", -1),
(0, "massively useful thing ", 123),
(-2, "an interstellar hitchhiker can have.", 3),
]

# Write CSV file
with open("test.csv", "wt") as fp:
writer = csv.writer(fp, delimiter=",")
# writer.writerow(["your", "header", "foo"]) # write header
writer.writerows(data)

# Read CSV file
with open("test.csv") as fp:
reader = csv.reader(fp, delimiter=",", quotechar='"')
# next(reader, None) # skip the headers
data_read = [row for row in reader]

print(data_read)

After that, the contents of data_read are

[['1', 'A towel,', '1.0'],
['42', ' it says, ', '2.0'],
['1337', 'is about the most ', '-1'],
['0', 'massively useful thing ', '123'],
['-2', 'an interstellar hitchhiker can have.', '3']]

Please note that CSV reads only strings. You need to convert to the column types manually.

A Python 2+3 version was here before (link), but Python 2 support is dropped. Removing the Python 2 stuff massively simplified this answer.

Related

  • How do I write data into csv format as string (not file)?
  • How can I use io.StringIO() with the csv module?: This is interesting if you want to serve a CSV on-the-fly with Flask, without actually storing the CSV on the server.

mpu

Have a look at my utility package mpu for a super simple and easy to remember one:

import mpu.io
data = mpu.io.read('example.csv', delimiter=',', quotechar='"', skiprows=None)
mpu.io.write('example.csv', data)

Pandas

import pandas as pd

# Read the CSV into a pandas data frame (df)
# With a df you can do many things
# most important: visualize data with Seaborn
df = pd.read_csv('myfile.csv', sep=',')
print(df)

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

See read_csv docs for more information. Please note that pandas automatically infers if there is a header line, but you can set it manually, too.

If you haven't heard of Seaborn, I recommend having a look at it.

Other

Reading CSV files is supported by a bunch of other libraries, for example:

  • dask.dataframe.read_csv
  • spark.read.csv

Created CSV file

1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3

Common file endings

.csv

Working with the data

After reading the CSV file to a list of tuples / dicts or a Pandas dataframe, it is simply working with this kind of data. Nothing CSV specific.

Alternatives

  • JSON: Nice for writing human-readable data; VERY commonly used (read & write)
  • CSV: Super simple format (read & write)
  • YAML: Nice to read, similar to JSON (read & write)
  • pickle: A Python serialization format (read & write)
  • MessagePack (Python package): More compact representation (read & write)
  • HDF5 (Python package): Nice for matrices (read & write)
  • XML: exists too *sigh* (read & write)

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

How to read and write to CSV Files in Python

Here is a simple example:

The data.csv is a csv with one column and multiple rows.

The results.csv contain the mean and median of the input and is a csv with 1 row and 2 columns (mean is in 1st column and median in 2nd column)

Example:

import numpy as np
import pandas as pd
import csv

#load the data
data = pd.read_csv("data.csv", header=None)

#calculate things for the 1st column that has the data
calculate_mean = [np.mean(data.loc[:,0])]
calculate_median = [np.median(data.loc[:,0])]
results = [calculate_mean, calculate_median]

#write results to csv
row = []
for result in results:
row.append(result)

with open("results.csv", "wb") as file:
writer = csv.writer(file)
writer.writerow(row)

Reading from a sensor and writing to a CSV file

I ended up, following Pranav Hosangadi's advice, handling the sigterm call manually as

import serial
import signal

def signal_handler(signal, frame):
global interrupted
interrupted = True


signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
interrupted = False

if __name__ == '__main__':
ser = serial.Serial('COM4')
with open(filename, 'w') as f:
while True:
if ser.in_waiting > 0:
temp = ser.readline()
f.write(temp)

if interrupted:
break

Writing to a CSV file with only one header line

There are two issues with your code, the first is data is an list yet you're enclosing it in another list, i.e. [header] is the same as [['cost', 'tax', 'percentage_of_pay']] which is an list of lists.

Second you would normally write the header first then write the data in a loop one per data row

You probably want something like:

with open('Sales-and-cost.csv', 'w') as f:
writer=csv.writer(f, delimiter='\t', lineterminator='\n')
writer.writerow(header)

for row in rows:
writer.writerow(row)

Where rows is a list of lists containing the output data, i.e.

rows = [[price_1, taxes_1, pay_percentage_1],[price_2, taxes_2, pay_percentage_2],[price_3, taxes_3, pay_percentage_3]]


Related Topics



Leave a reply



Submit