Reading Data from a CSV File in Python

Reading data from a CSV file in Python

Here is how I've got 2nd and 3rd columns:

import csv

path = 'c:\\temp\\'

file=open( path +"xyz.CSV", "r")
reader = csv.reader(file)
for line in reader:
t=line[1],line[2]
print(t)

Here is the results:

('col2', 'col3')
('empId1', '241682-27638-USD-CIGGNT ')
('empId2', '241682-27638-USD-OCGGINT ')
('empId3', '241942-37190-USD-GGDIV ')
('empId4', '241942-37190-USD-CHYOF ')
('empId5', '241942-37190-USD-EQPL ')
('empId6', '241942-37190-USD-INT ')
('empId7', '242066-15343-USD-CYJOF ')
('empId8', '242066-15343-USD-CYJOF ')
('empId9', '242066-15343-USD-CYJOF ')
('empId10', '241942-37190-USD-GGDIV ')

How do I read and write CSV files with Python?

Here are some minimal complete examples how to read CSV files and how to write CSV files with Python.

Pure Python:

import csv

# Define data
data = [
(1, "A towel,", 1.0),
(42, " it says, ", 2.0),
(1337, "is about the most ", -1),
(0, "massively useful thing ", 123),
(-2, "an interstellar hitchhiker can have.", 3),
]

# Write CSV file
with open("test.csv", "wt") as fp:
writer = csv.writer(fp, delimiter=",")
# writer.writerow(["your", "header", "foo"]) # write header
writer.writerows(data)

# Read CSV file
with open("test.csv") as fp:
reader = csv.reader(fp, delimiter=",", quotechar='"')
# next(reader, None) # skip the headers
data_read = [row for row in reader]

print(data_read)

After that, the contents of data_read are

[['1', 'A towel,', '1.0'],
['42', ' it says, ', '2.0'],
['1337', 'is about the most ', '-1'],
['0', 'massively useful thing ', '123'],
['-2', 'an interstellar hitchhiker can have.', '3']]

Please note that CSV reads only strings. You need to convert to the column types manually.

A Python 2+3 version was here before (link), but Python 2 support is dropped. Removing the Python 2 stuff massively simplified this answer.

Related

  • How do I write data into csv format as string (not file)?
  • How can I use io.StringIO() with the csv module?: This is interesting if you want to serve a CSV on-the-fly with Flask, without actually storing the CSV on the server.

mpu

Have a look at my utility package mpu for a super simple and easy to remember one:

import mpu.io
data = mpu.io.read('example.csv', delimiter=',', quotechar='"', skiprows=None)
mpu.io.write('example.csv', data)

Pandas

import pandas as pd

# Read the CSV into a pandas data frame (df)
# With a df you can do many things
# most important: visualize data with Seaborn
df = pd.read_csv('myfile.csv', sep=',')
print(df)

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

See read_csv docs for more information. Please note that pandas automatically infers if there is a header line, but you can set it manually, too.

If you haven't heard of Seaborn, I recommend having a look at it.

Other

Reading CSV files is supported by a bunch of other libraries, for example:

  • dask.dataframe.read_csv
  • spark.read.csv

Created CSV file

1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3

Common file endings

.csv

Working with the data

After reading the CSV file to a list of tuples / dicts or a Pandas dataframe, it is simply working with this kind of data. Nothing CSV specific.

Alternatives

  • JSON: Nice for writing human-readable data; VERY commonly used (read & write)
  • CSV: Super simple format (read & write)
  • YAML: Nice to read, similar to JSON (read & write)
  • pickle: A Python serialization format (read & write)
  • MessagePack (Python package): More compact representation (read & write)
  • HDF5 (Python package): Nice for matrices (read & write)
  • XML: exists too *sigh* (read & write)

For your application, the following might be important:

  • Support by other programming languages
  • Reading / writing performance
  • Compactness (file size)

See also: Comparison of data serialization formats

In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python

Reading rows from a CSV file in Python

You could do something like this:

with open("data1.txt") as f:
lis = [line.split() for line in f] # create a list of lists
for i, x in enumerate(lis): #print the list items
print "line{0} = {1}".format(i, x)

# output
line0 = ['Year:', 'Dec:', 'Jan:']
line1 = ['1', '50', '60']
line2 = ['2', '25', '50']
line3 = ['3', '30', '30']
line4 = ['4', '40', '20']
line5 = ['5', '10', '10']

or :

with open("data1.txt") as f:
for i, line in enumerate(f):
print "line {0} = {1}".format(i, line.split())

# output
line 0 = ['Year:', 'Dec:', 'Jan:']
line 1 = ['1', '50', '60']
line 2 = ['2', '25', '50']
line 3 = ['3', '30', '30']
line 4 = ['4', '40', '20']
line 5 = ['5', '10', '10']

Edit:

with open('data1.txt') as f:
print "{0}".format(f.readline().split())
for x in f:
x = x.split()
print "{0} = {1}".format(x[0],sum(map(int, x[1:])))

# output
['Year:', 'Dec:', 'Jan:']
1 = 110
2 = 75
3 = 60
4 = 60
5 = 20

Reading data from a CSV file yields TypeError

Try this:

import numpy as np

text = open("ucov_users.csv", "r")
text = ''.join([i for i in text]) \
.replace(" ", "\n")
x = open("ucov_users.csv", "w")
x.writelines(text)
x.close()

uncov_users = np.genfromtxt('ucov_users.csv', delimiter=',')
for i,j in uncov_users:
ux_coor = i
uy_coor = j
print(ux_coor,uy_coor)

Reading csv files with new lines

This method will print in the format you requested and number your rows as well.

import pandas as pd

data = pd.read_csv('test.csv', header = None, names = ['Team Name', 'Number', 'Score'])

print(data)

Output:

      Team Name  Number  Score
0 Team One 23 NaN
1 Team Two 102 NaN
2 Team Three 44 NaN
3 Team Four 40 NaN

Python code to read a csv file and create a master csv file

Your attempt has three problems; it uses os.walk which traverses subdirectories (perhaps this is not a problem because your folder does not have subdirectories, but you should use the correct function for your use case regardless), and you are opening a file in the current directory instead of the one actually returned by os.walk. Finally, the input from csv.reader cannot be None; either the line contains fewer fields (in which case you cannot access the second field at all, and trying will get you an IndexError), or it contains an empty string. (More fundamentally, your indentation seems to be broken, but since you are not asking about a syntax error, I'm guessing your actual code doesn't have this problem.)

Here's a quick refactoring to use glob.glob instead of os.walk, assuming that the input CSV files have an empty field where you were looking for None. (It would obviously not be hard to change it to if len(line) < 2: if you wanted to, or cover both conditions.)

import csv
import os
from glob import glob

with open("SUMMARY.csv", 'w', encoding='utf-8') as output_file:
writer = csv.writer(output_file)
writer.writerow(['SCENARIO', 'STATUS'])
for filename in glob(f"{path}/*.csv"):
with open(filename, 'r', encoding='utf-8') as input_file:
value = "FAIL"
reader = csv.reader(input_file)
for lineno, line in enumerate(reader, 1):
if lineno != 2:
continue
if line[1] != "":
value = "PASS"
break
writer.writerow([os.path.basename(filename).split(".")[0], value])

Tangentially perhaps notice also how I avoid having two variables with almost the same names csvfile and csv_file.

The logic writes "FAIL" if there is only one input line, too. (Refactored in response to a comment.)

Reading Data from csv-file in Python

You should fix the way the csv file is produced. Currently is contains:

row_number,text,polarity
"""0"",""Bromwell High cartoon comedy. It ran time programs school life, """"Teachers"""". My 35 years teaching profession lead believe Bromwell High's satire much closer reality """"Teachers"""". The scramble survive financially, insightful students see right pathetic teachers' pomp, pettiness whole situation, remind schools I knew students. When I saw episode student repeatedly tried burn school, I immediately recalled ......... .......... High. A classic line: INSPECTOR: I'm sack one teachers. STUDENT: Welcome Bromwell High. I expect many adults age think Bromwell High far fetched. What pity isn't!"",""1"""

The header line is fine, but the data line is awful. First, it has additional quote as first and last characters, then all quotes are doubled. You must first preprocess the file:

with open("test.csv", 'r') as fd, open("test2.csv", 'w', newline='\r\n') as out:
for line in fd:
if line.startswith('"'):
line = line.strip()[1:-1].replace('""', '"')
print(line, file=out)
else:
_ = out.write(line)

The test2.csv file should now be correct...



Related Topics



Leave a reply



Submit