Reading data from a CSV file in Python
Here is how I've got 2nd and 3rd columns:
import csv
path = 'c:\\temp\\'
file=open( path +"xyz.CSV", "r")
reader = csv.reader(file)
for line in reader:
t=line[1],line[2]
print(t)
Here is the results:
('col2', 'col3')
('empId1', '241682-27638-USD-CIGGNT ')
('empId2', '241682-27638-USD-OCGGINT ')
('empId3', '241942-37190-USD-GGDIV ')
('empId4', '241942-37190-USD-CHYOF ')
('empId5', '241942-37190-USD-EQPL ')
('empId6', '241942-37190-USD-INT ')
('empId7', '242066-15343-USD-CYJOF ')
('empId8', '242066-15343-USD-CYJOF ')
('empId9', '242066-15343-USD-CYJOF ')
('empId10', '241942-37190-USD-GGDIV ')
How do I read and write CSV files with Python?
Here are some minimal complete examples how to read CSV files and how to write CSV files with Python.
Pure Python:
import csv
# Define data
data = [
(1, "A towel,", 1.0),
(42, " it says, ", 2.0),
(1337, "is about the most ", -1),
(0, "massively useful thing ", 123),
(-2, "an interstellar hitchhiker can have.", 3),
]
# Write CSV file
with open("test.csv", "wt") as fp:
writer = csv.writer(fp, delimiter=",")
# writer.writerow(["your", "header", "foo"]) # write header
writer.writerows(data)
# Read CSV file
with open("test.csv") as fp:
reader = csv.reader(fp, delimiter=",", quotechar='"')
# next(reader, None) # skip the headers
data_read = [row for row in reader]
print(data_read)
After that, the contents of data_read
are
[['1', 'A towel,', '1.0'],
['42', ' it says, ', '2.0'],
['1337', 'is about the most ', '-1'],
['0', 'massively useful thing ', '123'],
['-2', 'an interstellar hitchhiker can have.', '3']]
Please note that CSV reads only strings. You need to convert to the column types manually.
A Python 2+3 version was here before (link), but Python 2 support is dropped. Removing the Python 2 stuff massively simplified this answer.
Related
- How do I write data into csv format as string (not file)?
- How can I use io.StringIO() with the csv module?: This is interesting if you want to serve a CSV on-the-fly with Flask, without actually storing the CSV on the server.
mpu
Have a look at my utility package mpu
for a super simple and easy to remember one:
import mpu.io
data = mpu.io.read('example.csv', delimiter=',', quotechar='"', skiprows=None)
mpu.io.write('example.csv', data)
Pandas
import pandas as pd
# Read the CSV into a pandas data frame (df)
# With a df you can do many things
# most important: visualize data with Seaborn
df = pd.read_csv('myfile.csv', sep=',')
print(df)
# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]
# or export it as a list of dicts
dicts = df.to_dict().values()
See read_csv
docs for more information. Please note that pandas automatically infers if there is a header line, but you can set it manually, too.
If you haven't heard of Seaborn, I recommend having a look at it.
Other
Reading CSV files is supported by a bunch of other libraries, for example:
dask.dataframe.read_csv
spark.read.csv
Created CSV file
1,"A towel,",1.0
42," it says, ",2.0
1337,is about the most ,-1
0,massively useful thing ,123
-2,an interstellar hitchhiker can have.,3
Common file endings
.csv
Working with the data
After reading the CSV file to a list of tuples / dicts or a Pandas dataframe, it is simply working with this kind of data. Nothing CSV specific.
Alternatives
- JSON: Nice for writing human-readable data; VERY commonly used (read & write)
- CSV: Super simple format (read & write)
- YAML: Nice to read, similar to JSON (read & write)
- pickle: A Python serialization format (read & write)
- MessagePack (Python package): More compact representation (read & write)
- HDF5 (Python package): Nice for matrices (read & write)
- XML: exists too *sigh* (read & write)
For your application, the following might be important:
- Support by other programming languages
- Reading / writing performance
- Compactness (file size)
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
Reading rows from a CSV file in Python
You could do something like this:
with open("data1.txt") as f:
lis = [line.split() for line in f] # create a list of lists
for i, x in enumerate(lis): #print the list items
print "line{0} = {1}".format(i, x)
# output
line0 = ['Year:', 'Dec:', 'Jan:']
line1 = ['1', '50', '60']
line2 = ['2', '25', '50']
line3 = ['3', '30', '30']
line4 = ['4', '40', '20']
line5 = ['5', '10', '10']
or :
with open("data1.txt") as f:
for i, line in enumerate(f):
print "line {0} = {1}".format(i, line.split())
# output
line 0 = ['Year:', 'Dec:', 'Jan:']
line 1 = ['1', '50', '60']
line 2 = ['2', '25', '50']
line 3 = ['3', '30', '30']
line 4 = ['4', '40', '20']
line 5 = ['5', '10', '10']
Edit:
with open('data1.txt') as f:
print "{0}".format(f.readline().split())
for x in f:
x = x.split()
print "{0} = {1}".format(x[0],sum(map(int, x[1:])))
# output
['Year:', 'Dec:', 'Jan:']
1 = 110
2 = 75
3 = 60
4 = 60
5 = 20
Reading data from a CSV file yields TypeError
Try this:
import numpy as np
text = open("ucov_users.csv", "r")
text = ''.join([i for i in text]) \
.replace(" ", "\n")
x = open("ucov_users.csv", "w")
x.writelines(text)
x.close()
uncov_users = np.genfromtxt('ucov_users.csv', delimiter=',')
for i,j in uncov_users:
ux_coor = i
uy_coor = j
print(ux_coor,uy_coor)
Reading csv files with new lines
This method will print in the format you requested and number your rows as well.
import pandas as pd
data = pd.read_csv('test.csv', header = None, names = ['Team Name', 'Number', 'Score'])
print(data)
Output:
Team Name Number Score
0 Team One 23 NaN
1 Team Two 102 NaN
2 Team Three 44 NaN
3 Team Four 40 NaN
Python code to read a csv file and create a master csv file
Your attempt has three problems; it uses os.walk
which traverses subdirectories (perhaps this is not a problem because your folder does not have subdirectories, but you should use the correct function for your use case regardless), and you are opening a file in the current directory instead of the one actually returned by os.walk
. Finally, the input from csv.reader
cannot be None
; either the line contains fewer fields (in which case you cannot access the second field at all, and trying will get you an IndexError
), or it contains an empty string. (More fundamentally, your indentation seems to be broken, but since you are not asking about a syntax error, I'm guessing your actual code doesn't have this problem.)
Here's a quick refactoring to use glob.glob
instead of os.walk
, assuming that the input CSV files have an empty field where you were looking for None
. (It would obviously not be hard to change it to if len(line) < 2:
if you wanted to, or cover both conditions.)
import csv
import os
from glob import glob
with open("SUMMARY.csv", 'w', encoding='utf-8') as output_file:
writer = csv.writer(output_file)
writer.writerow(['SCENARIO', 'STATUS'])
for filename in glob(f"{path}/*.csv"):
with open(filename, 'r', encoding='utf-8') as input_file:
value = "FAIL"
reader = csv.reader(input_file)
for lineno, line in enumerate(reader, 1):
if lineno != 2:
continue
if line[1] != "":
value = "PASS"
break
writer.writerow([os.path.basename(filename).split(".")[0], value])
Tangentially perhaps notice also how I avoid having two variables with almost the same names csvfile
and csv_file
.
The logic writes "FAIL"
if there is only one input line, too. (Refactored in response to a comment.)
Reading Data from csv-file in Python
You should fix the way the csv file is produced. Currently is contains:
row_number,text,polarity
"""0"",""Bromwell High cartoon comedy. It ran time programs school life, """"Teachers"""". My 35 years teaching profession lead believe Bromwell High's satire much closer reality """"Teachers"""". The scramble survive financially, insightful students see right pathetic teachers' pomp, pettiness whole situation, remind schools I knew students. When I saw episode student repeatedly tried burn school, I immediately recalled ......... .......... High. A classic line: INSPECTOR: I'm sack one teachers. STUDENT: Welcome Bromwell High. I expect many adults age think Bromwell High far fetched. What pity isn't!"",""1"""
The header line is fine, but the data line is awful. First, it has additional quote as first and last characters, then all quotes are doubled. You must first preprocess the file:
with open("test.csv", 'r') as fd, open("test2.csv", 'w', newline='\r\n') as out:
for line in fd:
if line.startswith('"'):
line = line.strip()[1:-1].replace('""', '"')
print(line, file=out)
else:
_ = out.write(line)
The test2.csv
file should now be correct...
Related Topics
Django-Registration & Django-Profile, Using Your Own Custom Form
Attributeerror: 'List' Object Has No Attribute 'Click' - Selenium Webdriver
Integer Overflow in Numpy Arrays
Redirect While Passing Arguments
Matplotlib - Add Colorbar to a Sequence of Line Plots
How to Load/Edit/Run/Save Text Files (.Py) into an Ipython Notebook Cell
How to Get Element-Wise Matrix Multiplication (Hadamard Product) in Numpy
How to Isolate Everything Inside of a Contour, Scale It, and Test the Similarity to an Image
Python: Urllib2 How to Send Cookie with Urlopen Request
Iterate Over All Combinations of Values in Multiple Lists in Python
Using Django Database Layer Outside of Django
Python: Sorting Items from Top Left to Bottom Right with Opencv
Python: Read Lines from Compressed Text Files
Why Doesn't 2._Add_(3) Work in Python
Python 3 Replacement for Deprecated Compiler.Ast Flatten Function