Read Specific Columns from a CSV File with CSV Module

Read specific columns from a csv file with csv module?

The only way you would be getting the last column from this code is if you don't include your print statement in your for loop.

This is most likely the end of your code:

for row in reader:
content = list(row[i] for i in included_cols)
print content

You want it to be this:

for row in reader:
content = list(row[i] for i in included_cols)
print content

Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.

Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:

names = df.Names

It's a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!

How to read specific columns in the csv file?

The third column has index 2 so you should be checking if row[2] is one of '2' or '5'. I have done this by defining the set select = {'2', '5'} and checking if row[2] in select.

I don't see what you are using header for but I assume you have more code that processes header somewhere. If you don't need header and just want to skip the first line, just do next(reader) without assigning it to header but I have kept header in my code under the assumption you use it later.

We can use time.sleep(2) from the time module to help us write a row every 2 seconds.

Below, "in.txt" is the csv file containing the sample input you provided and "out.txt" is the file we write to.

Code

import csv
import time

select = {'2', '5'}
with open("in.txt") as f_in, open("out.txt", "w") as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
header = next(reader)
for row in reader:
if row[2] in select:
print(f"Writing {row[2:5]} at {time.time()}")
writer.writerow(row[2:5])
# f_out.flush() may need to be run here
time.sleep(2)

Output

Writing ['2', '552', '525'] at 1650526118.9760585
Writing ['5', '552', '525'] at 1650526120.9763758

"out.txt"

2,552,525
5,552,525

Input

"in.txt"

0,2,1,437,464,385,171,0:44:4,dog.jpg
1,1,3,452,254,444,525,0:56:2,cat.jpg
2,3,2,552,525,785,522,0:52:8,car.jpg
3,8,4,552,525,233,555,0:52:8,car.jpg
4,7,5,552,525,433,522,1:52:8,phone.jpg
5,9,3,552,525,555,522,1:52:8,car.jpg
6,6,6,444,392,111,232,1:43:4,dog.jpg
7,1,1,234,322,191,112,1:43:4,dog.jpg

How to read specific columns from mulitple CSV files, and skip columns that do not exist in some of the files using Python Pandas

You could try to read only the columns names from the csv file and check them with your desired columns as follows:

import csv 

desired_col = ["user_id", "event_type"] # I selected only two values

for file_name in csv_files:

csv_cols = next(csv.reader(open(file_name))) # read only the csv columns names

cols = [col for col in desired_col if col in csv_cols]

df = pd.read_csv(file_name, usecols=cols)

Then, each time you read a new csv file, you need first to read the names of columns and then check desired_columns against csv_columns.

Read specific columns with pandas or other python module

An easy way to do this is using the pandas library like this.

import pandas as pd
fields = ['star_name', 'ra']

df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
# See the keys
print df.keys()
# See content in 'star_name'
print df.star_name

The problem here was the skipinitialspace which remove the spaces in the header. So ' star_name' becomes 'star_name'

Read CSV file and store specific columns

As Trenton mentions in a comment,

sets are not ordered. A set is an unordered collection with no duplicate elements. Convert the set to a list and order it

Or you could use an ordered dictionary, which remembers the order you set its keys, and do everything in one loop.

from collections import OrderedDict
data = OrderedDict()
with open('/Users/anku/Documents/dev/Credentials.csv', mode='r', encoding='utf-8-sig') as csv_input:
csv_file = csv.reader(csv_input)
next(csv_file)

#get unique value of system types
for row in csv_file:
row_type = row[0]
row_data = row[1:]
# if the dict doesn't already have the key, set it to an empty list
if not row[0] in data:
data[row_type] = []
data[row_type].append(row_data)

Then you can see the data in the order you added it:

print(data.keys()) # Should print the keys in the order they were added
print(data.items())

Read specific columns in csv using python

def read_csv(file, columns, type_name="Row"):
try:
row_type = namedtuple(type_name, columns)
except ValueError:
row_type = tuple
rows = iter(csv.reader(file))
header = rows.next()
mapping = [header.index(x) for x in columns]
for row in rows:
row = row_type(*[row[i] for i in mapping])
yield row

Example:

>>> import csv
>>> from collections import namedtuple
>>> from StringIO import StringIO
>>> def read_csv(file, columns, type_name="Row"):
... try:
... row_type = namedtuple(type_name, columns)
... except ValueError:
... row_type = tuple
... rows = iter(csv.reader(file))
... header = rows.next()
... mapping = [header.index(x) for x in columns]
... for row in rows:
... row = row_type(*[row[i] for i in mapping])
... yield row
...
>>> testdata = """\
... AAA,bbb,ccc,DDD,eee,FFF,GGG,hhh
... 1,2,3,4,50,3,20,4
... 2,1,3,5,24,2,23,5
... 4,1,3,6,34,1,22,5
... 2,1,3,5,24,2,23,5
... 2,1,3,5,24,2,23,5
... """
>>> testfile = StringIO(testdata)
>>> for row in read_csv(testfile, "AAA GGG DDD".split()):
... print row
...
Row(AAA='1', GGG='20', DDD='4')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='4', GGG='22', DDD='6')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='2', GGG='23', DDD='5')

How to print a specific a column in a csv file in python?

Reading from a csv file is symetric from writing. The main difference is that as you have skipped the header line, you will use a simple reader and get sequences instead of mappings:

with open('Mail_Txt.csv', 'r', encoding='utf-8', newline='') as csvfile:
reader= csv.reader(csvfile, delimiter=',')
for val in reader:
print(val[0])

Reading specific column from a csv file in python

One way of doing this would be

import csv
#replace the name with your actual csv file name
file_name = "data.csv"
f = open(file_name)
csv_file = csv.reader(f)
second_column = [] #empty list to store second column values
for line in csv_file:
second_column.append(line[1])
print(line[1]) #index 1 for second column

Second_column variable will hold the necessary values. Hope this helps.



Related Topics



Leave a reply



Submit