Read specific columns from a csv file with csv module?
The only way you would be getting the last column from this code is if you don't include your print statement in your for
loop.
This is most likely the end of your code:
for row in reader:
content = list(row[i] for i in included_cols)
print content
You want it to be this:
for row in reader:
content = list(row[i] for i in included_cols)
print content
Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.
Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
so if you wanted to save all of the info in your column Names
into a variable, this is all you need to do:
names = df.Names
It's a great module and I suggest you look into it. If for some reason your print statement was in for
loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!
How to read specific columns in the csv file?
The third column has index 2 so you should be checking if row[2]
is one of '2'
or '5'
. I have done this by defining the set select = {'2', '5'}
and checking if row[2] in select
.
I don't see what you are using header
for but I assume you have more code that processes header
somewhere. If you don't need header
and just want to skip the first line, just do next(reader)
without assigning it to header
but I have kept header
in my code under the assumption you use it later.
We can use time.sleep(2)
from the time
module to help us write a row every 2 seconds.
Below, "in.txt"
is the csv file containing the sample input you provided and "out.txt"
is the file we write to.
Code
import csv
import time
select = {'2', '5'}
with open("in.txt") as f_in, open("out.txt", "w") as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
header = next(reader)
for row in reader:
if row[2] in select:
print(f"Writing {row[2:5]} at {time.time()}")
writer.writerow(row[2:5])
# f_out.flush() may need to be run here
time.sleep(2)
Output
Writing ['2', '552', '525'] at 1650526118.9760585
Writing ['5', '552', '525'] at 1650526120.9763758
"out.txt"
2,552,525
5,552,525
Input
"in.txt"
0,2,1,437,464,385,171,0:44:4,dog.jpg
1,1,3,452,254,444,525,0:56:2,cat.jpg
2,3,2,552,525,785,522,0:52:8,car.jpg
3,8,4,552,525,233,555,0:52:8,car.jpg
4,7,5,552,525,433,522,1:52:8,phone.jpg
5,9,3,552,525,555,522,1:52:8,car.jpg
6,6,6,444,392,111,232,1:43:4,dog.jpg
7,1,1,234,322,191,112,1:43:4,dog.jpg
How to read specific columns from mulitple CSV files, and skip columns that do not exist in some of the files using Python Pandas
You could try to read only the columns names from the csv file and check them with your desired columns as follows:
import csv
desired_col = ["user_id", "event_type"] # I selected only two values
for file_name in csv_files:
csv_cols = next(csv.reader(open(file_name))) # read only the csv columns names
cols = [col for col in desired_col if col in csv_cols]
df = pd.read_csv(file_name, usecols=cols)
Then, each time you read a new csv file, you need first to read the names of columns and then check desired_columns against csv_columns.
Read specific columns with pandas or other python module
An easy way to do this is using the pandas
library like this.
import pandas as pd
fields = ['star_name', 'ra']
df = pd.read_csv('data.csv', skipinitialspace=True, usecols=fields)
# See the keys
print df.keys()
# See content in 'star_name'
print df.star_name
The problem here was the skipinitialspace
which remove the spaces in the header. So ' star_name' becomes 'star_name'
Read CSV file and store specific columns
As Trenton mentions in a comment,
sets
are not ordered. A set is an unordered collection with no duplicate elements. Convert the set to a list and order it
Or you could use an ordered dictionary, which remembers the order you set its keys, and do everything in one loop.
from collections import OrderedDict
data = OrderedDict()
with open('/Users/anku/Documents/dev/Credentials.csv', mode='r', encoding='utf-8-sig') as csv_input:
csv_file = csv.reader(csv_input)
next(csv_file)
#get unique value of system types
for row in csv_file:
row_type = row[0]
row_data = row[1:]
# if the dict doesn't already have the key, set it to an empty list
if not row[0] in data:
data[row_type] = []
data[row_type].append(row_data)
Then you can see the data in the order you added it:
print(data.keys()) # Should print the keys in the order they were added
print(data.items())
Read specific columns in csv using python
def read_csv(file, columns, type_name="Row"):
try:
row_type = namedtuple(type_name, columns)
except ValueError:
row_type = tuple
rows = iter(csv.reader(file))
header = rows.next()
mapping = [header.index(x) for x in columns]
for row in rows:
row = row_type(*[row[i] for i in mapping])
yield row
Example:
>>> import csv
>>> from collections import namedtuple
>>> from StringIO import StringIO
>>> def read_csv(file, columns, type_name="Row"):
... try:
... row_type = namedtuple(type_name, columns)
... except ValueError:
... row_type = tuple
... rows = iter(csv.reader(file))
... header = rows.next()
... mapping = [header.index(x) for x in columns]
... for row in rows:
... row = row_type(*[row[i] for i in mapping])
... yield row
...
>>> testdata = """\
... AAA,bbb,ccc,DDD,eee,FFF,GGG,hhh
... 1,2,3,4,50,3,20,4
... 2,1,3,5,24,2,23,5
... 4,1,3,6,34,1,22,5
... 2,1,3,5,24,2,23,5
... 2,1,3,5,24,2,23,5
... """
>>> testfile = StringIO(testdata)
>>> for row in read_csv(testfile, "AAA GGG DDD".split()):
... print row
...
Row(AAA='1', GGG='20', DDD='4')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='4', GGG='22', DDD='6')
Row(AAA='2', GGG='23', DDD='5')
Row(AAA='2', GGG='23', DDD='5')
How to print a specific a column in a csv file in python?
Reading from a csv file is symetric from writing. The main difference is that as you have skipped the header line, you will use a simple reader and get sequences instead of mappings:
with open('Mail_Txt.csv', 'r', encoding='utf-8', newline='') as csvfile:
reader= csv.reader(csvfile, delimiter=',')
for val in reader:
print(val[0])
Reading specific column from a csv file in python
One way of doing this would be
import csv
#replace the name with your actual csv file name
file_name = "data.csv"
f = open(file_name)
csv_file = csv.reader(f)
second_column = [] #empty list to store second column values
for line in csv_file:
second_column.append(line[1])
print(line[1]) #index 1 for second column
Second_column variable will hold the necessary values. Hope this helps.
Related Topics
How to Calculate the Time Interval Between Two Time Strings
Multiprocessing: How to Share a Dict Among Multiple Processes
Unpacking, Extended Unpacking and Nested Extended Unpacking
Why Is the Id of a Python Class Not Unique When Called Quickly
Multi Platform Portable Python
"Command Not Found" Using Line in Argument to Os.System Using Python
Detect Face Then Autocrop Pictures
How to Use Expect on Windows Without Installing Cygwin
Clang Error: Unknown Argument: '-Mno-Fused-Madd' (Python Package Installation Failure)
Expanding Tuples into Arguments
CSV in Python Adding an Extra Carriage Return, on Windows
Proper Name for Python * Operator
Fast Punctuation Removal with Pandas
Replace Values in a Pandas Series via Dictionary Efficiently
How to Remove the Ansi Escape Sequences from a String in Python
Weird Timezone Issue with Pytz