Python CSV.Reader: How to Return to the Top of the File

Python csv.reader: How do I return to the top of the file?

You can seek the file directly. For example:

>>> f = open("csv.txt")
>>> c = csv.reader(f)
>>> for row in c: print row
['1', '2', '3']
['4', '5', '6']
>>> f.seek(0)
>>> for row in c: print row # again
['1', '2', '3']
['4', '5', '6']

How to return to the beginning of a DictReader?

You need to do f.seek(0) instead of DictReader. Then, you can modify your code to be able to access file. This should work:

class CSVReader:
def __init__(self):
self.f = open('myfile.csv')
self.companies = csv.DictReader(f)

def company_at_node(self, node):
for row in self.companies:
if row['nodeid'] == node:
print row
self.f.seek(0)

If you seek(0) after reading a csv file, the line_num attribute does not resets. Why does this happens?

line_num is:

The number of lines read from the source iterator.

This is not the same as the location of the file pointer. line_num is set in C code and cannot be changed directly from Python code. There is no information in the commit message to explain why line_num isn't reset if the file pointer is reset.

If you want to track line numbers per iteration use can use the enumerate built-in function instead, or implement your own counter.

>>> for i, a in enumerate(fileR):
... print('line#', i, str(a))

Best way to access the Nth line of csv file

You can use enumerate to iterate through the list until you find the right row:

for i, row in enumerate(reader):
if i == line_number:
print("This is the line.")
print(row)
break

You can also use itertools.islice which is designed for this type of scenario - accessing a particular slice of an iterable without reading the whole thing into memory. It should be a bit more efficient than looping through the unwanted rows.

def get_csv_line(path, line_number):
with open(path) as f:
return next(itertools.islice(csv.reader(f), line_number, None))

But if your CSV file is small, just read the entire thing into a list, which you can then access with an index in the normal way. This also has the advantage that you can access several different rows in random order without having to reset the csv reader.

with open(path) as f:
my_csv_data = list(csv.reader(f))
print(my_csv_data[line_number])

How do I read different sections of a CSV file when the first 5 lines sometimes has more than 1 columns?

In both cases, I'd generalize your CSV as follows:

  • lines 1-4: special "lines" of text
  • line 5: garbage (discard)
  • lines 6-...: meaningful "rows"

Here's that general approach in code. The parse_special_csv function takes a filename as input and return two lists:

  • the first is a list of "lines" (1-4); they're technically rows, but it's more about how you treat them/what you do with them
  • the second is a list of rows, (lines 6-...)

My thinking being, that once you have the data split out, and file is completely parsed, you'll know what to do with lines and what to do with rows:

import csv

def parse_special_csv(fname):
lines = []
rows = []
with open(fname, 'r', newline='') as f:
reader = csv.reader(f)

# Treat lines 1-4 as just "lines"
for i in range(4):
row = next(reader) # manually advance the reader
lines.append(row[0]) # safe to index for first column, because *you know* these lines have column-like data

# Discard line 5
next(reader)

# Treat the remaining lines as CSV rows
for row in reader:
rows.append(row)

return lines, rows

lines, rows = parse_special_csv('sample1.csv')
print('sample1')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()

lines, rows = parse_special_csv('sample2.csv')
print('sample2')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()

And I get, based on your samples:

sample1
lines:
[
'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
'line two content',
'line 3 content.',
'line 4 content.'
]
rows:
[
['1', 'TEAM', 'Bob Jones', 'Sar a require transport', 'A', '', '18:34:04hrs on 17/10/21'],
['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]

sample2
lines:
[
'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
'line two content',
'line 3 content.',
'line 4 content.'
]
rows:
[
['1', 'TEAM', 'Bob Jones', 'Sar a require transport', 'A', '', '18:34:04hrs on 17/10/21'],
['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]

Also, next(reader) may look a little foreign, but it's the correct way to manually advance the CSV reader^1 (and any iterator in Python, in general^2).

How to read a CSV file in reverse order in Python?

Pretty much the same way as for a text file: read the whole thing into a list and then go backwards:

import csv
with open('test.csv', 'r') as textfile:
for row in reversed(list(csv.reader(textfile))):
print ', '.join(row)

If you want to get fancy, you could write a lot of code that reads blocks starting at the end of the file and working backwards, emitting a line at a time, and then feed that to csv.reader, but that will only work with a file that can be seeked, i.e. disk files but not standard input.


Some of us have files that do not fit into memory, could anyone come with a solution that does not require storing the entire file in memory?

That's a bit trickier. Luckily, all csv.reader expects is an iterator-like object that returns a string (line) per call to next(). So we grab the technique Darius Bacon presented in "Most efficient way to search the last x lines of a file in python" to read the lines of a file backwards, without having to pull in the whole file:

import os

def reversed_lines(file):
"Generate the lines of file in reverse order."
part = ''
for block in reversed_blocks(file):
for c in reversed(block):
if c == '\n' and part:
yield part[::-1]
part = ''
part += c
if part: yield part[::-1]

def reversed_blocks(file, blocksize=4096):
"Generate blocks of file's contents in reverse order."
file.seek(0, os.SEEK_END)
here = file.tell()
while 0 < here:
delta = min(blocksize, here)
here -= delta
file.seek(here, os.SEEK_SET)
yield file.read(delta)

and feed reversed_lines into the code to reverse the lines before they get to csv.reader, removing the need for reversed and list:

import csv
with open('test.csv', 'r') as textfile:
for row in csv.reader(reversed_lines(textfile)):
print ', '.join(row)

There is a more Pythonic solution possible, which doesn't require a character-by-character reversal of the block in memory (hint: just get a list of indices where there are line ends in the block, reverse it, and use it to slice the block), and uses chain out of itertools to glue the line clusters from successive blocks together, but that's left as an exercise for the reader.


It's worth noting that the reversed_lines() idiom above only works if the columns in the CSV file don't contain newlines.

Aargh! There's always something. Luckily, it's not too bad to fix this:

def reversed_lines(file):
"Generate the lines of file in reverse order."
part = ''
quoting = False
for block in reversed_blocks(file):
for c in reversed(block):
if c == '"':
quoting = not quoting
elif c == '\n' and part and not quoting:
yield part[::-1]
part = ''
part += c
if part: yield part[::-1]

Of course, you'll need to change the quote character if your CSV dialect doesn't use ".

only reading first N rows of csv file with csv reader in python

The shortest and most idiomatic way is probably to use itertools.islice:

import itertools
...
for row in itertools.islice(reader1, 200):
...

Top 3 values in a CSV column in Python

If the value of the status column is not a factor to choose the top 3 customers, you can create a dictionary with the number of orders.

Code:

import csv

with open("orders.csv") as csvfile:
reader = csv.DictReader(csvfile)
orders_count = {}
for line in reader:
orders_count[line["customer_id"]] = orders_count.get(line["customer_id"], 0) + 1
customers = sorted(orders_count.items(), key=lambda customer:customer[1], reverse=True)
print(customers[:3])

Output:

[('10', 3), ('11', 3), ('13', 3)]

orders.csv:

order_id,customer_id,status
1,10,Successful
2,11,Successful
3,11,Successful
4,10,Waiting
5,12,Waiting
6,10,Successful
7,11,Wairing
8,13,Successful
9,13,Waiting
10,13,Successful

Explanation:

  • Read the csv file content using DictReader method. Details can be found in the official documentation
  • Created a dictionary orders_count to count the number of orders of each customer. I assume each customer has a unique customer id.
  • Sorted the orders_count by value to get the sorted list of customers by total number of orders. reverse=True used for descending sort.
  • Finally, printed the top 3 customers with most number of orders.


Related Topics



Leave a reply



Submit