Python csv.reader: How do I return to the top of the file?
You can seek the file directly. For example:
>>> f = open("csv.txt")
>>> c = csv.reader(f)
>>> for row in c: print row
['1', '2', '3']
['4', '5', '6']
>>> f.seek(0)
>>> for row in c: print row # again
['1', '2', '3']
['4', '5', '6']
How to return to the beginning of a DictReader?
You need to do f.seek(0) instead of DictReader. Then, you can modify your code to be able to access file. This should work:
class CSVReader:
def __init__(self):
self.f = open('myfile.csv')
self.companies = csv.DictReader(f)
def company_at_node(self, node):
for row in self.companies:
if row['nodeid'] == node:
print row
self.f.seek(0)
If you seek(0) after reading a csv file, the line_num attribute does not resets. Why does this happens?
line_num is:
The number of lines read from the source iterator.
This is not the same as the location of the file pointer. line_num
is set in C
code and cannot be changed directly from Python code. There is no information in the commit message to explain why line_num
isn't reset if the file pointer is reset.
If you want to track line numbers per iteration use can use the enumerate built-in function instead, or implement your own counter.
>>> for i, a in enumerate(fileR):
... print('line#', i, str(a))
Best way to access the Nth line of csv file
You can use enumerate
to iterate through the list until you find the right row:
for i, row in enumerate(reader):
if i == line_number:
print("This is the line.")
print(row)
break
You can also use itertools.islice
which is designed for this type of scenario - accessing a particular slice of an iterable without reading the whole thing into memory. It should be a bit more efficient than looping through the unwanted rows.
def get_csv_line(path, line_number):
with open(path) as f:
return next(itertools.islice(csv.reader(f), line_number, None))
But if your CSV file is small, just read the entire thing into a list, which you can then access with an index in the normal way. This also has the advantage that you can access several different rows in random order without having to reset the csv reader.
with open(path) as f:
my_csv_data = list(csv.reader(f))
print(my_csv_data[line_number])
How do I read different sections of a CSV file when the first 5 lines sometimes has more than 1 columns?
In both cases, I'd generalize your CSV as follows:
- lines 1-4: special "lines" of text
- line 5: garbage (discard)
- lines 6-...: meaningful "rows"
Here's that general approach in code. The parse_special_csv
function takes a filename as input and return two lists:
- the first is a list of "lines" (1-4); they're technically rows, but it's more about how you treat them/what you do with them
- the second is a list of rows, (lines 6-...)
My thinking being, that once you have the data split out, and file is completely parsed, you'll know what to do with lines
and what to do with rows
:
import csv
def parse_special_csv(fname):
lines = []
rows = []
with open(fname, 'r', newline='') as f:
reader = csv.reader(f)
# Treat lines 1-4 as just "lines"
for i in range(4):
row = next(reader) # manually advance the reader
lines.append(row[0]) # safe to index for first column, because *you know* these lines have column-like data
# Discard line 5
next(reader)
# Treat the remaining lines as CSV rows
for row in reader:
rows.append(row)
return lines, rows
lines, rows = parse_special_csv('sample1.csv')
print('sample1')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()
lines, rows = parse_special_csv('sample2.csv')
print('sample2')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()
And I get, based on your samples:
sample1
lines:
[
'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
'line two content',
'line 3 content.',
'line 4 content.'
]
rows:
[
['1', 'TEAM', 'Bob Jones', 'Sar a require transport', 'A', '', '18:34:04hrs on 17/10/21'],
['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]
sample2
lines:
[
'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
'line two content',
'line 3 content.',
'line 4 content.'
]
rows:
[
['1', 'TEAM', 'Bob Jones', 'Sar a require transport', 'A', '', '18:34:04hrs on 17/10/21'],
['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]
Also, next(reader)
may look a little foreign, but it's the correct way to manually advance the CSV reader^1 (and any iterator in Python, in general^2).
How to read a CSV file in reverse order in Python?
Pretty much the same way as for a text file: read the whole thing into a list and then go backwards:
import csv
with open('test.csv', 'r') as textfile:
for row in reversed(list(csv.reader(textfile))):
print ', '.join(row)
If you want to get fancy, you could write a lot of code that reads blocks starting at the end of the file and working backwards, emitting a line at a time, and then feed that to csv.reader
, but that will only work with a file that can be seeked, i.e. disk files but not standard input.
Some of us have files that do not fit into memory, could anyone come with a solution that does not require storing the entire file in memory?
That's a bit trickier. Luckily, all csv.reader
expects is an iterator-like object that returns a string (line) per call to next()
. So we grab the technique Darius Bacon presented in "Most efficient way to search the last x lines of a file in python" to read the lines of a file backwards, without having to pull in the whole file:
import os
def reversed_lines(file):
"Generate the lines of file in reverse order."
part = ''
for block in reversed_blocks(file):
for c in reversed(block):
if c == '\n' and part:
yield part[::-1]
part = ''
part += c
if part: yield part[::-1]
def reversed_blocks(file, blocksize=4096):
"Generate blocks of file's contents in reverse order."
file.seek(0, os.SEEK_END)
here = file.tell()
while 0 < here:
delta = min(blocksize, here)
here -= delta
file.seek(here, os.SEEK_SET)
yield file.read(delta)
and feed reversed_lines
into the code to reverse the lines before they get to csv.reader
, removing the need for reversed
and list
:
import csv
with open('test.csv', 'r') as textfile:
for row in csv.reader(reversed_lines(textfile)):
print ', '.join(row)
There is a more Pythonic solution possible, which doesn't require a character-by-character reversal of the block in memory (hint: just get a list of indices where there are line ends in the block, reverse it, and use it to slice the block), and uses chain
out of itertools
to glue the line clusters from successive blocks together, but that's left as an exercise for the reader.
It's worth noting that the reversed_lines() idiom above only works if the columns in the CSV file don't contain newlines.
Aargh! There's always something. Luckily, it's not too bad to fix this:
def reversed_lines(file):
"Generate the lines of file in reverse order."
part = ''
quoting = False
for block in reversed_blocks(file):
for c in reversed(block):
if c == '"':
quoting = not quoting
elif c == '\n' and part and not quoting:
yield part[::-1]
part = ''
part += c
if part: yield part[::-1]
Of course, you'll need to change the quote character if your CSV dialect doesn't use "
.
only reading first N rows of csv file with csv reader in python
The shortest and most idiomatic way is probably to use itertools.islice
:
import itertools
...
for row in itertools.islice(reader1, 200):
...
Top 3 values in a CSV column in Python
If the value of the status
column is not a factor to choose the top 3 customers, you can create a dictionary with the number of orders.
Code:
import csv
with open("orders.csv") as csvfile:
reader = csv.DictReader(csvfile)
orders_count = {}
for line in reader:
orders_count[line["customer_id"]] = orders_count.get(line["customer_id"], 0) + 1
customers = sorted(orders_count.items(), key=lambda customer:customer[1], reverse=True)
print(customers[:3])
Output:
[('10', 3), ('11', 3), ('13', 3)]
orders.csv
:
order_id,customer_id,status
1,10,Successful
2,11,Successful
3,11,Successful
4,10,Waiting
5,12,Waiting
6,10,Successful
7,11,Wairing
8,13,Successful
9,13,Waiting
10,13,Successful
Explanation:
- Read the csv file content using
DictReader
method. Details can be found in the official documentation - Created a dictionary
orders_count
to count the number of orders of each customer. I assume each customer has a unique customer id. - Sorted the
orders_count
by value to get the sorted list of customers by total number of orders.reverse=True
used for descending sort. - Finally, printed the top 3 customers with most number of orders.
Related Topics
Directing Print Output to a .Txt File
Command Executed with Paramiko Does Not Produce Any Output
Python: My Function Returns "None" After It Does What I Want It To
String Comparison Doesn't Seem to Work for Lines Read from a File
Check for Identical Rows in Different Numpy Arrays
Pygame 2 Dimensional Movement of an Enemy Towards the Player, How to Calculate X and Y Velocity
Query Mongodb on Month, Day, Year... of a Datetime
Rect Collision with List of Rects
Python Re.Sub Back Reference Not Back Referencing
Python: Use MySQLdb to Import a MySQL Table as a Dictionary
Create a Main Loop with Tkinter
Text Box with Line Wrapping in Matplotlib
Typeerror: 'List' Object Is Not Callable While Trying to Access a List