Only Reading First N Rows of CSV File With CSV Reader in Python

only reading first N rows of csv file with csv reader in python

The shortest and most idiomatic way is probably to use itertools.islice:

import itertools
...
for row in itertools.islice(reader1, 200):
...

Python Pandas: How to read only first n rows of CSV files in?

If you only want to read the first 999,999 (non-header) rows:

read_csv(..., nrows=999999)

If you only want to read rows 1,000,000 ... 1,999,999

read_csv(..., skiprows=1000000, nrows=999999)

nrows : int, default None Number of rows of file to read. Useful for
reading pieces of large files*

skiprows : list-like or integer
Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file

and for large files, you'll probably also want to use chunksize:

chunksize : int, default None
Return TextFileReader object for iteration

pandas.io.parsers.read_csv documentation

Read from a .csv the first n rows and store the column in to a list

You can use pandas to do this:

import pandas as pd

df = pd.read_csv("test.csv", nrows=2000, header=None) #header = None avoids the first row to be read as column names
df_list = df.values.tolist()

How do I read different sections of a CSV file when the first 5 lines sometimes has more than 1 columns?

In both cases, I'd generalize your CSV as follows:

  • lines 1-4: special "lines" of text
  • line 5: garbage (discard)
  • lines 6-...: meaningful "rows"

Here's that general approach in code. The parse_special_csv function takes a filename as input and return two lists:

  • the first is a list of "lines" (1-4); they're technically rows, but it's more about how you treat them/what you do with them
  • the second is a list of rows, (lines 6-...)

My thinking being, that once you have the data split out, and file is completely parsed, you'll know what to do with lines and what to do with rows:

import csv

def parse_special_csv(fname):
lines = []
rows = []
with open(fname, 'r', newline='') as f:
reader = csv.reader(f)

# Treat lines 1-4 as just "lines"
for i in range(4):
row = next(reader) # manually advance the reader
lines.append(row[0]) # safe to index for first column, because *you know* these lines have column-like data

# Discard line 5
next(reader)

# Treat the remaining lines as CSV rows
for row in reader:
rows.append(row)

return lines, rows

lines, rows = parse_special_csv('sample1.csv')
print('sample1')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()

lines, rows = parse_special_csv('sample2.csv')
print('sample2')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()

And I get, based on your samples:

sample1
lines:
[
'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
'line two content',
'line 3 content.',
'line 4 content.'
]
rows:
[
['1', 'TEAM', 'Bob Jones', 'Sar a require transport', 'A', '', '18:34:04hrs on 17/10/21'],
['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]

sample2
lines:
[
'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
'line two content',
'line 3 content.',
'line 4 content.'
]
rows:
[
['1', 'TEAM', 'Bob Jones', 'Sar a require transport', 'A', '', '18:34:04hrs on 17/10/21'],
['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]

Also, next(reader) may look a little foreign, but it's the correct way to manually advance the CSV reader^1 (and any iterator in Python, in general^2).

How to read first 100 rows of a csv file in python appending comma, serial number and full stop marks?

Assuming that the crucial fields are separated by multiple spaces:

import re

with open('test.csv', 'r') as f:
next(f)
pat = re.compile(r'\s{2,}')

for i, row in enumerate(f, 1):
print('{}. {}.'.format(i, pat.sub(', ', row.strip(), 1)))
if i == 100: break

Regex \s{2,} details:

  • \s - whitespace character
  • {2,} - {n,m} where n >= 0 and m >= n. Repeats the previous item between n and m times. Greedy, so repeating m times is tried before reducing the repetition to n times. Ex. a{2,4} matches aaaa, aaa or aa

Sample output:

1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.

How to read desired rows from large CSV files in python

You can use islice from itertools

So here i have as sample csv file

    X      Y
0 21 test3
1 8 test1
2 75 test1
3 26 test2
4 98 test3
5 63 test3
6 65 test3
7 39 test3
8 74 test1
9 26 test2

And suppose I want only rows 3 and 4

>>> from itertools import islice
>>> with open('test.csv') as f:
... rows = csv.reader(f)
... rowiter = islice(rows, 3, 5)
... for item in rowiter:
... print(item)

gives me the following output

['2', '75', 'test1']
['3', '26', 'test2']

Update

input_file = 'trusted.csv'
start = 10
stop = start + 10
users = []

with open(input_file, encoding='UTF-8') as f:
rows = csv.reader(f,delimiter=",",lineterminator="\n")
rowiter = islice(rows, start, stop)
for row in rowiter :
user = {}
user['username'] = row[0]
user['id'] = int(row[1])
user['access_hash'] = int(row[2])
user['name'] = row[3]
users.append(user)

How to read first 1000 entries in a csv file

As you've discovered a csv.reader does not support slicing. You can use itertools.islice() to accomplish this with objects that are iterable. E.g.,

import itertools

entries = []
with open('mnist_train.csv', 'r') as f:
mycsv = csv.reader(f)
for row in itertools.islice(mycsv, 1000):
entries.append(row)


Related Topics



Leave a reply



Submit