only reading first N rows of csv file with csv reader in python
The shortest and most idiomatic way is probably to use itertools.islice
:
import itertools
...
for row in itertools.islice(reader1, 200):
...
Python Pandas: How to read only first n rows of CSV files in?
If you only want to read the first 999,999 (non-header) rows:
read_csv(..., nrows=999999)
If you only want to read rows 1,000,000 ... 1,999,999
read_csv(..., skiprows=1000000, nrows=999999)
nrows : int, default None Number of rows of file to read. Useful for
reading pieces of large files*
skiprows : list-like or integer
Row numbers to skip (0-indexed) or number of rows to skip (int) at the start of the file
and for large files, you'll probably also want to use chunksize:
chunksize : int, default None
Return TextFileReader object for iteration
pandas.io.parsers.read_csv documentation
Read from a .csv the first n rows and store the column in to a list
You can use pandas
to do this:
import pandas as pd
df = pd.read_csv("test.csv", nrows=2000, header=None) #header = None avoids the first row to be read as column names
df_list = df.values.tolist()
How do I read different sections of a CSV file when the first 5 lines sometimes has more than 1 columns?
In both cases, I'd generalize your CSV as follows:
- lines 1-4: special "lines" of text
- line 5: garbage (discard)
- lines 6-...: meaningful "rows"
Here's that general approach in code. The parse_special_csv
function takes a filename as input and return two lists:
- the first is a list of "lines" (1-4); they're technically rows, but it's more about how you treat them/what you do with them
- the second is a list of rows, (lines 6-...)
My thinking being, that once you have the data split out, and file is completely parsed, you'll know what to do with lines
and what to do with rows
:
import csv
def parse_special_csv(fname):
lines = []
rows = []
with open(fname, 'r', newline='') as f:
reader = csv.reader(f)
# Treat lines 1-4 as just "lines"
for i in range(4):
row = next(reader) # manually advance the reader
lines.append(row[0]) # safe to index for first column, because *you know* these lines have column-like data
# Discard line 5
next(reader)
# Treat the remaining lines as CSV rows
for row in reader:
rows.append(row)
return lines, rows
lines, rows = parse_special_csv('sample1.csv')
print('sample1')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()
lines, rows = parse_special_csv('sample2.csv')
print('sample2')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()
And I get, based on your samples:
sample1
lines:
[
'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
'line two content',
'line 3 content.',
'line 4 content.'
]
rows:
[
['1', 'TEAM', 'Bob Jones', 'Sar a require transport', 'A', '', '18:34:04hrs on 17/10/21'],
['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]
sample2
lines:
[
'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
'line two content',
'line 3 content.',
'line 4 content.'
]
rows:
[
['1', 'TEAM', 'Bob Jones', 'Sar a require transport', 'A', '', '18:34:04hrs on 17/10/21'],
['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]
Also, next(reader)
may look a little foreign, but it's the correct way to manually advance the CSV reader^1 (and any iterator in Python, in general^2).
How to read first 100 rows of a csv file in python appending comma, serial number and full stop marks?
Assuming that the crucial fields are separated by multiple spaces:
import re
with open('test.csv', 'r') as f:
next(f)
pat = re.compile(r'\s{2,}')
for i, row in enumerate(f, 1):
print('{}. {}.'.format(i, pat.sub(', ', row.strip(), 1)))
if i == 100: break
Regex \s{2,}
details:
\s
- whitespace character{2,}
- {n,m} where n >= 0 and m >= n. Repeats the previous item betweenn
andm
times. Greedy, so repeatingm
times is tried before reducing the repetition ton
times. Ex.a{2,4}
matchesaaaa
,aaa
oraa
Sample output:
1. what is your name, i am maxi.
2. are you happy, yes i am.
3. what you do, i am a student.
How to read desired rows from large CSV files in python
You can use islice
from itertools
So here i have as sample csv file
X Y
0 21 test3
1 8 test1
2 75 test1
3 26 test2
4 98 test3
5 63 test3
6 65 test3
7 39 test3
8 74 test1
9 26 test2
And suppose I want only rows 3 and 4
>>> from itertools import islice
>>> with open('test.csv') as f:
... rows = csv.reader(f)
... rowiter = islice(rows, 3, 5)
... for item in rowiter:
... print(item)
gives me the following output
['2', '75', 'test1']
['3', '26', 'test2']
Update
input_file = 'trusted.csv'
start = 10
stop = start + 10
users = []
with open(input_file, encoding='UTF-8') as f:
rows = csv.reader(f,delimiter=",",lineterminator="\n")
rowiter = islice(rows, start, stop)
for row in rowiter :
user = {}
user['username'] = row[0]
user['id'] = int(row[1])
user['access_hash'] = int(row[2])
user['name'] = row[3]
users.append(user)
How to read first 1000 entries in a csv file
As you've discovered a csv.reader
does not support slicing. You can use itertools.islice() to accomplish this with objects that are iterable. E.g.,
import itertools
entries = []
with open('mnist_train.csv', 'r') as f:
mycsv = csv.reader(f)
for row in itertools.islice(mycsv, 1000):
entries.append(row)
Related Topics
Check If Values of Multiple Columns Are the Same (Python)
Python: How to Match Nested Parentheses With Regex
Accuracy Score Valueerror: Can't Handle Mix of Binary and Continuous Target
How to Merge Elements in List in Python With Condition
Compare a Column Between 2 CSV Files and Write Differences Using Python
Pandas Update and Add Rows One Dataframe With Key Column in Another Dataframe
How to Install Pypdf2 Module Using Windows
How to Make a Discord Bot Leave a Server from a Command in Another Server
How to Extract the Entire Row and Columns When Condition Met in Numpy Array
How to Write Python Array (Data = []) to Excel
Pythone :How to Use Dataframe Output in Email Body as Text
Python | Count Number of False Statements in 3 Rows
Hiding Raw_Input() Password Input
Selenium - Iterating Through Groups of Elements - Python
Splitting Strings into Numbers (Python)
Python Tkinter How to Update a Text Widget in a for Loop