Read Large Data from CSV File in PHP

Read large data from csv file in php

An excellent method to deal with large files is located at: https://stackoverflow.com/a/5249971/797620

This method is used at http://www.cuddlycactus.com/knownpasswords/ (page has been taken down) to search through 170+ million passwords in just a few milliseconds.

How to parse a csv file that contains 15 million lines of data in php

Iterating over a large dataset (file lines, etc.) and pushing into array it increases memory usage and this is directly proportional to the number of items handling.
So the bigger file, the bigger memory usage - in this case.
If it's desired a function to formatting the CSV data before processing it, backing it on the of generators sounds like a great idea.

Reading the PHP doc it fits very well for your case (emphasis mine):

A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which
may cause you to exceed a memory limit, or require a considerable
amount of processing time to generate.

Something like this:


function csv_read($filename, $delimeter=',')
{
$header = [];
$row = 0;
# tip: dont do that every time calling csv_read(), pass handle as param instead ;)
$handle = fopen($filename, "r");

if ($handle === false) {
return false;
}

while (($data = fgetcsv($handle, 0, $delimeter)) !== false) {

if (0 == $row) {
$header = $data;
} else {
# on demand usage
yield array_combine($header, $data);
}

$row++;
}
fclose($handle);
}

And then:

$generator = csv_read('rdu-weather-history.csv', ';');

foreach ($generator as $item) {
do_something($item);
}

The major difference here is:
you do not get (from memory) and consume all data at once. You get items on demand (like a stream) and process it instead, one item at time. It has huge impact on memory usage.


P.S.: The CSV file above has taken from: https://data.townofcary.org/api/v2/catalog/datasets/rdu-weather-history/exports/csv

How can I process a large CSV file line by line?

Save the file somewhere and then process it in chunks like this:

<?php
$filePath = 'big.csv';

//How many rows to process in each batch
$limit = 100;

$fileHandle = fopen($filePath, "r");
if ($fileHandle === FALSE)
{
die('Error opening '.$filePath);
}

//Set up a variable to hold our current position in the file
$offset = 0;
while(!feof($fileHandle))
{
//Go to where we were when we ended the last batch
fseek($fileHandle, $offset);

$i = 0;
while (($currRow = fgetcsv($fileHandle)) !== FALSE)
{
$i++;

//Do something with the current row
print implode(', ', $currRow)."\n";

//If we hit our limit or are at the end of the file
if($i >= $limit)
{
//Update our current position in the file
$offset = ftell($fileHandle);

//Break out of the row processing loop
break;
}
}
}

//Close the file
fclose($fileHandle);

reading large csv file contain comma with php

So I waited the minute to download the file, grabbed the first 5 records, and used a copy/paste of the fgetcsv example in the PHP manual.

First 5 records - https://termbin.com/23ti - saved as "sm_file.csv"

<?php

if (($handle = fopen("sm_file.csv", "r")) !== FALSE) {
$data=array();
$num=0;
while (($data[] = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num++;
}
fclose($handle);
print_r($data);
}

?>

[0] => Array
(
[0] => از تاريخ وصل 01/07/1397 - با برنامه
[1] => تاريخ گزارش: 29/09/1397
[2] => شماره گزارش: (3-5)
[3] => صفحه 1
[4] => گزارش قطع و وصل فيدرهاي فشار متوسط (نمونه 3)
[5] => ملاحظات
[6] => شرايط جوي
[7] => عملكرد ريكلوزر
[8] => رله عامل
[9] => خاموشي (MWh)
[10] => بار فيدر (A)
[11] => مدت قطع
[12] => زمان وصل
[13] => تاريخ وصل
[14] => زمان قطع
[15] => تاريخ قطع
[16] => نوع اشكال بوجود آمده
[17] => فيدر فشار متوسط
[18] => پست فوق توزيع
[19] => شماره پرونده
[20] => رديف
[21] => ناحيه اسالم
[22] =>
[23] => آفتابي
[24] => ندارد
[25] => ندارد
[26] => 0.21
[27] => 3
[28] => 132
[29] => 11:30
[30] => 1397/07/04
[31] => 09:18
[32] => 1397/07/04
[33] => جهت كار در حريم شبكه
[34] => گيسوم
[35] => اسا لم
[36] => 96,042,429,972
[37] => 1
[38] => 61292.56
[39] => جمع کل بار فيدر:
[40] => 393.85
[41] => جمع کل خاموشي:
[42] => 92,725
[43] => جمع مدت قطع:
)

Looks like data element 36 is the one you are having issues with, as you can see fgetcsv handles it fine, you just need to convert from a string to a number as you process the data. Just strip the commas.

<?php

if (($handle = fopen("sm_file.csv", "r")) !== FALSE) {
$data=array();
$num=0;
while (($data[] = fgetcsv($handle, 1000, ",")) !== FALSE) {
$data[(count($data)-1)][36]=str_replace(",","",$data[(count($data)-1)][36]);
}
fclose($handle);
print_r($data);
}

?>

Which gives

[36] => 96042429972

As for how long it takes, your full file of 2k records

User time (seconds): 0.12
System time (seconds): 0.09
Percent of CPU this job got: 43%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.52
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 41820
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2448
Voluntary context switches: 18
Involuntary context switches: 55
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

on a modest i5 w/ 8gb ram. Not seeing any issues.

PHP read part of large CSV file

After much thinking and reading I finally think I found the solution to my problem. Correct me if this is a bad solution because of memory usage or from other perspectives.

First run

$buffer = part($path_to_file, 0, 100);

Next run

$buffer = part($path_to_file, $buffer['pointer'], 100);

Function

function part($path, $offset, $rows) {
$buffer = array();
$buffer['content'] = '';
$buffer['pointer'] = array();
$handle = fopen($path, "r");
fseek($handle, $offset);
if( $handle ) {
for( $i = 0; $i < $rows; $i++ ) {
$buffer['content'] .= fgets($handle);
$buffer['pointer'] = mb_strlen($buffer['content']);
}
}
fclose($handle);
return($buffer);
}

In my more object oriented environment it looks more like this:

function part() {
$handle = fopen($this->path, "r");
fseek($handle, $this->pointer);
if( $handle ) {
for( $i = 0; $i < 2; $i++ ) {
if( $this->pointer != $this->filesize ) {
$this->content .= fgets($handle);
}
}
$this->pointer += mb_strlen($this->content);
}
fclose($handle);
}

How to improve the speed of insertion of the csv data in a database in php?

Instead of inserting data into database for every row, try inserting in batches.

You can always do a bulk insert, that can take n(use 1000) number of entries and insert it into the table.

https://www.mysqltutorial.org/mysql-insert-multiple-rows/

This will result in reduction of the DB calls, thereby reducing the overall time.

And for 80k entries there is a possibility that you might exceed the memory limit too.

You can overcome that using generators in php.
https://medium.com/@aashish.gaba097/database-seeding-with-large-files-in-laravel-be5b2aceaa0b

Although, this is in Laravel, but the code that reads from csv is independent (the one that uses generator) and the logic can be used here.



Related Topics



Leave a reply



Submit