Read large data from csv file in php
An excellent method to deal with large files is located at: https://stackoverflow.com/a/5249971/797620
This method is used at http://www.cuddlycactus.com/knownpasswords/ (page has been taken down) to search through 170+ million passwords in just a few milliseconds.
How to parse a csv file that contains 15 million lines of data in php
Iterating over a large dataset (file lines, etc.) and pushing into array it increases memory usage and this is directly proportional to the number of items handling.
So the bigger file, the bigger memory usage - in this case.
If it's desired a function to formatting the CSV data before processing it, backing it on the of generators sounds like a great idea.
Reading the PHP doc it fits very well for your case (emphasis mine):
A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which
may cause you to exceed a memory limit, or require a considerable
amount of processing time to generate.
Something like this:
function csv_read($filename, $delimeter=',')
{
$header = [];
$row = 0;
# tip: dont do that every time calling csv_read(), pass handle as param instead ;)
$handle = fopen($filename, "r");
if ($handle === false) {
return false;
}
while (($data = fgetcsv($handle, 0, $delimeter)) !== false) {
if (0 == $row) {
$header = $data;
} else {
# on demand usage
yield array_combine($header, $data);
}
$row++;
}
fclose($handle);
}
And then:
$generator = csv_read('rdu-weather-history.csv', ';');
foreach ($generator as $item) {
do_something($item);
}
The major difference here is:
you do not get (from memory) and consume all data at once. You get items on demand (like a stream) and process it instead, one item at time. It has huge impact on memory usage.
P.S.: The CSV file above has taken from: https://data.townofcary.org/api/v2/catalog/datasets/rdu-weather-history/exports/csv
How can I process a large CSV file line by line?
Save the file somewhere and then process it in chunks like this:
<?php
$filePath = 'big.csv';
//How many rows to process in each batch
$limit = 100;
$fileHandle = fopen($filePath, "r");
if ($fileHandle === FALSE)
{
die('Error opening '.$filePath);
}
//Set up a variable to hold our current position in the file
$offset = 0;
while(!feof($fileHandle))
{
//Go to where we were when we ended the last batch
fseek($fileHandle, $offset);
$i = 0;
while (($currRow = fgetcsv($fileHandle)) !== FALSE)
{
$i++;
//Do something with the current row
print implode(', ', $currRow)."\n";
//If we hit our limit or are at the end of the file
if($i >= $limit)
{
//Update our current position in the file
$offset = ftell($fileHandle);
//Break out of the row processing loop
break;
}
}
}
//Close the file
fclose($fileHandle);
reading large csv file contain comma with php
So I waited the minute to download the file, grabbed the first 5 records, and used a copy/paste of the fgetcsv
example in the PHP manual.
First 5 records - https://termbin.com/23ti - saved as "sm_file.csv"
<?php
if (($handle = fopen("sm_file.csv", "r")) !== FALSE) {
$data=array();
$num=0;
while (($data[] = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num++;
}
fclose($handle);
print_r($data);
}
?>
[0] => Array
(
[0] => از تاريخ وصل 01/07/1397 - با برنامه
[1] => تاريخ گزارش: 29/09/1397
[2] => شماره گزارش: (3-5)
[3] => صفحه 1
[4] => گزارش قطع و وصل فيدرهاي فشار متوسط (نمونه 3)
[5] => ملاحظات
[6] => شرايط جوي
[7] => عملكرد ريكلوزر
[8] => رله عامل
[9] => خاموشي (MWh)
[10] => بار فيدر (A)
[11] => مدت قطع
[12] => زمان وصل
[13] => تاريخ وصل
[14] => زمان قطع
[15] => تاريخ قطع
[16] => نوع اشكال بوجود آمده
[17] => فيدر فشار متوسط
[18] => پست فوق توزيع
[19] => شماره پرونده
[20] => رديف
[21] => ناحيه اسالم
[22] =>
[23] => آفتابي
[24] => ندارد
[25] => ندارد
[26] => 0.21
[27] => 3
[28] => 132
[29] => 11:30
[30] => 1397/07/04
[31] => 09:18
[32] => 1397/07/04
[33] => جهت كار در حريم شبكه
[34] => گيسوم
[35] => اسا لم
[36] => 96,042,429,972
[37] => 1
[38] => 61292.56
[39] => جمع کل بار فيدر:
[40] => 393.85
[41] => جمع کل خاموشي:
[42] => 92,725
[43] => جمع مدت قطع:
)
Looks like data element 36 is the one you are having issues with, as you can see fgetcsv
handles it fine, you just need to convert from a string to a number as you process the data. Just strip the commas.
<?php
if (($handle = fopen("sm_file.csv", "r")) !== FALSE) {
$data=array();
$num=0;
while (($data[] = fgetcsv($handle, 1000, ",")) !== FALSE) {
$data[(count($data)-1)][36]=str_replace(",","",$data[(count($data)-1)][36]);
}
fclose($handle);
print_r($data);
}
?>
Which gives
[36] => 96042429972
As for how long it takes, your full file of 2k records
User time (seconds): 0.12
System time (seconds): 0.09
Percent of CPU this job got: 43%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.52
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 41820
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2448
Voluntary context switches: 18
Involuntary context switches: 55
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
on a modest i5 w/ 8gb ram. Not seeing any issues.
PHP read part of large CSV file
After much thinking and reading I finally think I found the solution to my problem. Correct me if this is a bad solution because of memory usage or from other perspectives.
First run
$buffer = part($path_to_file, 0, 100);
Next run
$buffer = part($path_to_file, $buffer['pointer'], 100);
Function
function part($path, $offset, $rows) {
$buffer = array();
$buffer['content'] = '';
$buffer['pointer'] = array();
$handle = fopen($path, "r");
fseek($handle, $offset);
if( $handle ) {
for( $i = 0; $i < $rows; $i++ ) {
$buffer['content'] .= fgets($handle);
$buffer['pointer'] = mb_strlen($buffer['content']);
}
}
fclose($handle);
return($buffer);
}
In my more object oriented environment it looks more like this:
function part() {
$handle = fopen($this->path, "r");
fseek($handle, $this->pointer);
if( $handle ) {
for( $i = 0; $i < 2; $i++ ) {
if( $this->pointer != $this->filesize ) {
$this->content .= fgets($handle);
}
}
$this->pointer += mb_strlen($this->content);
}
fclose($handle);
}
How to improve the speed of insertion of the csv data in a database in php?
Instead of inserting data into database for every row, try inserting in batches.
You can always do a bulk insert, that can take n(use 1000) number of entries and insert it into the table.
https://www.mysqltutorial.org/mysql-insert-multiple-rows/
This will result in reduction of the DB calls, thereby reducing the overall time.
And for 80k entries there is a possibility that you might exceed the memory limit too.
You can overcome that using generators in php.
https://medium.com/@aashish.gaba097/database-seeding-with-large-files-in-laravel-be5b2aceaa0b
Although, this is in Laravel, but the code that reads from csv is independent (the one that uses generator) and the logic can be used here.
Related Topics
Best Way to Transfer an Array Between PHP and JavaScript
How to Send Emails via Cron Job Usng PHP MySQL
Fix Malformed Xml in PHP Before Processing Using Domdocument Functions
How to Create Codeigniter Batch Insert Array
MySQL No Connection Could Be Made Because the Target MAChine Actively Refused It
Hide Shipping Methods for Specific Shipping Class in Woocommerce
Preg_Match(); - Unknown Modifier '+'
Setting Value of a HTML Form Textarea
How to Get the Os on Which PHP Is Running
Is Making Asynchronous Http Requests Possible with PHP
How to Output JavaScript with PHP
Php: Force File Download and Ie, Yet Again