PHP Array Performance

PHP array performance

Array access in PHP can certainly be slow. PHP uses hash tables to implement arrays, i.e. in order to access an element in an array it has to calculate a hash and traverse a linked list. Using a compiled language with real arrays will definitely improve performance, because there a direct memory access is made. For the interested: Code for hash access with string and with integer.

Concerning your code, there are several points I would optimize:

  • return directly, don't break twice.
  • put $file->get_width() and $file->get_height into simple variables. I assume that the height or width doesn't change throughout the process. Remember: Functions in PHP are slow.
  • Use a one-dimensional array, instead of nested arrays. You save one hash lookup per iteration that way. Actually a one-dimensional array is only marginally faster or even slightly slower. Comparison of several ways of saving the data concerning performance and memory usage.

.

function fits($bin, $x, $y, $w, $h) {
$w += $x;
$h += $y;

for ($i = $x; $i < $w; ++$i) {
for ($j = $y; $j < $h; ++$j) {
if ($bin[$i][$j] !== 0) {
return false;
}
}
}

return true;
}

Though I'm not sure, why you add $x to the $width / $y to the $height. Don't you want to iterate from the current coordinates to the image boundaries?

php array performance when adding elements one by one vs when adding all of the data at once

If you have a handful of keys/values, it will make absolutely no difference. If you deal in arrays with 100K+ members, it does actually make a difference. Let's build some data first:

$r = [];
for($i = 1; $i <= 100000; $i++) {
$r[] = $i; // for numerically indexed array
// $r["k_{$i}"] = $i; // for associative array
// array_push($r, $i); // with function call
}

This generates an array with 100000 members, one-by-one. When added with a numeric (auto)index, this loop takes ~0.0025 sec on my laptop, memory usage at ~6.8MB. If I use array_push, it takes ~0.0065 sec with the function overhead. When $i is added with a named key, it takes ~0.015 sec, memory usage at ~12.8MB. Then, named keys are slower to define.

But would it make a difference if you shaved 0.015 sec to 0.012 sec? Or with ^10 volume, 0.15 sec to 0.12 sec, or even 0.075 sec? Not really. It would only really start becoming noticeable if you had 1M+ members. What you actually do with that volume of data will take much longer, and should be the primary focus of your optimization efforts.


Update: I prepared three files, one with the 100K integers from above in one set; another with 100K integers separately defined; and serialized as JSON. I loaded them and logged the time. It turns out that there is a difference, where the definition "in one set" is 50% faster and more memory-efficient. Further, if the data is deserialized from JSON, it is 3x faster than including a "native array".

  • "In One Set": 0.075 sec, 9.9MB
  • "As Separate": 0.150 sec, 15.8MB
  • "From JSON": 0.025 sec, 9.9MB
  • "From MySQL": 0.110 sec, 13.8MB*

Then: If you define large arrays in native PHP format, define them in one go, rather than bit-by-bit. If you load bulk array data from a file, json_decode(file_get_contents('data.json'), true) loading JSON is significantly faster than include 'data.php'; with a native PHP array definition. Your mileage may vary with more complex data structures, however I wouldn't expect the basic performance pattern to change. For reference: Source data at BitBucket.

A curious observation: Generating the data from a scratch, in our loop above, was actually much faster than loading/parsing it from a file with a ready-made array!

MySQL: Key-value pairs were fetched from a two-column table with PDO into an array matching the sample data with fetchAll(PDO::FETCH_UNIQUE|PDO::FETCH_COLUMN) .


Best practice: When defining your data, if it's something you need to work with, rather than "crude export/import" data not read or manually edited: Construct your arrays in a manner that makes your code easy to maintain. I personally find it "cleaner" to keep simple arrays "contained":

$data = [
'length' => 100,
'width' => 200,
'foobar' => 'possibly'
];

Sometimes your array needs to "refer to itself" and the "bit-by-bit" format is necessary:

$data['length'] = 100;
$data['width'] = 200;
$data['square'] = $data['length'] * $data['width'];

If you build multidimensional arrays, I find it "cleaner" to separate each "root" dataset:

$data = [];
$data['shapes'] = ['square', 'triangle', 'octagon'];
$data['sizes'] = [100, 200, 300, 400];
$data['colors'] = ['red', 'green', 'blue'];

On a final note, by far the more limiting performance factor with PHP arrays is memory usage (see: array hashtable internals), which is unrelated to how you build your arrays. If you have massive datasets in arrays, make sure you don't keep unnecessary modified copies of them floating around beyond their scope of relevance. Otherwise your memory usage will rocket.


Tested on Win10 / PHP 8.1.1 / MariaDB 10.3.11 @ Thinkpad L380.

The speed of variables vs arrays in php

Speed and memory are likely to be irrelevant. Write clean, direct code.

If you are going to be iterating or searching these values, use an array.

As a fundamental rule, I don't declare single-use variables. Only in fringe cases where readability is dramatically improved do I break this rule.

PHP big array performance?

500+ will not do HUGE impact on performance. If your server has good RAM you do not need to be worry even though it reaches like, 10,000 - 50,000+.

But your concern has a valid point regarding scale-ability. Try to index your array with a known key and always try to use isset[<knonwn_key>] other than going for is_array operation (as Joey stated).

Even after doing this if you face performance issues, it will be the time to go to memcache. (Going from normal array to memcache will be easy if you have a known key array)

Hope this would help you! Thanks.

PHP, in_array and fast searches (by the end) in arrays

I assume that in_array is a linear search from 0 to n-1.

The fastest search will be to store the values as the keys and use array_key_exists.

$a['foo'] = true;
$a['bar'] = true;

if (array_key_exists('foo', $a)) ...

But if that's not an option, you can make your own for indexed arrays quite easily:

function in_array_i($needle, array $a, $i = 0);
{
$c = count($a);
for (;$i < $c; ++$i)
if ($a[$i] == $needle) return true;
return false;
}

It will start at $i, which you can keep track of yourself in order to skip the first elements.

Or alternatively...

function in_array_i($needle, array $a, $i = 0);
{
return in_array($needle, $i ? array_slice($a, $i) : $a);
}

You can benchmark to see which is faster.

in_array vs strpos for performance in php

strpos is the fastest way to search a text needle, per the php.net documentation for strstr():

If you only want to determine if a particular needle occurs within haystack, use the faster and less memory intensive function strpos() instead.1

PHP array performance - memory wise

Here is a code to test:

<?php

function testnochanges($arr1){
foreach($arr1 as $val){
//
}
return $arr1;
}

function testwithchanges($arr1){
$arr1[] = 1;
return $arr1;
}

echo "Stage 0: Mem usage is: " . memory_get_usage() . "<br />";

for ($i = 0; $i < 100000; ++$i) {
$arr[] = rand();
}

echo "Stage 1 (Array Created): Mem usage is: " . memory_get_usage() . "<br />";

$arrtest1 = testnochanges($arr);
echo "Stage 2 (Function did NO changes to array): Mem usage is: " . memory_get_usage() . "<br />";

$arrtest2 = testwithchanges($arr);
echo "Stage 3 (Function DID changes to array): Mem usage is: " . memory_get_usage() . "<br />";

?>

and here is a output after i run it:

Stage 0: Mem usage is: 330656
Stage 1 (Array Created): Mem usage is: 8855296
Stage 2 (Function did NO changes to array): Mem usage is: 8855352
Stage 3 (Function DID changes to array): Mem usage is: 14179864

On stage 0 we can see that before array is created PHP is already using some space in memory. After creating first array (Stage 1) we can see a big change in memory usage as expected. But after calling function testnochanges function and creating $arrtest1 on Stage 2, we see that memory usage did not change a lot. It's because we did no changes to $arr, so $arrtest1 and $arr still are pointing to the same array. But on Stage 3, where we call testwithchanges function, and add an element to $arr PHP performs copy-on-write and returned array which is assigned to $arrtest2 now uses different part of memory and again we see a big grow of memory usage.

Dry conclusion: If you copy array to another array and do not change it, memory usage stays the same as both arrays are pointed to the same one. If you change the array PHP performs copy-on-write and, of course memory usage grows.

Good thing to read: Be wary of garbage collection, part 2.



Related Topics



Leave a reply



Submit