Memory Optimization in PHP Array

Memory optimization in PHP array

If you want a real indexed array, use SplFixedArray. It uses less memory. Also, PHP 5.3 has a much better garbage collector.

Other than that, well, PHP will use more memory than a more carefully written C/C++ equivalent.

Memory Usage for 1024x1024 integer array:

  • Standard array: 218,756,848
  • SplFixedArray: 92,914,208

as measured by memory_get_peak_usage()

$array = new SplFixedArray(1024 * 1024); // array();
for ($i = 0; $i < 1024 * 1024; ++$i)
$array[$i] = 0;

echo memory_get_peak_usage();

Note that the same array in C using 64-bit integers would be 8M.

As others have suggested, you could pack the data into a string. This is slower but much more memory efficient. If using 8 bit values it's super easy:

$x = str_repeat(chr(0), 1024*1024);
$x[$i] = chr($v & 0xff); // store value $v into $x[$i]
$v = ord($x[$i]); // get value $v from $x[$i]

Here the memory will only be about 1.5MB (that is, when considering the entire overhead of PHP with just this integer string array).

For the fun of it, I created a simple benchmark of creating 1024x1024 8-bit integers and then looping through them once. The packed versions all used ArrayAccess so that the user code looked the same.

                   mem    write   read
array 218M 0.589s 0.176s
packed array 32.7M 1.85s 1.13s
packed spl array 13.8M 1.91s 1.18s
packed string 1.72M 1.11s 1.08s

The packed arrays used native 64-bit integers (only packing 7 bytes to avoid dealing with signed data) and the packed string used ord and chr. Obviously implementation details and computer specs will affect things a bit, but I would expect you to get similar results.

So while the array was 6x faster it also used 125x the memory as the next best alternative: packed strings. Obviously the speed is irrelevant if you are running out of memory. (When I used packed strings directly without an ArrayAccess class they were only 3x slower than native arrays.)

In short, to summarize, I would use something other than pure PHP to process this data if speed is of any concern.

Need an array-like structure in PHP with minimal memory usage

The most memory efficient you'll get is probably by storing everything in a string, packed in binary, and use manual indexing to it.

$storage = '';

$storage .= pack('l', 42);

// ...

// get 10th entry
$int = current(unpack('l', substr($storage, 9 * 4, 4)));

This can be feasible if the "array" initialisation can be done in one fell swoop and you're just reading from the structure. If you need a lot of appending to the string, this becomes extremely inefficient. Even this can be done using a resource handle though:

$storage = fopen('php://memory', 'r+');
fwrite($storage, pack('l', 42));
...

This is very efficient. You can then read this buffer back into a variable and use it as string, or you can continue to work with the resource and fseek.

How does PHP memory actually work

Note, answer below is applicable for PHP prior to version 7 as in PHP 7 major changes were introduced which also involve values structures.

TL;DR

Your question is not actually about "how memory works in PHP" (here, I assume, you meant "memory allocation"), but about "how arrays work in PHP" - and these two questions are different. To summarize what's written below:

  • PHP arrays aren't "arrays" in classical sense. They are hash-maps
  • Hash-map for PHP array has specific structure and uses many additional storage things, such as internal links pointers
  • Hash-map items for PHP hash-map also use additional fields to store information. And - yes, not only string/integer keys matters, but also what are strings themselves, which are used for your keys.
  • Option with string keys in your case will "win" in terms of memory amount because both options will be hashed into ulong (unsigned long) keys hash-map, so real difference will be in values, where string-keys option has integer (fixed-length) values, while integer-keys option has strings (chars-dependent length) values. But that may not always will be true due to possible collisions.
  • "String-numeric" keys, such as '4', will be treated as integer keys and translated into integer hash result as it was integer key. Thus, '4'=>'foo' and 4 => 'foo' are same things.

Also, important note: the graphics here are copyright of PHP internals book

Hash-map for PHP arrays

PHP arrays and C arrays

You should realize one very important thing: PHP is written on C, where such things as "associative array" simply does not exist. So, in C "array" is exactly what "array" is - i.e. it's just a consecutive area in memory which can be accessed by a consecutive offset. Your "keys" may be only numeric, integer and only consecutive, starting from zero. You can't have, for instance, 3,-6,'foo' as your "keys" there.

So to implement arrays, which are in PHP, there's hash-map option, it uses hash-function to hash your keys and transform them to integers, which can be used for C-arrays. That function, however, will never be able to create a bijection between string keys and their integer hashed results. And it's easy to understand why: because cardinality of strings set is much, much larger that cardinality of integer set. Let's illustrate with example: we'll recount all strings, up to length 10, which have only alphanumeric symbols (so, 0-9, a-z and A-Z, total 62): it's 6210 total strings possible. It's around 8.39E+17. Compare it with around 4E+9 which we have for unsigned integer (long integer, 32-bits) type and you'll get the idea - there will be collisions.

PHP hash-map keys & collisions

Now, to resolve collisions, PHP will just place items, which have same hash-function result, into one linked list. So, hash-map would not be just "list of hashed elements", but instead it will store pointers to lists of elements (each element in certain list will have same hash-function key). And this is where you have point to how it will affect memory allocation: if your array has string keys, which did not result in collisions, then no additional pointers inside those list would be needed, so memory amount will be reduced (actually, it's a very small overhead, but, since we're talking about precise memory allocation, this should be taken to account). And, same way, if your string keys will result into many collisions, then more additional pointers would be created, so total memory amount will be a bit more.

To illustrate those relations within those lists, here's a graphic:

Sample Image

Above there is how PHP will resolve collisions after applying hash-function. So one of your question parts lies here, pointers inside collision-resolution lists. Also, elements of linked lists are usually called buckets and the array, which contains pointers to heads of those lists is internally called arBuckets. Due to structure optimization (so, to make such things as element deletion, faster), real list element has two pointers, previous element and next element - but that's only will make difference in memory amount for non-collision/collision arrays little wider, but won't change concept itself.

One more list: order

To fully support arrays as they are in PHP, it's also needed to maintain order, so that is achieved with another internal list. Each element of arrays is a member of that list too. It won't make difference in terms of memory allocation, since in both options this list should be maintained, but for full picture, I'm mentioning this list. Here's the graphic:

Sample Image

In addition to pListLast and pListNext, pointers to order-list head and tail are stored. Again, it's not directly related to your question, but further I'll dump internal bucket structure, where these pointers are present.

Array element from inside

Now we're ready to look into: what is array element, so, bucket:

typedef struct bucket {
ulong h;
uint nKeyLength;
void *pData;
void *pDataPtr;
struct bucket *pListNext;
struct bucket *pListLast;
struct bucket *pNext;
struct bucket *pLast;
char *arKey;
} Bucket;

Here we are:

  • h is an integer (ulong) value of key, it's a result of hash-function. For integer keys it is just same as key itself (hash-function returns itself)
  • pNext / pLast are pointers inside collision-resolution linked list
  • pListNext/pListLast are pointers inside order-resolution linked list
  • pData is a pointer to the stored value. Actually, value isn't same as inserted at array creation, it's copy, but, to avoid unnecessary overhead, PHP uses pDataPtr (so pData = &pDataPtr)

From this viewpoint, you may get next thing to where difference is: since string key will be hashed (thus, h is always ulong and, therefore, same size), it will be a matter of what is stored in values. So for your string-keys array there will be integer values, while for integer-keys array there will be string values, and that makes difference. However - no, it isn't a magic: you can't "save memory" with storing string keys such way all the times, because if your keys would be large and there will be many of them, it will cause collisions overhead (well, with very high probability, but, of course, not guaranteed). It will "work" only for arbitrary short strings, which won't cause many collisions.

Hash-table itself

It's already been spoken about elements (buckets) and their structure, but there's also hash-table itself, which is, in fact, array data-structure. So, it's called _hashtable:

typedef struct _hashtable {
uint nTableSize;
uint nTableMask;
uint nNumOfElements;
ulong nNextFreeElement;
Bucket *pInternalPointer; /* Used for element traversal */
Bucket *pListHead;
Bucket *pListTail;
Bucket **arBuckets;
dtor_func_t pDestructor;
zend_bool persistent;
unsigned char nApplyCount;
zend_bool bApplyProtection;
#if ZEND_DEBUG
int inconsistent;
#endif
} HashTable;

I won't describe all the fields, since I've already provided much info, which is only related to the question, but I'll describe this structure briefly:

  • arBuckets is what was described above, the buckets storage,
  • pListHead/pListTail are pointers to order-resolution list
  • nTableSize determines size of hash-table. And this is directly related to memory allocation: nTableSize is always power of 2. Thus, it's no matter if you'll have 13 or 14 elements in array: actual size will be 16. Take that to account when you want to estimate array size.

Conclusion

It's really difficult to predict, will one array be larger than another in your case. Yes, there are guidelines which are following from internal structure, but if string keys are comparable by their length to integer values (like 'four', 'one' in your sample) - real difference will be in such things as - how many collisions occurred, how many bytes were allocated to save the value.

But choosing proper structure should be matter of sense, not memory. If your intention is to build the corresponding indexed data, then choice always be obvious. Post above is only about one goal: to show how arrays actually work in PHP and where you can find the difference in memory allocation in your sample.

You may also check article about arrays & hash-tables in PHP: it's Hash-tables in PHP by PHP internals book: I've used some graphics from there. Also, to realize, how values are allocated in PHP, check zval Structure article, it may help you to understand, what will be differences between strings & integers allocation for values of your arrays. I didn't include explanations from it here, since much more important point for me - is to show array data structure and what may be difference in context of string keys/integer keys for your question.

How do I optimize memory usage in PHP for this function?

It seems to me you are abusing soap service. if you telling us that your code fails at $googleFunction I can offer you set the $chunk with 100 or 200 objects.

The second thing is a $item->save() function. If you have an access to the class you should check if there are HAS-IS classes. The only place PHP leak the memory is like this construction:

class Foo {
function __construct()
{
$this->bar = new Bar($this);
}
}

class Bar {
function __construct($foo = null)
{
$this->foo = $foo;
}
}

for($i = 0; $i < 10; $i++){

$foo = new Foo();
unset($foo);
echo number_format(memory_get_usage()) . "<br>";

}

so If you have objects, I suspect Active Record objects, created in a save() function don't forget to unset them.
the easy way is to add destruct like this:

function __destruct()
{
unset($this->bar);
}

This should help

PHP arrays memory usage

When I repeat your experiments using your code, I get 18 072 000 bytes for the integer indexes and 16 471 960 bytes for the string indexes. Not much difference, which can be attributed to different memory management for array keys and their values.

Using memory_get_peak_usage(true) instead produces very similar results. Calculating the difference in memory usage right before and right after the for loop, I get 18 087 936 bytes with integer indexes and 16 515 072 bytes with string indexes.

That's a small difference which could be explained by different internal memory management for array keys and for array values. Perhaps since array keys are limited to scalars and array values aren't, PHP can optimize somewhere.

In any case, like @ed-heal said, use the best data structure for whatever you're trying to do. The memory usage is probably not that important and if it is, PHP might not be the tool for the job.

php array performance when adding elements one by one vs when adding all of the data at once

If you have a handful of keys/values, it will make absolutely no difference. If you deal in arrays with 100K+ members, it does actually make a difference. Let's build some data first:

$r = [];
for($i = 1; $i <= 100000; $i++) {
$r[] = $i; // for numerically indexed array
// $r["k_{$i}"] = $i; // for associative array
// array_push($r, $i); // with function call
}

This generates an array with 100000 members, one-by-one. When added with a numeric (auto)index, this loop takes ~0.0025 sec on my laptop, memory usage at ~6.8MB. If I use array_push, it takes ~0.0065 sec with the function overhead. When $i is added with a named key, it takes ~0.015 sec, memory usage at ~12.8MB. Then, named keys are slower to define.

But would it make a difference if you shaved 0.015 sec to 0.012 sec? Or with ^10 volume, 0.15 sec to 0.12 sec, or even 0.075 sec? Not really. It would only really start becoming noticeable if you had 1M+ members. What you actually do with that volume of data will take much longer, and should be the primary focus of your optimization efforts.


Update: I prepared three files, one with the 100K integers from above in one set; another with 100K integers separately defined; and serialized as JSON. I loaded them and logged the time. It turns out that there is a difference, where the definition "in one set" is 50% faster and more memory-efficient. Further, if the data is deserialized from JSON, it is 3x faster than including a "native array".

  • "In One Set": 0.075 sec, 9.9MB
  • "As Separate": 0.150 sec, 15.8MB
  • "From JSON": 0.025 sec, 9.9MB
  • "From MySQL": 0.110 sec, 13.8MB*

Then: If you define large arrays in native PHP format, define them in one go, rather than bit-by-bit. If you load bulk array data from a file, json_decode(file_get_contents('data.json'), true) loading JSON is significantly faster than include 'data.php'; with a native PHP array definition. Your mileage may vary with more complex data structures, however I wouldn't expect the basic performance pattern to change. For reference: Source data at BitBucket.

A curious observation: Generating the data from a scratch, in our loop above, was actually much faster than loading/parsing it from a file with a ready-made array!

MySQL: Key-value pairs were fetched from a two-column table with PDO into an array matching the sample data with fetchAll(PDO::FETCH_UNIQUE|PDO::FETCH_COLUMN) .


Best practice: When defining your data, if it's something you need to work with, rather than "crude export/import" data not read or manually edited: Construct your arrays in a manner that makes your code easy to maintain. I personally find it "cleaner" to keep simple arrays "contained":

$data = [
'length' => 100,
'width' => 200,
'foobar' => 'possibly'
];

Sometimes your array needs to "refer to itself" and the "bit-by-bit" format is necessary:

$data['length'] = 100;
$data['width'] = 200;
$data['square'] = $data['length'] * $data['width'];

If you build multidimensional arrays, I find it "cleaner" to separate each "root" dataset:

$data = [];
$data['shapes'] = ['square', 'triangle', 'octagon'];
$data['sizes'] = [100, 200, 300, 400];
$data['colors'] = ['red', 'green', 'blue'];

On a final note, by far the more limiting performance factor with PHP arrays is memory usage (see: array hashtable internals), which is unrelated to how you build your arrays. If you have massive datasets in arrays, make sure you don't keep unnecessary modified copies of them floating around beyond their scope of relevance. Otherwise your memory usage will rocket.


Tested on Win10 / PHP 8.1.1 / MariaDB 10.3.11 @ Thinkpad L380.

Managing mega Arrays in PHP

Use SplFixedArray because you really need to see How big are PHP arrays (and values) really? (Hint: BIG!)

$t = 1e6;
$x = array();
for($i = 0; $i < $t; $i ++) {
$x[$i] = 1 * rand(0, 10);
}

Output

Start memory usage: 256
End memory usage: 82688
Peak memory usage: 82688

and

$t = 1e6;
$x = new SplFixedArray($t);
for($i = 0; $i < $t; $i ++) {
$x[$i] = 1 * rand(0, 10);
}

Output

Start memory usage: 256
End memory usage: 35584
Peak memory usage: 35584

But better still i think you should consider a memory based database like REDIS



Related Topics



Leave a reply



Submit