Why Is Calling a Function (Such as Strlen, Count etc) on a Referenced Value So Slow

Why is calling a function (such as strlen, count etc) on a referenced value so slow?

I found a bug report from 2005 that describes exactly this issue: http://bugs.php.net/bug.php?id=34540

So the problem seems to be that when passing a referenced value to a function that doesn't accept a reference, PHP needs to copy it.

This can be demonstrated with this test code:

<?php
function CalledFunc(&$aData)
{
// Do nothing
}

function TestFunc(&$aArray)
{
$aArray = range(0, 100000);
$fStartTime = microtime(true);

for ($iIter = 0; $iIter < 1000; $iIter++)
{
CalledFunc($aArray);
}

$fTaken = microtime(true) - $fStartTime;

print "took $fTaken seconds\n";
}

$aArray = array();
TestFunc($sData);
?>

This runs quickly, but if you change function CalledFunc(&$aData) to function CalledFunc($aData) you'll see a similar slow-down to the count example.

This is rather worrying, since I've been coding PHP for quite a while and I had no idea about this issue.

Fortunately there's a simple workaround that is applicable in many cases - use a temporary local variable inside the loop, and copy to the reference variable at the end.

How to get the length of a (by reference) array

I'm still not sure I understand why this is necessary. However, you've indexed the array as a numerical sequential array, so you can do this "trick" to get the size:

// Make sure the array pointer is at the end of the array
end($result); // Not sure if this is necessary based on your description
$size = key($result) + 1; // Get the numeric key of the last array element

Note that both end() and key() take their parameters by reference. However, I wouldn't be surprised if they're now operating on references of references, but this is something you can investigate.

Why is calling a function (such as strlen, count etc) on a referenced value so slow?

I found a bug report from 2005 that describes exactly this issue: http://bugs.php.net/bug.php?id=34540

So the problem seems to be that when passing a referenced value to a function that doesn't accept a reference, PHP needs to copy it.

This can be demonstrated with this test code:

<?php
function CalledFunc(&$aData)
{
// Do nothing
}

function TestFunc(&$aArray)
{
$aArray = range(0, 100000);
$fStartTime = microtime(true);

for ($iIter = 0; $iIter < 1000; $iIter++)
{
CalledFunc($aArray);
}

$fTaken = microtime(true) - $fStartTime;

print "took $fTaken seconds\n";
}

$aArray = array();
TestFunc($sData);
?>

This runs quickly, but if you change function CalledFunc(&$aData) to function CalledFunc($aData) you'll see a similar slow-down to the count example.

This is rather worrying, since I've been coding PHP for quite a while and I had no idea about this issue.

Fortunately there's a simple workaround that is applicable in many cases - use a temporary local variable inside the loop, and copy to the reference variable at the end.

In PHP (= 5.0), is passing by reference faster?

The Zend Engine uses copy-on-write, and when you use a reference yourself, it incurs a little extra overhead. Can only find this mention at time of writing though, and comments in the manual contain other links.

(EDIT) The manual page on Objects and references contains a little more info on how object variables differ from references.

Why does glibc's strlen need to be so complicated to run quickly?

You don't need and you should never write code like that - especially if you're not a C compiler / standard library vendor. It is code used to implement strlen with some very questionable speed hacks and assumptions (that are not tested with assertions or mentioned in the comments):

  • unsigned long is either 4 or 8 bytes
  • bytes are 8 bits
  • a pointer can be cast to unsigned long long and not uintptr_t
  • one can align the pointer simply by checking that the 2 or 3 lowest order bits are zero
  • one can access a string as unsigned longs
  • one can read past the end of array without any ill effects.

What is more, a good compiler could even replace code written as

size_t stupid_strlen(const char s[]) {
size_t i;
for (i=0; s[i] != '\0'; i++)
;
return i;
}

(notice that it has to be a type compatible with size_t) with an inlined version of the compiler builtin strlen, or vectorize the code; but a compiler would be unlikely to be able to optimize the complex version.


The strlen function is described by C11 7.24.6.3 as:

Description


  1. The strlen function computes the length of the string pointed to by s.

Returns


  1. The strlen function returns the number of characters that precede the terminating null character.

Now, if the string pointed to by s was in an array of characters just long enough to contain the string and the terminating NUL, the behaviour will be undefined if we access the string past the null terminator, for example in

char *str = "hello world";  // or
char array[] = "hello world";

So really the only way in fully portable / standards compliant C to implement this correctly is the way it is written in your question, except for trivial transformations - you can pretend to be faster by unrolling the loop etc, but it still needs to be done one byte at a time.

(As commenters have pointed out, when strict portability is too much of a burden, taking advantage of reasonable or known-safe assumptions is not always a bad thing. Especially in code that's part of one specific C implementation. But you have to understand the rules before knowing how/when you can bend them.)


The linked strlen implementation first checks the bytes individually until the pointer is pointing to the natural 4 or 8 byte alignment boundary of the unsigned long. The C standard says that accessing a pointer that is not properly aligned has undefined behaviour, so this absolutely has to be done for the next dirty trick to be even dirtier. (In practice on some CPU architecture other than x86, a misaligned word or doubleword load will fault. C is not a portable assembly language, but this code is using it that way). It's also what makes it possible to read past the end of an object without risk of faulting on implementations where memory protection works in aligned blocks (e.g. 4kiB virtual memory pages).

Now comes the dirty part: the code breaks the promise and reads 4 or 8 8-bit bytes at a time (a long int), and uses a bit trick with unsigned addition to quickly figure out if there were any zero bytes within those 4 or 8 bytes - it uses a specially crafted number to that would cause the carry bit to change bits that are caught by a bit mask. In essence this would then figure out if any of the 4 or 8 bytes in the mask are zeroes supposedly faster than looping through each of these bytes would. Finally there is a loop at the end to figure out which byte was the first zero, if any, and to return the result.

The biggest problem is that in sizeof (unsigned long) - 1 times out of sizeof (unsigned long) cases it will read past the end of the string - only if the null byte is in the last accessed byte (i.e. in little-endian the most significant, and in big-endian the least significant), does it not access the array out of bounds!


The code, even though used to implement strlen in a C standard library is bad code. It has several implementation-defined and undefined aspects in it and it should not be used anywhere instead of the system-provided strlen - I renamed the function to the_strlen here and added the following main:

int main(void) {
char buf[12];
printf("%zu\n", the_strlen(fgets(buf, 12, stdin)));
}

The buffer is carefully sized so that it can hold exactly the hello world string and the terminator. However on my 64-bit processor the unsigned long is 8 bytes, so the access to the latter part would exceed this buffer.

If I now compile with -fsanitize=undefined and -fsanitize=address and run the resulting program, I get:

% ./a.out
hello world
=================================================================
==8355==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffffe63a3f8 at pc 0x55fbec46ab6c bp 0x7ffffe63a350 sp 0x7ffffe63a340
READ of size 8 at 0x7ffffe63a3f8 thread T0
#0 0x55fbec46ab6b in the_strlen (.../a.out+0x1b6b)
#1 0x55fbec46b139 in main (.../a.out+0x2139)
#2 0x7f4f0848fb96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
#3 0x55fbec46a949 in _start (.../a.out+0x1949)

Address 0x7ffffe63a3f8 is located in stack of thread T0 at offset 40 in frame
#0 0x55fbec46b07c in main (.../a.out+0x207c)

This frame has 1 object(s):
[32, 44) 'buf' <== Memory access at offset 40 partially overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (.../a.out+0x1b6b) in the_strlen
Shadow bytes around the buggy address:
0x10007fcbf420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fcbf430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fcbf440: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fcbf450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fcbf460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10007fcbf470: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00[04]
0x10007fcbf480: f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fcbf490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fcbf4a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fcbf4b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007fcbf4c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==8355==ABORTING

i.e. bad things happened.

Why are PHP function calls *so* expensive?

Function calls are expensive in PHP because there's lot of stuff being done.

Note that isset is not a function (it has a special opcode for it), so it's faster.

For a simple program like this:

<?php
func("arg1", "arg2");

There are six (four + one for each argument) opcodes:


1 INIT_FCALL_BY_NAME 'func', 'func'
2 EXT_FCALL_BEGIN
3 SEND_VAL 'arg1'
4 SEND_VAL 'arg2'
5 DO_FCALL_BY_NAME 2
6 EXT_FCALL_END

You can check the implementations of the opcodes in zend_vm_def.h. Prepend ZEND_ to the names, e.g. for ZEND_INIT_FCALL_BY_NAME and search.

ZEND_DO_FCALL_BY_NAME is particularly complicated. Then there's the the implementation of the function itself, which must unwind the stack, check the types, convert the zvals and possibly separate them and to the actual work...



Related Topics



Leave a reply



Submit