Php: What Is the Complexity [I.E O(1),O(N)] of the Function 'Count'

PHP: What is the complexity [i.e O(1),O(n)] of the function 'count'?

$ time php -r '$a=range(1,1000000); $b=0; for($i=0;$i<10;$i++) $b=count($a);'
real 0m0.458s
$ time php -r '$a=range(1,1000000); $a=array(1); $b=0; for($i=0;$i<10;$i++) $b=count($a);'
real 0m0.457s

Seems pretty O(1) to me.

Tested PHP version: PHP 5.3.3-1ubuntu9.1 with Suhosin-Patch (cli) (built: Oct 15 2010 14:00:18)

Is PHP's count() function O(1) or O(n) for arrays?

Well, we can look at the source:

/ext/standard/array.c

PHP_FUNCTION(count) calls php_count_recursive(), which in turn calls zend_hash_num_elements() for non-recursive array, which is implemented this way:

ZEND_API int zend_hash_num_elements(const HashTable *ht)
{
IS_CONSISTENT(ht);

return ht->nNumOfElements;
}

So you can see, it's O(1) for $mode = COUNT_NORMAL.

Return both query result and num rows

also you can use your function:

public function get_results()
{
$query = $this->db->get('data');
$result = $query->result();
return $result
}

then when call the function:

$result= $this->your_model->get_results();
$count = count($result);

Undefined offset with count()

Try this:

$arr = array();
for ($j=0;$j<=6;$j++) {
if (isset($MyArray[$i*7+$j])) $arr[] = count($MyArray[$i*7+$j]);
}
$result = max($arr);

I don't know exactly what $i refers to though...

List of Big-O for PHP functions

Since it doesn't seem like anyone has done this before I thought it'd be good idea to have it for reference somewhere. I've gone though and either via benchmark or code-skimming to characterize the array_* functions. I've tried to put the more interesting Big-O near the top. This list is not complete.

Note: All the Big-O where calculated assuming a hash lookup is O(1) even though it's really O(n). The coefficient of the n is so low, the ram overhead of storing a large enough array would hurt you before the characteristics of lookup Big-O would start taking effect. For example the difference between a call to array_key_exists at N=1 and N=1,000,000 is ~50% time increase.

Interesting Points:

  1. isset/array_key_exists is much faster than in_array and array_search
  2. +(union) is a bit faster than array_merge (and looks nicer). But it does work differently so keep that in mind.
  3. shuffle is on the same Big-O tier as array_rand
  4. array_pop/array_push is faster than array_shift/array_unshift due to re-index penalty

Lookups:

array_key_exists O(n) but really close to O(1) - this is because of linear polling in collisions, but because the chance of collisions is very small, the coefficient is also very small. I find you treat hash lookups as O(1) to give a more realistic big-O. For example the different between N=1000 and N=100000 is only about 50% slow down.

isset( $array[$index] ) O(n) but really close to O(1) - it uses the same lookup as array_key_exists. Since it's language construct, will cache the lookup if the key is hardcoded, resulting in speed up in cases where the same key is used repeatedly.

in_array O(n) - this is because it does a linear search though the array until it finds the value.

array_search O(n) - it uses the same core function as in_array but returns value.

Queue functions:

array_push O(∑ var_i, for all i)

array_pop O(1)

array_shift O(n) - it has to reindex all the keys

array_unshift O(n + ∑ var_i, for all i) - it has to reindex all the keys

Array Intersection, Union, Subtraction:

array_intersect_key if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)

array_intersect if intersection 100% do O(n^2*∑param_i_count, for all i), if intersection 0% intersect O(n^2)

array_intersect_assoc if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)

array_diff O(π param_i_size, for all i) - That's product of all the param_sizes

array_diff_key O(∑ param_i_size, for i != 1) - this is because we don't need to iterate over the first array.

array_merge O( ∑ array_i, i != 1 ) - doesn't need to iterate over the first array

+ (union) O(n), where n is size of the 2nd array (ie array_first + array_second) - less overhead than array_merge since it doesn't have to renumber

array_replace O( ∑ array_i, for all i )

Random:

shuffle O(n)

array_rand O(n) - Requires a linear poll.

Obvious Big-O:

array_fill O(n)

array_fill_keys O(n)

range O(n)

array_splice O(offset + length)

array_slice O(offset + length) or O(n) if length = NULL

array_keys O(n)

array_values O(n)

array_reverse O(n)

array_pad O(pad_size)

array_flip O(n)

array_sum O(n)

array_product O(n)

array_reduce O(n)

array_filter O(n)

array_map O(n)

array_chunk O(n)

array_combine O(n)

I'd like to thank Eureqa for making it easy to find the Big-O of the functions. It's an amazing free program that can find the best fitting function for arbitrary data.

EDIT:

For those who doubt that PHP array lookups are O(N), I've written a benchmark to test that (they are still effectively O(1) for most realistic values).

php array lookup graph

$tests = 1000000;
$max = 5000001;

for( $i = 1; $i <= $max; $i += 10000 ) {
//create lookup array
$array = array_fill( 0, $i, NULL );

//build test indexes
$test_indexes = array();
for( $j = 0; $j < $tests; $j++ ) {
$test_indexes[] = rand( 0, $i-1 );
}

//benchmark array lookups
$start = microtime( TRUE );
foreach( $test_indexes as $test_index ) {
$value = $array[ $test_index ];
unset( $value );
}
$stop = microtime( TRUE );
unset( $array, $test_indexes, $test_index );

printf( "%d,%1.15f\n", $i, $stop - $start ); //time per 1mil lookups
unset( $stop, $start );
}

PHP Arrays - Remove duplicates ( Time complexity )

While I can't speak for the native array_unique function, I can tell you that your friends algorithm is faster because:

  1. He uses a single foreach loop as opposed to your double for() loop.
  2. Foreach loops tend to perform faster than for loops in PHP.
  3. He used a single if(! ) comparison while you used two if() structures
  4. The only additional function call your friend made was in_array whereas you called count() twice.
  5. You made three variable declarations that your friend didn't have to ($a, $current_is_unique, $current_index)

While none of these factors alone is huge, I can see where the cumulative effect would make your algorithm take longer than your friends.

How to count the number of 1 bits set in 0, 1, 2, ..., n in O(n) time?

Here's a nice little observation that you can use to do this in time O(n). Imagine you want to know how many 1 bits are set in the number k, and that you already know how many 1 bits are set in the numbers 0, 1, 2, ..., k - 1. If you can find a way to clear any of the 1 bits that are set in the number k, you'd get some smaller number (let's call it m), and the number of bits set in k would then be equal to one plus the number of bits set in m. So provided that we can find a way to clear any 1 bit from the number k, we can use this pattern to solve the problem:

result[0] = 0  // No bits set in 0
for k = 1 to n:
let m = k, with some 1 bit cleared
result[k] = result[m] + 1

There's a famous bit twiddling trick where

 k & (k - 1)

yields the number formed by clearing the lowest 1 bit that's set in the number k, and does so in time O(1), assuming that the machine can do bitwise operations in constant time (which is usually a reasonable assumption). That means that this pseudocode should do it:

result[0] = 0
for k = 1 to n:
result[k] = result[k & (k - 1)] + 1

This does O(1) work per number O(n) total times, so the total work done is O(n).

Here's a different way to do this. Imagine, for example, that you know the counts of the bits in the numbers 0, 1, 2, and 3. You can use this to generate the counts of the bits of the numbers 4, 5, 6, and 7 by noticing that those numbers have bitwise representations which are formed by taking the bitwise representations of 0, 1, 2, and 3 and then prepending a 1. Similarly, if you then knew the bit counts of 0, 1, 2, 3, 4, 5, 6, and 7, you could generate the bit counts of 8, 9, 10, 11, 12, 13, 14, and 15 by noting that they too are formed by prepending a 1 bit to each of the lower numbers. That gives rise to this pseudocode, which for simplicity assumes that n is of the form 2k - 1 but could easily be adapted for a general n:

result[0] = 0;
for (int powerOfTwo = 1; powerOfTwo < n; powerOfTwo *= 2) {
for (int i = 0; i < powerOfTwo; i++) {
result[powerOfTwo + i] = result[i] + 1;
}
}

This also runs in time O(n). To see this, notice that across all iterations of all the loops here, each entry in the array is written to exactly once, with O(1) work done to determine which value is supposed to be put into the array at that slot.

Can hash tables really be O(1)?

You have two variables here, m and n, where m is the length of the input and n is the number of items in the hash.

The O(1) lookup performance claim makes at least two assumptions:

  • Your objects can be equality compared in O(1) time.
  • There will be few hash collisions.

If your objects are variable size and an equality check requires looking at all bits then performance will become O(m). The hash function however does not have to be O(m) - it can be O(1). Unlike a cryptographic hash, a hash function for use in a dictionary does not have to look at every bit in the input in order to calculate the hash. Implementations are free to look at only a fixed number of bits.

For sufficiently many items the number of items will become greater than the number of possible hashes and then you will get collisions causing the performance rise above O(1), for example O(n) for a simple linked list traversal (or O(n*m) if both assumptions are false).

In practice though the O(1) claim while technically false, is approximately true for many real world situations, and in particular those situations where the above assumptions hold.

Undefined offset with count()

Try this:

$arr = array();
for ($j=0;$j<=6;$j++) {
if (isset($MyArray[$i*7+$j])) $arr[] = count($MyArray[$i*7+$j]);
}
$result = max($arr);

I don't know exactly what $i refers to though...



Related Topics



Leave a reply



Submit