List of Big-O for PHP functions
Since it doesn't seem like anyone has done this before I thought it'd be good idea to have it for reference somewhere. I've gone though and either via benchmark or code-skimming to characterize the array_*
functions. I've tried to put the more interesting Big-O near the top. This list is not complete.
Note: All the Big-O where calculated assuming a hash lookup is O(1) even though it's really O(n). The coefficient of the n is so low, the ram overhead of storing a large enough array would hurt you before the characteristics of lookup Big-O would start taking effect. For example the difference between a call to array_key_exists
at N=1 and N=1,000,000 is ~50% time increase.
Interesting Points:
isset
/array_key_exists
is much faster thanin_array
andarray_search
+
(union) is a bit faster thanarray_merge
(and looks nicer). But it does work differently so keep that in mind.shuffle
is on the same Big-O tier asarray_rand
array_pop
/array_push
is faster thanarray_shift
/array_unshift
due to re-index penalty
Lookups:
array_key_exists
O(n) but really close to O(1) - this is because of linear polling in collisions, but because the chance of collisions is very small, the coefficient is also very small. I find you treat hash lookups as O(1) to give a more realistic big-O. For example the different between N=1000 and N=100000 is only about 50% slow down.
isset( $array[$index] )
O(n) but really close to O(1) - it uses the same lookup as array_key_exists. Since it's language construct, will cache the lookup if the key is hardcoded, resulting in speed up in cases where the same key is used repeatedly.
in_array
O(n) - this is because it does a linear search though the array until it finds the value.
array_search
O(n) - it uses the same core function as in_array but returns value.
Queue functions:
array_push
O(∑ var_i, for all i)
array_pop
O(1)
array_shift
O(n) - it has to reindex all the keys
array_unshift
O(n + ∑ var_i, for all i) - it has to reindex all the keys
Array Intersection, Union, Subtraction:
array_intersect_key
if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)
array_intersect
if intersection 100% do O(n^2*∑param_i_count, for all i), if intersection 0% intersect O(n^2)
array_intersect_assoc
if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)
array_diff
O(π param_i_size, for all i) - That's product of all the param_sizes
array_diff_key
O(∑ param_i_size, for i != 1) - this is because we don't need to iterate over the first array.
array_merge
O( ∑ array_i, i != 1 ) - doesn't need to iterate over the first array
+
(union) O(n), where n is size of the 2nd array (ie array_first + array_second) - less overhead than array_merge since it doesn't have to renumber
array_replace
O( ∑ array_i, for all i )
Random:
shuffle
O(n)
array_rand
O(n) - Requires a linear poll.
Obvious Big-O:
array_fill
O(n)
array_fill_keys
O(n)
range
O(n)
array_splice
O(offset + length)
array_slice
O(offset + length) or O(n) if length = NULL
array_keys
O(n)
array_values
O(n)
array_reverse
O(n)
array_pad
O(pad_size)
array_flip
O(n)
array_sum
O(n)
array_product
O(n)
array_reduce
O(n)
array_filter
O(n)
array_map
O(n)
array_chunk
O(n)
array_combine
O(n)
I'd like to thank Eureqa for making it easy to find the Big-O of the functions. It's an amazing free program that can find the best fitting function for arbitrary data.
EDIT:
For those who doubt that PHP array lookups are O(N)
, I've written a benchmark to test that (they are still effectively O(1)
for most realistic values).
$tests = 1000000;
$max = 5000001;
for( $i = 1; $i <= $max; $i += 10000 ) {
//create lookup array
$array = array_fill( 0, $i, NULL );
//build test indexes
$test_indexes = array();
for( $j = 0; $j < $tests; $j++ ) {
$test_indexes[] = rand( 0, $i-1 );
}
//benchmark array lookups
$start = microtime( TRUE );
foreach( $test_indexes as $test_index ) {
$value = $array[ $test_index ];
unset( $value );
}
$stop = microtime( TRUE );
unset( $array, $test_indexes, $test_index );
printf( "%d,%1.15f\n", $i, $stop - $start ); //time per 1mil lookups
unset( $stop, $start );
}
Big O of two functions
Here is how I would write that function in a more straightforward approach: The inner for
loop is the perfect scenario for a do ... while
loop.
function foo( $j)
{
$ids = array();
for( $i = 0; $i < $j; $i++)
{
do
{
$id = gen_id();
} while( isset( $ids[$id]));
$ids[$id] = true;
}
return $ids;
}
Edit: Actually, this function is O($j)
/ O(n)
, since gen_id()
and isset()
are both constant O(1)
(isset()
is a hash table lookup). So, the inner efficiency of the for
loop is constant time O(1)
, making the whole function O(n)
.
Big-O of PHP arrays
You're correct in saying that Kendall Hopkins's statement is confusing / incorrect (though the rest of his post is a pretty awesome benchmarking reference).
Saying that an algorithm is O(1) for small inputs is a rather nonsensical statement. Saying an algorithm is O(1) means that the running time will always be less than a constant. Therefore, if we constrict our input space to a finite set (i.e. 'small' inputs) then any algorithm will be O(1), because we can just take the longest running time from that set as our constant.
What he means, is that, for an initial range of inputs, increasing the size of the array will make little difference to the running-time of the array.
If you want to learn more about big-O notation, then the [http://en.wikipedia.org/wiki/Analysis_of_algorithms wikipedia page] is a great jumping off point.
What's the Big O run time of the two following solutions
Question 1
In the solution two you have
Sort
Sort
Single loop
The complexity of the loop is O(n), so we have to look at the sort. According to PHP, it uses Quicksort, so the complexity is O(n log n)
Note: Like most PHP sorting functions, sort() uses an implementation of » Quicksort.
Therefore, we have O(n log n) + O(n log n) + O(n). We take the bigger which is O(n log n).
Question 2
In that case you could remove the sort
calls, and it would be O(n). If you keep them, it will continue being O(n log n).
Question 3
Yes, the complexity is O(n) + O(n), so the final complexity is O(n) as you said.
What would the Big O notation be for alphabetically sorting each element in a nested list comprehension
For the sake of analysis, let's replace the comprehensions with their equivalent loops as described in the docs.
Your first approach becomes
substr = []
for i in range(len(s)): # O(n)
for j in range(i + 1, len(s) + 1): # O(n)
substr.append("".join(sorted(s[i: j]))) # O(n*logn)
# Overall O(n^3 * logn)
Note that substr.append("".join(sorted(s[i: j])))
is not multiplicative, rather it is a sequential combination of the following operations (no actual assignments happen)
temp = s[i:j] # O(j-i), which worst case we can take O(n)
sorted_temp = sorted(temp) # O(n*logn)
joined_temp = "".join(sorted_temp) # O(n)
substr.append(joined_temp) # amortized O(1)
# Overall O(n*logn)
Now in the second approach the code becomes
substr = []
for i in range(len(s)): # O(n)
for j in range(i + 1, len(s) + 1): # O(n)
substr.append(s[i:j]) # O(n) for the slice
# first loop: O(n^3)
sortedSubstr = []
for s in substr: # O(n^2), we have appended n^2 times in the first loop
sortedSubstr.append("".join(sorted(s))) # O(n*logn)
# second loop: O(n^3 * logn)
# Overall O(n^3 * logn)
So as you can see, they should be the same time complexity. I almost missed substr
length being n^2
rather than n
, so that might be a pitfall.
References for time complexity of various operations:
- Time complexity of string slice
- Why is the time complexity of python's list.append() method O(1)?
Big-O notation name for O(n^c), where c >= 4
Generally, any n^c
for integer c
is called a polynomial time complexity. It matches with the polynomial class of problem as well and is denoted by P
. Here is the time complexity class.
calculate time complexity for this solution
Ok, now I see what you are doing! Yes, you are right, it's O(n²)
, because you're always looping through every element of data. Your data will decrease by one every loop, but because with O complexity we don't care about constants, we can say it's O(n) for each loop
. Multiplying (because one is inside the other) we have O(n²)
.
Related Topics
Performance of Foreach, Array_Map With Lambda and Array_Map With Static Function
How to Increase the Execution Timeout in PHP
Does PHP Allow Named Parameters So That Optional Arguments Can Be Omitted from Function Calls
What Is the Purpose of the Question Marks Before Type Declaration in PHP7 (String or Int)
How to Find the PHP.Ini File Used by the Command Line
Generating a Random Password in PHP
How to Get the File Extension in PHP
Woocommerce, Get Current Product Id
Get Url Query String Parameters
Show Values from a MySQL Database Table Inside a HTML Table on a Webpage
Login to Remote Site With PHP Curl
Html Element Array, Name="Something[]" or Name="Something"
Having Issue With Matching Rows in the Database Using Pdo
How to Create a Simple 'Hello World' Module in Magento