PHP Arrays – Remove duplicates ( Time complexity )


Okay this is not a question of "how to get all uniques" or "How to remove duplicates from my array in php". This is a question about the time complexity.

I figured that the array_unique is somewhat O(n^2 – n) and here's my implementation:

function array_unique2($array) 
    $to_return = array(); 
    $current_index = 0;

    for ( $i = 0 ; $i < count($array); $i++ ) 
        $current_is_unique = true; 

        for ( $a = $i+1; $a < count($array); $a++ ) 
            if ( $array[$i] == $array[$a] ) 
                $current_is_unique = false; 
        if ( $current_is_unique ) 
            $to_return[$current_index] = $array[$i];


    return $to_return; 

However when benchmarking this against the array_unique i got the following result:

Testing (array_unique2)… Operation took 0.52146291732788 s.

Testing (array_unique)… Operation took 0.28323101997375 s.

Which makes the array_unique twice as fast, my question is, why ( Both had the same random data ) ?

And a friend of mine wrote the following:

function array_unique2($a)
    $n = array();
    foreach ($a as $k=>$v)
        if (!in_array($v,$n))
    return $n;

which is twice as fast as the built in one in php.

I'd like to know, why?

What is the time-complexity of array_unique and in_array?

I removed the count($array) from both loops and just used a variable in the top of the function, that gained 2 seconds on 100 000 elements!

Best Solution

While I can't speak for the native array_unique function, I can tell you that your friends algorithm is faster because:

  1. He uses a single foreach loop as opposed to your double for() loop.
  2. Foreach loops tend to perform faster than for loops in PHP.
  3. He used a single if(! ) comparison while you used two if() structures
  4. The only additional function call your friend made was in_array whereas you called count() twice.
  5. You made three variable declarations that your friend didn't have to ($a, $current_is_unique, $current_index)

While none of these factors alone is huge, I can see where the cumulative effect would make your algorithm take longer than your friends.