How Unique Is Uniqid

How unique is uniqid?

Update, March 2014:

Firstly, it is important to note that uniqid is a bit of a misnomer as it doesnt guarantee a unique ID.

Per the PHP documentation:

WARNING!

This function does not create random nor unpredictable string. This
function must not be used for security purposes. Use cryptographically
secure random function/generator and cryptographically secure hash
functions to create unpredictable secure ID.

And

This function does not generate cryptographically secure tokens, in
fact without being passed any additional parameters the return value
is little different from microtime(). If you need to generate
cryptographically secure tokens use openssl_random_pseudo_bytes().


Setting more-entropy to true generates a more unique value, however the execution time is longer (though to a tiny degree), according to the docs:

If set to TRUE, uniqid() will add additional entropy (using the
combined linear congruential generator) at the end of the return
value, which increases the likelihood that the result will be unique.

Note the line increases the likelihood that the result will be unique and not that is will guarantee uniqueness.

You can 'endlessly' strive for uniqueness, up to a point, and enhance using any number of encryption routines, adding salts and the like- it depends on the purpose.

I'd recommend looking at the comments on the main PHP topic, notably:

http://www.php.net/manual/en/function.uniqid.php#96898

http://www.php.net/manual/en/function.uniqid.php#96549

http://www.php.net/manual/en/function.uniqid.php#95001

What I'd recommend is working out why you need uniqueness, is it for security (i.e. to add to an encryption/scrambling routine)? Also, How unique does it need to be? Finally, look at the speed consideration. Suitability will change with the underlying considerations.

multiple uniqid() calls not being unique

The result of uniqid() is not guaranteed to be unique, and your investigation with microtime() is indeed a clue as to why.

According to the manual page for uniqid(), it:

Gets a prefixed unique identifier based on the current time in microseconds.

So the main input is indeed the current "microtime". However, it also takes an extra parameter:

more_entropy
If set to TRUE, uniqid() will add additional entropy (using the combined linear congruential generator) at the end of the return value, which increases the likelihood that the result will be unique.

Note that even with this argument, the manual is careful not to guarantee uniqueness, but as with your manual use of rand(), it is adding an extra source of randomness which makes collisions vastly more unlikely.

To confirm, we can look at the source code for the function, where we can see that the output without more_entropy set is indeed just a hex representation of the current microsecond timestamp. An interesting piece to notice is this:

#if HAVE_USLEEP && !defined(PHP_WIN32)
if (!more_entropy) {
#if defined(__CYGWIN__)
php_error_docref(NULL, E_WARNING, "You must use 'more entropy' under CYGWIN");
RETURN_FALSE;
#else
usleep(1);
#endif
}
#endif

So, if you're not under Windows, the function will actually try to sleep for a microsecond in order to force subsequent values to be different.

This makes it a bad idea to run uniqid() lots of times in succession, because if it does succeed, it will do so slowly. (Requiring either a microsecond of sleep, or a call to the random-number generator.)

A better idea is to use it once to generate an arbitrary prefix, and then simply increment a counter for each item, which could look something like this:

$id_prefix = uniqid();
$id_suffix = 0;
$out = preg_replace_callback(
$regex,
function ($matches) use ($id_prefix, &$id_suffix) {
$id = $id_prefix . $id_suffix;
$id_suffix ++;
return $matches[1] . '... some html ...' . $id . ' ... ';
},
$out
);

How easily will uniqid() with more entropy create a duplicate?

From the source code, more_entropy adds nine random decimal digits, so you can expect a collision after 37,000 or so calls. (For how a billion turned into 37,000, see the birthday attack.) That of course ignores the fact that these digits are not actually random but generated by an LCG, and the same LCG is probably used in other places in the code, so the actual chance of collision is probably higher (by how much exactly, I have no idea).

Also worth noting that uniqid does not actually guarantee microsecond resolution as some PHP implementations (Windows, specifically) don't have access to a microsecond-precision clock.

In short, if you need a unique ID for anything security-sensitive, or collisions are costly, avoid uniqid. Otherwise, using it with more_entropy is probably fine (although the common pattern is to use uniqid(mt_rand(), true) to add even more extra entropy).

PHP - Is uniqid() a good practical solution to generate a unique and sequential key server side?

uniqid is nothing more than an interface to microtime (which is why it generates sequential IDs), and as such, it could be predictable and could create a duplicate.

This id should be sequential to allow for clustered indexing. It also needs to be unique, obviously. [...] Am I overthinking this?

Most databases, including MySQL, include transaction-safe sequence generators. MySQL's implementation, AUTO_INCREMENT, is pretty darn primitive, but also effective. An auto-inc primary key would ensure uniqueness and is, more importantly, not weird.

That said, just ensuring that the id column in the table is a primary key is complete defense against duplicate IDs.

PHP uniqid() no longer unique after updating PHP to 5.3.26

Okay, I was able to fix my problem by doing the following.

Change:

 $id = uniqid('button');

To

 $id = str_replace('.','-',uniqid('button',true));

That includes an extra number appended with a dot. Which creates invalid DOM IDs (in my case) so I just replace it with a dash.

Why does PHPs uniqid function return only 13 digits and not 14?

Found this on http://www.php.net/manual/en/function.uniqid.php#95001

Makes sense to me. Comment if you need an explanation

For the record, the underlying
function to uniqid() appears to be
roughly as follows:

$m=microtime(true);
sprintf("%8x%05x\n",floor($m),($m-floor($m))*1000000);

In other words, first 8 hex chars =
Unixtime, last 5 hex chars =
microseconds. This is why it has
microsecond precision. Also, it
provides a means by which to
reverse-engineer the time when a
uniqid was generated:

date("r",hexdec(substr(uniqid(),0,8)));

Increasingly as you go further down
the string, the number becomes "more
unique" over time, with the exception
of digit 9, where numeral prevalence
is 0..3>4>5..f, because of the
difference between 10^6 and 16^5 (this
is presumably true for the remaining
digits as well but much less
noticeable).

uniqid() not generating unique values within a Foreach loop - PHP

I see, my first guess is using uniqid. Uniqid is generated based on the current time in microseconds which could result in the same values. As the documentation mentions, this function does not guarantee the uniqueness of the return value.

Since most systems adjust the system clock by NTP or like, system time is changed constantly. Therefore, it is possible that this function does not return a unique ID for the process/thread. Use more_entropy to increase the likelihood of uniqueness.

A [not so reliable] solution would be setting the more_entropy to true:

uniqid('', true)

but again, remember this approach is not reliable.

So use another method for generating your unique IDs (UUID for example as explained in the comments here.)

Does randomUUID give a unique id?

If you get a UUID collision, go play the lottery next.

From Wikipedia:

Randomly generated UUIDs have 122 random bits. Out of a total of 128
bits, four bits are used for the version ('Randomly generated UUID'),
and two bits for the variant ('Leach-Salz').

With random UUIDs, the
chance of two having the same value can be calculated using
probability theory (Birthday paradox). Using the approximation

p(n)\approx 1-e^{-\tfrac{n^2}{{2x}}}

these are the probabilities of an
accidental clash after calculating n UUIDs, with x=2122:

n probability
68,719,476,736 = 236 0.0000000000000004 (4 × 10−16)
2,199,023,255,552 = 241 0.0000000000004 (4 × 10−13)
70,368,744,177,664 = 246 0.0000000004 (4 × 10−10)

To put these numbers into perspective,
the annual risk of someone being hit by a meteorite is estimated to be
one chance in 17 billion, which means the probability is about
0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of > UUIDs in a year and having one duplicate. In
other words, only after generating 1 billion UUIDs every second for
the next 100 years, the probability of creating just one duplicate
would be about 50%. The probability of one duplicate would be about
50% if every person on earth owns 600 million UUIDs.



Related Topics



Leave a reply



Submit