Likelihood of Collision Using Most Significant Bits of a Uuid in Java

Likelihood of collision using most significant bits of a UUID in Java

According to the documentation, the static method UUID.randomUUID() generates a type 4 UUID.

This means that six bits are used for some type information and the remaining 122 bits are assigned randomly.

The six non-random bits are distributed with four in the most significant half of the UUID and two in the least significant half. So the most significant half of your UUID contains 60 bits of randomness, which means you on average need to generate 2^30 UUIDs to get a collision (compared to 2^61 for the full UUID).

So I would say that you are rather safe. Note, however that this is absolutely not true for other types of UUIDs, as Carl Seleborg mentions.

Incidentally, you would be slightly better off by using the least significant half of the UUID (or just generating a random long using SecureRandom).

On Multiple JVM, Java's UUID.randomUUID, what is probability for collision?

  1. What is possibility of duplicate UUID across JVMs.

It is possible, but the probability is vanishingly small. The Wikipedia page on the Birthday Problem has a probability table that can be used to estimate the likelihood of a collision.

For example, with 128 bit random UUIDs (and a high quality random number generator) the table says that you would need to generate 2.6 x 1010 UUIDs for the probability of a collision to reach 1 in 1018.

Earlier in the article you will find the mathematics on calculating ... and estimating ... the probabilities.


  1. Should we add some logic in code to verify collision before saving it to DB?

It really depends on the number of UUIDs you are likely to generate and store, and on the probability of collision that you are willing to accept.

However, if you are concerned by the possibility of a collision, you could just make the UUID columns a unique keys in the relevant database tables. It is more likely that a transaction will fail due to a hardware error than you will get a collision leading to a uniqueness constraint failure!


Followup questions:

I am not sure if this probability is for one generator or multiple?

The number of generators is not relevant, provided that they are >independent< random number generators.

As we have seen collision few hundred times with 1 million txns.

The mathematics don't lie. If you have seen a collision a few hundred times with 1 million transactions then something else is wrong. The assumptions are incorrect.

For example:

  • Maybe you are using a weak PRNG.
  • Maybe you are using a fixed seed or using a poor source of entropy when seeding the PRNGs.
  • Maybe you are modifying (e.g. shortening) the UUIDs in a way that drastically reduces their effective bit count.
  • Maybe something in your UUID generation methodology is causing UUIDs to be issued twice in a row ... sometimes.
  • Maybe objects are being duplicated when they shouldn't be ... and you end up with two copies of an object with the same UUID.
  • Maybe someone / something is faking UUIDs.

There are a lot of things that you need to check before you start doubting the mathematics.

My doubt is all 4 services are using same algorithm the probability will increase.

As I said, the number of generators does not alter the mathematics.

What do you say of chopping type-4 UUID in this manner

I suggest that you use either Random or SecureRandom to generate random bits and turn those into a number. That should be more portable.

I don't understand your point about chopping digits. Assuming that you are generating a 17 (decimal) digit number from enough bits from a long-cycle PRNG you should have a chance of 1 in 10**17 of a collision for any given generated pair of numbers. If the source is good, and you use enough bits then it is immaterial that you are "chopping" ...

It is not clear to me that 1 in 10**17 is good enough. It depends on how many numbers are going to exist (in your persistent store) at any given time. For instance, if you have 44 million numbers extant, the chance of a collision between at least one pair will be about 1%.

Try plugging some numbers into a Birthday Paradox Calculator.

EDIT: I think what you need is a generator that gives you 64 bit pseudo random numbers with a long cycle length AND an absolute guarantee of no repetitions for more numbers than you could ever possibly generate. It must also be possible to persist the state of the generator and resume it. Then, to get a 17 decimal digit "random" number, get the next value from the generator and test if it is in the range 0 ... 10**17 - 1. If it is, use it, if not repeat.

If you manage the generator correctly, you never get a repetition for the lifetime of the system, and therefore there is zero risk of a collision. But it is important that you use a PRNG (not a true RNG) and that you pick a PRNG that has the right properties.

From what I can tell, the Random class offers a PRNG with a cycle length of 2**48; i.e. you should get 2**48 numbers (e.g. using the getLong() method) before the numbers start to repeat. OTOH, SecureRandom gives either truly random or pseudo-random with a very long cycle count ... but with a small but non-zero chance of repeating a number on each call.

How big is the chance to get a Java UUID.randomUUID collision?

According to wikipedia, regarding the probability of duplicates in random UUIDs:

Only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. Or, to put it another way, the probability of one duplicate would be about 50% if every person on earth owned 600 million UUIDs.

I guess the same reasoning applies to Java's implementation of UUID. So no, you should not worry about this.



Related Topics



Leave a reply



Submit