Using Python's UUID to generate unique IDs, should I still check for duplicates?

Question

I'm using Python's UUID function to create unique IDs for objects to be stored in a database:

>>> import uuid
>>> print uuid.uuid4()
2eec67d5-450a-48d4-a92f-e387530b1b8b

Is it ok to assume that this is indeed a unique ID?

Or should I double-check that this unique ID has not already been generated against my database before accepting it as valid.

This question may be informative. In essence, it is possible that the ID is nonunique, however the chance of that is unbelievably small. — BrenBarn
– BrenBarn, Commented Jun 1, 2014 at 18:39
@roippi It's not that easy. Part of the UUID is a timestamp. So, if the UUIDs were generated at different times, the probability of a duplicate is 0. Otherwise, it depends on the RNG that's being used. In Python 3, the RNG of libc or libuuid is used, if available. Otherwise, it tries os.urandom and falls back to the random module. So, in conclusion, you should be safe. (Sorry for being overly specific, but I thought that might be interesting, for you or for the OP or someone else.) — Carsten
– Carsten, Commented Jun 1, 2014 at 18:52

Community · Accepted Answer · 2017-05-23 11:48:18Z

15

I would use uuid1, which has zero chance of collisions since it takes date/time into account when generating the UUID (unless you are generating a great number of UUID's at the same time).

You can actually reverse the UUID1 value to retrieve the original epoch time that was used to generate it.

uuid4 generates a random ID that has a very small chance of colliding with a previously generated value, however since it doesn't use monotonically increasing epoch time as an input (or include it in the output uuid), a value that was previously generated has a (very) small chance of being generated again in the future.

edited May 23, 2017 at 11:48

CommunityBot

11 silver badge

answered Jun 1, 2014 at 18:38

Martin Konecny

59.9k20 gold badges144 silver badges159 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Serge Ballesta Over a year ago

uuid version 1 are guaranteed.unique only if you take an actual MAC address. Be careful if you use virtual machines who generaly have fake network cards .

Nicholas Pipitone Over a year ago

Not quite accurate, uuid1 has a higher probability of collision. uuid.uuid4() is guaranteed to never collide, since it's a random 128-bit integer. Being able to collide a 128-bit integer is equivalent to being able to hack a specific public Bitcoin address (Impossible, and would also net you Billions of Dollars if you could). uuid1 is feasible, if your clock is wrong and get unlucky or get UUIDs from two machines at the same time or you simply sample it too quickly. Epoch gets used into the RNG that seeds uuid4.

Serge Ballesta · Accepted Answer · 2014-06-01 20:42:49Z

As long as you create all uuids on same system, unless there is a very serious flaw in python implementation (what I really cannot imagine), RFC 4122 states that they will all be distinct (edited : if using version 1,3 or 5).

The only problem that could arise with UUID, were if two systems create UUID exactly at the same moment and :

use same MAC address on their network card (really uncommon) and you are using UUID version 1
or use same name and you are using UUID version 3 or 5
or got same random number and you are using UUID version 4 (*)

So if you have a real MAC address or use an official DNS name or a unique LDAP DN, you can take for true that the generated uuids will be globally unique.

So IMHO, you only have to check unicity if you want to prevent your application against a malicious attack trying to voluntaryly use an existant uuid.

EDIT: As stated by Martin Konecny, in uuid4 the timestamp part is random too and not monotonic. So the possibilily is collision is very limited but not 0.

Wolph · Accepted Answer · 2014-06-01 18:37:44Z

6

You should always have a duplicate check, even though the odds are pretty good, you can always have duplicates.

I would recommend just adding a duplicate key constraint in your database and in case of an error retry.

answered Jun 1, 2014 at 18:37

Wolph

80.4k12 gold badges142 silver badges152 bronze badges

Comments

Nicholas Pipitone · Accepted Answer · 2024-02-24 12:17:46Z

The universally correct way to generate a UUID on a personal computer is uuid4, you should never use any other way except if specifically desired (Specific Desires shown below).

uuid4 is also guaranteed to be unique from UUID's generated from other computers.

In addition, uuid4 is currently cryptographically secure, meaning even if you expose your UUID's over the internet, the PRNG and entropy are strong enough to prevent others from guessing future UUID's you might generate. However, this is not guaranteed, so if cryptographic security matters for your application and you're handling sensitive user information, use secrets instead.

Note that when using V1's timestamp and MAC, if your UUID's are exposed over the internet, people can easily guess future timestamps and you've already exposed your MAC, creating collisions.

This guarantee is based on the 122-bits of randomness in a UUID, making it of equivalent difficulty to hacking a Bitcoin wallet

Version 1: UUIDs using timestamp and MAC Address.
- If you generate at the same timestamp, it will do +1
- Specific Desire: When looking through business logs, it can tell you when it was generated and what machine generated it.
Version 2: Not Used
Version 3: UUIDs based on the MD5 hash of given data. (Same as SHA1)
Version 4: UUIDs with random data
- Specific Desire: You want the strongest guarantee of uniqueness and security.
Version 5: UUIDs based on the SHA1 hash of given data.
- Specific Desire: When you want to be able to associate UUIDs with the "given data")

When you use V4, entropy is used to seed the PRNG. This includes timestamp, MAC address, and anything else collectible by the system (Internal RNG from CPU, Startup Time, Temperature readings, Disk Usage).

V1 is a kind of V4 where the only entropy is timestamp and MAC address and the PRNG is f(x) = x

Nuance into the discussion of urandom's security is here

Collectives™ on Stack Overflow

Using Python's UUID to generate unique IDs, should I still check for duplicates?

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related