How unique is UUID?

How safe is it to use UUID to uniquely identify something (I'm using it for files uploaded to the server)? As I understand it, it is based off random numbers. However, it seems to me that given enough time, it would eventually repeat it self, just by pure chance. Is there a better system or a pattern of some type to alleviate this issue?


Very safe:

the annual risk of a given person being hit by a meteorite is estimated to be one chance in 17 billion, which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%.

Caveat:

However, these probabilities only hold when the UUIDs are generated using sufficient entropy. Otherwise, the probability of duplicates could be significantly higher, since the statistical dispersion might be lower. Where unique identifiers are required for distributed applications, so that UUIDs do not clash even when data from many devices is merged, the randomness of the seeds and generators used on every device must be reliable for the life of the application. Where this is not feasible, RFC4122 recommends using a namespace variant instead.

Source: http://en.wikipedia.org/wiki/UUID#Random_UUID_probability_of_duplicates

Wikepedia article has this section removed, but they have a useful reference External link to it somewhere else: http://www.h2database.com/html/advanced.html#uuid


如果“给定足够的时间”意味着100年,而你以每秒10亿的速度创造它们,那么是的,100年后你有50%的机会发生碰撞。


There is more than one type of UUID, so "how safe" depends on which type (which the UUID specifications call "version") you are using.

  • Version 1 is the time based plus MAC address UUID. The 128-bits contains 48-bits for the network card's MAC address (which is uniquely assigned by the manufacturer) and a 60-bit clock with a resolution of 100 nanoseconds. That clock wraps in 3603 AD so these UUIDs are safe at least until then (unless you need more than 10 million new UUIDs per second or someone clones your network card). I say "at least" because the clock starts at 15 October 1582, so you have about 400 years after the clock wraps before there is even a small possibility of duplications.

  • Version 4 is the random number UUID. There's six fixed bits and the rest of the UUID is 122-bits of randomness. See Wikipedia or other analysis that describe how very unlikely a duplicate is.

  • Version 3 is uses MD5 and Version 5 uses SHA-1 to create those 122-bits, instead of a random or pseudo-random number generator. So in terms of safety it is like Version 4 being a statistical issue (as long as you make sure what the digest algorithm is processing is always unique).

  • Version 2 is similar to Version 1, but with a smaller clock so it is going to wrap around much sooner. But since Version 2 UUIDs are for DCE, you shouldn't be using these.

  • So for all practical problems they are safe. If you are uncomfortable with leaving it up to probabilities (eg your are the type of person worried about the earth getting destroyed by a large asteroid in your lifetime), just make sure you use a Version 1 UUID and it is guaranteed to be unique (in your lifetime, unless you plan to live past 3603 AD).

    So why doesn't everyone simply use Version 1 UUIDs? That is because Version 1 UUIDs reveal the MAC address of the machine it was generated on and they can be predictable -- two things which might have security implications for the application using those UUIDs.

    链接地址: http://www.djcxy.com/p/16266.html

    上一篇: GUID和UUID有什么区别?

    下一篇: UUID有多独特?