Generating v5 UUID. What is name and namespace?
I've read the man
page, but I do not undestand what name
and namespace
are for.
For version 3 and version 5 UUIDs the additional command line arguments namespace and name have to be given. The namespace is either a UUID in string representation or an identifier for internally pre-defined namespace UUIDs (currently known are "ns:DNS", "ns:URL", "ns:OID", and "ns:X500"). The name is a string of arbitrary length.
The namespace:
The namespace is either a UUID in string representation or an
Does it mean that I need to store it (UUID v4) somewhere in relation with the generated UUID v5? In either case, why is this not done automatically?
The name is a string of arbitrary length.
name
a completely random string? What is the purpose of it then? Can it be decoded from the UUID v5?
Name and namespace can be used to create a hierarchy of (very probably) unique UUIDs.
Roughly speaking, a type 3 or type 5 UUID is generated by hashing together a namespace identifier with a name. Type 3 UUIDs use MD5 and type 5 UUIDs use SHA1. Only 128-bits are available and 5 bits are used to specify the type, so all of the hash bits don't make it into the UUID. (Also MD5 is considered cryptographically broken, and SHA1 is on its last legs, so don't use this to verify data that needs to be "very secure"). That said, it gives you a way of creating a repeatable/verifiable "hash" function mapping a possibly hierarchical name onto a probabilistically unique 128-bit value, potentially acting like a hierarchical hash or MAC.
Suppose you have a (key,value) store, but it only supports one namespace. You can generate a large number of distinct logical namespaces using type 3 or type 5 UUIDs. First, create a root UUID for each namespace. This could be a type 1 (host+timestamp) or type 4 (random) UUID so long as you stash it somewhere. Alternatively you could create one random UUID for your root (or use the null
UUID: 00000000-0000-0000-0000-000000000000
as root) and then create a reproducible UUID for each namespace using " uuid -v5 $ROOTUUID $NAMESPACENAME
". Now you can create unique UUIDs for keys within a namespace using " uuid -v5 $NAMESPACEUUID $KEY
". These UUIDs can be thrown into a single key-value store with high probability of avoiding collision. This process can be repeated recursively so that if for instance the "value" associated with a UUID key in turn represents some sort of logical "namespace" like a bucket, container or directory, then its UUID can be used in turn to generate more hierarchical UUIDs.
The generated type 3 or type 5 UUID holds a (partial) hash of the namespace id and name-within-namespace (key). It no more holds the namespace UUID than does a message MAC hold the contents of the message it is encoded from. The name is an "arbitrary" (octet) string from the perspective of the uuid algorithm. Its meaning however depends on your application. It could be a filename within a logical directory, object-id within an object-store, etcetera.
While this works well for a moderately large number of namespaces and keys, it eventually runs out of steam if you are aiming for a very large numbers of keys that are unique with very high probability. The Wikipedia entry for the Birthday Problem (aka Birthday Paradox) includes a table that gives the probabilities of at least one collision for various numbers of keys and table sizes. For 128-bits, hashing 26 billion keys this way has a probability of collision of p=10^-18
(negligible), but 26 trillion keys, increases the probability of at least one collision to p=10^-12
(one in a trillion), and hashing 26*10^15
keys, increases the probability of at least one collision to p=10^-6
(one in a million). Adjusting for 5 bits that encode the UUID type, it will run out somewhat faster, so a trillion keys have roughly a 1-in-a-trillion chance of having a single collision.
See http://en.wikipedia.org/wiki/Birthday_problem#Probability_table for the probability table.
See http://www.ietf.org/rfc/rfc4122.txt for more details on UUID encodings.
Type 3 and Type 5 UUIDs are just a technique of stuffing a hash into a UUID.
An SHA1 hash outputs 160 bits (20 bytes). The result of the hash is converted into a UUID. From the 20-bytes from SHA1:
SHA1 Digest: 74738ff5 5367 e958 9aee 98fffdcd1876 94028007
UUID (v5): 74738ff5-5367-5958-9aee-98fffdcd1876
^_low nibble is set to 5 to indicate type 5
^_first two bits set to 1 and 0, respectively
(Note that the first two bits of '9' are already 1 and 0, respectively, so this has no effect).
What do I hash?
You're probably wondering what is it that I'm supposed to hash. Basically you hash the concatenation of:
sha1([NamespaceUUID]+[AnyString]);
You prefix your string with a so-called namespace to prevent name conflicts.
The UUID RFC pre-defines four namespaces for you:
NameSpace_DNS
: {6ba7b810-9dad-11d1-80b4-00c04fd430c8} NameSpace_URL
: {6ba7b811-9dad-11d1-80b4-00c04fd430c8} NameSpace_OID
: {6ba7b812-9dad-11d1-80b4-00c04fd430c8} NameSpace_X500
:{6ba7b814-9dad-11d1-80b4-00c04fd430c8} So, you could hash together:
StackOverflowDnsUUID = sha1(Namespace_DNS + "stackoverflow.com");
StackOverflowUrlUUID = sha1(Namespace_URL + "stackoverflow.com");
The RFC then defines how to:
The basic gist is to only take the first 128 bits, stuff a 5
in the type record, and then set the first two bits of the clock_seq_hi_and_reserved
section to 1 and 0, respectively.
More examples
Now that you have a function that generates a so-called Name , you can have the function (in pseudo-code):
UUID NameToUUID(UUID NamespaceUUID, String Name)
{
byte[] hash = sha1(NamespaceUUID.ToBytes() + Name.ToBytes());
UUID result;
Copy(hash, result, 16);
result[6] &= 0x0F;
result[6] |= 0x50;
result[8] &= 0x3F;
result[8] |= 0x80;
return result;
}
(Note that the endian-ness of your system can affect indices of the above bytes)
You can have calls:
uuid = NameToUUID(Namespace_DNS, 'www.stackoverflow.com');
uuid = NameToUUID(Namespace_DNS, 'www.google.com');
uuid = NameToUUID(Namespace_URL, 'http://www.stackoverflow.com');
uuid = NameToUUID(Namespace_URL, 'http://www.google.com/search&q=rfc+4112');
uuid = NameToUUID(Namespace_URL, 'http://stackoverflow.com/questions/5515880/test-vectors-for-uuid-version-5-converting-hash-into-guid-algorithm');
Now back to your question
For version 3 and version 5 UUIDs the additional command line arguments namespace and name have to be given. The namespace is either a UUID in string representation or an identifier for internally pre-defined namespace UUIDs (currently known are "ns:DNS", "ns:URL", "ns:OID", and "ns:X500"). The name is a string of arbitrary length.
The namespace is whatever UUID you like. It can be one of the pre-defined ones, or you can make up your own, eg:
UUID Namespace_RectalForeignExtractedObject = '4d79546f-6e67-7565-496e-486572417373'
The name is a string of arbitrary length.
The name is just the text you want to have appended to the namespace, then hashed, and stuffed into a UUID:
uuid = NameToUUID('8e884ace-bee4-11e4-8dfc-aa07a5b093db', 'screwdriver');
uuid = NameToUUID('8e884ace-bee4-11e4-8dfc-aa07a5b093db', 'toothbrush');
uuid = NameToUUID('8e884ace-bee4-11e4-8dfc-aa07a5b093db', 'broomstick');
uuid = NameToUUID('8e884ace-bee4-11e4-8dfc-aa07a5b093db', 'orange');
uuid = NameToUUID('8e884ace-bee4-11e4-8dfc-aa07a5b093db', 'axe handle');
uuid = NameToUUID('8e884ace-bee4-11e4-8dfc-aa07a5b093db', 'impulse body spray');
uuid = NameToUUID('8e884ace-bee4-11e4-8dfc-aa07a5b093db', 'iPod Touch');
Note : Any code released into public domain. No attribution required.
链接地址: http://www.djcxy.com/p/70112.html