The mystery of upper-case and lower-case GUID values

For every Windows developer GUIDs are familiar values, but they are widely used in distributed systems, too.

COM classes, interfaces, type libraries all are identified using GUIDs. And, of course, .NET had to inherit this, so every assembly, interface, class, enumeration, structure, and delegate gets its own GUID, automatically, or by means of the Guid attribute.

Then, when you think about creating an installer for your application, Microsoft Installer requires an abundance of GUID values to identify components, features and other objects it can track.

As much as anything else, all information about GUIDs can be read on Wikipedia (GUID).

What I want to talk about is GUID abuse and it has to do with converting GUIDs to strings.

Don’t.

Really.

A GUID is a 128-bit number calculated so that it is globally unique after 100 nanoseconds (or so).

If you convert it to text it becomes something else. It may be formatted in a gazillion different ways. Braces, parenthesis, hyphens, spaces, lowercase, uppercase letters, GUID text encoding, UUID canonical form, all is possible. UTF-32, anyone? Due to this variability comparing GUIDs by text will become your nightmare. Ordinal compare may and will fail and culture-aware case invariant string compare is actually quite slow.

On Windows, if you use the Guidgen tool available with Microsoft Visual Studio, you will find the mysterious format which has the third group in lower case and all others in upper case hexadecimal. This doesn’t help avoiding confusion.

{600E2D98-C0BF-4d6b-BE53-88DE15A01346}The reason for this strange behavior is the format string resources it uses to convert the bytes to string.

I think the uuidgen tool used to do the same thing, which means it was a feature of the UuidToString function, but the version on my machine (Visual Studio 2005 SP1) does return all lowercase (or uppercase with the -c switch).

So the actual reason for the confusion may have been a Microsoft intern in the early 90’s putting the wrong string resources into the SDK sample (and the Visual Studio tool).

I know GUIDs are compelling to be used as primary keys in databases. Almost anybody does it, I have received invoices with GUID invoice numbers and I’ve even done it myself and may be doing it in the future.

They have the fantastic property of being applicable across creators, so you can create the primary key on the client instead of going through the chore of reading back the new value from a sequence or automatic id. And GUIDs survive replication and backup/restore cycles.

Many database management systems support a native GUID type, or at least something like RAW(16). But then again, it’s a pain to type SQL queries using binary data types.

So what can we do?

Have a certain amount of discipline when using GUIDs as text. Use a small char datatype like CHAR(36), set the column’s code-page/collation to us-ascii, case-sensitive and choose and stick with one of the standard formats, e.g. 600e2d98-c0bf-4d6b-be53-88de15a01346 or {600E2D98-C0BF-4D6B-BE53-88DE15A01346}. Nowadays I’d prefer the first format, it is shorter and doesn’t need tweaking the output of Guid.ToString.

Advertisements
This entry was posted in Coding Horror. Bookmark the permalink.

6 Responses to The mystery of upper-case and lower-case GUID values

  1. ZHOU says:

    Thanks! I met the problem as well.

  2. Fergal says:

    Thanks Henry – I was came across this problem with nhibernate and sqlite.net, it expect lowercase guids only.

  3. Basil Bourque says:

    Actually, the specification for UUID (the international standard form of GUID) states clearly:


    6.5.4 Software generating the hexadecimal representation of a UUID shall not use upper case letters.
    NOTE – It is recommended that the hexadecimal representation used in all human-readable formats be restricted to lower-case letters. Software processing this representation is, however, required to accept both upper and lower case letters as specified in 6.5.2.

    http://www.itu.int/rec/T-REC-X.667/en

  4. dragon376 says:

    Interesting… why is Apple NSUUID returning all uppercase then….

  5. karlgjertsen says:

    I know that a Guid is using hex values, hence the 0-9 and a-f, but I recently came across a comparison that was comparing strings. The case sensitivity caused headaches, so please don’t compare strings!

    If you want a way to create a Guid, there is also http://www.createaguid.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s