Sequential Guid Generator

Sequential Guid Generator

this person came up with something to make sequential guids, here's a link

http://developmenttips.blogspot.com/2008/03/generate-sequential-guids-for-sql.html

relevant code:

public class SequentialGuid {
Guid _CurrentGuid;
public Guid CurrentGuid {
get {
return _CurrentGuid;
}
}

public SequentialGuid() {
_CurrentGuid = Guid.NewGuid();
}

public SequentialGuid(Guid previousGuid) {
_CurrentGuid = previousGuid;
}

public static SequentialGuid operator++(SequentialGuid sequentialGuid) {
byte[] bytes = sequentialGuid._CurrentGuid.ToByteArray();
for (int mapIndex = 0; mapIndex < 16; mapIndex++) {
int bytesIndex = SqlOrderMap[mapIndex];
bytes[bytesIndex]++;
if (bytes[bytesIndex] != 0) {
break; // No need to increment more significant bytes
}
}
sequentialGuid._CurrentGuid = new Guid(bytes);
return sequentialGuid;
}

private static int[] _SqlOrderMap = null;
private static int[] SqlOrderMap {
get {
if (_SqlOrderMap == null) {
_SqlOrderMap = new int[16] {
3, 2, 1, 0, 5, 4, 7, 6, 9, 8, 15, 14, 13, 12, 11, 10
};
// 3 - the least significant byte in Guid ByteArray [for SQL Server ORDER BY clause]
// 10 - the most significant byte in Guid ByteArray [for SQL Server ORDERY BY clause]
}
return _SqlOrderMap;
}
}
}

Generated Sequential GUIDs are sometimes not sequential

I think the problem is that you assume that "sequential" means "increment by one". Although your example shows that this happens sometimes there's no actual guarantee in the documentation that it will. We could argue the definition of "sequence" or that maybe this function is named poorly or even wrong but at the end of the day, according to the documentation it appears to be working 100% exactly as defined:

Creates a GUID that is greater than any GUID previously generated by this function

For people that have a problem with the general concept of a "sequential GUID" use the definition of "sequence" that's about "following a logical order" and also try and remember the Melissa virus. This was mildly document by this line:

For security reasons, UuidCreate was modified so that it no longer uses a machine's MAC address to generate UUIDs. UuidCreateSequential was introduced to allow creation of UUIDs using the MAC address of a machine's Ethernet card.

So the "sequence" is basing the GUID off of the MAC address.

Back to the OP's question, my best guess for why you have GUIDs that don't increment by one each time is the simple fact that you are probably not the only one calling this function. This is a core Windows function and there's bound to be other components, programs, services, etc. that use it.

Sequential GUID generation in Java with SQL Server uniqueidentifier

The default option of newsequentialid() of course did not work, because hibernate does not use the default, it always sets a value issued by its generator.

By taking a quick look at the JUG library, it appears that it does not offer any means of generating GUIDs sequentially. I do not know why you thought that the generate() method of the generator obtained via Generators.timeBasedGenerator() would give you sequential GUIDs. A time based generator is simply a generator which takes the current time into account when generating GUIDs, but it is free to mangle the current time coordinate in any way it sees fit when embedding it into the GUID, so it does not guarantee that there will be anything sequential about the resulting GUIDs.

Generally, the terms "GUID" and "sequential" are incompatible with each other. You can either have keys that are GUIDs, or keys that are sequential, but under normal circumstances, you cannot have both.

So, are you sure that the keys must be GUIDs? Personally, I find GUIDs very hard to work with.

But if you must do any hacks necessary as to have sequential GUIDs, then my recommendation would be to write your own function which generates 36-character strings that look like GUIDs, but are sequential.

The sequential part should come from a SEQUENCE, which simply issues sequential integers. (I believe MS-SQL-Server supports them.)

You can read the IETF's UUID specification on how to construct GUIDs properly, but you do not have to follow it to the letter. For the most part, if it simply looks like a GUID, it is good enough.

If you can have a single global sequence for this, that's good. If you cannot have a single global sequence, then you need to somehow identify your sequences, and then take the identifier of each sequence into account when generating your GUIDs. (That would be the "node id" mentioned in the IETF documentation.)

I once had the unreasonable requirement that the rows that I was to transmit to a certain web service had to be identified by GUIDs, and there was too much red tape that was preventing me from contacting them to ask them "are you friggin' serious?" so I just transmitted GUIDs like the following:

|--- random part -----| |-- key ---|
314a9a1b-6295-11e5-8d2c-000000000001
314a9a1b-6295-11e5-8d2c-000000000002
314a9a1b-6295-11e5-8d2c-000000000003
314a9a1b-6295-11e5-8d2c-000000000004
314a9a1b-6295-11e5-8d2c-000000000005
...

They did not say a word.

What are the performance improvement of Sequential Guid over standard Guid?

GUID vs.Sequential GUID




A typical pattern it's to use Guid as PK for tables, but, as referred in other discussions (see Advantages and disadvantages of GUID / UUID database keys)
there are some performance issues.



This is a typical Guid sequence


f3818d69-2552-40b7-a403-01a6db4552f7

7ce31615-fafb-42c4-b317-40d21a6a3c60

94732fc7-768e-4cf2-9107-f0953f6795a5




Problems of this kind of data are:<

-

  • Wide distributions of values
  • Almost randomically ones
  • Index usage is very, very, very bad
  • A lot of leaf moving
  • Almost every PK need to be at least
    on a non clustered index
  • Problem happens both on Oracle and
    SQL Server




A possible solution is using Sequential Guid, that are generated as follows:



cc6466f7-1066-11dd-acb6-005056c00008

cc6466f8-1066-11dd-acb6-005056c00008

cc6466f9-1066-11dd-acb6-005056c00008



How to generate them From C# code:

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

public static Guid SequentialGuid()
{
const int RPC_S_OK = 0;
Guid g;
if (UuidCreateSequential(out g) != RPC_S_OK)
return Guid.NewGuid();
else
return g;
}



Benefits

  • Better usage of index
  • Allow usage of clustered keys (to be
    verified in NLB scenarios)
  • Less disk usage
  • 20-25% of performance increase at a
    minimum cost




Real life measurement:
Scenario:

  • Guid stored as UniqueIdentifier
    types on SQL Server
  • Guid stored as CHAR(36) on Oracle
  • Lot of insert operations, batched
    together in a single transaction
  • From 1 to 100s of inserts depending
    on table
  • Some tables > 10 millions rows




Laboratory Test – SQL Server


VS2008 test, 10 concurrent users, no think time, benchmark process with 600 inserts in batch for leaf table

Standard Guid

Avg. Process duration: 10.5 sec

Avg. Request for second: 54.6

Avg. Resp. Time: 0.26


Sequential Guid

Avg. Process duration: 4.6 sec

Avg. Request for second: 87.1

Avg. Resp. Time: 0.12



Results on Oracle (sorry, different tool used for test) 1.327.613 insert on a table with a Guid PK


Standard Guid, 0.02 sec. elapsed time for each insert, 2.861 sec. of CPU time, total of 31.049 sec. elapsed


Sequential Guid, 0.00 sec. elapsed time for each insert, 1.142 sec. of CPU time, total of 3.667 sec. elapsed


The DB file sequential read wait time passed from 6.4 millions wait events for 62.415 seconds to 1.2 million wait events for 11.063 seconds.



It's important to see that all the sequential guid can be guessed, so it's not a good idea to use them if security is a concern, still using standard guid.

To make it short... if you use Guid as PK use sequential guid every time they are not passed back and forward from a UI, they will speed up operation and do not cost anything to implement.

Generate a sequential guid by myself

The code below shows how RavenDB achieves this on the server for eTags, you can see it in context here. The other piece of the code is here

public Guid CreateSequentialUuid()
{
var ticksAsBytes = BitConverter.GetBytes(currentEtagBase);
Array.Reverse(ticksAsBytes);
var increment = Interlocked.Increment(ref sequentialUuidCounter);
var currentAsBytes = BitConverter.GetBytes(increment);
Array.Reverse(currentAsBytes);
var bytes = new byte[16];
Array.Copy(ticksAsBytes, 0, bytes, 0, ticksAsBytes.Length);
Array.Copy(currentAsBytes, 0, bytes, 8, currentAsBytes.Length);
return bytes.TransfromToGuidWithProperSorting();
}

I don't think this is accessible via the API as it's an internal detail, but you could implement something similar yourself. Basically it relies on having a global counter (currentEtagBase and sequentialUuidCounter).

Sequential GUIDs

The Win32 UuidCreateSequential creates a Version 1 uuid.

Here's some sample version 1 uuid's created on my computer using UuidCreateSequential:

GuidToString                            Raw bytes
====================================== =================================================
{1BE8D85D-63D1-11E1-80DB-B8AC6FBE26E1} 1B E8 D8 5D 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1
{1BE8D85E-63D1-11E1-80DB-B8AC6FBE26E1} 1B E8 D8 5E 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1
{1BE8D85F-63D1-11E1-80DB-B8AC6FBE26E1} 1B E8 D8 5F 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1
{1BE8D860-63D1-11E1-80DB-B8AC6FBE26E1} 1B E8 D8 60 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1
{1BE8D861-63D1-11E1-80DB-B8AC6FBE26E1} 1B E8 D8 61 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1
{1BE8D862-63D1-11E1-80DB-B8AC6FBE26E1} 1B E8 D8 62 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1
{1BE8D863-63D1-11E1-80DB-B8AC6FBE26E1} 1B E8 D8 63 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1
{1BE8D864-63D1-11E1-80DB-B8AC6FBE26E1} 1B E8 D8 64 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1
{1BE8D865-63D1-11E1-80DB-B8AC6FBE26E1} 1B E8 D8 65 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1
{220FB46C-63D1-11E1-80DB-B8AC6FBE26E1} 22 0F B4 6C 63 D1 11 E1 80 DB B8 AC 6F BE 26 E1

The first thing that's important to note that these uuid contain my machine's MAC address (B8AC6FBE26E1):

Sample Image

                        Node
======================= ============
1BE8D85D-63D1-11E1-80DB B8AC6FBE26E1
1BE8D85E-63D1-11E1-80DB B8AC6FBE26E1
1BE8D85F-63D1-11E1-80DB B8AC6FBE26E1
1BE8D860-63D1-11E1-80DB B8AC6FBE26E1
1BE8D861-63D1-11E1-80DB B8AC6FBE26E1
1BE8D862-63D1-11E1-80DB B8AC6FBE26E1
1BE8D863-63D1-11E1-80DB B8AC6FBE26E1
1BE8D864-63D1-11E1-80DB B8AC6FBE26E1
1BE8D865-63D1-11E1-80DB B8AC6FBE26E1
220FB46C-63D1-11E1-80DB B8AC6FBE26E1

So if you're hoping for different computers to generate guid's that are "close" to each other, you're going to be disappointed.

Let's look at the rest of the values.

Seven and a half bytes of the remaining 10 bytes are a timestamp; the number of 100ns intervals since 00:00:00 15 October 1582. Rearranging those timestamp bytes together:

Timestamp              Node
=============== ====== ============
1E163D11BE8D85D 1-80DB B8AC6FBE26E1
1E163D11BE8D85E 1-80DB B8AC6FBE26E1
1E163D11BE8D85F 1-80DB B8AC6FBE26E1
1E163D11BE8D860 1-80DB B8AC6FBE26E1
1E163D11BE8D861 1-80DB B8AC6FBE26E1
1E163D11BE8D862 1-80DB B8AC6FBE26E1
1E163D11BE8D863 1-80DB B8AC6FBE26E1
1E163D11BE8D864 1-80DB B8AC6FBE26E1
1E163D11BE8D865 1-80DB B8AC6FBE26E1
1E163D1220FB46C 1-80DB B8AC6FBE26E1

You can see that guid's created on the same machine by UuidCreateSequential will be together, as they are chronological.


The 1 you see is the version number, in this case meaning a time based uuid. There are 5 defined versions:

  • 1: time based version (UuidCreateSequential)
  • 2: DCE Security version, with embedded POSIX UIDs
  • 3: Name-based version that uses MD5 hashing
  • 4: Randomly or pseudo-randomly generated version (UuidCreate)
  • 5: Name-based version that uses SHA-1 hashing
  • 6: The version RFC 4122 forgot (unofficial)

Giving:

Timestamp       Version      Node
=============== ======= ==== ============
1E163D11BE8D85D 1 80DB B8AC6FBE26E1
1E163D11BE8D85E 1 80DB B8AC6FBE26E1
1E163D11BE8D85F 1 80DB B8AC6FBE26E1
1E163D11BE8D860 1 80DB B8AC6FBE26E1
1E163D11BE8D861 1 80DB B8AC6FBE26E1
1E163D11BE8D862 1 80DB B8AC6FBE26E1
1E163D11BE8D863 1 80DB B8AC6FBE26E1
1E163D11BE8D864 1 80DB B8AC6FBE26E1
1E163D11BE8D865 1 80DB B8AC6FBE26E1
1E163D1220FB46C 1 80DB B8AC6FBE26E1

The last word contains two things.

The lower 12 bits is the machine-specifc Clock Sequence number:

Timestamp       Version   Clock Sequence   Node
=============== ======= = ================ ============
1E163D11BE8D85D 1 8 0DB B8AC6FBE26E1
1E163D11BE8D85E 1 8 0DB B8AC6FBE26E1
1E163D11BE8D85F 1 8 0DB B8AC6FBE26E1
1E163D11BE8D860 1 8 0DB B8AC6FBE26E1
1E163D11BE8D861 1 8 0DB B8AC6FBE26E1
1E163D11BE8D862 1 8 0DB B8AC6FBE26E1
1E163D11BE8D863 1 8 0DB B8AC6FBE26E1
1E163D11BE8D864 1 8 0DB B8AC6FBE26E1
1E163D11BE8D865 1 8 0DB B8AC6FBE26E1
1E163D1220FB46C 1 8 0DB B8AC6FBE26E1

This machine-wide persistent value is incremented if:

  • you switched network cards
  • you generated a UUID less than 100 ns from the last one (and the timestamp would collide)

So, again, any guid's created by UuidCreateSequential will (ideally) have the same Clock Sequence number, making them "near" to each other.

The final 2 bits, is called a Variant, and is always set to binary 10:

Timestamp       Version Variant Clock Sequence   Node
=============== ======= ======= ================ ============
1E163D11BE8D85D 1 8 0DB B8AC6FBE26E1
1E163D11BE8D85E 1 8 0DB B8AC6FBE26E1
1E163D11BE8D85F 1 8 0DB B8AC6FBE26E1
1E163D11BE8D860 1 8 0DB B8AC6FBE26E1
1E163D11BE8D861 1 8 0DB B8AC6FBE26E1
1E163D11BE8D862 1 8 0DB B8AC6FBE26E1
1E163D11BE8D863 1 8 0DB B8AC6FBE26E1
1E163D11BE8D864 1 8 0DB B8AC6FBE26E1
1E163D11BE8D865 1 8 0DB B8AC6FBE26E1
1E163D1220FB46C 1 8 0DB B8AC6FBE26E1

So there you have it. Sequential guid's are sequential; and if you create them on the same machine they will be "near" to each other in a database.


But you want to know what actually happens with two sequential UUID's created on different computers.

Using our newfound knowledge of Version 1 guids, let's construct two guid's for the same timestamp from different machines, e.g.:

{1BE8D85D-63D1-11E1-80DB-B8AC6FBE26E1}
{1BE8D85D-63D1-11E1-80DB-123456789ABC}

First let's insert a bunch of guid's with sequential timestamps. First create a temporary table to store our guid's in, and cluster by the guid:

--DROP table #uuidOrderingTest
CREATE TABLE #uuidOrderingTest
(
uuid uniqueidentifier not null
)

CREATE clustered index IX_uuidorderingTest_uuid ON #uuidOrderingTest
(
uuid
)

Now insert the data:

INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D866-63D1-11E1-80DB-B8AC6FBE26E1}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D862-63D1-11E1-80DB-B8AC6FBE26E1}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D861-63D1-11E1-80DB-B8AC6FBE26E1}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D85E-63D1-11E1-80DB-B8AC6FBE26E1}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D864-63D1-11E1-80DB-B8AC6FBE26E1}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D863-63D1-11E1-80DB-B8AC6FBE26E1}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D85F-63D1-11E1-80DB-B8AC6FBE26E1}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D85D-63D1-11E1-80DB-B8AC6FBE26E1}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D865-63D1-11E1-80DB-B8AC6FBE26E1}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D860-63D1-11E1-80DB-B8AC6FBE26E1}')

Note: i insert them in random timestamp order, to illustrate that SQL Server will cluster them.

Get the rows back and see what order they're in sequential (timestamp) order:

SELECT * FROM #uuidOrderingTest

uuid
------------------------------------
1BE8D85D-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D85E-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D85F-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D860-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D861-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D862-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D863-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D864-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D865-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D866-63D1-11E1-80DB-B8AC6FBE26E1

Now lets insert guid's with:

  • the same timestamps
  • but different node (i.e. MAC address):

Insert the new guids from a "different" computer:

INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D866-63D1-11E1-80DB-123456789ABC}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D862-63D1-11E1-80DB-123456789ABC}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D861-63D1-11E1-80DB-123456789ABC}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D85E-63D1-11E1-80DB-123456789ABC}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D864-63D1-11E1-80DB-123456789ABC}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D863-63D1-11E1-80DB-123456789ABC}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D85F-63D1-11E1-80DB-123456789ABC}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D85D-63D1-11E1-80DB-123456789ABC}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D865-63D1-11E1-80DB-123456789ABC}')
INSERT INTO #uuidOrderingTest (uuid) VALUES ('{1BE8D860-63D1-11E1-80DB-123456789ABC}')

And get the results:

uuid
------------------------------------
1BE8D85D-63D1-11E1-80DB-123456789ABC
1BE8D85E-63D1-11E1-80DB-123456789ABC
1BE8D85F-63D1-11E1-80DB-123456789ABC
1BE8D860-63D1-11E1-80DB-123456789ABC
1BE8D861-63D1-11E1-80DB-123456789ABC
1BE8D862-63D1-11E1-80DB-123456789ABC
1BE8D863-63D1-11E1-80DB-123456789ABC
1BE8D864-63D1-11E1-80DB-123456789ABC
1BE8D865-63D1-11E1-80DB-123456789ABC
1BE8D866-63D1-11E1-80DB-123456789ABC
1BE8D85D-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D85E-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D85F-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D860-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D861-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D862-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D863-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D864-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D865-63D1-11E1-80DB-B8AC6FBE26E1
1BE8D866-63D1-11E1-80DB-B8AC6FBE26E1

So there you have it. SQL Server order's Node before Timestamp. Uuid created from different machines will not be clustered together. Would have been better if it hadn't done so, but whatcha gonna do.

How to generate sequential GUIDs?

If you mean uuids based on a common prefix, such as MAC and time, have a look at uniqid. Here you can also find some code for RFC 4211 compliant UUIDS. If you want to be on the safe side, use this wrapper to libuuid: pecl uuid. Haven't tested it though, YMMV.

Sort Sequential Guids Generated by UuidCreateSequential

After researching, I can't sort the guid using the default sort or even using the default string representation from guid.ToString as the byte order is different.

to sort the guids generated by UuidCreateSequential I need to convert to either BigInteger or form my own string representation (i.e. hex string 32 characters) by putting bytes in most signification to least significant order as follows:

static void TestSortedSequentialGuid(int length)
{
Guid []guids = new Guid[length];
int[] ids = new int[length];

for (int i = 0; i < length; i++)
{
guids[i] = CreateSequentialGuid();
ids[i] = i;

// this simulates the delay between guids creation
// yes the guids will not be sequential as it interrupts generator
// (as it used the time internally)
// but still the guids should be in increasing order and hence they are
// sortable and that was the goal of the question
Thread.Sleep(60000);
}

var sortedGuidStrings = guids.Select(x =>
{
var bytes = x.ToByteArray();

//reverse high bytes that represents the sequential part (time)
string high = BitConverter.ToString(bytes.Take(10).Reverse().ToArray());

//set last 6 bytes are just the node (MAC address) take it as it is.
return high + BitConverter.ToString(bytes.Skip(10).ToArray());
}).ToArray();

// sort ids using the generated sortedGuidStrings
Array.Sort(sortedGuidStrings, ids);

for (int i = 0; i < length - 1; i++)
{
if (ids[i] > ids[i + 1])
{
Console.WriteLine("sorting using sortedGuidStrings failed!");
return;
}
}

Console.WriteLine("sorting using sortedGuidStrings succeeded!");
}

Sequential Guid in Java

See this article: http://www.informit.com/articles/article.aspx?p=25862&seqNum=7 (linked to Page 7).

It contains an algorithm for what the author refers to as "COMB" Guids; I reproduce his code (SQL) below:

SET @aGuid = CAST(CAST(NEWID() AS BINARY(10)) 
+ CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)

Trivial to convert this to Java, or your desired language. The obvious underlying principle is to make the date a component of the Guid. The entire article is a good read, as he does a nice analysis of the performance of the various approaches.



Related Topics



Leave a reply



Submit