Performance Value of Comb Guids

Performance value of COMB guids

I second that you'll see differences only when you have indexes (PK, FK or other kind of indexes, clustered or not clustered) on the Guid colume, because cost of standard guid versus newguid or comb guid is due to the high cost of re-ordering the index data every time an insert is performed.

See my question in which I corroborate this with some real life data from both SQL Server and Oracle.

What are the performance improvement of Sequential Guid over standard Guid?

GUID vs.Sequential GUID

A typical pattern it's to use Guid as PK for tables, but, as referred in other discussions (see Advantages and disadvantages of GUID / UUID database keys)
there are some performance issues.

This is a typical Guid sequence

f3818d69-2552-40b7-a403-01a6db4552f7

7ce31615-fafb-42c4-b317-40d21a6a3c60

94732fc7-768e-4cf2-9107-f0953f6795a5

Problems of this kind of data are:<

-

Wide distributions of values
Almost randomically ones
Index usage is very, very, very bad
A lot of leaf moving
Almost every PK need to be at least
on a non clustered index
Problem happens both on Oracle and
SQL Server

A possible solution is using Sequential Guid, that are generated as follows:

cc6466f7-1066-11dd-acb6-005056c00008

cc6466f8-1066-11dd-acb6-005056c00008

cc6466f9-1066-11dd-acb6-005056c00008

How to generate them From C# code:

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

public static Guid SequentialGuid()
{
    const int RPC_S_OK = 0;
    Guid g;
    if (UuidCreateSequential(out g) != RPC_S_OK)
        return Guid.NewGuid();
    else
        return g;
}

Benefits

Better usage of index
Allow usage of clustered keys (to be
verified in NLB scenarios)
Less disk usage
20-25% of performance increase at a
minimum cost

Real life measurement:
Scenario:

Guid stored as UniqueIdentifier
types on SQL Server
Guid stored as CHAR(36) on Oracle
Lot of insert operations, batched
together in a single transaction
From 1 to 100s of inserts depending
on table
Some tables > 10 millions rows

Laboratory Test – SQL Server

VS2008 test, 10 concurrent users, no think time, benchmark process with 600 inserts in batch for leaf table

Standard Guid

Avg. Process duration: 10.5 sec

Avg. Request for second: 54.6

Avg. Resp. Time: 0.26

Sequential Guid

Avg. Process duration: 4.6 sec

Avg. Request for second: 87.1

Avg. Resp. Time: 0.12

Results on Oracle (sorry, different tool used for test) 1.327.613 insert on a table with a Guid PK

Standard Guid, 0.02 sec. elapsed time for each insert, 2.861 sec. of CPU time, total of 31.049 sec. elapsed

Sequential Guid, 0.00 sec. elapsed time for each insert, 1.142 sec. of CPU time, total of 3.667 sec. elapsed

The DB file sequential read wait time passed from 6.4 millions wait events for 62.415 seconds to 1.2 million wait events for 11.063 seconds.

It's important to see that all the sequential guid can be guessed, so it's not a good idea to use them if security is a concern, still using standard guid.

To make it short... if you use Guid as PK use sequential guid every time they are not passed back and forward from a UI, they will speed up operation and do not cost anything to implement.

A couple of questions about NHibernate's GuidCombGenerator

Re 1: there is no relevance to the actual day value, the two bytes used from the value simply roll over 65536 days after 1/1/1900. The only thing that matters is that the values are roughly sequential. The dbase is going to be a bit inefficient in the summer of 2079, nobody will notice.

Re 2: yes, makes no sense. But same story, the actual value doesn't matter.

The algorithm is questionable, messing with the guaranteed uniqueness of Guids is a tricky proposition. You'll have to rely on somebody in the nHibernate team having insider knowledge that this works without problems. If you change it, you're liable to break it.

Can't explain performance of Nhibernate using Guid vs. Native Keys

The answer is #2, you've written an experiement which doesnt correctly reflect the conditions in 'the accepted wisdom'. Three problems:

First you're inserting "one row per Session". This is not the case that 'the accepted wisdom' is concerned with. Take the documentation example of Cats, kittens and Mate where a Cat has one mate and many kittens. Newing up and inserting whole families of cats at a time is the case thats discussed, not one row at a time. The overhead of a transaction for each row is going to introduce a lot of noise into your test.

Second, "Assuming that there should be a linear relationship between the size of the table and the time it takes to insert a row" is also false. The BTree structure used to hold tables has a general-case insert time of about O(log n).

Third, the reason you're seeing 'slower' inserts on guids vs identities has to do with the details of BTree. Inserts into the middle of a tree are slower because much more data is potentially moved around and page splits are much less likely. The guid-comb algorithm and generator was created to mitigate this problem. Guid-comb ensures that guids created in the future will always be greater than guids created in the past thus forcing inserts to happen at the end of the table which is much more efficent. A similar strategy is used by the newsequentialid function in sql server.

To see the predicted performance insert several hundred rows per transaction in a parent-child relationship using identity, guid and guid-comb.

Sequential GUID in Linq-to-Sql?

COMBs are generated the following way:

DECLARE @aGuid UNIQUEIDENTIFIER

SET @aGuid = CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)

Which transcribed into C# would look like this:

    public static unsafe Guid CombGuid()
    {
        Guid guid = Guid.NewGuid();
        byte[] bytes = guid.ToByteArray();
        long ticks = DateTime.Now.Ticks;
        fixed( byte* pByte = bytes )
        {
            int*    pFirst  = (int *)(pByte + 10);
            short* pNext    = (short*)(pByte + 14);
            *pFirst = (int)(ticks & 0xFFFFFF00);
            *pNext  = (short)ticks;
        }

        return new Guid( bytes );
    }

Generated Sequential GUIDs are sometimes not sequential

I think the problem is that you assume that "sequential" means "increment by one". Although your example shows that this happens sometimes there's no actual guarantee in the documentation that it will. We could argue the definition of "sequence" or that maybe this function is named poorly or even wrong but at the end of the day, according to the documentation it appears to be working 100% exactly as defined:

Creates a GUID that is greater than any GUID previously generated by this function

For people that have a problem with the general concept of a "sequential GUID" use the definition of "sequence" that's about "following a logical order" and also try and remember the Melissa virus. This was mildly document by this line:

For security reasons, UuidCreate was modified so that it no longer uses a machine's MAC address to generate UUIDs. UuidCreateSequential was introduced to allow creation of UUIDs using the MAC address of a machine's Ethernet card.

So the "sequence" is basing the GUID off of the MAC address.

Back to the OP's question, my best guess for why you have GUIDs that don't increment by one each time is the simple fact that you are probably not the only one calling this function. This is a core Windows function and there's bound to be other components, programs, services, etc. that use it.

Performance Value of Comb Guids