How to Shorten a Uuid to a Specific Length

How do I shorten and expand a uuid to a 15 or less characters

As AuxTaco pointed out, if you actually mean "alphanumeric" as in it matches "/^[A-Za-z0-9]{0,15}/" (giving the number of bits of 26 + 26 + 10 = 62), then it is really impossible. You can't fit 3 gallons of water in a gallon bucket without losing something. A UUID is 128-bits, so to convert that to a character space of 62, you'd need at least 22 characters (log[base 62](2^128) == ~22).

If you are more flexible on your charset and just need it 15 unicode characters you can put in a text document, then my answer will help.


Note: First part of this answer, I thought it said length of 16, not 15. The simpler answer won't work. The more complex version below still will.


In order to do so, you'd to use some kind of two-way compression algorithm (similar to an algorithm that is used for zipping files).

However, the problem with trying to compress something like a UUID is you'd probably have lots of collisions.

A UUID v4 is 32 characters long (without dashes). It's hexadecimal, so it's character space is 16 characters (0123456789ABCDEF)

That gives you a number of possible combinations of 16^32, approximately 3.4028237e+38 or 340,282,370,000,000,000,000,000,000,000,000,000,000. To make it recoverable after compression, you'd have to make sure you don't have any collisions (i.e., no 2 UUIDs turn into the same value). That's a lot of possible values (which is exactly why we use that many for UUID, the chance of 2 random UUIDs is only 1 out of that number big number).

To crunch that many possibilities to 16 characters, you'd have to have at least as many possible values. With 16 characters, you'd have to have 256 characters (root 16 of that big number, 256^16 == 16^32`). That's assuming you have an algorithm that'd never create a collision.

One way to ensure you never have collisions would be to convert it from a base-16 number to a base-256 number. That would give you a 1-to-1 relation, ensuring no collisions and making it perfectly reversible. Normally, switching bases is easy in JavaScript: parseInt(someStr, radix).toString(otherRadix) (e.g., parseInt('00FF', 16).toString(20). Unfortunately, JavaScript only does up to a radix of 36, so we'll have to do the conversion ourselves.

The catch with such a large base is representing it. You could arbitrarily pick 256 different characters, throw them in a string, and use that for a manual conversion. However, I don't think there are 256 different symbols on a standard US keyboard, even if you treat upper and lowercase as different glyphs.

A simpler solution would be to just use arbitrary character codes from 0 to 255 with String.fromCharCode().

Another small catch is if we tried to treat that all as one big number, we'd have issues because it's a really big number and JavaScript can't properly represent it exactly.

Instead of that, since we already have hexadecimal, we can just split it into pairs of decimals, convert those, then spit them out. 32 hexadecimal digits = 16 pairs, so that'll (coincidentally) be perfect. (If you had to solve this for an arbitrary size, you'd have to do some extra math and converting to split the number into pieces, convert, then reassemble.)

const uuid = '1234567890ABCDEF1234567890ABCDEF';const letters = uuid.match(/.{2}/g).map(pair => String.fromCharCode(parseInt(pair, 16)));const str = letters.join('');console.log(str);

how to reduce length of UUID generated using randomUUID( )

If you don't need it to be unique, you can use any length you like.

For example, you can do this.

Random rand = new Random();
char[] chars = new char[16];
for(int i=0;i<chars.length;i++) {
chars[i] = (char) rand.nextInt(65536);
if (!Character.isValidCodePoint(chars[i]))
i--;
}
String s = new String(chars);

This will give you almost the same degree of randomness but will use every possible character between \u0000 and \ufffd

If you need printable ASCII characters you can make it as short as you like but the likelihood of uniqueness drops significantly. What can do is use base 36 instead of base 16

UUID uuid = UUID.randomUUID();
String s = Long.toString(uuid.getMostSignificantBits(), 36) + '-' + Long.toString(uuid.getLeastSignificantBits(), 36);

This will 26 characters on average, at most 27 character.

You can use base64 encoding and reduce it to 22 characters.

If you use base94 you can get it does to 20 characters.

If you use the whole range of valid chars fro \u0000 to \ufffd you can reduce it to just 9 characters or 17 bytes.

If you don't care about Strings you can use 16, 8-bit bytes.

Generating 8-character only UUIDs

It is not possible since a UUID is a 16-byte number per definition. But of course, you can generate 8-character long unique strings (see the other answers).

Also be careful with generating longer UUIDs and substring-ing them, since some parts of the ID may contain fixed bytes (e.g. this is the case with MAC, DCE and MD5 UUIDs).

Python uuid4, How to limit the length of Unique chars

You can then generate a short UUID with shortuuid:

import shortuuid
shortuuid.uuid()
'vytxeTZskVKR7C7WgdSP3d'

Native solution with big risk of collision:

Try :

x = uuid4()
str(x)[:8]

Output :

"ffc69c1b"

How do I get a substring of a string in Python?

Shortening java UUID while preserving the uniqueness

I use org.apache.commons.codec.binary.Base64 to convert a UUID into a url-safe unique string that is 22 characters in length and has the same uniqueness as UUID.

I posted my code on Storing UUID as base64 String

android, how to generate shorter version of uuid (13 chars) an app side

It won't be an UUID in the strict sense anymore; UUID describes a very specific data structure. Using the low bits of a proper UUID is generally a bad idea; those were never meant to be unique. Single machine tests will be inconclusive.

EDIT: now that I think of it, what exactly is "char" in the question? A decimal digit? A hex digit? A byte? An ASCII character? A Unicode character? If the latter, you can stuff a full proper UUID there. Just represent it as binary, not as a hexadecimal string. A UUID is 128 bits long. A Unicode codepoint is 20 bits, ergo 13 of those would cover 260 bits, that's well enough.

The Java char datatype is, effectively, slightly less than 16 bits. If by "13 chars" you mean a Java string of length 13 (or an array of 13 chars), you can still stuff a UUID there, with some trickery to avoid reserved UTF-16 surrogate pair values.


All that said, for globally unique ID generation, they usually use a combination of current time, a random number, and some kind of device specific identifier, hashed together. That's how canonical UUIDs work. Depending on the exact nature of the size limit (which is vague in the question), a different hash algorithm would be advisable.


EDIT: about using the whole range of Unicode. First things first: you do realize that both "du3d2t5fdaib4" and "8efc9756-70ff-4a9f-bf45-4c693bde61a4" are hex strings, right? They only use 16 characters, 0-9 and a-f? The dashes in case of the second one can be safely omitted, they're there just for readability. Meanwhile, a single Java char can have one of 63488 possible values - any codepoint from 0 to 0xFFFF, except for the subrange 0xD800..0xDFFF, would do. The string with all those crazy characters won't be nice looking or even printable; it could look something like "芦№Π║ثЯ"; some of the characters might not display in Android because they're not in the system font, but it will be unique all right.

Is it a requirement that the unique string displays nicely?


If no, let's see. A UUID is two 64-bit Java longs. It's a signed datatype in Java; would've been easier if it was unsigned, but there's no such thing. We can, however, treat two longs as 4 ints, and make sure the ints are positive.

Now we have 4 positive ints to stuff into 13 characters. We also don't want to mess with arithmetic that straddles variable boundaries, so let's convert each integer into a 3 character chunk with no overlap. This wastes some bits, but oh well, we have some bits to spare. An int is 4 bytes long, while 3 Java characters are 6 bytes long.

When composing the chars, we would like to avoid the area between D800 and DFFF. Also, we would want to avoid the codepoints from 0 to 1F - those are control characters, unprintable by design. Also, let's avoid character 0x20 - that's space. Now, I don't know exactly how will the string be used; whether or not it will be used in a text format that doesn't allow for escaping and therefore if certain other characters should be avoided to make things simpler downstream.

A contiguous character range is easier to work with, so let's completely throw away the range upwards from 0xD800, too. That leaves us with 0xD7DF distinct codepoints, starting from 0x21. Three of those is plenty enough to cover a 32-bit int. The rule for converting an int into a character triple is straightforward: divide the int by 0xD7DF twice, take the remainders, add the remainders to the base codepoint (which is 0x21). This algorithm is your vanilla "convert an int to a string in base N", with the knowledge that there can be no more than three digits.

All things considered, here goes Java:

public static String uuidToWeirdString(UUID uuid)
{
//Description of our alphabet: from 021 to 0xD7FF
final int ALPHA_SIZE = 0xD7DF, ALPHA_BASE = 0x21;

//Convert the UUID to a pair of signed, potentially negative longs
long low = uuid.getLeastSignificantBits(),
high = uuid.getMostSignificantBits();

//Convert to positive 32-bit ints, represented as signed longs
long []parts = {
(high >> 32) & 0xffffffff,
high & 0xffffffff,
(low >> 32) & 0xffffffff,
low & 0xffffffff
};

//Convert ints to char triples
int nPart, pos = 0;
char []c = new char[12];
for(nPart=0;nPart<4;nPart++)
{
long part = parts[nPart];
c[pos++] = (char)(ALPHA_BASE + part / (ALPHA_SIZE*ALPHA_SIZE));
c[pos++] = (char)(ALPHA_BASE + (part / ALPHA_SIZE ) % ALPHA_SIZE);
c[pos++] = (char)(ALPHA_BASE + part % ALPHA_SIZE);
}
return new String(c);
}

Feast your eyes on the beauty of the Unicode.



Related Topics



Leave a reply



Submit