Pseudo_Encrypt() Function in Plpgsql That Takes Bigint

pseudo_encrypt() function in plpgsql that takes bigint

4294967295 must be used as the bitmask to select 32 bits (instead of 4294967296).
That's the reason why currently you get the same value for different inputs.

I'd also suggest using bigint for the types of l2 and r2, they shouldn't really differ from r1 and l1

And, for better randomness, use a much higher multiplier in the PRNG function to get intermediate block that really occupy 32 bits, like 32767*32767 instead of 32767.

The complete modified version:

CREATE OR REPLACE FUNCTION pseudo_encrypt(VALUE bigint) returns bigint AS $$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
l1:= (VALUE >> 32) & 4294967295::bigint;
r1:= VALUE & 4294967295;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;

First results:


select x,pseudo_encrypt(x::bigint) from generate_series (1, 10) as x;
x | pseudo_encrypt
----+---------------------
1 | 3898573529235304961
2 | 2034171750778085465
3 | 169769968641019729
4 | 2925594765163772086
5 | 1061193016228543981
6 | 3808195743949274374
7 | 1943793931158625313
8 | 88214277952430814
9 | 2835217030863818694
10 | 970815170807835400
(10 rows)

How to customize the output of the Postgres Pseudo Encrypt function?

Alternative solution: use different ciphers

Other cipher functions are now available on postgres wiki. They're going to be significantly slower, but aside from that, they're better candidates for generating customized random-looking series of unique numbers.

For 32 bit outputs, Skip32 in plpgsql will encrypt its input with a 10 bytes wide key, so you just have to choose your own secret key to have your own specific permutation (the particular order in which the 2^32 unique values will come out).

For 64 bit outputs, XTEA in plpgsql will do similarly, but using a 16 bytes wide key.

Otherwise, to just customize pseudo_encrypt, see below:

Explanations about pseudo_encrypt's implementation:

This function has 3 properties

  • global unicity of the output values
  • reversability
  • pseudo-random effect

The first and second property come from the Feistel Network, and as already explained in @CodesInChaos's answer, they don't depend on the choice of these constants: 1366 and also 150889 and 714025.

Make sure when changing f(r1) that it stays a function in the mathematical sense, that is x=y implies f(x)=f(y), or in other words the same input must always produce the same output. Breaking this would break the unicity.

The purpose of these constants and this formula for f(r1) is to produce a reasonably good pseudo-random effect. Using postgres built-in random() or similar method is not possible because it's not a mathematical function as described above.

Why these arbitrary constants? In this part of the function:

r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;

The formula and the values 1366, 150889 and 714025 come from Numerical recipes in C (1992, by William H.Press, 2nd ed.), chapter 7: random numbers, specifically p.284 and 285.
The book is not directly indexable on the web but readable through an interface here: http://apps.nrbook.com/c/index.html .It's also cited as a reference in various source code implementing PRNGs.

Among the algorithms discussed in this chapter, the one used above is very simple and relatively effective. The formula to get a new random number from a previous one (jran) is:

jran = (jran * ia + ic) % im;
ran = (float) jran / (float) im; /* normalize into the 0..1 range */

where jran is the current random integer.

This generator will necessarily loop over itself after a certain number of values (the "period"), so the constants ia, ic and im have to be chosen carefully for that period to be as large as possible. The book provides a table p.285 where constants are suggested for various lengths of the period.

ia=1366, ic=150889 and im=714025 is one of the entries for a period of
229 bits, which is way more than needed.

Finally the multiplication by 32767 or 215-1 is not part of the PRNG but meant to produce a positive half-integer from the 0..1 pseudo-random float value. Don't change that part, unless to widen the blocksize of the algorithm.

Generate unique random numbers in Postgresql with fixed length

Tweaking the existing function for N < 64 bits values

It's relatively simple to tweak the bigint variant to reduce the output to 2^N values, where N is even, and less than 64.

To get 13 decimal digits, consider the maximum N for which 2^N has 13 digits. That's N=42, with 2^42=4398046511104.

The algorithm works by breaking the input value into two halves with an equal number of bits, and make them flow through the Feistel network, essentially XOR'ing with the result of the round function and swapping halves at each iteration.

If at every stage of the process, each half is limited to 21 bits then the result combining both halves is guaranteed not to exceed 42 bits.

So here's my proposed variant:

CREATE OR REPLACE FUNCTION pseudo_encrypt42(VALUE bigint) returns bigint
AS $$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
b21 int:=(1<<21)-1; -- 21 bits mask for a half-number => 42 bits total
BEGIN
l1:= VALUE >> 21;
r1:= VALUE & b21;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # (((((1366*r1+150889)%714025)/714025.0)*32767*32767)::int & b21);
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::bigint << 21) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;

The input must be less than (2^42)-1, otherwise the outputs will collide , as pseudo_encrypt42(x) = pseudo_encrypt42(x mod 2^42).

What can be done about the missing numbers between 2^42 and 10^13 ?

2^42 - 10^13 = 5601953488896 so that's quite a lot of missing numbers.
I don't know how to help with that in a single pass with the Feistel network. One workaround that might be acceptable, though, is to generate another set of unique values in 0..M and add 2^42 to them, so there's no risk of collision.

This another set could be obtained by the same function, just with the offset added. 4398046511104 + pseudo_encrypt42(x) is guaranteed to be between 4398046511104 and 2*4398046511104 = 8796093022208 unique values so that's closer to the goal. The same technique could be applied with several other ranges, not even necessarily of the same size.

However this workaround degrades the random-looking behavior , as instead of having a single output range where every number can be between 0 and X, you'd get N distinct output ranges of X/N numbers. With several distinct partitions like that, it's easy to guess in what partition the output will be, just not what value inside the partition.

How to generate random unique number in PostgreSQL using function

See the pseudo_encrypt function, which implements a permutation based on the Feistel network technique. Combined with a postgres sequence, this guarantees unicity of the result, as well as randomness to the human eye.

Example:

CREATE OR REPLACE FUNCTION pseudo_encrypt(VALUE int) returns int AS $$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
l1:= (VALUE >> 16) & 65535;
r1:= VALUE & 65535;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((r1 << 16) + l1);
END;
$$ LANGUAGE plpgsql strict immutable;

create sequence seq maxvalue 2147483647;

create table tablename(
id int default pseudo_encrypt(nextval('seq')::int),
[other columns]
);

A variant with a 64-bit output space can be found at: pseudo_encrypt() function in plpgsql that takes bigint.


EDIT: pseudo_encrypt implements only one permutation, and it does not accept a user-supplied key. If you prefer having your own permutations, depending on secret keys, you may consider skip32 (a 32-bit block cipher based on Skipjack, with 10 bytes wide keys).

A plpgsql function (ported from Perl/C) is available at:
https://wiki.postgresql.org/wiki/Skip32

Decrypt Feistel Cipher in PostgreSQL

You could use this self-reversible variant in the first place:

CREATE FUNCTION rev_pseudo_encrypt(VALUE bigint) returns bigint AS $$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
l1:= (VALUE >> 16) & 65535;
r1:= VALUE & 65535;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((r1::bigint<<16) + l1);
END;
$$ LANGUAGE plpgsql strict immutable;

It differs from the original version by taking bigint instead of int as input (but the input should still less than 2^32), and by the fact that the two 16 bits blocks are swapped in the 32 bits final result.

It has the property that rev_pseudo_encrypt(rev_pseudo_encrypt(x)) = x in addition to the uniquess of results.

Also there's the advantage that the input type is the same as the output type.

On the other hand, to reverse the values generated by the original version, their 16 bits blocks need to be swapped before being fed to the algorithm, and the result again swapped:

create function swap16(bigint) returns bigint as
'select (($1&65535)<<16)+(($1)>>16)'
language sql stable;

select pseudo_encrypt(1234);
pseudo_encrypt
----------------
223549288

select swap16(pseudo_encrypt(swap16(223549288)::int));
swap16
--------
1234


Related Topics



Leave a reply



Submit