pseudo_encrypt() function in plpgsql that takes bigint
4294967295
must be used as the bitmask to select 32 bits (instead of 4294967296
).
That's the reason why currently you get the same value for different inputs.
I'd also suggest using bigint
for the types of l2
and r2
, they shouldn't really differ from r1
and l1
And, for better randomness, use a much higher multiplier in the PRNG function to get intermediate block that really occupy 32 bits, like 32767*32767 instead of 32767.
The complete modified version:
CREATE OR REPLACE FUNCTION pseudo_encrypt(VALUE bigint) returns bigint AS $$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
BEGIN
l1:= (VALUE >> 32) & 4294967295::bigint;
r1:= VALUE & 4294967295;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767*32767)::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::bigint << 32) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;
First results:
select x,pseudo_encrypt(x::bigint) from generate_series (1, 10) as x;
x | pseudo_encrypt
----+---------------------
1 | 3898573529235304961
2 | 2034171750778085465
3 | 169769968641019729
4 | 2925594765163772086
5 | 1061193016228543981
6 | 3808195743949274374
7 | 1943793931158625313
8 | 88214277952430814
9 | 2835217030863818694
10 | 970815170807835400
(10 rows)
How to customize the output of the Postgres Pseudo Encrypt function?
Alternative solution: use different ciphers
Other cipher functions are now available on postgres wiki. They're going to be significantly slower, but aside from that, they're better candidates for generating customized random-looking series of unique numbers.
For 32 bit outputs, Skip32 in plpgsql will encrypt its input with a 10 bytes wide key, so you just have to choose your own secret key to have your own specific permutation (the particular order in which the 2^32 unique values will come out).
For 64 bit outputs, XTEA in plpgsql will do similarly, but using a 16 bytes wide key.
Otherwise, to just customize pseudo_encrypt
, see below:
Explanations about pseudo_encrypt
's implementation:
This function has 3 properties
- global unicity of the output values
- reversability
- pseudo-random effect
The first and second property come from the Feistel Network, and as already explained in @CodesInChaos's answer, they don't depend on the choice of these constants: 1366
and also 150889
and 714025
.
Make sure when changing f(r1)
that it stays a function in the mathematical sense, that is x=y
implies f(x)=f(y)
, or in other words the same input must always produce the same output. Breaking this would break the unicity.
The purpose of these constants and this formula for f(r1)
is to produce a reasonably good pseudo-random effect. Using postgres built-in random()
or similar method is not possible because it's not a mathematical function as described above.
Why these arbitrary constants? In this part of the function:
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
The formula and the values 1366
, 150889
and 714025
come from Numerical recipes in C (1992, by William H.Press, 2nd ed.), chapter 7: random numbers, specifically p.284 and 285.
The book is not directly indexable on the web but readable through an interface here: http://apps.nrbook.com/c/index.html .It's also cited as a reference in various source code implementing PRNGs.
Among the algorithms discussed in this chapter, the one used above is very simple and relatively effective. The formula to get a new random number from a previous one (jran
) is:
jran = (jran * ia + ic) % im;
ran = (float) jran / (float) im; /* normalize into the 0..1 range */
where jran
is the current random integer.
This generator will necessarily loop over itself after a certain number of values (the "period"), so the constants ia
, ic
and im
have to be chosen carefully for that period to be as large as possible. The book provides a table p.285 where constants are suggested for various lengths of the period.
ia=1366
, ic=150889
and im=714025
is one of the entries for a period of
229 bits, which is way more than needed.
Finally the multiplication by 32767
or 215-1 is not part of the PRNG but meant to produce a positive half-integer from the 0..1 pseudo-random float value. Don't change that part, unless to widen the blocksize of the algorithm.
Generate unique random numbers in Postgresql with fixed length
Tweaking the existing function for N < 64 bits values
It's relatively simple to tweak the bigint variant to reduce the output to 2^N
values, where N
is even, and less than 64.
To get 13 decimal digits, consider the maximum N
for which 2^N
has 13 digits. That's N=42, with 2^42=4398046511104
.
The algorithm works by breaking the input value into two halves with an equal number of bits, and make them flow through the Feistel network, essentially XOR'ing with the result of the round function and swapping halves at each iteration.
If at every stage of the process, each half is limited to 21
bits then the result combining both halves is guaranteed not to exceed 42 bits.
So here's my proposed variant:
CREATE OR REPLACE FUNCTION pseudo_encrypt42(VALUE bigint) returns bigint
AS $$
DECLARE
l1 bigint;
l2 bigint;
r1 bigint;
r2 bigint;
i int:=0;
b21 int:=(1<<21)-1; -- 21 bits mask for a half-number => 42 bits total
BEGIN
l1:= VALUE >> 21;
r1:= VALUE & b21;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # (((((1366*r1+150889)%714025)/714025.0)*32767*32767)::int & b21);
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((l1::bigint << 21) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;
The input must be less than (2^42)-1
, otherwise the outputs will collide , as pseudo_encrypt42(x) = pseudo_encrypt42(x mod 2^42)
.
What can be done about the missing numbers between 2^42 and 10^13 ?
2^42 - 10^13 = 5601953488896
so that's quite a lot of missing numbers.
I don't know how to help with that in a single pass with the Feistel network. One workaround that might be acceptable, though, is to generate another set of unique values in 0..M
and add 2^42
to them, so there's no risk of collision.
This another set could be obtained by the same function, just with the offset added. 4398046511104 + pseudo_encrypt42(x)
is guaranteed to be between 4398046511104
and 2*4398046511104 = 8796093022208
unique values so that's closer to the goal. The same technique could be applied with several other ranges, not even necessarily of the same size.
However this workaround degrades the random-looking behavior , as instead of having a single output range where every number can be between 0
and X
, you'd get N
distinct output ranges of X/N
numbers. With several distinct partitions like that, it's easy to guess in what partition the output will be, just not what value inside the partition.
How to generate random unique number in PostgreSQL using function
See the pseudo_encrypt function, which implements a permutation based on the Feistel network technique. Combined with a postgres sequence, this guarantees unicity of the result, as well as randomness to the human eye.
Example:
CREATE OR REPLACE FUNCTION pseudo_encrypt(VALUE int) returns int AS $$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
l1:= (VALUE >> 16) & 65535;
r1:= VALUE & 65535;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((r1 << 16) + l1);
END;
$$ LANGUAGE plpgsql strict immutable;
create sequence seq maxvalue 2147483647;
create table tablename(
id int default pseudo_encrypt(nextval('seq')::int),
[other columns]
);
A variant with a 64-bit output space can be found at: pseudo_encrypt() function in plpgsql that takes bigint.
EDIT: pseudo_encrypt
implements only one permutation, and it does not accept a user-supplied key. If you prefer having your own permutations, depending on secret keys, you may consider skip32
(a 32-bit block cipher based on Skipjack, with 10 bytes wide keys).
A plpgsql function (ported from Perl/C) is available at:
https://wiki.postgresql.org/wiki/Skip32
Decrypt Feistel Cipher in PostgreSQL
You could use this self-reversible variant in the first place:
CREATE FUNCTION rev_pseudo_encrypt(VALUE bigint) returns bigint AS $$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
l1:= (VALUE >> 16) & 65535;
r1:= VALUE & 65535;
WHILE i < 3 LOOP
l2 := r1;
r2 := l1 # ((((1366.0 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
l1 := l2;
r1 := r2;
i := i + 1;
END LOOP;
RETURN ((r1::bigint<<16) + l1);
END;
$$ LANGUAGE plpgsql strict immutable;
It differs from the original version by taking bigint
instead of int
as input (but the input should still less than 2^32
), and by the fact that the two 16 bits blocks are swapped in the 32 bits final result.
It has the property that rev_pseudo_encrypt(rev_pseudo_encrypt(x)) = x
in addition to the uniquess of results.
Also there's the advantage that the input type is the same as the output type.
On the other hand, to reverse the values generated by the original version, their 16 bits blocks need to be swapped before being fed to the algorithm, and the result again swapped:
create function swap16(bigint) returns bigint as
'select (($1&65535)<<16)+(($1)>>16)'
language sql stable;
select pseudo_encrypt(1234);
pseudo_encrypt
----------------
223549288
select swap16(pseudo_encrypt(swap16(223549288)::int));
swap16
--------
1234
Related Topics
Designing a SQL Schema for a Combination of Many-To-Many Relationship (Variations of Products)
Postgresql Join with Array Type with Array Elements Order, How to Implement
How to Execute a Native SQL Script in JPA/Hibernate
SQL Server Cumulative Sum by Group
The Conversion of a Datetime2 Data Type to a Datetime Data Type Resulted in an Out-Of-Range
Hql: How to Perform an Inner Join on a Subquery
How to Transform Comma Separated Column into Multiples Rows in Db2
How to Compare Dates in SQL Server
Backup a Single Table with Its Data from a Database in SQL Server 2008
Why Can't I Use an Alias for an Aggregate in a Having Clause
Oracle SQL: Update If Exists Else Insert
Partition by with and Without Keep in Oracle
In SQL How to Get the Maximum Value for an Integer
How to Measure the Execution Time of a Query on Spark
Does Assigning Stored Procedure Input Parameters to Local Variables Help Optimize the Query
Error Trying to Call Stored Procedure with Prepared Statement