Create Random Number Sequence With No Repeats

Create Random Number Sequence with No Repeats

You may be interested in a linear feedback shift register.
We used to build these out of hardware, but I've also done them in software. It uses a shift register with some of the bits xor'ed and fed back to the input, and if you pick just the right "taps" you can get a sequence that's as long as the register size. That is, a 16-bit lfsr can produce a sequence 65535 long with no repeats. It's statistically random but of course eminently repeatable. Also, if it's done wrong, you can get some embarrassingly short sequences. If you look up the lfsr, you will find examples of how to construct them properly (which is to say, "maximal length").

Generating a random, non-repeating sequence of all integers in .NET

Is there a way in .NET

Actually, this can be done in most any language

to generate a sequence of all the 32-bit integers (Int32)

Yes.

in random order,

Here we need to agree on terminology since "random" is not what most people think it is. More on this in a moment.

without repetitions,

Yes.

and in a memory-efficient manner?

Yes.

Memory-efficient would mean using a maximum of just a few hundred mega bytes of main memory.

Ok, so would using almost no memory be acceptable? ;-)

Before getting to the suggestion, we need to clear up the matter of "randomness". Something that is truly random has no discernible pattern. Hence, running the algorithm millions of times in a row could theoretically return the same value across all iterations. If you throw in the concept of "must be different from the prior iteration", then it is no longer random. However, looking at all of the requirements together, it seems that all that is really being asked for is "differing patterns of distribution of the integers". And this is doable.

So how to do this efficiently? Make use of Modular multiplicative inverses. I used this to answer the following Question which had a similar requirement to generate non-repeating, pseudo-random, sample data within certain bounds:

Generate different random time in the given interval

I first learned about this concept here ( generate seemingly random unique numeric ID in SQL Server ) and you can use either of the following online calculators to determine your "Integer" and "Modular Multiplicative Inverses (MMI)" values:

  • http://planetcalc.com/3311/
  • http://www.cs.princeton.edu/~dsri/modular-inversion-answer.php

Applying that concept here, you would use Int32.MaxSize as the Modulo value.

This would give a definite appearance of random distribution with no chance for collisions and no memory needed to store already used values.

The only initial problem is that the pattern of distribution is always the same given the same "Integer" and "MMI" values. So, you could come up with differing patterns by either adding a "randomly" generated Int to the starting value (as I believe I did in my answer about generating the sample data in SQL Server) or you can pre-generate several combinations of "Integer" and corresponding "MMI" values, store those in a config file / dictionary, and use a .NET random function to pick one at the start of each run. Even if you store 100 combinations, that is almost no memory use (assuming it is not in a config file). In fact, if storing both as Int and the dictionary uses Int as an index, then 1000 values is approximately 12k?


UPDATE

Notes:

  • There is a pattern in the results, but it is not discernible unless you have enough of them at any given moment to look at in total. For most use-cases, this is acceptable since no recipient of the values would have a large collection of them, or know that they were assigned in sequence without any gaps (and that knowledge is required in order to determine if there is a pattern).
  • Only 1 of the two variable values -- "Integer" and "Modular Multiplicative Inverse (MMI)" -- is needed in the formula for a particular run. Hence:

    • each pair gives two distinct sequences
    • if maintaining a set in memory, only a simple array is needed, and assuming that the array index is merely an offset in memory from the base address of the array, then the memory required should only be 4 bytes * capacity (i.e. 1024 options is only 4k, right?)

Here is some test code. It is written in T-SQL for Microsoft SQL Server since that is where I work primarily, and it also has the advantage of making it real easy-like to test for uniqueness, min and max values, etc, without needing to compile anything. The syntax will work in SQL Server 2008 or newer. For SQL Server 2005, initialization of variables had not been introduced yet so each DECLARE that contains an = would merely need to be separated into the DECLARE by itself and a SET @Variable = ... for however that variable is being initialized. And the SET @Index += 1; would need to become SET @Index = @Index + 1;.

The test code will error if you supply values that produce any duplicates. And the final query indicates if there are any gaps since it can be inferred that if the table variable population did not error (hence no duplicates), and the total number of values is the expected number, then there could only be gaps (i.e. missing values) IF either or both of the actual MIN and MAX values are outside of the expected values.

PLEASE NOTE that this test code does not imply that any of the values are pre-generated or need to be stored. The code only stores the values in order to test for uniqueness and min / max values. In practice, all that is needed is the simple formula, and all that is needed to pass into it is:

  • the capacity (though that could also be hard-coded in this case)
  • the MMI / Integer value
  • the current "index"

So you only need to maintain 2 - 3 simple values.

DECLARE @TotalCapacity INT = 30; -- Modulo; -5 to +4 = 10 OR Int32.MinValue
-- to Int32.MaxValue = (UInt32.MaxValue + 1)
DECLARE @MMI INT = 7; -- Modular Multiplicative Inverse (MMI) or
-- Integer (derived from @TotalCapacity)

DECLARE @Offset INT = 0; -- needs to stay at 0 if min and max values are hard-set
-----------
DECLARE @Index INT = (1 + @Offset); -- start

DECLARE @EnsureUnique TABLE ([OrderNum] INT NOT NULL IDENTITY(1, 1),
[Value] INT NOT NULL UNIQUE);
SET NOCOUNT ON;

BEGIN TRY
WHILE (@Index < (@TotalCapacity + 1 + @Offset)) -- range + 1
BEGIN
INSERT INTO @EnsureUnique ([Value]) VALUES (
((@Index * @MMI) % @TotalCapacity) - (@TotalCapacity / 2) + @Offset
);
SET @Index += 1;
END;
END TRY
BEGIN CATCH
DECLARE @Error NVARCHAR(4000) = ERROR_MESSAGE();
RAISERROR(@Error, 16, 1);
RETURN;
END CATCH;

SELECT * FROM @EnsureUnique ORDER BY [OrderNum] ASC;
SELECT COUNT(*) AS [TotalValues],
@TotalCapacity AS [ExpectedCapacity],
MIN([Value]) AS [MinValue],
(@TotalCapacity / -2) AS [ExpectedMinValue],
MAX([Value]) AS [MaxValue],
(@TotalCapacity / 2) - 1 AS [ExpectedMaxValue]
FROM @EnsureUnique;

How do I create a list of random numbers without duplicates

This will return a list of 10 numbers selected from the range 0 to 99, without duplicates.

import random
random.sample(range(100), 10)

With reference to your specific code example, you probably want to read all the lines from the file once and then select random lines from the saved list in memory. For example:

all_lines = f1.readlines()
for i in range(50):
lines = random.sample(all_lines, 40)

This way, you only need to actually read from the file once, before your loop. It's much more efficient to do this than to seek back to the start of the file and call f1.readlines() again for each loop iteration.

C - Generate random sequence with no repeats without shuffling

This can be done in O(n) time with two lists, one with the number (initialy) in order, and one in the resulting order.

You start with n elements in order in your source list. Then you select a random number mod n. That gives you the next element, which you place in the destination list.

Now the key part. If you were to pick a random number between 0 and n-1 each time, as you seem to think a shuffle does, you have an increasing chance of selecting a number you selected before. So how do you handle this? By decreasing the available list of number to select from.

In the source list, after selecting a number, you move the last element of the list to the index that was just used. You now have a list of n-1 numbers to chose from. So on the next iteration you take a random number mod n-1. Keep going until your source list only has one element.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define LEN 10

int main()
{
int a[LEN], b[LEN];
int i, val;
int count = LEN;

srand(time(NULL));

for (i=0;i<LEN;i++) {
a[i]=i+1;
}
for (i=0;i<LEN;i++) {
val = rand() % count;
b[i] = a[val];
a[val] = a[count-1];
count--;
}
for (i=0;i<LEN;i++) {
printf("%d ", b[i]);
}
printf("\n");

return 0;
}

EDIT:

Here's a slightly more efficient version that doesn't use two arrays and is therefore O(1) space:

int a[LEN];
int i, val, tmp;

srand(time(NULL));

for (i=0;i<LEN;i++) {
a[i]=i+1;
}
for (i=0;i<LEN-1;i++) {
val = (rand() % (LEN - 1 - i)) + i + 1;
tmp = a[i];
a[i] = a[val];
a[val] = tmp;
}
for (i=0;i<LEN;i++) {
printf("%d ", a[i]);
}
printf("\n");

Creating random numbers with no duplicates

The simplest way would be to create a list of the possible numbers (1..20 or whatever) and then shuffle them with Collections.shuffle. Then just take however many elements you want. This is great if your range is equal to the number of elements you need in the end (e.g. for shuffling a deck of cards).

That doesn't work so well if you want (say) 10 random elements in the range 1..10,000 - you'd end up doing a lot of work unnecessarily. At that point, it's probably better to keep a set of values you've generated so far, and just keep generating numbers in a loop until the next one isn't already present:

if (max < numbersNeeded)
{
throw new IllegalArgumentException("Can't ask for more numbers than are available");
}
Random rng = new Random(); // Ideally just create one instance globally
// Note: use LinkedHashSet to maintain insertion order
Set<Integer> generated = new LinkedHashSet<Integer>();
while (generated.size() < numbersNeeded)
{
Integer next = rng.nextInt(max) + 1;
// As we're adding to a set, this will automatically do a containment check
generated.add(next);
}

Be careful with the set choice though - I've very deliberately used LinkedHashSet as it maintains insertion order, which we care about here.

Yet another option is to always make progress, by reducing the range each time and compensating for existing values. So for example, suppose you wanted 3 values in the range 0..9. On the first iteration you'd generate any number in the range 0..9 - let's say you generate a 4.

On the second iteration you'd then generate a number in the range 0..8. If the generated number is less than 4, you'd keep it as is... otherwise you add one to it. That gets you a result range of 0..9 without 4. Suppose we get 7 that way.

On the third iteration you'd generate a number in the range 0..7. If the generated number is less than 4, you'd keep it as is. If it's 4 or 5, you'd add one. If it's 6 or 7, you'd add two. That way the result range is 0..9 without 4 or 6.

Unique (non-repeating) random numbers in O(1)?

Initialize an array of 1001 integers with the values 0-1000 and set a variable, max, to the current max index of the array (starting with 1000). Pick a random number, r, between 0 and max, swap the number at the position r with the number at position max and return the number now at position max. Decrement max by 1 and continue. When max is 0, set max back to the size of the array - 1 and start again without the need to reinitialize the array.

Update:
Although I came up with this method on my own when I answered the question, after some research I realize this is a modified version of Fisher-Yates known as Durstenfeld-Fisher-Yates or Knuth-Fisher-Yates. Since the description may be a little difficult to follow, I have provided an example below (using 11 elements instead of 1001):

Array starts off with 11 elements initialized to array[n] = n, max starts off at 10:

+--+--+--+--+--+--+--+--+--+--+--+
| 0| 1| 2| 3| 4| 5| 6| 7| 8| 9|10|
+--+--+--+--+--+--+--+--+--+--+--+
^
max

At each iteration, a random number r is selected between 0 and max, array[r] and array[max] are swapped, the new array[max] is returned, and max is decremented:

max = 10, r = 3
+--------------------+
v v
+--+--+--+--+--+--+--+--+--+--+--+
| 0| 1| 2|10| 4| 5| 6| 7| 8| 9| 3|
+--+--+--+--+--+--+--+--+--+--+--+

max = 9, r = 7
+-----+
v v
+--+--+--+--+--+--+--+--+--+--+--+
| 0| 1| 2|10| 4| 5| 6| 9| 8| 7: 3|
+--+--+--+--+--+--+--+--+--+--+--+

max = 8, r = 1
+--------------------+
v v
+--+--+--+--+--+--+--+--+--+--+--+
| 0| 8| 2|10| 4| 5| 6| 9| 1: 7| 3|
+--+--+--+--+--+--+--+--+--+--+--+

max = 7, r = 5
+-----+
v v
+--+--+--+--+--+--+--+--+--+--+--+
| 0| 8| 2|10| 4| 9| 6| 5: 1| 7| 3|
+--+--+--+--+--+--+--+--+--+--+--+

...

After 11 iterations, all numbers in the array have been selected, max == 0, and the array elements are shuffled:

+--+--+--+--+--+--+--+--+--+--+--+
| 4|10| 8| 6| 2| 0| 9| 5| 1| 7| 3|
+--+--+--+--+--+--+--+--+--+--+--+

At this point, max can be reset to 10 and the process can continue.

Generate non-repeating, no sequential numbers

The simplest way is to produce a collection (e.g. an array) of the numbers 1-15, and then shuffle it. (EDIT: By "collection of the numbers 1-15" I mean 1, 2, 3, 4, 5... 15. Not a collection of random numbers in the range 1-15. If I'd meant that, I'd have said so :)

You haven't given details of which platform you're on so we can't easily give sample code, but I'm a big fan of the modern variant of the Fisher-Yates shuffle. For example, in C#:

public static void Shuffle<T>(IList<T> collection, Random rng)
{
for (int i = collection.Count - 1; i > 0; i--)
{
int randomIndex = rng.Next(i + 1);
T tmp = collection[i];
collection[i] = collection[randomIndex];
collection[randomIndex] = tmp;
}
}

If you want to produce "more random" numbers (e.g. 15 distinct integers within the entire range of integers available to you) then it's probably easiest just to do something like this (again, C# but should be easy to port):

HashSet<int> numbers = new HashSet<int>();
while (numbers.Count < 15)
{
numbers.Add(rng.Next());
}
List<int> list = numbers.ToList();
// Now shuffle as before

The shuffling at the end is to make sure that any ordering which might come out of the set implementation doesn't affect the final result.



Related Topics



Leave a reply



Submit