best way to pick a random subset from a collection?
Jon Bentley discusses this in either 'Programming Pearls' or 'More Programming Pearls'. You need to be careful with your N of M selection process, but I think the code shown works correctly. Rather than randomly shuffle all the items, you can do the random shuffle only shuffling the first N positions - which is a useful saving when N << M.
Knuth also discusses these algorithms - I believe that would be Vol 3 "Sorting and Searching", but my set is packed pending a move of house so I can't formally check that.
Java best way to select random subset from an array of strings with no duplicates
You can use Collections.shuffle
which is exactly for these purposes.
For example:
String[] strings = new String[] {
"foo", "bar", "toto", "fox", "koukou", "bar", "Bar"
} ;
// Filter-out duplicates, then shuffle
final List<String> stringsList = Arrays.stream(strings)
.distinct().collect(Collectors.toList());
Collections.shuffle(stringsList);
// Take a sample
final int k = 3; // the sample size
List<String> sample = stringsList.subList(0, k); // this is a view, not another list
System.err.println(sample);
If we did not care about duplicates, things would be a bit simpler and shuffling could be performed on the original array (then, we could just pick first k elements):
Collections.shuffle(Arrays.asList(strings));
Efficient random sampling from a huge list
Elaborating on aix's answer above, to choose k
out of a stream of items, read the items one at a time. Keep the first k
items in a set S
.
Now when reading the m
-th item I
(m>k
now), keep it with probability k/m
. If you do keep it, select an item U
uniformly at random from S
, and replace U
with I
.
The proof that this yields all subsets of size k
with equal probability is based on induction on m
. Note that you don't need to know n
(the total number of items) in advance, and that S
at each step is suitable. The algorithm is "streaming" - it doesn't require storing all items, or making a second pass.
Get a random subset from a result set in java
List<Member> list = new LinkedList<Member>(memberSet);
Collections.shuffle(list);
Set<Member> randomSet = new HashSet<Member>(list.subList(0, 5));
Full example:
public static void main(String... args) {
Set<Member> memberSet = new HashSet<Member>();
for (int i = 0; i < 100; i++)
memberSet.add(new Member(i));
List<Member> list = new LinkedList<Member>(memberSet);
Collections.shuffle(list);
Set<Member> randomSet = new HashSet<Member>(list.subList(0, 5));
System.out.println(randomSet);
}
static class Member {
final int value;
public Member(int value) {
this.value = value;
}
@Override
public String toString() {
return "" + value;
}
}
Sampling a random subset from an array
I suggest shuffling a copy of the array using the Fisher-Yates shuffle and taking a slice:
function getRandomSubarray(arr, size) {
var shuffled = arr.slice(0), i = arr.length, temp, index;
while (i--) {
index = Math.floor((i + 1) * Math.random());
temp = shuffled[index];
shuffled[index] = shuffled[i];
shuffled[i] = temp;
}
return shuffled.slice(0, size);
}
var x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];
var fiveRandomMembers = getRandomSubarray(x, 5);
Note that this will not be the most efficient method for getting a small random subset of a large array because it shuffles the whole array unnecessarily. For better performance you could do a partial shuffle instead:
function getRandomSubarray(arr, size) {
var shuffled = arr.slice(0), i = arr.length, min = i - size, temp, index;
while (i-- > min) {
index = Math.floor((i + 1) * Math.random());
temp = shuffled[index];
shuffled[index] = shuffled[i];
shuffled[i] = temp;
}
return shuffled.slice(min);
}
How can you obtain a randomized subset of a list in Java, without using Collections.shuffle?
Shuffling the whole list would be a bit of a waste of resources if what you want is significantly smaller. I'd personally just pick n unique random numbers between 0..size and use the objects at those indexes for the randomised subset.
If you're talking about picking a random subset very close to the size of the whole collection, then chances are you're better of just calling Collections.shuffle()
and choosing the first n
entries. But if we're talking about ~5 / 50,000, definitely use the above approach.
Pick a unique random subset from a set of unique values
Create a random permutation of the range 0, 1, ..., N - 1
and pick the first M
of them; use those as indices into your original vector.
A random permutation is easily made with the standard library by using std::iota
together with std::random_shuffle
:
std::vector<Heavy> v; // given
std::vector<unsigned int> indices(V.size());
std::iota(indices.begin(), indices.end(), 0);
std::random_shuffle(indices.begin(), indices.end());
// use V[indices[0]], V[indices[1]], ..., V[indices[M-1]]
You can supply random_shuffle
with a random number generator of your choice; check the documentation for details.
Quick way of creating a random subset of an array using IntStream
In this case, you want to select streamSize
distinct elements from an input array.
You can use the Random#ints(randomNumberOrigin, randomNumberBound)
method:
Returns an effectively unlimited stream of pseudorandom int values, each conforming to the given origin (inclusive) and bound (exclusive).
This returns a stream of random integers in the specified range. To ensure distinct values, distinct()
is called and limit(...)
allows to only keep the number of elements we want.
Random random = new Random();
Player[] subset = random.ints(0, allPlayers.length)
.distinct()
.limit(subsetSize)
.mapToObj(i -> allPlayers[i])
.toArray(Player[]::new);
I would note however, that, even if this is a one-liner, it is not as efficient as JB Nizet's solution since this continues generating integers until a distinct one is found.
How can I randomly select an item from a list?
Use random.choice()
:
import random
foo = ['a', 'b', 'c', 'd', 'e']
print(random.choice(foo))
For cryptographically secure random choices (e.g., for generating a passphrase from a wordlist), use secrets.choice()
:
import secrets
foo = ['battery', 'correct', 'horse', 'staple']
print(secrets.choice(foo))
secrets
is new in Python 3.6. On older versions of Python you can use the random.SystemRandom
class:
import random
secure_random = random.SystemRandom()
print(secure_random.choice(foo))
Related Topics
How I Save and Retrieve an Image on My Server in a Java Webapp
Printing Even and Odd Using Two Threads in Java
What Does the Registernatives() Method Do
Comparing Strings by Their Alphabetical Order
How to Do an Integer.Parseint() for a Decimal Number
How to Find Out What Type Each Object Is in a Arraylist<Object>
Throw Checked Exceptions from Mocks with Mockito
How to Stop Selenium from Creating Temporary Firefox Profiles Using Web Driver
Memory Leak When Redeploying Application in Tomcat
What Does Maven Do, in Theory and in Practice? When Is It Worth to Use It
Why Doesn't String's Hashcode() Cache 0
Spring Boot Jackson Date and Timestamp Format
Spring Configure @Responsebody JSON Format
How to Load/Reference a File as a File Instance from the Classpath
Convert String to Int Array in Java
What Is the Right Action to Take Upon Closing Windows in Java/Swing