Take N Random Elements from a List<E>

Take n random elements from a ListE?

Two main ways.

  1. Use Random#nextInt(int):

    List<Foo> list = createItSomehow();
    Random random = new Random();
    Foo foo = list.get(random.nextInt(list.size()));

    It's however not guaranteed that successive n calls returns unique elements.

  2. Use Collections#shuffle():

    List<Foo> list = createItSomehow();
    Collections.shuffle(list);
    Foo foo = list.get(0);

    It enables you to get n unique elements by an incremented index (assuming that the list itself contains unique elements).


In case you're wondering if there's a Java 8 Stream approach; no, there isn't a built-in one. There's no such thing as Comparator#randomOrder() in standard API (yet?). You could try something like below while still satisfying the strict Comparator contract (although the distribution is pretty terrible):

List<Foo> list = createItSomehow();
int random = new Random().nextInt();
Foo foo = list.stream().sorted(Comparator.comparingInt(o -> System.identityHashCode(o) ^ random)).findFirst().get();

Better use Collections#shuffle() instead.

Select N random elements from a List efficiently (without toArray and change the list)

You are probably looking for something like Resorvoir Sampling.

Start with an initial array with first k elements, and modify it with new elements with decreasing probabilities:

java like pseudo code:

E[] r = new E[k]; //not really, cannot create an array of generic type, but just pseudo code
int i = 0;
for (E e : list) {
//assign first k elements:
if (i < k) { r[i++] = e; continue; }
//add current element with decreasing probability:
j = random(i++) + 1; //a number from 1 to i inclusive
if (j <= k) r[j] = e;
}
return r;

This requires a single pass on the data, with very cheap ops every iteration, and the space consumption is linear with the required output size.

How can I randomly select an item from a list?

Use random.choice():

import random

foo = ['a', 'b', 'c', 'd', 'e']
print(random.choice(foo))

For cryptographically secure random choices (e.g., for generating a passphrase from a wordlist), use secrets.choice():

import secrets

foo = ['battery', 'correct', 'horse', 'staple']
print(secrets.choice(foo))

secrets is new in Python 3.6. On older versions of Python you can use the random.SystemRandom class:

import random

secure_random = random.SystemRandom()
print(secure_random.choice(foo))

Pick multiple random elements from a list in Java

Try this:

public static List<String> pickNRandom(List<String> lst, int n) {
List<String> copy = new ArrayList<String>(lst);
Collections.shuffle(copy);
return n > copy.size() ? copy.subList(0, copy.size()) : copy.subList(0, n);
}

I'm assuming that there are no repeated elements in the input list, also I take the precaution of shuffling a copy for leaving the original list undisturbed. Use it like this:

List<String> randomPicks = pickNRandom(teamList, 3);

Select Random element from ArrayList print then remove It

Fast and simple, make use of Collections.shuffle

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class Main {
public static void main(String[] args) {
List<String> list = new ArrayList<String>();
list.add("a");
list.add("b");
list.add("c");
list.add("d");
list.add("e");

while (!list.isEmpty()) {
// You don't need to do this on each iteration
// but this is going to give you a really
// random result
Collections.shuffle(list);
String value = list.remove(0);
System.out.println(value);
}
}
}

First run...

e
a
c
d
b

Second run...

a
e
d
b
c

Obviously, each run will give different results

Select N random elements from a ListT in C#

Iterate through and for each element make the probability of selection = (number needed)/(number left)

So if you had 40 items, the first would have a 5/40 chance of being selected. If it is, the next has a 4/39 chance, otherwise it has a 5/39 chance. By the time you get to the end you will have your 5 items, and often you'll have all of them before that.

This technique is called selection sampling, a special case of Reservoir Sampling. It's similar in performance to shuffling the input, but of course allows the sample to be generated without modifying the original data.

How to randomly replace elements of list with another list elements?

You could use numpy.random.choice to randomly select elements where you want the replacements to occur in the original list. Then zip those indexes against the values you want to use for the substitution and apply the replacements.

from numpy.random import choice

def swap_elements(x, t):
new_x = x[:]
for idx, value in zip(choice(range(len(x)), size=len(t), replace=False), t_list):
new_x[idx] = value
return new_x

Example usage

>>> x_list = ['a', 'b', 'c', 'd', 'e']
>>> t_list = ['z', 'y', 'u']
>>> swap_elements(x_list, t_list)
['y', 'u', 'z', 'd', 'e']
>>> swap_elements(x_list, t_list)
['y', 'b', 'z', 'u', 'e']
>>> swap_elements(x_list, t_list)
['y', 'b', 'z', 'u', 'e']
>>> swap_elements(x_list, t_list)
['a', 'u', 'z', 'y', 'e']

Getting random elements from a list while another thread is mutating it

COW list works as follows:

  • It has a single relevant field, which is Object[], containing the elements in the list.
  • Any call that mutates (so, clear, add, addAll, remove, etcetera) will copy that inner array, apply the mutation to the copy, then it sets its field to this new array. Given how java works, the 'old' array still exists, that field is just a pointer.
  • Notably any iterators (for (Foo f : theCowList) is one way to make one) copy over the 'pointer', which means any existing iterators just keep on truckin'. They won't cover whatever you added. They will cover whatever you removed - an iterator made from a cowList will iterate through all elements as there were when you made the iterator regardless of any mutations you perform afterwards. That is the 'point' of a COWList. If you don't need that functionality it is exceedingly unlikely that COWList is the right type for you.

COWList by its nature is atomic and safe for any single call. However, your operation is three calls: size() and then indexOf and size() again. There is no guarantee of atomicity here. Your concern is valid.

General mindset update

What you're doing is so-called 'check-then-act': You check the state of the object (is it empty?), and then you decide how to act.

This is the wrong way to go about it in concurrency land. The proper model is act-then-check: You assume fair weather, perform your action, and then react to invalid state afterwards. This also means it is required for the 'act' part to do the checking intrinsically. If no such intrinsic operation is available, neither model works, and that is the point: In such a case you must find a way to inject synchronization, or the operation is impossible to do safely in a concurrent operation. Example: "Add an element if it is not already in the list" is impossible in a concurrent fashion in a plain jane ArrayList.

A second important realization when working with concurrency is that if you want to shift the concurrency checking to the object itself, then the only way to interact with that object that is safe is single calls. You are making 3 calls. No good. You get just the one.

Here's an example of a different operation that could work here: Let's say you have a method that removes the first item from your list, throwing NSElemEx if the list is empty.

Your 'style' would write this:

public E pollFirst() {
// This code is wrong! do not use!
if (isEmpty()) throw new NoSuchElementException();
return remove(0);
}

This is wrong - you're mixed up 2 calls. You only get one. This is the right way:

public E pollFirst() {
try {
return remove(0);
} catch (ArrayIndexOutOfBoundsException e) {
throw new NoSuchElementException();
}
}

Note how we first act, assuming fair weather: We just remove the 0th element, not knowing if there is even such an element in the first place. If this fails, we react to the failure (if you call remove(0) on an empty COWList, an ArrayIndexOutOfBoundsException falls out, we react to that here).

So how do we do it?

The unfortunate answer is, not possible - COWList doesn't have a single call that does what you want, thus, you're dead in the water. COWList promises certain concurrency safeties; to do this, it has a package private (so you can't see it!) field called lock. Operations such as remove(int index) acquire this lock and thus will cause any threads to e.g. also call remove to freeze until the first thread is done.

You cannot use this lock. This causes all sorts of problems. We're just stuck at this point, and we need to first go back and correct a bunch of style errors you've committed.

Prefer composition over inheritance

You've decided to extend COWList. That means you're saying that any instance of your object acts in all ways like a COWList and offers additional features on top of that. The problem is, COWList and what you want are at odds. In particular, in order to befit the idea of 'you are a COWList with bolted on extras', you'd need to copy its locking behaviour: Your operation of grabbing a random element requires locking out other threads because COWList as an impl simply doesn't offer the API you'd need to do it without a lock, but you can't actually see the lock, which means you just can't do this.

The real problem, though, is that you really do not want to be 'a COWList with additional functionality'. That as a concept is rarely a good idea: COWList is already a complex beast, thus anything that is by definition 'a COWList with addons' is even more complicated. Forget COWList, what you want is much simpler: A datastructure to which you can add things concurrently and grab (concurrency-safe) random elements from it. That's all you want. Why bring COWList's rules into it? You're using COWlist here because it's convenient - most of the work is done for you. That's great, but, don't 'expose' that. Keep implementation details internalized. Thus, write it like so:

public class SomeList<E> extends AbstractList<E> {
private final CopyOnWriteArrayList<E> list = new CopyOnWriteArrayList<E>();
}

Here you just adopt the rules of List itself, but given that your type is named SomeList, I assume you intentionally want that. The mental 'load' of what AbstractList enforces is vastly simpler than what COWList enforces. Now you merely implement all methods you need to, usually as one-liners that just farm out the work to your COWList.

Now we're at least closer to doing it right: You can now make your own lock object or just define in the docs that all operations synchronize on the SomeList instance itself (which means you just add the synchronized keyword to every method - that's shorthand for wrapping the entire method body in synchronized (this) { body-goes-here }.

Sticking a field in a class and implementing many methods as one-liners that just invoke a similar method on said field is called composition, vs adding an extends clause and letting it serve as implementation for most of the methods you intend to expose, which is called inheritance. Prefer composition over inheritance.

Now that you're taking care of the synchronizing on your own, there is absolutely no need for COWList anymore. That inner list can just become a plain jane ArrayList. Much more efficient.

Efficiency

"Just synchronize on all the things" is a bit of a cop-out. synchronized (x) { ... } simply means that only one thread can be in such a block for any given x. Just 'blocking' threads is the easy way out. A really cool datastructure would do some or all of the work without holding the lock. Separately, 'remove item from the middle of a large arraylist' is a slow operation (because all elements that follow must be copied over to one slot earlier; arraylist fundamentally works as a continuous set of elements, hence why removing one in the middle is so pricey).

Thus, even the 'fixed' case of using composition over inheritance is quite inefficient: It uses locks (ouch), and removes elements from the middle of arraylists (double ouch).

Making concurrency-safe data structures that are still fast require three things:

[1] It is quite an artform. You need to come up with very creative ideas.

[2] It's trade-offs, so be sure to be very specific in what you data structure can and cannot do. You often end up with datastructures that take up (way) more space in memory in order to allow lock-free access. Also often seemingly simple operations such as 'how many elements are in you' are impossible to efficiently answer, so don't offer that functionality. There's a reason the java.util.concurrent package has so many classes in it: There is no god-class that can do it all and do it efficiently in both memory, CPU, and lock aspects.

[3] Be clear in the difference between 'trends' and 'guarantees'. For example, it's much easier to write a fast, efficient system that can give you (and remove) a random element from a humongous list of elements that 'trends towards' picking fairly from the bunch, vs. one that guarantees that it is exactly uniformly randomly chosen every time.

Part of the art is to know when you can make do with a trend instead of a guarantee, usually orders-of-magnitude speedups lie in such realizations.

Here are some ideas on how to make hypothetically really fast ones:

Do you add everything, and then you 'switch the mode' and never add another element again, instead only removing from there on out?

If that's the case, a much faster idea:

Have one object that is 'modal'. It starts out with just having an add method (and possibly addAll). Nothing else. Not even .get (don't add API you don't need, it only limits your flexibility when trying to optimize things!). It doesn't even promise concurrency-safe behaviour (calling add from 2 threads simultaneously will fail disastrously at times). When 'done', you call a build() method. This returns a new object that ONLY has poll() method that returns an arbitrary element in a concurrency-safe fashion.

If that's your API you can write it very efficiency and entirely lock free!

Your builder is just a really simple class that contains a plain jane arraylist and fills it:

class RandomPollerBuilder<E> {
private final ArrayList<E> data = new ArrayList<E>();
private boolean open = true;

public void add(E elem) {
if (!open) throw new IllegalStateException();
data.add(elem);
}

public RandomPoller<E> build() {
open = false;
data.trimToSize();
Collection.shuffle(data);
return new RandomPoller<E>(data);
}
}

The random poller then looks something like:

public class RandomPoller<E> {
private final AtomicInteger pos;
private final List<E> input;

RandomPoller<E>(List<E> input) {
this.input = input;
this.pos = new AtomicInteger(input.size());
}

public E poll() {
int idx = pos.decrementAndGet();
if (idx >= 0) return input.get(idx);
pos.incrementAndGet();
throw new NoSuchElementException();
}

public int size() {
return Math.max(0, pos.get());
}
}

Here we combine a boatload of properties:

  • We don't remove at all. Removal tends to be slow, if we can avoid it, why not?
  • We rely on AtomicInteger which gives us a way to guarantee a strict and complete ordering in a concurrent environment in a way that is considerably faster than locks.
  • We reduced the implementation to just the bits you really need. Every method is painful to write. I included a size() method to show this: The way the poll algorithm works is that it increments 0 to -1 or even worse (if multiple threads all simultaneously attempt to poll() an empty poller, that atomicinteger can go lower than -1, but eventually it gets back to 0, so no risk of 'rolling over' our atomic integer as no system is going to have 2 billion threads). We need to guard against this in our size() method.
  • It also means: If you don't particularly need the efficiency, then don't bother with any of this and make globally locking data structures. The cop-out is easy and works (it's just not efficient). Testing concurrency is extremely difficult (because things merely could go wrong in the right circumstance, they aren't guaranteed to. In tests you WANT broken code to actually fail, and therein lies the rub. That's virtually impossible to do!) - so don't go there unless you absolutely know what you are doing and you're sure you need it.


Related Topics



Leave a reply



Submit