Sampling a random subset from an array
I suggest shuffling a copy of the array using the Fisher-Yates shuffle and taking a slice:
function getRandomSubarray(arr, size) {
var shuffled = arr.slice(0), i = arr.length, temp, index;
while (i--) {
index = Math.floor((i + 1) * Math.random());
temp = shuffled[index];
shuffled[index] = shuffled[i];
shuffled[i] = temp;
}
return shuffled.slice(0, size);
}
var x = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];
var fiveRandomMembers = getRandomSubarray(x, 5);
Note that this will not be the most efficient method for getting a small random subset of a large array because it shuffles the whole array unnecessarily. For better performance you could do a partial shuffle instead:function getRandomSubarray(arr, size) {
var shuffled = arr.slice(0), i = arr.length, min = i - size, temp, index;
while (i-- > min) {
index = Math.floor((i + 1) * Math.random());
temp = shuffled[index];
shuffled[index] = shuffled[i];
shuffled[i] = temp;
}
return shuffled.slice(min);
}
Efficient random sampling from a huge list
Elaborating on aix's answer above, to choose k
out of a stream of items, read the items one at a time. Keep the first k
items in a set S
.
Now when reading the m
-th item I
(m>k
now), keep it with probability k/m
. If you do keep it, select an item U
uniformly at random from S
, and replace U
with I
.
The proof that this yields all subsets of size k
with equal probability is based on induction on m
. Note that you don't need to know n
(the total number of items) in advance, and that S
at each step is suitable. The algorithm is "streaming" - it doesn't require storing all items, or making a second pass.
Random subset of a fixed set in Julia
The best method I can think of for sampling without replacement is to use the sample
method from StatsBase (Doc). Unfortunately, this method currently only works for indexable collections. So you would have to convert your Set to an Array first and your sample back to Set.
using StatsBase
A = Set([1, 2, 3, 4, 5])
S = Set(sample(collect(A), 3, replace = false))
sample from randomly generated numbers?
From your code, the resulting x
is a numpy array.
To check type(x)
output: numpy.ndarray
But the function random.sample(sequence, k)
can only take in a sequence that is a list
, tuple
, string
, or set
. So your code could be:
import random
N = 1000
x = 1 + 500*np.random.rand(N)
x = list(x)
sp_x = random.sample(x, 100)
print(len(sp_x))
output: 100
NumPy: Fastest way to generate random subset that includes particular element
Approach #1
Well the problem is such that i
has to be there and rest must be random, but uniformly distributed. So, we can use np.random.choice
with replace=False
on the rest and append with i
, Then, randomize it and that's our output. Also, no iterations needed.
Hence, it would be -
def create_rand_ar(n,k,i):
sel_ar = np.r_[:i,i+1:n]
sel_ar_incl_i = np.r_[i,np.random.choice(sel_ar, k-1, replace=False)]
np.random.shuffle(sel_ar_incl_i) # skip if order does not matter
return sel_ar_incl_i
To verify that we have i
always there and rest have equal probabilities to be in the output, here's a run on a large number of iterations and checking the count of occurrences, which should be uniform -In [84]: n = 7
...: k = 4
...: i = 2
In [85]: outputs = np.array([create_rand_ar(n,k,i) for _ in range(10000)])
In [87]: np.bincount(outputs.ravel())
Out[87]: array([ 5023, 5061, 10000, 4992, 4902, 5006, 5016])
Approach #2Another way would be to create a uniform random array in [0,1)
, set the i-th element to be something < 0
. Then, do an efficient argparititon
and select first k
elements, which guarantees the inclusion of i
and that's our output. Hence, it would be -
def create_rand_ar_v2(n,k,i):
r = np.random.rand(n)
r[i] = -1
return r.argpartition(k)[:k]
Verify distribution -In [168]: outputs = np.array([create_rand_ar_v2(n,k,i) for _ in range(10000)])
In [169]: np.bincount(outputs.ravel())
Out[169]: array([ 4946, 5055, 10000, 5071, 4972, 5038, 4918])
Timings -In [165]: n = 7
...: k = 4
...: i = 2
In [166]: %timeit create_rand_ar(n,k,i)
10000 loops, best of 3: 107 µs per loop
In [167]: %timeit create_rand_ar_v2(n,k,i)
100000 loops, best of 3: 2.27 µs per loop
Randomly sample sub-arrays from a 2D array in python
Here is a sampler that creates a sample cut from an array of any dimensionality. It uses functions to control where to start the cut and for how wide the cut should be along any axis.
Here is an explanation of the parameters:
arr
- the input numpy array.loc_sampler_fn
- this is the function you want to use to set the corner of the box. If you want the corner of the box to be sampled uniformly from the anywhere along the axis, usenp.random.uniform
. If you want the corner to be closer to the center of the array, usenp.random.normal
. However, we need to tell the function what range to sample over. This brings us to the next parameter.loc_dim_param
- this passes the size of each axis toloc_sampler_fn
. If we are usingnp.random.uniform
for the location sampler, we want to sample from the entire range of the axis.np.random.uniform
has two parameters:low
andhigh
, so by passing the length of the axis tohigh
it samples uniformly over the entire axis. In other words, if the axis has length120
we wantnp.random.uniform(low=0, high=120)
, so we would setloc_dim_param='high'
.loc_params
- this passes any additional parameters toloc_sampler_fn
. Keeping with the example, we need to passlow=0
tonp.random.uniform
, so we pass the dictionaryloc_params={'low':0}
.
shape_sampler_fn=np.random.uniform
, with shape_dim_param=None
since we are not using the size of the axis for anything, and shape_params={'low':3, 'high':11}
.def box_sampler(arr,
loc_sampler_fn,
loc_dim_param,
loc_params,
shape_sampler_fn,
shape_dim_param,
shape_params):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
loc_sampler_fn : function
The function to determine the where the minimum coordinate
for each axis should be placed.
loc_dim_param : string or None
The parameter in `loc_sampler_fn` that should use the axes
dimension size
loc_params : dict
Parameters to pass to `loc_sampler_fn`.
shape_sampler_fn : function
The function to determine the width of the sample cut
along each axis.
shape_dim_param : string or None
The parameter in `shape_sampler_fn` that should use the
axes dimension size.
shape_params : dict
Parameters to pass to `shape_sampler_fn`.
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
slices = []
for dim in arr.shape:
if loc_dim_param:
loc_params.update({loc_dim_param: dim})
if shape_dim_param:
shape_params.update({shape_dim_param: dim})
start = int(loc_sampler_fn(**loc_params))
stop = start + int(shape_sampler_fn(**shape_params))
slices.append(slice(start, stop))
return slices, arr[slices]
Example for a uniform cut on a 2D array with widths between 3 and 9:a = np.random.randint(0, 1+1, size=(100,150))
box_sampler(a,
np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':3, 'high':10})
# returns:
([slice(49, 55, None), slice(86, 89, None)],
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[0, 0, 1],
[1, 1, 1],
[1, 1, 0]]))
Examples for taking 2x2x2 chunks from a 10x20x30 3D array:a = np.random.randint(0,2,size=(10,20,30))
box_sampler(a, np.random.uniform, 'high', {'low':0},
np.random.uniform, None, {'low':2, 'high':2})
# returns:
([slice(7, 9, None), slice(9, 11, None), slice(19, 21, None)],
array([[[0, 1],
[1, 0]],
[[0, 1],
[1, 1]]]))
Update based on the comments.
For your specific purpose, it looks like you want a rectangular sample where the starting corner is uniformly sampled from anywhere in the array, and the the width of the sample along each axis is uniformly sampled, but can be limited.Here is a function that generates these samples. min_width
and max_width
can accept iterables of integers (such as a tuple) or a single integer.
def uniform_box_sampler(arr, min_width, max_width):
'''
Extracts a sample cut from `arr`.
Parameters:
-----------
arr : array
The numpy array to sample a box from
min_width : int or tuple
The minimum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
max_width : int or tuple
The maximum width of the box along a given axis.
If a tuple of integers is supplied, it my have the
same length as the number of dimensions of `arr`
Returns:
--------
(slices, x) : A tuple of the slices used to cut the sample as well as
the sampled subsection with the same dimensionality of arr.
slice :: list of slice objects
x :: array object with the same ndims as arr
'''
if isinstance(min_width, (tuple, list)):
assert len(min_width)==arr.ndim, 'Dimensions of `min_width` and `arr` must match'
else:
min_width = (min_width,)*arr.ndim
if isinstance(max_width, (tuple, list)):
assert len(max_width)==arr.ndim, 'Dimensions of `max_width` and `arr` must match'
else:
max_width = (max_width,)*arr.ndim
slices = []
for dim, mn, mx in zip(arr.shape, min_width, max_width):
fn = np.random.uniform
start = int(np.random.uniform(0,dim))
stop = start + int(np.random.uniform(mn, mx+1))
slices.append(slice(start, stop))
return slices, arr[slices]
Example of generating a box cut that starts uniformly anywhere in the array, the height is a random uniform draw from 1 to 4 and the width is a random uniform draw from 2 to 6 (just to show). In this case, the size of the box was 3 by 4, starting at the 66th row and 19th column.x = np.random.randint(0,2,size=(100,100))
uniform_box_sampler(x, (1,2), (4,6))
# returns:
([slice(65, 68, None), slice(18, 22, None)],
array([[1, 0, 0, 0],
[0, 0, 1, 1],
[0, 1, 1, 0]]))
Related Topics
Focus Next Element in Tab Index
How to Test for Nan in JavaScript
Using Fetch API to Access JSON
What Do Double Brackets Mean in JavaScript and How to Access Them
Get Width Height of Remote Image from Url
Es6 Classes:What About Instrospection
JavaScript Es6 Typeerror: Class Constructor Client Cannot Be Invoked Without 'New'
Decompress Gzip and Zlib String in JavaScript
Sort an Array by the "Levenshtein Distance" with Best Performance in JavaScript
JavaScript Property with Three Dots (...)
Can Jquery Provide the Tag Name
Conversion Between Utf-8 Arraybuffer and String
How to Calculate How Many Seconds Between Two Dates
Basic Ajax Send/Receive with Node.Js
Twig Variable in External Js File
How to Pass a Flag to Gulp to Have It Run Tasks in Different Ways