Generating Multidimensional Data

Generating multidimensional data

Also check out the copula package. This will generate data within a cube/hypercube with uniform margins, but with correlation structures that you set. The generated variables can then be transformed to represent other shapes, but still with relations other than independent.

If you want more complex shapes but are happy with uniform and idependent within the shape then you can just do rejection sampling: generate data within a cube that contains your shape, then test if the points are within your shape, reject them if not, then keep doing this until there are enough points.

How can I dynamically generate and populate a multi-dimensional array

You can call Array.CreateInstance with the actual ElementType which is int in this case.

var indices = new[] { 2, 3 };                
var arr = Array.CreateInstance(typeof(int), indices);

Then you can populate the array with SetValue without any exception. For example

var value = 1;
for (int i = 0; i < indices[0]; i++)
{
for (int j = 0; j < indices[1]; j++)
{
arr.SetValue(value++, new[] { i, j });
}
}

//arr = [ [ 1, 2, 3 ], [ 4, 5, 6 ] ]

How do I generate a multidimensional NumPy array with random numbers with the dimension of another array whose dimension is not declared?

Use

np.random.randn(*some_array.shape)

Generate n-dimensional random numbers in Python

Numpy has multidimensional equivalents to the functions in the random module

The function you're looking for is numpy.random.normal

Tensorflow: Creating a TensorFlow dataset using multi-dimensional input data with differing length. (Video Data)

To create a dataset of videos of different length I suggest something like that:

file_names = [str(i) for i in range(20)]

def dummy_read_file(name):
length = tf.random.uniform(shape=[], minval=10, maxval=40, dtype=tf.int32)
return tf.random.normal(shape=[length, 2, 21, 3])

dataset = tf.data.Dataset.from_tensor_slices(file_names)
dataset = dataset.map(lambda file_name: {"file_name": file_name, "video": dummy_read_file(file_name)})
dataset = dataset.padded_batch(4)

for batch in dataset.as_numpy_iterator():
print(batch["video"].shape)

# (4, 28, 2, 21, 3)
# (4, 24, 2, 21, 3)
# (4, 27, 2, 21, 3)
# (4, 23, 2, 21, 3)
# (4, 26, 2, 21, 3)

In order to make batches of closed length for better performance
replace dataset = dataset.padded_batch(4) as follows

...
dataset = dataset.apply(tf.data.experimental.bucket_by_sequence_length(
element_length_func=lambda sample: tf.shape(sample["video"])[0],
bucket_boundaries=[20, 30],
bucket_batch_sizes=[5, 4, 3],
))
...

for batch in dataset.as_numpy_iterator():
print(batch["video"].shape)

# (4, 27, 2, 21, 3)
# (5, 16, 2, 21, 3)
# (5, 19, 2, 21, 3)
# (4, 26, 2, 21, 3)
# (2, 11, 2, 21, 3)

Or use
tf.data.Dataset.bucket_by_sequence_length
for latest TensorFlow versions.

You can also try
tf.RaggedTensor
but I cannot recommend it. It may be unstable for very big tensors like entire video dataset and practically useless for batches.

For further optimization make the bucketing before actual file upload by video length precalculation.

How to create pandas dataframes with more than 2 dimensions?

Rather than using an n-dimensional Panel, you are probably better off using a two dimensional representation of data, but using MultiIndexes for the index, column or both.

For example:

np.random.seed(1618033)

#Set 3 axis labels/dims
years = np.arange(2000,2010) #Years
samples = np.arange(0,20) #Samples
patients = np.array(["patient_%d" % i for i in range(0,3)]) #Patients

#Create random 3D array to simulate data from dims above
A_3D = np.random.random((years.size, samples.size, len(patients))) #(10, 20, 3)

# Create the MultiIndex from years, samples and patients.
midx = pd.MultiIndex.from_product([years, samples, patients])

# Create sample data for each patient, and add the MultiIndex.
patient_data = pd.DataFrame(np.random.randn(len(midx), 3), index = midx)

>>> patient_data.head()
0 1 2
2000 0 patient_0 -0.128005 0.371413 -0.078591
patient_1 -0.378728 -2.003226 -0.024424
patient_2 1.339083 0.408708 1.724094
1 patient_0 -0.997879 -0.251789 -0.976275
patient_1 0.131380 -0.901092 1.456144

Once you have data in this form, it is relatively easy to juggle it around. For example:

>>> patient_data.unstack(level=0).head()  # Years.
0 ... 2
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ... 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
0 patient_0 -0.128005 0.051558 1.251120 0.666061 -1.048103 0.259231 1.535370 0.156281 -0.609149 0.360219 ... -0.078591 -2.305314 -2.253770 0.865997 0.458720 1.479144 -0.214834 -0.791904 0.800452 0.235016
patient_1 -0.378728 -0.117470 -0.306892 0.810256 2.702960 -0.748132 -1.449984 -0.195038 1.151445 0.301487 ... -0.024424 0.114843 0.143700 1.732072 0.602326 1.465946 -1.215020 0.648420 0.844932 -1.261558
patient_2 1.339083 -0.915771 0.246077 0.820608 -0.935617 -0.449514 -1.105256 -0.051772 -0.671971 0.213349 ... 1.724094 0.835418 0.000819 1.149556 -0.318513 -0.450519 -0.694412 -1.535343 1.035295 0.627757
1 patient_0 -0.997879 -0.242597 1.028464 2.093807 1.380361 0.691210 -2.420800 1.593001 0.925579 0.540447 ... -0.976275 1.928454 -0.626332 -0.049824 -0.912860 0.225834 0.277991 0.326982 -0.520260 0.788685
patient_1 0.131380 0.398155 -1.671873 -1.329554 -0.298208 -0.525148 0.897745 -0.125233 -0.450068 -0.688240 ... 1.456144 -0.503815 -1.329334 0.475751 -0.201466 0.604806 -0.640869 -1.381123 0.524899 0.041983

In order to select the data, please refere to the docs for MultiIndexing.

How to generate normal distributed multidimensional points

Package mnormt, function rmnorm()

set.seed(2)
require(mnormt)
varcov <- matrix(rchisq(4, 2), 2)
varcov <- varcov + t(varcov)

rmnorm(1000, mean=c(0,1), varcov=varcov)


Related Topics



Leave a reply



Submit