Generating multidimensional data
Also check out the copula package. This will generate data within a cube/hypercube with uniform margins, but with correlation structures that you set. The generated variables can then be transformed to represent other shapes, but still with relations other than independent.
If you want more complex shapes but are happy with uniform and idependent within the shape then you can just do rejection sampling: generate data within a cube that contains your shape, then test if the points are within your shape, reject them if not, then keep doing this until there are enough points.
How can I dynamically generate and populate a multi-dimensional array
You can call Array.CreateInstance
with the actual ElementType
which is int
in this case.
var indices = new[] { 2, 3 };
var arr = Array.CreateInstance(typeof(int), indices);
Then you can populate the array with SetValue
without any exception. For example
var value = 1;
for (int i = 0; i < indices[0]; i++)
{
for (int j = 0; j < indices[1]; j++)
{
arr.SetValue(value++, new[] { i, j });
}
}
//arr = [ [ 1, 2, 3 ], [ 4, 5, 6 ] ]
How do I generate a multidimensional NumPy array with random numbers with the dimension of another array whose dimension is not declared?
Use
np.random.randn(*some_array.shape)
Generate n-dimensional random numbers in Python
Numpy has multidimensional equivalents to the functions in the random module
The function you're looking for is numpy.random.normal
Tensorflow: Creating a TensorFlow dataset using multi-dimensional input data with differing length. (Video Data)
To create a dataset of videos of different length I suggest something like that:
file_names = [str(i) for i in range(20)]
def dummy_read_file(name):
length = tf.random.uniform(shape=[], minval=10, maxval=40, dtype=tf.int32)
return tf.random.normal(shape=[length, 2, 21, 3])
dataset = tf.data.Dataset.from_tensor_slices(file_names)
dataset = dataset.map(lambda file_name: {"file_name": file_name, "video": dummy_read_file(file_name)})
dataset = dataset.padded_batch(4)
for batch in dataset.as_numpy_iterator():
print(batch["video"].shape)
# (4, 28, 2, 21, 3)
# (4, 24, 2, 21, 3)
# (4, 27, 2, 21, 3)
# (4, 23, 2, 21, 3)
# (4, 26, 2, 21, 3)
In order to make batches of closed length for better performance
replace dataset = dataset.padded_batch(4)
as follows
...
dataset = dataset.apply(tf.data.experimental.bucket_by_sequence_length(
element_length_func=lambda sample: tf.shape(sample["video"])[0],
bucket_boundaries=[20, 30],
bucket_batch_sizes=[5, 4, 3],
))
...
for batch in dataset.as_numpy_iterator():
print(batch["video"].shape)
# (4, 27, 2, 21, 3)
# (5, 16, 2, 21, 3)
# (5, 19, 2, 21, 3)
# (4, 26, 2, 21, 3)
# (2, 11, 2, 21, 3)
Or usetf.data.Dataset.bucket_by_sequence_length
for latest TensorFlow versions.
You can also trytf.RaggedTensor
but I cannot recommend it. It may be unstable for very big tensors like entire video dataset and practically useless for batches.
For further optimization make the bucketing before actual file upload by video length precalculation.
How to create pandas dataframes with more than 2 dimensions?
Rather than using an n-dimensional Panel, you are probably better off using a two dimensional representation of data, but using MultiIndexes for the index, column or both.
For example:
np.random.seed(1618033)
#Set 3 axis labels/dims
years = np.arange(2000,2010) #Years
samples = np.arange(0,20) #Samples
patients = np.array(["patient_%d" % i for i in range(0,3)]) #Patients
#Create random 3D array to simulate data from dims above
A_3D = np.random.random((years.size, samples.size, len(patients))) #(10, 20, 3)
# Create the MultiIndex from years, samples and patients.
midx = pd.MultiIndex.from_product([years, samples, patients])
# Create sample data for each patient, and add the MultiIndex.
patient_data = pd.DataFrame(np.random.randn(len(midx), 3), index = midx)
>>> patient_data.head()
0 1 2
2000 0 patient_0 -0.128005 0.371413 -0.078591
patient_1 -0.378728 -2.003226 -0.024424
patient_2 1.339083 0.408708 1.724094
1 patient_0 -0.997879 -0.251789 -0.976275
patient_1 0.131380 -0.901092 1.456144
Once you have data in this form, it is relatively easy to juggle it around. For example:
>>> patient_data.unstack(level=0).head() # Years.
0 ... 2
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ... 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
0 patient_0 -0.128005 0.051558 1.251120 0.666061 -1.048103 0.259231 1.535370 0.156281 -0.609149 0.360219 ... -0.078591 -2.305314 -2.253770 0.865997 0.458720 1.479144 -0.214834 -0.791904 0.800452 0.235016
patient_1 -0.378728 -0.117470 -0.306892 0.810256 2.702960 -0.748132 -1.449984 -0.195038 1.151445 0.301487 ... -0.024424 0.114843 0.143700 1.732072 0.602326 1.465946 -1.215020 0.648420 0.844932 -1.261558
patient_2 1.339083 -0.915771 0.246077 0.820608 -0.935617 -0.449514 -1.105256 -0.051772 -0.671971 0.213349 ... 1.724094 0.835418 0.000819 1.149556 -0.318513 -0.450519 -0.694412 -1.535343 1.035295 0.627757
1 patient_0 -0.997879 -0.242597 1.028464 2.093807 1.380361 0.691210 -2.420800 1.593001 0.925579 0.540447 ... -0.976275 1.928454 -0.626332 -0.049824 -0.912860 0.225834 0.277991 0.326982 -0.520260 0.788685
patient_1 0.131380 0.398155 -1.671873 -1.329554 -0.298208 -0.525148 0.897745 -0.125233 -0.450068 -0.688240 ... 1.456144 -0.503815 -1.329334 0.475751 -0.201466 0.604806 -0.640869 -1.381123 0.524899 0.041983
In order to select the data, please refere to the docs for MultiIndexing.
How to generate normal distributed multidimensional points
Package mnormt
, function rmnorm()
set.seed(2)
require(mnormt)
varcov <- matrix(rchisq(4, 2), 2)
varcov <- varcov + t(varcov)
rmnorm(1000, mean=c(0,1), varcov=varcov)
Related Topics
Shiny Saving Url State Subpages and Tabs
Get Filename and Path of 'Source'D File
Keeping Zero Count Combinations When Aggregating with Data.Table
Increase the API Limit in Ggmap's Geocode Function (In R)
How to Use Plyr to Number Rows
Initialize an Empty Tibble with Column Names and 0 Rows
Fixing Set.Seed for an Entire Session
R: Text Progress Bar in for Loop
Arrange a Grouped_Df by Group Variable Not Working
How to Rank Within Groups in R
How to Change Font Size of the Correlation Coefficient in Corrplot
Sendmailr (Part2): Sending Files as Mail Attachments
How to Set Seed for Random Simulations with Foreach and Domc Packages
How to Attach a Simple Data.Frame to a Spatialpolygondataframe in R
R Shiny Error: Cannot Coerce Type 'Closure' to Vector of Type 'Double'