How to "Extract" Values from a Multidimensional Array in a Smart Way

How can I extract values from a multidimensional array in a smart way?

You can use Array#collect to execute a block for each element of the outer array. To get the first element, pass a block that indexes the array.

arr.collect {|ind| ind[0]}

In use:


arr = [["value1", "value1_other"], ["value2", "value2_other"], ["value3", "value3_other"]]
=> [["value1", "value1_other"], ["value2", "value2_other"], ["value3", "value3_other"]]
arr.collect {|ind| ind[0]}
=> ["value1", "value2", "value3"]

Instead of {|ind| ind[0]}, you can use Array#first to get the first element of each inner array:

arr.collect(&:first)

For the &:first syntax, read "Ruby/Ruby on Rails ampersand colon shortcut".

How to Flatten a Multidimensional Array?

You can use the Standard PHP Library (SPL) to "hide" the recursion.

$a = array(1,2,array(3,4, array(5,6,7), 8), 9);
$it = new RecursiveIteratorIterator(new RecursiveArrayIterator($a));
foreach($it as $v) {
echo $v, " ";
}

prints

1 2 3 4 5 6 7 8 9 

Extract indices of intersecting array from numpy 2D array in subarray

Consider the easy case when all the values are distinct:

A = np.arange(25).reshape(5,5)
ans = [1,3,4]
B = A[np.ix_(ans, ans)]

In [287]: A
Out[287]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])

In [288]: B
Out[288]:
array([[ 6, 8, 9],
[16, 18, 19],
[21, 23, 24]])

If we test the first row of B with each row of A, we will eventually come to the
comparison of [6, 8, 9] with [5, 6, 7, 8, 9] from which we can glean the
candidate solution of indices [1, 3, 4].

We can generate a set of all possible candidate solutions by pairing the first
row
of B with each row of A.

If there is only one candidate, then we are done, since we are given that B is a
submatrix of A and therefore there is always a solution.

If there is more than one candidate, then we can do the same thing with the
second row of B, and take the intersection of the candidate solutions -- After
all, a solution must be a solution for each and every row of B.

Thus we can loop through the rows of B and short-circuit once we find there
is only one candidate. Again, we are assuming that B is always a submatrix of A.

The find_idx function below implements the idea described above:

import itertools as IT
import numpy as np

def find_idx_1d(rowA, rowB):
result = []
if np.in1d(rowB, rowA).all():
result = [tuple(sorted(idx))
for idx in IT.product(*[np.where(rowA==b)[0] for b in rowB])]
return result

def find_idx(A, B):
candidates = set([idx for row in A for idx in find_idx_1d(row, B[0])])
for Bi in B[1:]:
if len(candidates) == 1:
# stop when there is a unique candidate
return candidates.pop()
new = [idx for row in A for idx in find_idx_1d(row, Bi)]
candidates = candidates.intersection(new)
if candidates:
return candidates.pop()
raise ValueError('no solution found')

Correctness: The two solutions you've proposed may not always return the correct result, particularly when there are repeated values. For example,

def is_solution(A, B, idx):
return np.allclose(A[np.ix_(idx, idx)], B)

def find_idx_orig(A, B):
index = []
for j in range(len(B)):
k = 0
while k<len(A) and set(np.intersect1d(B[j],A[k])) != set(B[j]):
k+=1
index.append(k)
return index

def find_idx_diag(A, B):
index = []
a = np.diag(A)
b = np.diag(B)
for j in range(len(b)):
k = 0
while a[j+k] != b[j] and k<len(A):
k+=1
index.append(k+j)
return index

def counterexample():
"""
Show find_idx_diag, find_idx_orig may not return the correct result
"""
A = np.array([[1,2,0],
[2,1,0],
[0,0,1]])
ans = [0,1]
B = A[np.ix_(ans, ans)]
assert not is_solution(A, B, find_idx_orig(A, B))
assert is_solution(A, B, find_idx(A, B))

A = np.array([[1,2,0],
[2,1,0],
[0,0,1]])
ans = [1,2]
B = A[np.ix_(ans, ans)]

assert not is_solution(A, B, find_idx_diag(A, B))
assert is_solution(A, B, find_idx(A, B))

counterexample()

Benchmark: Ignoring at our peril the issue of correctness, out of curiosity
let's compare these functions on the basis of speed.

def make_AB(n, m):
A = symmetrize(np.random.random((n, n)))
ans = np.sort(np.random.choice(n, m, replace=False))
B = A[np.ix_(ans, ans)]
return A, B

def symmetrize(a):
"http://stackoverflow.com/a/2573982/190597 (EOL)"
return a + a.T - np.diag(a.diagonal())

if __name__ == '__main__':
counterexample()
A, B = make_AB(500, 450)
assert is_solution(A, B, find_idx(A, B))

In [283]: %timeit find_idx(A, B)
10 loops, best of 3: 74 ms per loop

In [284]: %timeit find_idx_orig(A, B)
1 loops, best of 3: 14.5 s per loop

In [285]: %timeit find_idx_diag(A, B)
100 loops, best of 3: 2.93 ms per loop

So find_idx is much faster than find_idx_orig, but not as fast as
find_idx_diag.

Matlab - Accessing a part of a multidimensional array

Using comma separated lists you can make it a more quick and friendly:

% some test data
ind1 = [2 1 5 4];
ind2 = [3 20 5 7];
X = randi(99,20,20,20,20);

% get all subscripts in column format
vecs = arrayfun(@colon,ind1,ind2,'un',0);
% extract the values
result = X(vecs{:});

Multi-dimensional arrays in Bash

Bash does not support multidimensional arrays, nor hashes, and it seems that you want a hash that values are arrays. This solution is not very beautiful, a solution with an xml file should be better :

array=('d1=(v1 v2 v3)' 'd2=(v1 v2 v3)')
for elt in "${array[@]}";do eval $elt;done
echo "d1 ${#d1[@]} ${d1[@]}"
echo "d2 ${#d2[@]} ${d2[@]}"

EDIT: this answer is quite old, since since bash 4 supports hash tables, see also this answer for a solution without eval.

PHP Multidimensional Array Searching (Find key by specific value)

Very simple:

function myfunction($products, $field, $value)
{
foreach($products as $key => $product)
{
if ( $product[$field] === $value )
return $key;
}
return false;
}

Selecting Random Windows from Multidimensional Numpy Array Rows

Here's one leveraging np.lib.stride_tricks.as_strided -

def random_windows_per_row_strided(arr, W=3):
idx = np.random.randint(0,arr.shape[1]-W+1, arr.shape[0])
strided = np.lib.stride_tricks.as_strided
m,n = arr.shape
s0,s1 = arr.strides
windows = strided(arr, shape=(m,n-W+1,W), strides=(s0,s1,s1))
return windows[np.arange(len(idx)), idx]

Runtime test on bigger array with 10,000 rows -

In [469]: arr = np.random.rand(100000,100)

# @Psidom's soln
In [470]: %timeit select_random_windows(arr, window_size=3)
100 loops, best of 3: 7.41 ms per loop

In [471]: %timeit random_windows_per_row_strided(arr, W=3)
100 loops, best of 3: 6.84 ms per loop

# @Psidom's soln
In [472]: %timeit select_random_windows(arr, window_size=30)
10 loops, best of 3: 26.8 ms per loop

In [473]: %timeit random_windows_per_row_strided(arr, W=30)
100 loops, best of 3: 9.65 ms per loop

# @Psidom's soln
In [474]: %timeit select_random_windows(arr, window_size=50)
10 loops, best of 3: 41.8 ms per loop

In [475]: %timeit random_windows_per_row_strided(arr, W=50)
100 loops, best of 3: 10 ms per loop

How to you copy elements from a 2D array into a 1D array using pointers C++

This is not modern-day C++, and obviously you should not be deleting your return value.

im trying to copy values from row row_num of the 2D array the_array
into the 1D array

First off, the_array in this context, being defined as a raw poiner and used for pointer arithmetic, is a 1D representation of a 2D array -- better be accurate in our description.

the error im getting [...] is : operand of '*' must be a pointer

Ok, so this line:

*row = *(*(the_array + row_num) + i);

is kind of a mess. I think I see where you were going with it, but look, you're dereferencing (using * to the left of an expression) twice for no reason, which BTW causes the error. Remember we said the_array is a raw pointer, being treated as a 1D array. Hence, you can use this type of pointer arithmetic, let's say inside the parenthesis, and then dereference it just once. Your expression above takes the resultant address of (the_array + row_num), dereferences it to get the double within this cell of the array, and then adds i to the double value itself and then tries to dereference this sum of a double and int i -- which is a temporary variable of type double, not much of a pointer. This here is probably the line you were aiming for:

*row = *(the_array + row_num * col_size + i);

Because the 2D data is spread contagiously in memory, row by row, consecutively, you need to multiply the row index like this by the row size (which is also the column count, and I take it that col_size is actually that, otherwise you have no way of traversing between rows) to "skip between the rows" and finally add the current cell index i to get to the specific cell inside the row. Then, like we said, dereferene the whole thing.

Beyond the compilation issue and address calculation, you should at least keep address of row before incrementing it in the loop so you could be returning it and not the pointer past-the-end of the row. In keeping as much as possible with your original function, you can do it using a second pointer like this:

double* get_row(double *the_array, int row_num, int col_size) {

double* row = new double[col_size];
double* rowPtr = row;

for (int i = 0; i < col_size; i++) {
*rowPtr = *(the_array + row_num * col_size + i);
rowPtr++;
}

return row;
}

I'm trying to keep this as closest as possible to your version, so you could see the difference from what you did. Many things are better done differently, for example why not use the [] operator in the first place like this row[i] = *(the_array + row_num * col_size + i); and then get rid of the second pointer altogether? No real need for doing ++ there as a second line in the loop if you're iterating over i as it is. Another example is did you have to allocate or could you just return a pointer to the existing array but in the right place?

Too important to not mention here is that in almost any scenario you really shouldn't be returning a raw pointer like this, as it could too easily lead to unclear intent in the code, as to who owns the allocation (and is responsible to delete it eventually) and raises the issue of how to prevent memory-leak in case of an exception? The preferred solution is to be using smart pointers, such as std::unique_ptr, an instance of which wraps the raw pointer and makes it so you're going to have to be very explicit if you'll want to just throw that piece of memory around in an unsafe way.



Related Topics



Leave a reply



Submit