Passing Numpy Arrays to a C Function for Input and Output

Passing Numpy arrays to a C function for input and output

Just pass all four arguments to the C function. Change your Python code from:

fun(ctypes.c_void_p(indata.ctypes.data), ctypes.c_void_p(outdata.ctypes.data))

To:

fun(ctypes.c_void_p(indata.ctypes.data), ctypes.c_int(5), ctypes.c_int(6),
    ctypes.c_void_p(outdata.ctypes.data))

Passing a set of NumPy arrays into C function for input and output

To do this specifically with Numpy arrays, you could use:

import numpy as np
import ctypes

count = 5
size = 1000

#create some arrays
arrays = [np.arange(size,dtype="float32") for ii in range(count)] 

#get ctypes handles
ctypes_arrays = [np.ctypeslib.as_ctypes(array) for array in arrays]

#Pack into pointer array
pointer_ar = (ctypes.POINTER(C.c_float) * count)(*ctypes_arrays)

ctypes.CDLL("./libfoo.so").foo(ctypes.c_int(count), pointer_ar, ctypes.c_int(size))

Where the C side of things might look like:

# function to multiply all arrays by 2
void foo(int count, float** array, int size)
{
   int ii,jj;
   for (ii=0;ii<count;ii++){
      for (jj=0;jj<size;jj++)
         array[ii][jj] *= 2;    
   }

}

How to pass this numpy array to C with Ctypes?

You need to return the dynamically allocated memory, e.g. change your C code to something like:

#include <math.h>
#include <stdlib.h>
#include <stdio.h>

double tan1(double f) {
    return sin(f)/cos(f);
}

double *loop(double *arr, int n) {
    double *b = malloc(n * sizeof(double));
    for(int i = 0; i < n; i++) {
        b[i] = tan(arr[i]);
    }
    return b;
}

void freeArray(double *b) {
    free(b);
}

On the Python side you have to declare parameter and return types. As mentioned by others in comments, you should also free dynamically allocated memory. Note that on the C side, arrays always decay into pointers. Therefore, you need an additional parameter which tells you the number of elements in the array.

Also if you return a pointer to double to the Python page, you must specify the size of the array. With np.frombuffer you can work with the data without making a copy of it.

import numpy as np
from ctypes import *

testlib = ctypes.CDLL('./testlib.so')

n = 500
dtype = np.float64
input_array = np.array(np.linspace(0, 4 * np.pi, n), dtype=dtype)
input_ptr = input_array.ctypes.data_as(POINTER(c_double))

testlib.loop.argtypes = (POINTER(c_double), c_int)
testlib.loop.restype = POINTER(c_double * n)
testlib.freeArray.argtypes = POINTER(c_double * n),

result_ptr = testlib.loop(input_ptr, n)
result_array = np.frombuffer(result_ptr.contents)

# ...do some processing
for value in result_array:
    print(value)

# free buffer
testlib.freeArray(result_ptr)

Pass a 2d numpy array to c using ctypes

This is probably a late answer, but I finally got it working. All credit goes to Sturla Molden at this link.

The key is, note that double** is an array of type np.uintp. Therefore, we have

xpp = (x.ctypes.data + np.arange(x.shape[0]) * x.strides[0]).astype(np.uintp)
doublepp = np.ctypeslib.ndpointer(dtype=np.uintp)

And then use doublepp as the type, pass xpp in. See full code attached.

The C code:

// dummy.c 
#include <stdlib.h> 

__declspec(dllexport) void foobar(const int m, const int n, const 
double **x, double **y) 
{ 
    size_t i, j; 
    for(i=0; i<m; i++) 
        for(j=0; j<n; j++) 
            y[i][j] = x[i][j]; 
}

The Python code:

# test.py 
import numpy as np 
from numpy.ctypeslib import ndpointer 
import ctypes 

_doublepp = ndpointer(dtype=np.uintp, ndim=1, flags='C') 

_dll = ctypes.CDLL('dummy.dll') 

_foobar = _dll.foobar 
_foobar.argtypes = [ctypes.c_int, ctypes.c_int, _doublepp, _doublepp] 
_foobar.restype = None 

def foobar(x): 
    y = np.zeros_like(x) 
    xpp = (x.__array_interface__['data'][0] 
      + np.arange(x.shape[0])*x.strides[0]).astype(np.uintp) 
    ypp = (y.__array_interface__['data'][0] 
      + np.arange(y.shape[0])*y.strides[0]).astype(np.uintp) 
    m = ctypes.c_int(x.shape[0]) 
    n = ctypes.c_int(x.shape[1]) 
    _foobar(m, n, xpp, ypp) 
    return y 

if __name__ == '__main__': 
    x = np.arange(9.).reshape((3, 3)) 
    y = foobar(x)

Hope it helps,

Shawn

Passing numpy ndarray as keras input

As discussed in the comments, it would be best if you created your arrays directly instead of having a DataFrame in the middle.

The problem is that even if X is a numpy array, it contains other arrays because Pandas returns an array for each row and each cell. An example:

import pandas as pd
import numpy as np

df = pd.DataFrame({'RF':[np.array([1, 2, 3, 4, 5]), np.array([6, 2, 3, 4, 5])], 'DL': [np.array([1, 2, 3, 4, 5]), np.array([7, 2, 3, 4, 5])]})

print(df)

                RF               DL
0  [1, 2, 3, 4, 5]  [1, 2, 3, 4, 5]
1  [6, 2, 3, 4, 5]  [7, 2, 3, 4, 5]

X = df[['RF', 'DL']].to_numpy()

print(X)

[[array([1, 2, 3, 4, 5]) array([1, 2, 3, 4, 5])]
 [array([6, 2, 3, 4, 5]) array([7, 2, 3, 4, 5])]]

You can reshape to try to fix the problem. This should work:

# You have to reshape each cell in each row.
X = np.array([np.reshape(x, (1, len(X[0][0]))) 
              for i in range(len(X)) 
              for x in X[i]]).reshape(-1, 2*len(X[0][0])).astype(np.float32)

print(X)
print(X.shape)

[[1. 2. 3. 4. 5. 1. 2. 3. 4. 5.]
 [6. 2. 3. 4. 5. 7. 2. 3. 4. 5.]]
(2, 10)

Obviously, this assumes the shapes of the arrays in both the 'RF' and 'DL' columns are the same, which I believe is true because they are the output of predict_proba.

Passing 3-dimensional numpy array to C

I already mentioned this in a comment, but I hope flushing it out a little helps make it more clear.

When you're working with numpy arrays in C it's good to be explicit about the typing of your arrays. Specifically it looks like you're declaring your pointers as double ***list3, but they way you're creating l3 in your python code you'll get an array with dtype npy_intp (I think). You can fix this by explicitly using the dtype when creating your arrays.

import cmod, numpy
l2 = numpy.array([[1.0,2.0,3.0],
                  [4.0,5.0,6.0],
                  [7.0,8.0,9.0],
                  [3.0, 5.0, 0.0]], dtype="double")

l3 = numpy.array([[[2,7, 1, 11], [6, 3, 9, 12]],
                  [[1, 10, 13, 15], [4, 2, 6, 2]]], dtype="double")

cmod.func(l2, l3)

Another note, because of the way python works it's nearly impossible for "line A" and "line B" to have any effect on the C code what so ever. I know that this seems to conflict with your empirical experience, but I'm pretty sure on this point.

I'm a little less sure about this, but based on my experience with C, bus-errors and segfaults are not deterministic. They depend on memory allocation, alignment, and addresses. In some situation code seems to run fine 10 times, and fails on the 11th run even though nothing has changed.

Have you considered using cython? I know it's not an option for everyone, but if it is an option you could get nearly C level speedups using typed memoryviews.

Numpy passing input array as `out` argument to ufunc

This is an old question, but there is an updated answer:

Yes, it is safe. In the Numpy documentation, we see that as of v1.13:

Operations where ufunc input and output operands have memory overlap are defined to be the same as for equivalent operations where there is no memory overlap. Operations affected make temporary copies as needed to eliminate data dependency. As detecting these cases is computationally expensive, a heuristic is used, which may in rare cases result in needless temporary copies. For operations where the data dependency is simple enough for the heuristic to analyze, temporary copies will not be made even if the arrays overlap, if it can be deduced copies are not necessary. As an example, np.add(a, b, out=a) will not involve copies.

Operate on Numpy array from C extension without memory copy

Cython doesn't create new copies of numpy arrays unless you specifically request it to do so using numpy functions, so it is as efficient as it can be when dealing with numpy arrays, see Working with NumPy

choosing between writing raw C module and using cython depends on the purpose of the module written.
if you are writing a module that will only be used by python to do a very small specific task with numpy arrays as fast as possible, then by all means do use cython, as it will automate registering the module correctly as well as handle the memory and prevent common mistakes that people do when writing C code (like memory management problems), as well as automate the compiler includes and allow an overall easier access to complicated functionality (like using numpy iterators).

however if your module is going to be used in other languages and has to be run independently from python and has to be used with python without any overhead, and implements some complex C data structures and requires a lot of C functionality then by all means create your own C extension (or even a dll), and you can pass pointers to numpy arrays from python (using numpy.ctypeslib.as_ctypes_type), or pass the python object itself and return it (but you must make a .pyd/so instead of dll), or even create numpy array on C side and have it managed by python (but you will have to understand the numpy C API).

Passing Numpy Arrays to a C Function for Input and Output