What Exactly Is the Point of Memoryview in Python

When should a memoryview be used?

A memoryview is essentially a generalized NumPy array structure in Python itself (without the math). It allows you to share memory between data-structures (things like PIL images, SQLlite data-bases, NumPy arrays, etc.) without first copying. This is very important for large data sets.

With it you can do things like memory-map to a very large file, slice a piece of that file and do calculations on that piece (easiest if you are using NumPy).

I still don't understand the point of memoryview

Your first two benchmarks essentially nibble off a single byte from the left until there is nothing left.

For the bytes example, this does N copies, for memoryview there is never a copy, just an adjustment of the length of the view

Your last example isn't at all similar, instead of nibbling off a single byte, you nibble off a progressively larger number of bytes (b[1:] b[2:] b[3:]) -- eventually the string is exhausted and you're slicing an empty string (more precisely when i * (i + 1) / 2 > n). For example with the 100,000 byte sequence, you're doing noops after 446 iterations.

Buffers and Memoryview Objects explained for the non-C programmer

Here's a line from a hash function I wrote:

M = tuple(buffer(M, i, Nb) for i in range(0, len(M), Nb))

This will split a long string, M, into shorter 'strings' of length Nb, where Nb is the number of bytes / characters I can handle at a time. It does this WITHOUT copying any parts of the string, as would happen if I made slices of the string like so:

M = tuple(M[i*Nb:i*Nb+Nb] for i in range(0, len(M), Nb))

I can now iterate over M just as I would had I sliced it:

H = key
for Mi in M:
    H = encrypt(H, Mi)

Basically, buffers and memoryviews are efficient ways to deal with the immutability of strings in Python, and the general copying behavior of slicing etc. A memoryview is just like a buffer, except you can also write to it, not just read.

While the main buffer / memoryview doc is about the C implementation, the standard types page has a bit of info under memoryview: http://docs.python.org/library/stdtypes.html#memoryview-type

Edit: Found this in my bookmarks, http://webcache.googleusercontent.com/search?q=cache:Ago7BXl1_qUJ:mattgattis.com/2010/3/9/python-memory-views+site:mattgattis.com+python&hl=en&client=firefox-a&gl=us&strip=1 is a REALLY good brief writeup.

Edit 2: Turns out I got that link from When should a memoryview be used? in the first place, that question was never answered in detail and the link was dead, so hopefully this helps.

Is the memoryview object used correctly in this snippet?

You may just use bytes from fr.read()

    with open('some_text.txt', 'rb') as f:
        b = f.read()
        print(b.__class__, id(b), len(b))
        data = memoryview(b)
        text = data.tobytes()
        print(text.__class__, id(text), len(text))

Possible output:

<class 'bytes'> 47642448 173227
<class 'bytes'> 47815728 173227

For CPython id() returns the addres of the object in memory. So data.tobytes() returns a copy in this case.

Consider to use the text mode

with open('some_text.txt', 'r') as f:

Underlying mechanism of Python's memoryview

It checks for that:

    if (dptr + size < sptr || sptr + size < dptr)
        memcpy(dptr, sptr, size); /* no overlapping */
    else
        memmove(dptr, sptr, size);

memmove is specified to be safe for overlapping source and destination. How it ensures safety varies from case to case and implementation to implementation, but one technique is to work from right to left instead of left to right if left to right would overwrite not-yet-copied data.

Cython typed memoryviews: what they really are?

What is a memoryview:

When you write in a function:

cdef double[:] a

you end up with a __Pyx_memviewslice object:

typedef struct {
  struct __pyx_memoryview_obj *memview;
  char *data;
  Py_ssize_t shape[8];
  Py_ssize_t strides[8];
  Py_ssize_t suboffsets[8];
} __Pyx_memviewslice;

The memoryview contains a C pointer some some data which it (usually) doesn't directly own. It also contains a pointer to an underlying Python object (struct __pyx_memoryview_obj *memview;). If the data is owned by a Python object then memview holds a reference to that and ensures the Python object that holds the data is kept alive as long as the memoryview is around.

The combination of the pointer to the raw data, and information of how to index it (shape, strides and suboffsets) allows Cython to do indexing the using the raw data pointers and some simple C maths (which is very efficient). e.g.:

x=a[0]

gives something like:

(*((double *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_2 * __pyx_v_a.strides[0]) )));

In contrast, if you work with untyped objects and write something like:

a = np.array([1,2,3]) # note no typedef
x = x[0]

the indexing is done as:

__Pyx_GetItemInt(__pyx_v_a, 0, long, 1, __Pyx_PyInt_From_long, 0, 0, 1);

which itself expands to a whole bunch of Python C-api calls (so is slow). Ultimately it calls a's __getitem__ method.

Compared to typed numpy arrays: there really isn't a huge difference.
If you do something like:

cdef np.ndarray[np.int32_t, ndim=1] new_arr

it works practically very like a memoryview, with access to raw pointers and the speed should be very similar.

The advantage to using memoryviews is that you can use a wider range of array types with them (such as the standard library array), so you're more flexible about the types your functions can be called with. This fits in with the general Python idea of "duck-typing" - that your code should work with any parameter that behaves the right way (rather than checking the type).

A second (small) advantage is that you don't need the numpy headers to build your module.

A third (possibly larger) advantage is that memoryviews can be initialised without the GIL while cdef np.ndarrays can't (http://docs.cython.org/src/userguide/memoryviews.html#comparison-to-the-old-buffer-support)

A slight disadvantage to memoryviews is that they seem to be slightly slower to set up.

Compared to just using malloced int pointers:

You won't get any speed advantage (but neither will you get too much speed loss). The minor advantages of converting using a memoryview are:

You can write functions that can be used either from Python or internally within Cython:

cpdef do_something_useful(double[:] x):
    # can be called from Python with any array type or from Cython
    # with something that's already a memoryview
    ....

You can let Cython handle the freeing of memory for this type of array, which could simplify your life for things that have an unknown lifetime. See http://docs.cython.org/src/userguide/memoryviews.html#cython-arrays and especially .callback_free_data.
You can pass your data back to python python code (it'll get the underlying __pyx_memoryview_obj or something similar). Be very careful of memory management here (i.e. see point 2!).
The other thing you can do is handle things like 2D arrays defined as pointer to pointer (e.g. double**). See http://docs.cython.org/src/userguide/memoryviews.html#specifying-more-general-memory-layouts. I generally don't like this type of array, but if you have existing C code that already uses if then you can interface with that (and pass it back to Python so your Python code can also use it).

python memoryview slower than expected

You can't think of Python like you would C or C++. The constant-factor overhead of an extra copy is far lower than the constant-factor overhead involved in supporting all of Python's dynamic features, especially with no JIT in CPython. You can't assume saving one copy is going to actually help once you take into account the other stuff you have to change to avoid that copy.

In this case, almost all of the work is in the list conversion. The copy you're saving is meaningless. Compare the timings for b[i:] and list(b[i:]), and you'll see the slicing is only a few percent of the runtime even when the slice performs a copy.

The copy you save doesn't matter because it's basically just a memcpy. In contrast, the list conversion needs to create an iterator over the bytestring or memoryview, call the iterator's tp_iternext slot repeatedly, obtain int objects corresponding to the raw bytes of memory, etc., which is way more expensive. It's even more expensive for the memoryview, because memoryview objects have to support multidimensional shapes and non-byte data types, and because the memoryview implementation doesn't have a dedicated __iter__ implementation, so it goes through the generic sequence-based fallback iteration, which is slower.

You can save some time by using the memoryview's tolist method instead of calling list. This skips a bunch of iteration protocol overhead and allows some checks to be done just once instead of once per item. In my tests, this is almost as fast as calling list on a bytestring.