Good C++ Array Class for Dealing with Large Arrays of Data in a Fast and Memory Efficient Way

Good C++ array class for dealing with large arrays of data in a fast and memory efficient way?

Have you tried using an std::deque? Unlike a std::vector, which uses one huge heap allocation, deque usually allocates in small chunks, but still provides amortised constant time indexing via operator[].

Arithmetic operation on very large static arrays in C/C++

Local variables will always be on the stack, no matter optimization flags. And that array will be around 7 gigabyte! Way larger than any possible stack.

The size may also be a reason it doesn't start, as if you put it as a global/static variable then you need to have more than 7 GB or virtual memory free and contiguous to be able to even load the program.

Efficient Array Reallocation in C++

Based on a previous question, the approach that I took for handling large arrays that could grow and shrink with reasonable efficiency was to write a container similar to a deque that broke the array down into multiple pages of smaller arrays. So for example, say we have an array of n elements, we select a page size p, and create 1 + n/p arrays (pages) of p elements. When we want to re-allocate and grow, we simply leave the existing pages where they are, and allocate the new pages. When we want to shrink, we free the totally empty pages.

The downside is the array access is slightly slower, in that given and index i, you need the page = i / p, and the offset into the page i % p, to get the element. I find this is still very fast however and provides a good solution. Theoretically, std::deque should do something very similar, but for the cases I tried with large arrays it was very slow. See comments and notes on the linked question for more details.

There is also a memory inefficiency in that given n elements, we are always holding p - n % p elements in reserve. i.e. we only ever allocate or deallocate complete pages. This was the best solution I could come up with in the context of large arrays with the requirement for re-sizing and fast access, while I don't doubt there are better solutions I'd love to see them.

What are all the possible and memory efficient ways to store multiple images of fixed size and data types in Matlab?

cellArray is second best as it only needs pointers for each array of 8 Bytes extra (ie, 100*8 bytes more).

This is not true. Each array has a "header" (a block of memory that specifies its type, size, etc.) The header in R2017a is 104 bytes (I think it's a little larger in the latest release). A cell array holds arrays, so the difference you see in your test with the 3D array:

26225600 - 26214400 = 11200

is

100 * (104 + 8) = 11200

The cell array is an array of pointers (8 bytes each) to arrays (104 bytes + whatever their data is).

For an image, which is a fairly large block of data, this 112 byte overhead is negligible. Other considerations, such as speed of access, become more important.

In MATLAB, two arrays can point to the same data. So doing something like

I = C{4};

doesn't create a copy of the array at C{4}, instead the array I references it. But if you use a 3D array, then:

I = A(:,:,4);

does make a copy because I cannot reference a subset of another array, it must reference the whole thing.

Thus, using a 3D array, processing individual images requires a lot of copying back and forth of pixel data, which would not be necessary in a cell array.

A struct array is not a relevant data structure here, it would be equivalent to the cell array, except the indexing is more complicated (I don't know if this translates to runtime increase or not). That is, S(4).Image is more involved than C{4}. However, if you want to store additional information for each image, a struct array could be useful.

As you noticed, the struct array is only 64 bytes larger than the cell array. This stores the field name Image. Again, not really worth while worrying about this amount of memory.

Here is a short summary of other ways to handle data in MATLAB, none of which seem reasonable to me:

  • Custom object type: here you are still dealing with normal arrays underneath, so there is no advantage or disadvantage here. These types are nice if you want to add methods specific to your images, but don't change the way memory is handled. They do seem to add some time overhead.

  • Use tall arrays, suitable for very large data that doesn't fit in memory, but I don't think anybody would consider doing image analysis with such an array.

  • Use memory-mapped files, useful to speed up file access, but doesn't really help in this case.

  • Talk to Java or Python from MATLAB, and have them do the memory handling. But then you might as well skip MATLAB altogether and go to a different environment.

So I really think that the two meaningful options for handling multiple images are either a cell array (or other heterogeneous container such as a struct or custom object), or a 3D array. I would not consider anything else.

In summary: Use a cell array.

Using arrays or std::vectors in C++, what's the performance gap?

Using C++ arrays with new (that is, using dynamic arrays) should be avoided. There is the problem that you have to keep track of the size, and you need to delete them manually and do all sorts of housekeeping.

Using arrays on the stack is also discouraged because you don't have range checking, and passing the array around will lose any information about its size (array to pointer conversion). You should use std::array in that case, which wraps a C++ array in a small class and provides a size function and iterators to iterate over it.

Now, std::vector vs. native C++ arrays (taken from the internet):

// Comparison of assembly code generated for basic indexing, dereferencing, 
// and increment operations on vectors and arrays/pointers.

// Assembly code was generated by gcc 4.1.0 invoked with g++ -O3 -S on a
// x86_64-suse-linux machine.

#include <vector>

struct S
{
int padding;

std::vector<int> v;
int * p;
std::vector<int>::iterator i;
};

int pointer_index (S & s) { return s.p[3]; }
// movq 32(%rdi), %rax
// movl 12(%rax), %eax
// ret

int vector_index (S & s) { return s.v[3]; }
// movq 8(%rdi), %rax
// movl 12(%rax), %eax
// ret

// Conclusion: Indexing a vector is the same damn thing as indexing a pointer.

int pointer_deref (S & s) { return *s.p; }
// movq 32(%rdi), %rax
// movl (%rax), %eax
// ret

int iterator_deref (S & s) { return *s.i; }
// movq 40(%rdi), %rax
// movl (%rax), %eax
// ret

// Conclusion: Dereferencing a vector iterator is the same damn thing
// as dereferencing a pointer.

void pointer_increment (S & s) { ++s.p; }
// addq $4, 32(%rdi)
// ret

void iterator_increment (S & s) { ++s.i; }
// addq $4, 40(%rdi)
// ret

// Conclusion: Incrementing a vector iterator is the same damn thing as
// incrementing a pointer.

Note: If you allocate arrays with new and allocate non-class objects (like plain int) or classes without a user defined constructor and you don't want to have your elements initialized initially, using new-allocated arrays can have performance advantages because std::vector initializes all elements to default values (0 for int, for example) on construction (credits to @bernie for reminding me).

Container to present multiple memory chunks as single continuous one

The memory is provided to you, you say. That sounds like you don't want to copy it. No problem, the STL philosophy is quite flexible. You don't actually need a container; they're just there for memory management and that's already taken care of.

What you do need is an iterator. There's no standard one; you'll have to write one yourself. There are just too many slight variations to provide a standard solution for this. But don't worry, it's fairly easy. You get the necessary typedefs if you inherit from std::iterator<value_type>, so you need to only write operator* (straightforward) and operator++/operator--/operator+/operator- (which understand the chunks).

Performance of Arrays vs. Lists

Very easy to measure...

In a small number of tight-loop processing code where I know the length is fixed I use arrays for that extra tiny bit of micro-optimisation; arrays can be marginally faster if you use the indexer / for form - but IIRC believe it depends on the type of data in the array. But unless you need to micro-optimise, keep it simple and use List<T> etc.

Of course, this only applies if you are reading all of the data; a dictionary would be quicker for key-based lookups.

Here's my results using "int" (the second number is a checksum to verify they all did the same work):

(edited to fix bug)

List/for: 1971ms (589725196)
Array/for: 1864ms (589725196)
List/foreach: 3054ms (589725196)
Array/foreach: 1860ms (589725196)

based on the test rig:

using System;
using System.Collections.Generic;
using System.Diagnostics;
static class Program
{
static void Main()
{
List<int> list = new List<int>(6000000);
Random rand = new Random(12345);
for (int i = 0; i < 6000000; i++)
{
list.Add(rand.Next(5000));
}
int[] arr = list.ToArray();

int chk = 0;
Stopwatch watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 100; rpt++)
{
int len = list.Count;
for (int i = 0; i < len; i++)
{
chk += list[i];
}
}
watch.Stop();
Console.WriteLine("List/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

chk = 0;
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 100; rpt++)
{
for (int i = 0; i < arr.Length; i++)
{
chk += arr[i];
}
}
watch.Stop();
Console.WriteLine("Array/for: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

chk = 0;
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 100; rpt++)
{
foreach (int i in list)
{
chk += i;
}
}
watch.Stop();
Console.WriteLine("List/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

chk = 0;
watch = Stopwatch.StartNew();
for (int rpt = 0; rpt < 100; rpt++)
{
foreach (int i in arr)
{
chk += i;
}
}
watch.Stop();
Console.WriteLine("Array/foreach: {0}ms ({1})", watch.ElapsedMilliseconds, chk);

Console.ReadLine();
}
}

Unusual heap size limitations in VS2003 C++

May it be the cast that the debugger is playing a trick on you in release-mode? Neither single stepping nor the values of variables are reliable in release-mode.

I tried your example in VS2003 in release mode, and when single stepping it does at first look like the code is landing on the return NULL line, but when I continue stepping it eventually continues into HeapAlloc, I would guess that it's this function that's failing, looking at the disassembly if (size > _HEAP_MAXREQ) reveals the following:

00401078  cmp         dword ptr [esp+4],0FFFFFFE0h 

so I don't think it's a problem with _HEAP_MAXREQ.

Is a char array more efficient than a char pointer in C?

When discussing performance in general, allocation, access time and copy time separate things. You seem mostly concerned about allocation.

But there are lots of misconceptions here. Arrays are used for storing. Pointers are used to point at things stored elsewhere. You cannot store any data in a pointer, you can only store an address to data allocated elsewhere.

So comparing pointers or arrays is pretty much nonsense, because they are separate things. Similar to "should I live in my house at a street address or should I live in the sign stating my street address".

I understand that using char* gives a pointer to the first character in str1

No, it gives a pointer to a single character which is allocated somewhere else. Though it doesn't point anywhere meaningful until you assign an address to it. In case of arrays, it will typically get set to point at the first character of the array.

I recall that in some encodings it can be more

No, a character is per definition always 1 byte. Some exotic systems might have 16 bits per bytes or such though. This is of no concern unless you program exotic DSPs and the like. As for other character encodings, there's wchar_t which is a different topic entirely.

whereas in the case of char* we're really just saying allocate a pointer to a single byte

No, we tell it to allocate room for the pointer itself. Which is typically of a size between 2 to 8 bytes depending on address bus width of the specific system.

But what happens if the string we assign to str1 is more than a single byte, how is that allocated?

However you like. You can assign it to a read-only string literal, or a static storage duration variable, or a local automatic storage variable, or dynamically allocated variables. The pointer itself doesn't know or care.

How much more work is needed to appropriately allocate that?

It depends on what you want to allocate.

Because of the uncertainty from the compiler's point of view when dealing with char pointers

What uncertainty is that? Pointers are pointers and the compiler don't treat them much differently than other variables.

is it more efficient to use a char array when I either know the length ahead of time or want to limit the length to start with?

You need to use an array, because data cannot be stored in thin air. Again, data cannot be stored "in pointers".



Related Topics



Leave a reply



Submit