Filling an Array Concurrently

Filling an array concurrently

There are pointers inside an Array that absolutely can change under your feet. It is not raw memory.

Arrays are not thread-safe. Arrays are value types, which means that they support copy-on-write in a thread-safe way (so you can freely pass an array to another thread, and if it is copied there, that is ok), but you can't mutate the same array on multiple threads. An Array is not a C buffer. It's not promised to have contiguous memory. It's not even promised to allocate memory at all. Array could choose internally to store "I'm currently all zeros" as a special state and just return 0 for every subscript. (It doesn't, but it's allowed to.)

For this specific problem, you'd typically use vDSP methods like vDSP_vramp, but I understand this is just an example, and there may not be a vDSP method that solves the problem. Typically, though, I'd still focus on Accelerate/SIMD methods rather than dispatching to queues.

But if you are going to dispatch to queues, you'll need an UnsafeMutableBuffer to take control of the memory (and ensure that the memory even exists):

pixels.withUnsafeMutableBufferPointer { pixelsPtr in
DispatchQueue.concurrentPerform(iterations: threadCount) { thread in
for number in thread*size ..< (thread+1)*size {
let floating = Float(number)
pixelsPtr[number] = SIMD3(floating, floating, floating)
}
}
}

The "Unsafe" indicates that it's now your problem to make sure all the accesses are legal and that you're not creating race conditions.

Note the use of .concurrentPerform here. As @user3441734 reminds us, pixelsPtr is not promised to be valid once .withUnsafeMutablePointer completes. .concurrentPerform is guaranteed not to return until all blocks are complete, so the pointer is guaranteed to be valid.

This could be done with DispatchGroup as well, but the .wait would need to be inside of withUnsafeMutableBufferPointer.

How to let different threads fill an array together?

Since you already know how many elements your are going to work with and never change the size of the vector, the easiest solution is to let each thread work on it's own part of the vector. For example

Update

to accomodate for vastly varying calculation times, you should keep your current code, but avoid race conditions via a std::lock_guard. You will need a std::mutex that is the same for all threads, for example a global variable, or pass a reference of the mutex to each thread.

void fill(int RandSeed, std::mutex &nextItemMutex)
{
Simulator sim{RandSeed};
size_t workingIndex;
while(true)
{
{
// enter critical area
std::lock_guard<std::mutex> nextItemLock(nextItemMutex);

// Acquire next item
if(LastAdded < Max)
{
workingIndex = LastAdded;
LastAdded++;
}
else
{
break;
}
// lock is released when nextItemLock goes out of scope
}

// Do some work to bring foo to the desired state
// The duration of this work is subject to randomness
vec[workingIndex] = sim.GetResult();//Produces SimResult.
}
}

Problem with this is, that snychronisation is quite expensive. But it's probably not that expensive in comparison to the simulation you run, so it shouldn't be too bad.

Version 2:

To reduce the amount of synchronisation that is required, you could acquire blocks to work on, instead of single items:

void fill(int RandSeed, std::mutex &nextItemMutex, size_t blockSize)
{
Simulator sim{RandSeed};
size_t workingIndex;
while(true)
{
{
std::lock_guard<std::mutex> nextItemLock(nextItemMutex);

if(LastAdded < Max)
{
workingIndex = LastAdded;
LastAdded += blockSize;
}
else
{
break;
}
}

for(size_t i = workingIndex; i < workingIndex + blockSize && i < MAX; i++)
vec[i] = sim.GetResult();//Produces SimResult.
}
}

Simple Version

void fill(int RandSeed, size_t partitionStart, size_t partitionEnd)
{
Simulator sim{RandSeed};
for(size_t i = partitionStart; i < partitionEnd; i++)
{
// Do some work to bring foo to the desired state
// The duration of this work is subject to randomness
vec[i] = sim.GetResult();//Produces SimResult.
}
}
main()
{
//launch a bunch of std::async that start
auto fut1 = std::async(fill,1, 0, Max / 2);
auto fut2 = std::async(fill,2, Max / 2, Max);

// ...
}

How to fill numpy arrays simultaneously using multiple indexes in python

This should work, if I understood the structure of mylist correctly:

>>> idcs, vals = np.hstack(mylist)
>>> vals[idcs.argsort()]
array([0.2, 0.1, 0.7, 0.4, 0.7, 0.7])

Edit: As Paul Panzer points out in the comments, the sorting operation is unnecessary. If you're not working with big data sets I doubt you will see a difference, but here is another method that should be linear time:

>>> idcs, vals = np.hstack(mylist)
>>> out = np.zeros(len(idcs))
>>> out[idcs.astype(int)] = vals
>>> out
array([0.2, 0.1, 0.7, 0.4, 0.7, 0.7])

Though I don't like it as much because of the type conversion.

Edit: Another one, without type conversion:

>>> idcs, vals = map(np.hstack, zip(*mylist))
>>> out = np.zeros(len(idcs))
>>> out[idcs] = vals
>>> out
array([0.2, 0.1, 0.7, 0.4, 0.7, 0.7])

How to fill an array in multiple threads?

In modern c#, you should almost never have to use Thread objects themselves-- they are fraught with peril, and there are other language features that will do the job just as well (see async and TPL). I'll show you a way to do it with TPL.

Note: Due to the problem of false sharing, you need to rig things so that the different threads are working on different memory areas. Otherwise you will see no gain in performance-- indeed, performance could get considerably worse. In this example I divide the array into blocks of 4,000 bytes (1,000 elements) each and work on each block in a separate thread.

using System.Threading.Tasks;

var array = new int[10000];
var offsets = Enumerable.Range(0, 10).Select( x => x * 1000 );
Parallel.ForEach( offsets, offset => {
for ( int i=0; i<1000; i++ )
{
array[offset + i] = random.Next( -100,100 );
}
});

That all being said, I doubt you'll see much of a gain in performance in this example-- the array is much too small to be worth the additional overhead.

Is it possible to fill two arrays simultaneously?

From the docs for PHImageManager.requestImageForAsset:

By default, this method executes asynchronously. If you call it from a background thread you may change the synchronous property of the options parameter to YES to block the calling thread until either the requested image is ready or an error occurs, at which time Photos calls your result handler.

The issue is that you're invoking the requestImageForAsset in the order you want them, but they are completing asynchronously, which may populate the array in different orders depending on which size completes first.

If you want them in the same order, I'd recommend changing croppedImagesArray and originalImagesArray to dictionaries keyed by something that you can share between them. For example, if the actual order of the assets doesn't matter, as long as they are the same between the two collections, you could key by the asset's localIdentifier, then sort the dictionaries by that identifier when you enumerate the collections. If the actual order matters, pass some other sequence key (perhaps the current timestamp) and share it between the two completion handlers.

Fill array with multiple threads in C

I suspect the main bottleneck would the calls to rand(). rand() isn't required to be thread-safe. So, it can't be safely used in a multi-threaded program when multiple threads could call rand() concurrently. But the Glibc implementation uses an internal lock to protect against such uses. This effectively serializes the call to rand() in all threads and thus severely affecting the multi-threaded nature of your program. Instead use rand_r() which doesn't need to maintain any internal state (because the caller(s) do) and can at least solve this aspect of your problem.

In general, if the threads don't do sufficient work then the thread creation/synchronization overhead can outdo the concurrency that could be available on multi-core systems using threads.



Related Topics



Leave a reply



Submit