Get Gnu Octave to Work with a Multicore Processor. (Multithreading)

Get GNU Octave to work with a multicore processor. (Multithreading)

Solution

Octave itself is a single-thread application that runs on one core. You can get octave to use some libraries like ATLAS which utilize multiple cores. So while Octave only uses one core, when you encounter a heavy operation, octave calls functions in ATLAS that utilize many CPU's.

I was able to do this. First compile 'ATLAS' from source code and make it available to your system so that octave can find it and use those library functions. ATLAS tunes itself to your system and number of cores. When you install octave from source and specify ATLAS, it uses it, so when octave does a heavy operation like a huge matrix multiplication, ATLAS decides how many cpu's to use.

I was unable to get this to work for Fedora, but on Gentoo I could get it to work.

I used these two links:
ftp://ftp.gnu.org/gnu/octave/

http://math-atlas.sourceforge.net/

I ran the following octave core before and after ATLAS install:

tic
bigMatrixA = rand(3000000,80);
bigMatrixB = rand(80,30);
bigMatrixC = bigMatrixA * bigMatrixB;
toc
disp("done");

The matrix multiplication goes much faster using multiple processors, which was 3 times faster than before with single core:

Without Atlas: Elapsed time is 3.22819 seconds.
With Atlas: Elapsed time is 0.529 seconds.

The three libraries I am using which speed things up are
blas-atlas,
cblas-atlas,
lapack-atlas.

If octave can use these instead of the default blas, and lapack libraries, then it will utilize multi core.

It is not easy and takes some programming skill to get octave to compile from source with ATLAS.

Drabacks to using Atlas:

This Atlas software uses a lot of overhead to split your octave program into multiple threads. Sure it goes much faster if all you are doing is huge matrix multiplications, but most commands can't be multi-threaded by atlas. If extracting every bit of processing power/speed out of your cores is top priority then you'll have much better luck just writing your program to be run in parallel with itself. (Split your program into 8 equivalent programs that work on 1/8th of the problem and run them all simultaneously, when all are done, reassemble the results).

Atlas helps a single threaded octave program behave a little bit more like a multi-threaded app but it is no silver bullet. Atlas won't make your single threaded Octave program max out your 2,4,6,8 core processor. You'll notice a performance boost, but the boost will leave you searching for a better way to use all the processor. The answer is writing your program to run in parallel with itself, and this takes a lot of programming skill.

Suggestion

Put your energy into vectorizing your heaviest operations and distributing the process over n simultaneous running threads. If you are waiting too long for a process to run, most likely the lowest hanging fruit to speed it up is using a more efficient algorithm or data structure.

Do any of Octaves minimization functions utilize multi-core/threaded processing?

This looks like the answer I actually needed. The minimizer may not run multithreaded, but the matrix operations in the function I'm minimizing can.

Get GNU Octave to work with a multicore processor. (Multithreading)

How to make ATLAS to be Avaliable for Octave?

One possible way of increasing the usage of CPU cores is vectorizing your Neural Network implementation.

More information about vectorization can be found in the following tutorial.

http://ufldl.stanford.edu/wiki/index.php/Vectorization

Scheme implementation with multicore implementation of SRFI-18?

Guile 2.0 has a SRFI 18 implementation that uses POSIX threads. (Guile 1.8 had POSIX threads, but no SRFI 18.)

Why is ATLAS using just 1 core with Octave? (Linux Mint 17.2)

Because that is how Atlas is designed / configured as the package! You could change that locally by installing Atlas from source, but that is non-trivial and risks missing the packaging integration.

If you want multi-core LAPACK/BLAS, install the corresponding openblas packages (which are an open source continuation of the older GOTO BLAS). Mint will have these too.

Scheduling and Synchronization in Multicore CPU and in Single core CPU

there is no need for scheduling as all 4 threads of my program will be running in individual cores

This is not true in practice. The OS scheduler operate in both cases. Unless you pin threads to core, threads can migrate from one core to another. In fact, even if you pin them, there are generally few other threads that can be ready on the machine (eg. ssh daemon, tty session, graphical programs, kernel threads, etc.) so the OS has to schedule them. There will be context-switches though the number will be much lower than with a single processor.

there maybe a need for synchronization since all 4 threads access the memory of the program( or a process) that is stored in the same space in the main memory.

This is true. Note that threads can also work on different memory area (so that there is no need for synchronization except when they are joined). Note also that "main memory" includes CPU caches here.

In a single core CPU computer. If I run the same program that creates 4 threads, I will need both synchronization and Scheduling since all threads must utilize the same Core ( or a microprocessor).

Overall, yes. That being said, the term "scheduling" is unclear. THere are multiple kind of scheduling: preemptive VS cooperative scheduling. Here, as a programmer, you do not need to do anything special since the scheduling is done by the OS. Thus, it is a bit unexpected to say that you "need" scheduling. The OS will schedule the threads on the same core using preemption (by allocating different time-slices for each threads on the same core).

MATLAB and using multiple cores to run calculations

Firstly, I would recommend re-running the bench command a few times to make sure MATLAB has fully loaded all the libraries etc. it needs. Much of MATLAB is loaded on demand, so it's always best to time the second or third run.

MATLAB automatically takes advantage of multiple cores when executing certain operations which are multithreaded. For example lots of elementwise operations such as +, .* and so on as well as BLAS-backed operations (and probably others). This page lists those things which are multithreaded.

Parallel Computing Toolbox is useful when MATLAB's intrinsic multithreading can't help (if it can, then it's usually the fastest way to do things). This gives you explicit parallelism via PARFOR, SPMD and distributed arrays.



Related Topics



Leave a reply



Submit