Why Do Multithreading Program Is Slower Than Single Thread Program, Though They Read Separate Txt Files

Why do multithreading program is slower than single thread program, though they read separate txt files?

Python does not support real multi-threading, you always have the Global Interpreter Lock [ more about GIL ] which allows only for execution of single statement at a time. So there is only really one thread plus the added code for handling threads so it will be slower in most cases.

There can be some speed up in I/O operations but not always. Multi-threading module serves more for a different type of programming style than for example async programming (for which python also has a module link). If you would like to see real performance improvement you should use python multiprocessing module which does not suffer from GIL, however data exchange between two process is more complicated than using threads.

https://docs.python.org/3.7/library/multiprocessing.html

MultiThread runs slower than single process

There can be different factors:

  • Most important is avoiding disk access from multiple threads at the same time (but since you are on SSD, you might get away with that). On a normal harddisk however, switching from one file to another could cost you 10ms seek time (depending on how the data is cached).

  • 1000 threads is too much, try to use number of cores * 2. Too much time will be lost switching contexts only.

  • Try using a thread pool. Total times are between 110ms and 130ms, part of that will be from creating threads.

  • Do some more work in the test in general. Timing 110ms isn't always that accurate. Also depends on what other processes or threads are running at that time.

  • Try to switch the order of your tests to see if it makes a difference (caching could be an important factor)

    countLinesThread(num);
    countLinesOneProcess(num);

Also, depending on the system, currentTimeMillis() might have a resolution of 10 to 15ms. So it isn't very accurate to time short runs.

long start = System.currentTimeMillis();
long end = System.currentTimeMillis();

Why is my multi-threading slower than my single threading?

From the official Console documentation

I/O operations that use these streams are synchronized, which means
that multiple threads can read from, or write to, the streams. This
means that methods that are ordinarily asynchronous, such as
TextReader.ReadLineAsync, execute synchronously if the object
represents a console stream

This means that the console class handles the thread synchronization so if thread A and thread B are trying to write to the console the console will handle them and only one for time will be able to write. The handling logic behind it is the reason why it takes longer



UPDATE


I suggest you to have a look at Parallel.ForEach

Multi-threading program taking longer than single thread (Java)

A rule of thumbs says that you need one CPU core for the operating system, the others can be used for the program. So you need at least 5 CPU cores for optimal performance.

The overhead for creating these few threads does not really matter. That would become more relevant when you start dozens of threads withing milliseconds.

The main issue in your code is that you access data in a shared memory area for 90% of the total time. In this case we are talking about the ConcurrentLinkedQueue and the synchronized Monitor.keySet() method. While one thread access these objects, the other 3 threads must wait. When you run your program for a long time you might notice that only a fraction of your total CPU power is used.

To improve the performance I would recommend to split the job queue into 4 packets before you start the threads, so each thread can then process its own packet without waiting for other threads. Also each thread shall collect its result in an individual container. Then finally (after the threads are finished), you can combine the four results.

If your worker threads would be more complicated, your problem would be less hard. For example if the access to the containers would take only 10% of the overall time (while some calculation takes 90%) then the overhead of the thread synchronization would also be much less - relative to the total execution time.

Multi-threading technique taking longer than single thread C#

The extra performance had from using multiple threads is being displaced by the overhead of starting the thread to begin with. This implementation would be more useful on a more complex instruction.

This unfortunately executes too quickly to benefit from multi threading.

while (counter != convertInput)
{
if (counter == convertInput)
{
watch.Stop();
return;
}
counter++;
}

Threads writing each their own file is slower than writing all files sequentially

There are several issues:

  1. Due to Global Interpreter Lock (GIL), Python will not use more than one CPU core at a time for the data generation part, so your data generation won't be sped up by running multiple threads. You'll need multi processing to improve CPU bound operation.
  2. But that's not really the core of the problem here, because the GIL is released when you do I/O like writing to disk. The core of the problem is that you're writing to ten different places at a time, which most likely causes the harddisk head to thrash around as the hard disk head switches around between ten different places in the disk. Serial writes is almost always fastest in a hard disk.
  3. Even if you have CPU bound operation and use multiprocessing, using ten thread won't give you any significant advantage in data generation unless you actually have ten CPU cores. If you use more threads than the number of CPU cores, you'll pay the cost of thread switching, but you'll never speed up the total runtime of a CPU bound operation.

If you use more threads than available CPU, the total run time always increases or at most stay the same. The only reason to use more threads than CPU cores is if you are consuming the result of the threads interactively or in a pipeline with other systems. There are edge cases where you can speed up a poorly designed, I/O bound program by using threads. But a well designed single thread program will most likely perform just as well or better.

PHP Pthread Multithread code slower than single thread

Your code is just incredibly inefficient. There are also a number of problems with it - I've made a quick breakdown of some of these things below.

Firstly, you are spinning up over 500 threads (9 * 56 = 504). This is going to be very slow because threading in PHP requires a shared-nothing architecture. This means that a new instance of PHP's interpreter will need to be created for each thread you create, where all classes, interfaces, traits, functions, etc, will need to be copied over to the new interpreter instance.

Perhaps more to the point, though, is that your 3 nested for loops are performing 54 million iterations (90 * 200 * 3000). Multiply this by the 504 threads being created, and you can soon see why things are becoming sluggish. Instead, use a thread pool (see pthreads' Pool class) with a more modest amount of threads (try 8, and go from there), and cut down on the iterations being performed per thread.

Secondly, you are opening up a file 90 times per thread (so a total of 90 * 504 = 45360). You only need one file handler per thread.

Thirdly, utilising actual PHP arrays inside of Threaded objects makes them read-only. So with respect to the $this->c2_result property, the code inside of your nested while loop should not even work. Not to mention that the following check does not look for duplicates:

if(count(array_flip($this->c2_result)) != count($this->c2_result))

If you avoid casting the $this->c2_result property to an array (therefore making it a Volatile object), then the following code could instead replace your while loop:

$keys = array_rand($this->linesId, 7);
for ($i = 0; $i < 7; ++$i) {
$this->c2_result[$this->linesId[$keys[$i]]] = true;
}

By setting the values as the keys in $this->c2_result we can remove the subsequent in_array function call to search through the $this->c2_result. This is done by utilising a PHP array as a hash table, where the lookup time for a key is constant time (O(1)), rather than linear time required when searching for values (with in_array). This enables us to replace the following slow check:

if(!in_array($this->linesId2[$this->traceId],$this->c2_result))

with the following fast check:

if (!isset($this->c2_result[$this->linesId2[$this->traceId]]))

But with that said, you don't seem to be using the $this->c2_result property anywhere else. So (assuming you haven't purposefully redacted code that uses it), you could remove it altogether and simply replace the while loop at check after it with the following:

$found = false;

foreach (array_rand($this->linesId, 7) as $key) {
if ($this->linesId[$key] === $this->linesId2[$this->traceId]) {
$found = true;
break;
}
}

if (!$found) {
++$b;
}

Beyond the above, you could also look at storing the data you're collecting in-memory (as some property on the Threaded object), to prevent expensive disk writes. The results could be aggregated at the end, before shutting down the pool.

Update based up your update

You've said that the rand function is causing major slowdown. Whilst it may be part of the problem, I believe it is actually all of the code inside of your third nested for loop. The code inside there is very hot code, because it gets executed 54 million times. I suggested above that you replace the following code:

$zex=0;

while($zex != 1) {
$c2_result[0]=$lines[rand(0,324631)];
$c2_result[1]=$lines[rand(0,324631)];
$c2_result[2]=$lines[rand(0,324631)];
$c2_result[3]=$lines[rand(0,324631)];
$c2_result[4]=$lines[rand(0,324631)];
$c2_result[5]=$lines[rand(0,324631)];
$c2_result[6]=$lines[rand(0,324631)];

$myArray = (array) $c2_result;
$myArray2 = (array) $c2_result;
$myArray=array_flip($myArray);

if(count($myArray) != count($c2_result)) {//echo "duplicates\n";
$zex=0;
} else {//echo "no duplicates\n";
$zex=1;
//exit;
}
}

if(!in_array($lines2[$this->traceId],$myArray2)) {
$b++;
}

with a combination of array_rand and foreach. Upon some initial tests, it turns out that array_rand really is outstandingly slow. But my hash table solution to replace the in_array invocation still holds true. By leveraging a PHP array as a hash table (basically, store values as keys), we get a constant time lookup performance (O(1)), as opposed to a linear time lookup (O(n)).

Try replacing the above code with the following:

$myArray = [];

$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;
$myArray[rand(0,324631)] = true;

while (count($myArray) !== 7) {
$myArray[rand(0,324631)] = true;
}

if (!isset($myArray[$lines2[$this->traceId]])) {
$b++;
}

For me, this resulted in a 120% speedup.

As for further performance, you can (as mentioned above, again) store the results in-memory (as a simple property) and perform a write of all results at the end of the run method.

Also, the garbage collector for pthreads is not deterministic. It should therefore not be used to retrieve data. Instead, a Threaded object should be injected into the worker thread, where data to be collected should be saved to this object. Lastly, you should shutdown the pool after garbage collection (which, again, should not be used in your case).



Related Topics



Leave a reply



Submit