Why is there a difference using std::thread::hardware_concurrency() and boost::thread::hardware_concurrency()?
After reviewing /usr/include/c++/4.6.2/thread
it can be seen that the implementation is actually:
// Returns a value that hints at the number of hardware thread contexts.
static unsigned int
hardware_concurrency()
{ return 0; }
So problem solved. It's just another feature that hasn't been implemented in gcc 4.6.2
Handle std::thread::hardware_concurrency()
There is another way than using the GCC Common Predefined Macros: Check if std::thread::hardware_concurrency()
returns zero meaning the feature is not (yet) implemented.
unsigned int hardware_concurrency()
{
unsigned int cores = std::thread::hardware_concurrency();
return cores ? cores : my_hardware_concurrency();
}
You may be inspired by awgn's source code (GPL v2 licensed) to implement my_hardware_concurrency()
auto my_hardware_concurrency()
{
std::ifstream cpuinfo("/proc/cpuinfo");
return std::count(std::istream_iterator<std::string>(cpuinfo),
std::istream_iterator<std::string>(),
std::string("processor"));
}
std::thread::hardware_concurrency() does not return the correct number of Logical Processors in AMD Ryzen threadripper 3990x
The idea of std::thread::hardware_concurrency
is to tell you what kind of concurrency std::thread
s can experience. Since std::thread
can only put threads into your default processor group, you will get the number of logical processors in your default processor group. This is not going to be greater than 64 on Windows unless you go to extreme measures.
"[A] system with 128 logical processors would have two processor groups with 64 processors in each group[.]"
"An application that requires the use of multiple groups so that it can run on more than 64 processors must explicitly determine where to run its threads and is responsible for setting the threads' processor affinities to the desired groups."
POSIX equivalent of boost::thread::hardware_concurrency
It uses C-compatible constructs, so why not just use the actual code? [libs/thread/src/*/thread.cpp]
using pthread library:
unsigned thread::hardware_concurrency()
{
#if defined(PTW32_VERSION) || defined(__hpux)
return pthread_num_processors_np();
#elif defined(__APPLE__) || defined(__FreeBSD__)
int count;
size_t size=sizeof(count);
return sysctlbyname("hw.ncpu",&count,&size,NULL,0)?0:count;
#elif defined(BOOST_HAS_UNISTD_H) && defined(_SC_NPROCESSORS_ONLN)
int const count=sysconf(_SC_NPROCESSORS_ONLN);
return (count>0)?count:0;
#elif defined(_GNU_SOURCE)
return get_nprocs();
#else
return 0;
#endif
}
in windows:
unsigned thread::hardware_concurrency()
{
SYSTEM_INFO info={{0}};
GetSystemInfo(&info);
return info.dwNumberOfProcessors;
}
thread::hardware_concurrency() as template parameter
There is no static way to get the number of logical CPUs on a system. This number can easily change between compilation and execution, e.g. if the binary is executed on a different system.
You could get the number of logical CPUs of system that compilers the code from the build system, for instance using CMake's ProcessorCount and put that into a define.
What is std::thread::hardware_concurrency returns?
std::thread::hardware_concurrency()
is a static member function of std::thread
which
Returns the number of concurrent threads supported by the implementation. The value should be considered only a hint.
So the 8
you get is your "max conncurent threads".
When you do
std::cout << std::thread::hardware_concurrency << std::endl;
you are printing the address of the function. The value that you get is basically meaningless unless you want to pass that function pointer to something else.
std::thread::hardware_concurrency is one , should I implement threading on my system
As the cppreference page says, the value of it should only be considered a hint. Note that it should return 0 only when it is actually unable to compute the value for the available threads on the implementation. 1 might be the actual number of usable threads for your application.
Unfortunately the only real and good way to find out if it's worth to have threading for your application, is to actually implement threading and benchmark your application. Depending on your application workload and the implementation of threading in your application you may see either no change, a degradation of performance or an improvement in performance.
Make sure to consider the following questions before implementing/considering multi-threading in your application:
- Can your workload be parallelized?
- Would you need a lot of synchronization locks/mutexes/etc? If yes, then it might not be worth it.
- Maybe you can split the workload onto a GPU? Consider if your workload can fit into video memory and if it is fit for processing on a GPU.
- Would the estimated time for implementation of multi-threading be worth it?
- Make sure to create a small test application that would somewhat represent your use case.
- If your application before ran 5 seconds and is estimated to run 2 seconds after implementation of multi-threading, was the increase worth it?
Related Topics
Full Process Name from Task_Struct
Best Way to Build Cross Toolchains on MAC Os X
Sudo User Not Using Same Node Version
Is There a 'ssh-Add' Linux Alpine One Liner
Bash Loop Through Directory Including Hidden File
Copy and Overwrite a File in Shell Script
Surprise! The Shell Suggests Command Line Switches
When Using Cpan in Linux Ubuntu Should I Run It Using Sudo/As Root or as My Default User
In Linux, How to Create a File Descriptor for a Memory Region
Adjust Audio Volume Level with Cli Omxplayer - Raspberry Pi
How to Block Push to Master Branch on Remote
How to Redirect Http to Https Using Gcp Load Balancer
Running Docker on Google Colab