Why Is There a Difference Using Std::Thread::Hardware_Concurrency() and Boost::Thread::Hardware_Concurrency()

Why is there a difference using std::thread::hardware_concurrency() and boost::thread::hardware_concurrency()?

After reviewing /usr/include/c++/4.6.2/thread

it can be seen that the implementation is actually:

// Returns a value that hints at the number of hardware thread contexts.
static unsigned int
hardware_concurrency()
{ return 0; }

So problem solved. It's just another feature that hasn't been implemented in gcc 4.6.2

Handle std::thread::hardware_concurrency()

There is another way than using the GCC Common Predefined Macros: Check if std::thread::hardware_concurrency() returns zero meaning the feature is not (yet) implemented.

unsigned int hardware_concurrency()
{
    unsigned int cores = std::thread::hardware_concurrency();
    return cores ? cores : my_hardware_concurrency();
}

You may be inspired by awgn's source code (GPL v2 licensed) to implement my_hardware_concurrency()

auto my_hardware_concurrency()
{
    std::ifstream cpuinfo("/proc/cpuinfo");

    return std::count(std::istream_iterator<std::string>(cpuinfo),
                      std::istream_iterator<std::string>(),
                      std::string("processor"));
}

std::thread::hardware_concurrency() does not return the correct number of Logical Processors in AMD Ryzen threadripper 3990x

The idea of std::thread::hardware_concurrency is to tell you what kind of concurrency std::threads can experience. Since std::thread can only put threads into your default processor group, you will get the number of logical processors in your default processor group. This is not going to be greater than 64 on Windows unless you go to extreme measures.

"[A] system with 128 logical processors would have two processor groups with 64 processors in each group[.]"

"An application that requires the use of multiple groups so that it can run on more than 64 processors must explicitly determine where to run its threads and is responsible for setting the threads' processor affinities to the desired groups."

POSIX equivalent of boost::thread::hardware_concurrency

It uses C-compatible constructs, so why not just use the actual code? [libs/thread/src/*/thread.cpp]

using pthread library:

unsigned thread::hardware_concurrency()
{
#if defined(PTW32_VERSION) || defined(__hpux)
    return pthread_num_processors_np();
#elif defined(__APPLE__) || defined(__FreeBSD__)
    int count;
    size_t size=sizeof(count);
    return sysctlbyname("hw.ncpu",&count,&size,NULL,0)?0:count;
#elif defined(BOOST_HAS_UNISTD_H) && defined(_SC_NPROCESSORS_ONLN)
    int const count=sysconf(_SC_NPROCESSORS_ONLN);
    return (count>0)?count:0;
#elif defined(_GNU_SOURCE)
    return get_nprocs();
#else
    return 0;
#endif
}

in windows:

unsigned thread::hardware_concurrency()
{
    SYSTEM_INFO info={{0}};
    GetSystemInfo(&info);
    return info.dwNumberOfProcessors;
}

thread::hardware_concurrency() as template parameter

There is no static way to get the number of logical CPUs on a system. This number can easily change between compilation and execution, e.g. if the binary is executed on a different system.

You could get the number of logical CPUs of system that compilers the code from the build system, for instance using CMake's ProcessorCount and put that into a define.

What is std::thread::hardware_concurrency returns?

std::thread::hardware_concurrency() is a static member function of std::thread which

Returns the number of concurrent threads supported by the implementation. The value should be considered only a hint.

So the 8 you get is your "max conncurent threads".

When you do

std::cout << std::thread::hardware_concurrency << std::endl;

you are printing the address of the function. The value that you get is basically meaningless unless you want to pass that function pointer to something else.

std::thread::hardware_concurrency is one , should I implement threading on my system

As the cppreference page says, the value of it should only be considered a hint. Note that it should return 0 only when it is actually unable to compute the value for the available threads on the implementation. 1 might be the actual number of usable threads for your application.

Unfortunately the only real and good way to find out if it's worth to have threading for your application, is to actually implement threading and benchmark your application. Depending on your application workload and the implementation of threading in your application you may see either no change, a degradation of performance or an improvement in performance.

Make sure to consider the following questions before implementing/considering multi-threading in your application:

Can your workload be parallelized?
- Would you need a lot of synchronization locks/mutexes/etc? If yes, then it might not be worth it.
- Maybe you can split the workload onto a GPU? Consider if your workload can fit into video memory and if it is fit for processing on a GPU.
Would the estimated time for implementation of multi-threading be worth it?
- Make sure to create a small test application that would somewhat represent your use case.
- If your application before ran 5 seconds and is estimated to run 2 seconds after implementation of multi-threading, was the increase worth it?

Why Is There a Difference Using Std::Thread::Hardware_Concurrency() and Boost::Thread::Hardware_Concurrency()