Is 1.0 a Valid Output from Std::Generate_Canonical

Is 1.0 a valid output from std::generate_canonical?

The problem is in mapping from the codomain of std::mt19937 (std::uint_fast32_t) to float; the algorithm described by the standard gives incorrect results (inconsistent with its description of the output of the algorithm) when loss of precision occurs if the current IEEE754 rounding mode is anything other than round-to-negative-infinity (note that the default is round-to-nearest).

The 7549723rd output of mt19937 with your seed is 4294967257 (0xffffffd9u), which when rounded to 32-bit float gives 0x1p+32, which is equal to the max value of mt19937, 4294967295 (0xffffffffu) when that is also rounded to 32-bit float.

The standard could ensure correct behavior if it were to specify that when converting from the output of the URNG to the RealType of generate_canonical, rounding is to be performed towards negative infinity; this would give a correct result in this case. As QOI, it would be good for libstdc++ to make this change.

With this change, 1.0 will no longer be generated; instead the boundary values 0x1.fffffep-N for 0 < N <= 8 will be generated more often (approximately 2^(8 - N - 32) per N, depending on the actual distribution of MT19937).

I would recommend to not use float with std::generate_canonical directly; rather generate the number in double and then round towards negative infinity:

    double rd = std::generate_canonical<double,
        std::numeric_limits<float>::digits>(rng);
    float rf = rd;
    if (rf > rd) {
      rf = std::nextafter(rf, -std::numeric_limits<float>::infinity());
    }

This problem can also occur with std::uniform_real_distribution<float>; the solution is the same, to specialize the distribution on double and round the result towards negative infinity in float.

Is generate_canonical output consistent across platforms?

The difficulties encountered in the linked question points to the basic problem with consistency: rounding mode. The clear intent of the mathematical definition of generate_canonical in the standard is that the URNG be called several times, each producing a non-overlapping block of entropy to fill the result with; that would be entirely consistent across platforms. The problem is, no indication is given as to what to do with the extra bits below the LSB. Depending on rounding mode and summation order, these can round upwards, spilling into the next block (which is what allows for a 1.0 result).

Now, the precise wording is "the instantiation’s results...are distributed as uniformly as possible as specified below". If the rounding mode is round-to-nearest, an implementation which produces 1.0 is not as uniform as possible (because 1-eps is less likely than 1-2*eps). But it's still "as specified below". So depending on how you parse that sentence, generate_canonical is either fully specified and consistent, or has delegated some extra un-discussed bits to the implementation.

In any case, the fact that certain implementations produce 1.0 makes it quite clear that the current behavior is not cross-platform consistent. If you want that, it seems like the most straightforward approach would be to wrap your URNG in an independent_bits_engine to produce some factor of bits bits, so there's never anything to round.

how to use std::generate_canonical to generate random number in range [0,1)?

26.5.7.2 Function template generate_canonical [rand.util.canonical]

Each function instantiated from the template described in this section
26.5.7.2 maps the result of one or more invocations of a supplied uniform random number generator g to one member of the specified
RealType such that, if the values g_i produced by g are uniformly
distributed, the instantiation's results tj, 0 ≤ tj < 1, are
distributed as uniformly as possible as specified below.
template<class RealType, size_t bits, class URNG> 
RealType generate_canonical(URNG& g);

Also, Standard describes, that this function returns s/rk , where

So, this function should return a value from zero to one. I think, Microsoft implementation is wrong here.

Is 1.0 a valid output from std::generate_canonical?

I would recommend to not use float with std::generate_canonical directly; rather generate the number in double and then round towards negative infinity:

    double rd = std::generate_canonical<double,
        std::numeric_limits<float>::digits>(rng);
    float rf = rd;
    if (rf > rd) {
      rf = std::nextafter(rf, -std::numeric_limits<float>::infinity());
    }

std::uniform_real_distribution inclusive range

This is easier to think about if you start by looking at integers. If you pass [-1, 1) you would expect to get -1, 0. Since you want to include 1, you would pass [-1, (1+1)), or [-1, 2). Now you get -1, 0, 1.

You want to do the same thing, but with doubles:

Borrowing from this answer:

#include <cfloat> // DBL_MAX
#include <cmath> // std::nextafter
#include <random>
#include <iostream>

int main()
{
  const double start = -1.0;
  const double stop = 1.0;

  std::random_device rd;
  std::mt19937 gen(rd());

  // Note: uniform_real_distribution does [start, stop),
  //   but we want to do [start, stop].
  //   Pass the next largest value instead.
  std::uniform_real_distribution<> dis(start, std::nextafter(stop, DBL_MAX));

  for (auto i = 0; i < 100; ++i)
  {
    std::cout << dis(gen) << "\n";
  }
  std::cout << std::endl;
}

(See the code run here)

That is, find the next largest double value after the one you want, and pass that as the end value instead.

How efficient is it to make a temporary uniform random distribution each time round a loop?

std::uniform_real_distribution's objects are lightweight, so it's not a problem to construct them each time inside the loop.

Sometimes, the hidden internal state of distribution is important, but not in this case. reset() function is doing nothing in all popular STL implementations:

void
reset() { }

For example, it's not true for std::normal_distribution:

void
reset()
{ _M_saved_available = false; }

In C++ string, why after the last character, behavior is different when accessed by index and at()?

shouldn't both behavior be the same either both( accessing by index and at()) give abnormal termination or exit normally?

No, they should not have the same behaviour. The behaviour is different intentionally. If it wasn't, then there would only be a need for one of them to exist.

The at member function performs bounds checks. Any access outside the bounds of the container results in an exception. This is the same as the at member function of std::array or std::vector for example. Note that an uncaught throw will cause the program to be terminated.

The subscript operator does not perform any out of bounds checks. Prior to C++11, any access to elements at indices > size() has undefined behaviour. Under no circumstance is the subscript operator guaranteed to throw an exception. This is the same as subscript operator of an array, std::array or std::vector for example.

Since C++11, the behaviour of the subscript operator of std::string was changed such that reading the element at index == size() (i.e. one past the last element) is well defined, and returns a null terminator. Only modifying the object through the returned reference has undefined behaviour. Reading other indices outside the bounds still has undefined behaviour.

I do not know for a fact the rationale for not making corresponding change to at to allow access to the null terminator, but I suspect that it was considered to be a backwards incompatible change. Making UB well defined is always backwards compatible, while ceasing exception throwing is not. Another possible reason is that it would have opened a route to UB (if the null terminator is modified), and the design of at is to keep it free from UB.

Getting strange characters in fields with output from json c++

You should save the result in a std::string and then call c_str() on that string to get a C-string. If you chain those calls and save the pointer instantly or only do asCString() the string object holding the memory the C-string is pointing at will have been cleared and you will invoke undefined behavior in your code which is not what you want.

I.E

std::string runtime = result.get("Runtime", "NULL").asString();
runtimeInput->value(runtime.c_str());

Is 1.0 a Valid Output from Std::Generate_Canonical

Is 1.0 a valid output from std::generate_canonical?

Is generate_canonical output consistent across platforms?

how to use std::generate_canonical to generate random number in range [0,1)?

Is 1.0 a valid output from std::generate_canonical?

std::uniform_real_distribution inclusive range

How efficient is it to make a temporary uniform random distribution each time round a loop?

In C++ string, why after the last character, behavior is different when accessed by index and at()?

Getting strange characters in fields with output from json c++

Related Topics

Leave a reply