SQL Injection Attack Prevention: Where Do I Start

Efficient way to round double precision numbers to a lower precision given in number of bits

Dekker’s algorithm will split a floating-point number into high and low parts. If there are s bits in the significand (53 in IEEE 754 64-bit binary), then *x0 receives the high s-b bits, which is what you requested, and *x1 receives the remaining bits, which you may discard. In the code below, Scale should have the value 2b. If b is known at compile time, e.g., the constant 43, you can replace Scale with 0x1p43. Otherwise, you must produce 2b in some way.

This requires round-to-nearest mode. IEEE 754 arithmetic suffices, but other reasonable arithmetic may be okay too. It rounds ties to even, which is not what you requested (ties upward). Is that necessary?

This assumes that x * (Scale + 1) does not overflow. The operations must be evaluated in double precision (not greater).

void Split(double *x0, double *x1, double x)
{
double d = x * (Scale + 1);
double t = d - x;
*x0 = d - t;
*x1 = x - *x0;
}

How to round a double/float to BINARY precision?

Yes, rounding off binary digits makes more sense than going through BigDecimal and can be implemented very efficiently if you are not worried about being within a small factor of Double.MAX_VALUE.

You can round a floating-point double value x with the following sequence in Java (untested):

double t = 9 * x; // beware: this overflows if x is too close to Double.MAX_VALUE
double y = x - t + t;

After this sequence, y should contain the rounded value. Adjust the distance between the two set bits in the constant 9 in order to adjust the number of bits that are rounded off. The value 3 rounds off one bit. The value 5 rounds off two bits. The value 17 rounds off four bits, and so on.

This sequence of instruction is attributed to Veltkamp and is typically used in “Dekker multiplication”. This page has some references.

Rounding to specfic digits fails with this double-precision value

Double is a floating binary point type. They are represented in binary system (like 11010.00110). When double is presented in decimal system it is only an approximation as not all binary numbers have exact representation in decimal system. Try for example this operation:

double d = 3.65d + 0.05d;

It will not result in 3.7 but in 3.6999999999999997. It is because the variable contains a closest available double.

The same happens in your case. Your variable contains closest available double.

For precise operations double/float is not the most fortunate choice.
Use double/float when you need fast performance or you want to operate on larger range of numbers, but where high precision is not required. For instance, it is perfect type for calculations in physics.
For precise decimal operations use, well, decimal.

Here is an article about float/decimal: http://csharpindepth.com/Articles/General/FloatingPoint.aspx



Related Topics



Leave a reply



Submit