Fast Way of Finding Most and Least Significant Bit Set in a 64-Bit Integer

Fast way of finding most and least significant bit set in a 64-bit integer

Fastest way to get last significant bit position in a ulong (C#)?

I have measured perfomance of all answers.

The winner is not present here classic De Bruijn sequence approach.

    private const ulong DeBruijnSequence = 0x37E84A99DAE458F;

private static readonly int[] MultiplyDeBruijnBitPosition =
{
0, 1, 17, 2, 18, 50, 3, 57,
47, 19, 22, 51, 29, 4, 33, 58,
15, 48, 20, 27, 25, 23, 52, 41,
54, 30, 38, 5, 43, 34, 59, 8,
63, 16, 49, 56, 46, 21, 28, 32,
14, 26, 24, 40, 53, 37, 42, 7,
62, 55, 45, 31, 13, 39, 36, 6,
61, 44, 12, 35, 60, 11, 10, 9,
};

/// <summary>
/// Search the mask data from least significant bit (LSB) to the most significant bit (MSB) for a set bit (1)
/// using De Bruijn sequence approach. Warning: Will return zero for b = 0.
/// </summary>
/// <param name="b">Target number.</param>
/// <returns>Zero-based position of LSB (from right to left).</returns>
private static int BitScanForward(ulong b)
{
Debug.Assert(b > 0, "Target number should not be zero");
return MultiplyDeBruijnBitPosition[((ulong)((long)b & -(long)b) * DeBruijnSequence) >> 58];
}

The fastest way is to inject Bit Scan Forward (bsf) Bit Instruction to assembly after JIT compiler instead of BitScanForward body, but this requires much more efforts.

Finding most significant and least significant bit set in a 64-bit integer?

If code portability is not an issue you should try _BitScanForward64 and _BitScanReverse64. They're compiler intrinsics and map to a single, efficient assembler instruction.

Position of least significant bit that is set

Bit Twiddling Hacks offers an excellent collection of, er, bit twiddling hacks, with performance/optimisation discussion attached. My favourite solution for your problem (from that site) is «multiply and lookup»:

unsigned int v;  // find the number of trailing zeros in 32-bit v 
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];

Helpful references:

  • "Using de Bruijn Sequences to Index a 1 in a Computer Word" - Explanation about why the above code works.
  • "Board Representation > Bitboards > BitScan" - Detailed analysis of this problem, with a particular focus on chess programming

Efficiently find least significant set bit in a large array?

Find most significant bit in Swift

You can use the flsl() function ("find last set bit, long"):

let x = 9
let p = flsl(x)
print(p) // 4

The result is 4 because flsl() and the related functions number the bits starting at 1, the least significant bit.

On Intel platforms you can use the _bit_scan_reverse intrinsic,
in my test in a macOS application this translated to a BSR
instruction.

import _Builtin_intrinsics.intel

let x: Int32 = 9
let p = _bit_scan_reverse(x)
print(p) // 3

Efficiently getting most and least significant bit in javascript

Realize the answer has already been selected for this question, but found it an interesting problem likely best solved with webassembly (WASM). Unfortunately, it appears that WASM does not support i64 parameters, so had to settle for sending the BigInt as lo / hi ( i32 / i32 ), and stitching together as an i64 within WASM before looking for the MSB.


EDIT Jan 2021: Per https://webassembly.org/roadmap/, the latest browsers are now supporting BigInt i64 parameters between javascript and WASM. This answer has been updated to now include the following WASM code lz64.wat compiled into lz64.wasm:

(module
(func $lz64 (param $val i64) (result i64)
get_local $val
i64.clz
)
(export "lz64" (func $lz64))
)

Compared with the WASM code below which needed to stitch together two i32 parameters, the code above is greatly simplified (and much faster)!


The WASM i32 code is fairly straightforward, leveraging https://pengowray.github.io/wasm-ops/ to determine the proper OpCodes. Additionally, used https://webassembly.github.io/wabt/demo/wat2wasm/ to create, debug, and compile the Web Assembly Text (WAT) to WASM. The i64.clz op is where the actual magic happens. All the code before that is to stitch the two i32 numbers together to form the actual i64 number to check. Note too that WASM acts a bit like an RPN calculator...

lz64v32.wat just below is compiled into lz64v32.wasm

(module
(func $lz (param $lo i32) (param $hi i32) (result i32)
get_local $hi
i64.extend_i32_u
i64.const 32
i64.shl

get_local $lo
i64.extend_i32_u
i64.add

i64.clz
i32.wrap_i64
)
(export "lz" (func $lz))
)

The listing below includes the answer for the De Bruijn solution, in order to test the relative performance.

EDIT Also stumbled into What's the most efficient way of getting position of least significant bit of a number in javascript? which points out the Math.clz32 function(!). Put together a few help functions to support finding the MSB for a 64 bit BigInt, and ran tests with this third option also.

BEWARE! Running the code below will create 10M BigInts to run through the Math.clz32, WASM, and De Bruijn solutions. On my Acer laptop, on a handful of runs, the results are...

  • Math.clz32 => 1.70s
  • WASM i32 => 2.06s
  • WASM i64 => 0.43s
  • De Bruijn => 13.70s

The WASM i64 solution is significantly faster!

var v232 = 2n ** 32n;

function clz32(n) {
return Math.clz32(n | 0);
}

// https://stackoverflow.com/questions/61442006/whats-the-most-efficient-way-of-getting-position-of-least-significant-bit-of-a
function ctz32(n) {
n |= 0;
return n ? 31 - Math.clz32(n & -n) : 0;
}

function clz64(bn) {
let result = clz32(Number(bn / v232) | 0);
if (result === 32) {
result += clz32(Number(bn % v232) | 0);
}
return result;
}

function arrayBufferToBase64(arrayBuffer) {
return btoa(String.fromCharCode(...new Uint8Array(arrayBuffer)));
}

function base64ToArrayBuffer(base64) {
let s = atob(base64);
let arrayBuffer = new ArrayBuffer(s.length);
var bufferView = new Uint8Array(arrayBuffer);
for (let i = 0; i < s.length; i++) {
bufferView[i] = s.charCodeAt(i);
}
return arrayBuffer;
}

async function compile(fn, preCompiled) {
let wasmBuffer;
if (preCompiled) {
wasmBuffer = base64ToArrayBuffer(preCompiled);
} else {
let response = await fetch(fn);
wasmBuffer = await response.arrayBuffer();
console.log(arrayBufferToBase64(wasmBuffer));
}

return await WebAssembly.instantiate(wasmBuffer);
}

async function runTest() {

let wasm64v32 = await compile('./lz64v32.wasm', 'AGFzbQEAAAABBwFgAn9/AX8DAgEABwYBAmx6AAAKEAEOACABrUIghiAArXx5pwsAGQRuYW1lAQUBAAJsegILAQACAAJsbwECaGk=');
let wasm64 = await compile('./lz64.wasm', 'AGFzbQEAAAABBgFgAX4BfgMCAQAHCAEEbHo2NAAACgcBBQAgAHkLABgEbmFtZQEHAQAEbHo2NAIIAQABAAN2YWw=');
let v232 = 2n ** 32n;
let randomBigInts = new Array(10000000);
console.log(`Building ${randomBigInts.length.toLocaleString()} BigInts...\n\n`);
for (let i = 0; i < randomBigInts.length; i++) {
randomBigInts[i] = 2n ** BigInt(Math.round(Math.random() * 64));
}

console.log(`Math.clz32 MSB start check on ${randomBigInts.length.toLocaleString()} BigInts.`);
randomBigInts.forEach(v=>{
64 - clz64(v);
});
console.log('Done');

let v = randomBigInts[0];
console.log(`Math.clz32 test of msb ${('0'.repeat(64) + v.toString(2)).slice(-64)} is ${64 - clz64(v)}\n\n`);

console.log(`WASM MSB start check on ${randomBigInts.length.toLocaleString()} BigInts via WASM, splitting 64 bit BigInt into 2 x i32 parameters.`);
let lzf = wasm64v32.instance.exports.lz
randomBigInts.forEach(v=>{
64 - lzf(Number(v % v232), Number(v / v232));
});
console.log('Done');

console.log(`WASM test of leading zeros ${('0'.repeat(64) + v.toString(2)).slice(-64)} is ${64 - lzf(Number(v % v232), Number(v / v232))}\n\n`);

console.log(`WASM MSB start check on ${randomBigInts.length.toLocaleString()} BigInts via WASM using i64 parameter.`);
let lzf64 = wasm64.instance.exports.lz64
randomBigInts.forEach(v=>{
64n - lzf64(v);
});
console.log('Done');

console.log(`WASM test of leading zeros ${('0'.repeat(64) + v.toString(2)).slice(-64)} is ${64n - lzf64(v)}\n\n`);

console.log(`DeBruijn MSB start check on ${randomBigInts.length.toLocaleString()} BigInts via deBruijn.`);
randomBigInts.forEach(v=>{
msb(v);
});
console.log('Done');
console.log(`DeBruijn test of leading zeros ${('0'.repeat(64) + v.toString(2)).slice(-64)} is ${msb(v)}\n\n`);

}

const deBruijn = [0, 48, -1, -1, 31, -1, 15, 51, -1, 63, 5, -1, -1, -1, 19, -1, 23, 28, -1, -1, -1, 40, 36, 46, -1, 13, -1, -1, -1, 34, -1, 58, -1, 60, 2, 43, 55, -1, -1, -1, 50, 62, 4, -1, 18, 27, -1, 39, 45, -1, -1, 33, 57, -1, 1, 54, -1, 49, -1, 17, -1, -1, 32, -1, 53, -1, 16, -1, -1, 52, -1, -1, -1, 64, 6, 7, 8, -1, 9, -1, -1, -1, 20, 10, -1, -1, 24, -1, 29, -1, -1, 21, -1, 11, -1, -1, 41, -1, 25, 37, -1, 47, -1, 30, 14, -1, -1, -1, -1, 22, -1, -1, 35, 12, -1, -1, -1, 59, 42, -1, -1, 61, 3, 26, 38, 44, -1, 56];
const multiplicator = BigInt("0x6c04f118e9966f6b");

const b1 = BigInt(1)
, b2 = BigInt(2)
, b4 = BigInt(4)
, b8 = BigInt(8)
, b16 = BigInt(16)
, b32 = BigInt(32)
, b57 = BigInt(57);

function msb(v) {
v |= v >> b1;
v |= v >> b2;
v |= v >> b4;
v |= v >> b8;
v |= v >> b16;
v |= v >> b32;
return deBruijn[BigInt.asUintN(64, (BigInt.asUintN(64, (v * multiplicator))) >> b57)];
}

function lsb(v) {
v = -v | v;
return deBruijn[BigInt.asUintN(64, (BigInt.asUintN(64, (~(v) * multiplicator))) >> b57)];
}

runTest();

How does clearing least significant bit repeatedly work in this algorithm?

c - 1 unsets the least significant bit in the binary representation of c and sets all the unset bits to the right of that bit.

When you binary and c - 1 with c you effectively unset all the bits to the right of the least significant set bit and also the least significant set bit. In other words the least significant set bit in c and everything to its right become zeros.

You count this as one, and rightly so. Because it was just one set bit fomr a ^ b.

Now, you continue this operation until c becomes zero and the number of operations is the number of set bits in c which is the number of different bits between a and b.

To give you an example for what c - 1 does to the binary representation of c:

c = 6, in binary 110
c-1 = 5, in binary 101
(c-1) & (c) = 110 & 101 = 100
The above is one iteration of the loop

Next iteration
c = 4, in binary 100
c-1 = 3, in binary 011
(c-1) & (c) = 100 & 101 = 0

The above successfully counts the number of set bits in 6.

This optimization helps you to improve the algorithm compared to when you right shift the number at each iteration and check whether the least significant bit is set. In the former case you operate in the number of positions where the most significant set bit is. Say for a power of 2 number, 2^7, you iterate 8 times until the number becomes zero. While with the optimized method you iterate according to the number of set bits. For 2^7 it would be just one iteration.



Related Topics



Leave a reply



Submit