Bitset to and from Integer/Long

BitSet to and from integer/long

The following code creates a bit set from a long value and vice versa:

public class Bits {

  public static BitSet convert(long value) {
    BitSet bits = new BitSet();
    int index = 0;
    while (value != 0L) {
      if (value % 2L != 0) {
        bits.set(index);
      }
      ++index;
      value = value >>> 1;
    }
    return bits;
  }

  public static long convert(BitSet bits) {
    long value = 0L;
    for (int i = 0; i < bits.length(); ++i) {
      value += bits.get(i) ? (1L << i) : 0L;
    }
    return value;
  }
}

EDITED: Now both directions, @leftbrain: of cause, you are right

Handling Long data with BitSet in Java

BitSet could be used by seeing which bits were previously set. That would then constitute a duplicate value. However, you can't set a bit position greater than Integer.MAX_VALUE (and it would be infeasible to handle such a large range for longs). So it would not work for the range you suggested. And I presume you would still want to record the duplicates.

I would use a Map<Long,Long> to do a frequency count. Then you can just determine the exact count of each value provided. And locating the next Key of a map is comparable to calculating which internal long value holds the desired bit. So I don't believe performance is a factor here.

If you simply want to eliminate duplicates, then just put them in a Set<Long>

Based on your comment, check out this simple test for holding one large value in a BitSet.

BitSet bitSet = new BitSet();
bitSet.set(Integer.MAX_VALUE);
long[] backingArray = bitSet.toLongArray();
System.out.printf("Size of backing array = %,d longs.%n",backingArray.length);

Prints

Size of backing array = 33,554,432 longs.

Convert a byte or int to bitset

If you want a BitSet, try:

final byte b = ...;
final BitSet set = BitSet.valueOf(new byte[] { b });

If you want a boolean[],

static boolean[] bits(byte b) {
  int n = 8;
  final boolean[] set = new boolean[n];
  while (--n >= 0) {
    set[n] = (b & 0x80) != 0;
    b <<= 1;
  }
  return set;
}

or, equivalently,

static boolean[] bits(final byte b) {
  return new boolean[] {
    (b &    1) != 0,
    (b &    2) != 0,
    (b &    4) != 0,
    (b &    8) != 0,
    (b & 0x10) != 0,
    (b & 0x20) != 0,
    (b & 0x40) != 0,
    (b & 0x80) != 0
  };
}

Java from BigInteger to BitSet and back

You can use the BigInteger.toByteArray() and BitSet.toByteArray() for these:

BigInteger bi = new BigInteger("31415926535");
bi = bi.multiply(new BigInteger("271828"));
System.out.println(bi);
BitSet bs = BitSet.valueOf(bi.toByteArray());
System.out.println(bs);
BigInteger bi2 = new BigInteger(bs.toByteArray());
System.out.println(bi2);

You can use the constructor of the BigInteger(byte[]) to convert the array of bytes into a BigInteger and the BitSet.valueOf(byte[]) to convert the data into the desired values.

For the given code, it outputs:

8539728478155980
{1, 2, 3, 4, 9, 10, 12, 14, 17, 18, 20, 22, 23, 25, 27, 28, 29, 30, 32, 33, 35, 37, 38, 42, 44, 45, 50, 51, 54, 55}
8539728478155980

Note that the 2-complement notation is used. Thus for negative numbers, it will generate additional ones. And a 2^64-1 will require more than 64 bits to represent. This also works with big endian. (see modified answer below).

EDIT: there is however one problem with this approach: if there are tailing zeros, these are not taken into account by the program. This can be important because they are part of the representation. You can solve this problem by adding a tailing bit:

public static BitSet convertTo (BigInteger bi) {
    byte[] bia = bi.toByteArray();
    int l = bia.length;
    byte[] bsa = new byte[l+1];
    System.arraycopy(bia,0,bsa,0,l);
    bsa[l] = 0x01;
    return BitSet.valueOf(bsa);
}

public static BigInteger convertFrom (BitSet bs) {
    byte[] bsa = bs.toByteArray();
    int l = bsa.length-0x01;
    byte[] bia = new byte[l];
    System.arraycopy(bsa,0,bia,0,l);
    return new BigInteger(bia);
}

And call it with:

BigInteger bi = new BigInteger("20000000");
System.out.println(bi);

BitSet bs = convertTo(bi);
System.out.println(bs);

BigInteger bi2 = convertFrom(bs);
System.out.println(bi2);

About the memory usage

In general both methods will use approximately the same number of bits. For the first implementation (without size marker and thus errors), it is possible that sometimes, the BitSet approach will use one byte less than the BigInteger approach. With the size marker (second approach), it is the opposite: in general the BitSet will use always one extra byte, except for some rare occasions.

How to input an integer's binary expression into a bitSet in java

BitSet has a static valueOf(long[]) method which

Returns a new bit set containing all the bits in the given long array.

So an array with one long will have 64 bits, an array with two longs will have 128 bits, etc.

If you only need to get a BitSet from a single int value, use it like so

Integer value = 42;
System.out.println(Integer.toBinaryString(value));
BitSet bitSet = BitSet.valueOf(new long[] { value });
System.out.println(bitSet);

It prints

101010
{1, 3, 5}

In other words, from the right to left in the representation above, the 2nd, 4th, and 6th bit are set.

Why is the internal data of BitSet in java stored as long[] instead of int[] in Java?

When querying or manipulating a single bit, there is no significant difference. You have to calculate the word index and read that word and, in case of an update, manipulate one bit of that word and write it back. That’s all the same for int[] and long[].

One could argue that doing it using a long instead of int could raise the amount of memory that has to be transferred for a single bit operation if you have a real 32 bit memory bus, but since Java was designed in the nineties of the last century, the designers decided that this is not an issue anymore.

On the other hand, you get a big win when processing multiple bits at once. When you perform operations like and, or or xor on an entire BitSet, you can perform the operation on an entire word, read 64 bits, at once when using a long array.

Similarly, when searching for the next set bit, if the bit is not within the word of the start position, subsequent words are first tested against zero, which is an intrinsic operation, even for most 32 bit CPUs, so you can skip 64 zero bits at once while the first non-zero word will definitely contain the next set bit, so only one bit extraction operation is needed for the entire iteration.

These benefits for bulk operations will outweigh any single-bit related drawbacks, if there ever are one. As said, most today’s CPU are capable of doing all operations on 64 bit words directly.

Does BitSet in java stores bits or integers?

Typically BitSet would be implemented using a long[]. Each long stores 64 consecutive possible bit positions. The array needs a size equal to the highest set bit index minus one (to allow for index 0), divided by 64 (rounded down). Set bits are represented as a binary 1 and bits present in the array but not set as a binary 0.

So the internal representation of your examples would be something like:

bs1 = new long[] { 0b00010111L }; // 23
bs2 = new long[] { 0b01111110L }; // 126
     // bit indexes: 76543210

(Bits 8-63 elided from constants - add all the zeros if your want.)

convert bitset to int in c++

Use to_ulong to convert it to unsigned long, then an ordinary cast to convert it to int.

int mybit_int;

mybit_int = (int)(mybit.to_ulong());

DEMO

Bitset to and from Integer/Long