Ruby - Return Byte Array Containing Two's Complement Representation of Bignum/Fixnum

Ruby - Return byte array containing two's complement representation of Bignum/Fixnum

The end condition is a bit tricky. Here it goes:

def to_byte_array(num)
  result = []
  begin
    result << (num & 0xff)
    num >>= 8
  end until (num == 0 || num == -1) && (result.last[7] == num[7])
  result.reverse
end

p [0, 1, 255, 256, -1, -128, -256].map{|i| to_byte_array(i)}
# => [[0], [1], [0, 255], [1, 0], [255], [128], [255, 0]]

two's complement representation of a negative number, on a given number of (hexa) digits

What about doing the subtraction yourself when the number is negative:

sprintf("%#X",16**digits-7)

12345.class returning 'Integer' not 'Fixnum' in Ruby

It depends on the Ruby version. From Ruby 2.4.0 we have just Integers, no more Fixnums and Bignums distinction

https://www.ruby-lang.org/en/news/2016/12/25/ruby-2-4-0-released/

Base-36 representation of Digest

Fixnum#to_s accepts a base as the argument. So does string#to_i. Because of this, you can convert from the base-16 string to an int, then to base 36 string:

i = hexstring.to_i(16)
base_36 = i.to_s(36)

Ruby C extensions API questions

Ruby Strings vs. C strings

Let's start with strings first. First of all, before trying to retrieve a string in C, it is good habit to call StringValue(obj) on your VALUE first. This ensures that you will really deal with a Ruby string in the end because if it is not already a string, then it will turn it into one by coercing it with a call to that object's to_str method. So this makes things safer and prevents the occasional segfault you might get otherwise.

The next thing to watch out for is that Ruby strings are not \0-terminated as your C code would expect them to make things like strlen etc. work as expected. Ruby's strings carry their length information with them instead - that's why in addition to RSTRING_PTR(str) there is also the RSTRING_LEN(str) macro to determine the actual length.

So what StringValuePtr now does is returning the non-zero-terminated char * to you - this is great for buffers where you have a separate length, but not what you want for e.g. strlen. Use StringValueCStr instead, it will modify the string to be zero-terminated so that it is safe for usage with functions in C that expect it to be zero-terminated. But, try to avoid this wherever possible, because this modification is much less performant than retrieving the non-zero-terminated string that does not have to be modified at all. It's surprising if you keep an eye on this how rarely you will actually need "real" C strings.

self as an implicit VALUE argument

Another reason why your current code doesn't work as expected is that every C function to be called by Ruby gets passed self as an implicit VALUE.

No arguments in Ruby ( e.g. obj.doit ) translates to

VALUE doit(VALUE self)
Fixed amount of arguments (>0, e.g. obj.doit(a, b)) translates to

VALUE doit(VALUE self, VALUE a, VALUE b)
Var args in Ruby ( e.g. obj.doit(a, b=nil)) translates to

VALUE doit(int argc, VALUE *argv, VALUE self)

in Ruby. So what you were working on in your example is not the string passed to you by Ruby but actually the current value of self, that is the object that was the receiver when you called that function. A correct definition for your example would be

static VALUE test(VALUE self, VALUE input)

I made it static to point out another rule that you should follow in your C extensions. Make your C functions only public if you intend to share them among several source files. Since that's almost never the case for function that you attach to a Ruby class, you should declare them as static by default and only make them public if there is a good reason to do so.

What is VALUE and where does it come from?

Now to the harder part. If you dig down deeply into Ruby internals, then you will find the function rb_objnew in gc.c. Here you can see that any newly created Ruby object becomes a VALUEby being cast as one from something called the freelist. It's defined as:

#define freelist objspace->heap.freelist

You can imagine the objspace as a huge map that stores each and every object that is currently alive at a given point in time in your code. This is also where the garbage collector fulfills his duty and the heap struct in particular is the place where new objects are born. The "freelist" of the heap is again declared as being an RVALUE *. This is the C-internal representation of the Ruby built-in types. An RVALUE is actually defined as follows:

typedef struct RVALUE {
    union {
    struct {
        VALUE flags;        /* always 0 for freed obj */
        struct RVALUE *next;
    } free;
    struct RBasic  basic;
    struct RObject object;
    struct RClass  klass;
    struct RFloat  flonum;
    struct RString string;
    struct RArray  array;
    struct RRegexp regexp;
    struct RHash   hash;
    struct RData   data;
    struct RTypedData   typeddata;
    struct RStruct rstruct;
    struct RBignum bignum;
    struct RFile   file;
    struct RNode   node;
    struct RMatch  match;
    struct RRational rational;
    struct RComplex complex;
    } as;
    #ifdef GC_DEBUG
    const char *file;
    int   line;
    #endif
} RVALUE;

That is, basically a union of core data types that Ruby knows about. Missing something? Yes, Fixnums, Symbols, nil and boolean values are not included there. It's because these kinds of objects are directly represented using the unsigned long that a VALUE boils down to in the end. I think the design decision there was (besides being a cool idea) that dereferencing a pointer might be slightly less performant than the bit shifts that are currently needed when transforming the VALUE to what it actually represents. Essentially

obj = (VALUE)freelist;

says give me whatever freelist points to currently and treat is as unsigned long. This is safe because freelist is a pointer to an RVALUE - and a pointer can also be safely interpreted as unsigned long. This implies that every VALUE except those carrying Fixnums, symbols, nil or Booleans are essentially pointers to an RVALUE, the others are directly represented within the VALUE.

Your last question, how can you check for what a VALUE stands for? You can use the TYPE(x) macro to check whether a VALUE's type would be one of the "primitive" ones.

Ruby max integer

Ruby automatically converts integers to a large integer class when they overflow, so there's (practically) no limit to how big they can be.

If you are looking for the machine's size, i.e. 64- or 32-bit, I found this trick at ruby-forum.com:

machine_bytes = ['foo'].pack('p').size
machine_bits = machine_bytes * 8
machine_max_signed = 2**(machine_bits-1) - 1
machine_max_unsigned = 2**machine_bits - 1

If you are looking for the size of Fixnum objects (integers small enough to store in a single machine word), you can call 0.size to get the number of bytes. I would guess it should be 4 on 32-bit builds, but I can't test that right now. Also, the largest Fixnum is apparently 2**30 - 1 (or 2**62 - 1), because one bit is used to mark it as an integer instead of an object reference.

Ruby - Return Byte Array Containing Two's Complement Representation of Bignum/Fixnum