Convert Ruby to Low Level Languages

Convert Ruby to low level languages?

Such a compiler would be an enormous piece of work. Even if it works, it still has to

include the ruby runtime
include the standard library (which wasn't built for performance but for usability)
allow for metaprogramming
do dynamic dispatch
etc.

All of these inflict tremendous runtime penalties, because a C compiler can neither understand nor optimize such abstractions. Ruby and other dynamic languages are not only slower because they are interpreted (or compiled to bytecode which is then interpreted), but also because they are dynamic.

Example

In C++, a method call can be inlined in most cases, because the compiler knows the exact type of this. If a subtype is passed, the method still can't change unless it is virtual, in which case a still very efficient lookup table is used.

In Ruby, classes and methods can change in any way at any time, thus a (relatively expensive) lookup is required every time.

Languages like Ruby, Python or Perl have many features that simply are expensive, and most if not all relevant programs rely heavily on these features (of course, they are extremely useful!), so they cannot be removed or inlined.

Simply put: Dynamic languages are very hard to optimize, simply doing what an interpreter would do and compiling that to machine code doesn't cut it. It's possible to get incredible speed out of dynamic languages, as V8 proves, but you have to throw huge piles of money and offices full of clever programmers at it.

Advantages of Java over Ruby/JRuby

I don't know Ruby very well, but I can guess the following points:

Java has more documentation (books, blogs, tutorial, etc.); overall documentation quality is very good
Java has more tools (IDEs, build tools, compilers, etc.)
Java has better refactoring capabilities (due to the static type system, I guess)
Java has more widespread adoption than Ruby
Java has a well-specified memory model
As far as I know, Java has better support for threading and unicode (JRuby may help here)
Java's overall performance is quite good as of late (due to hotspot, G1 new garbage collector, etc.)
Nowadays, Java has very attractive and cheap server hosting: appengine

Are there any low-level languages that can be used in place of scripts?

I recently stumbled upon something called BinaryPHP in which you code normally in php and then convert the script into C++ to be compiled on your favorite tool. That should be a nice learning curve for someone already in touch with php.

Why Are There Different Ruby Implementations?

Because different language implementors decide to focus on a particular area. For instance, compatibility with the Java runtime (JRuby), or experiment with JIT (rubinius), target Ruby at the enterprise (REE), etc., etc...

This isn't unique to Ruby either, it's healthy in a language, if a particular group sees potential with the language in a certain area, it can help foster growth within that community.

Compress a Integer to a Base64 upper/lower case character in Ruby (to make an Encoded Short URL)

As others here have said ruby's Base64 encoding is not the same as converting an integer to a string using a base of 64. Ruby provides an elegant converter for this but the maximum base is base-36. (See @jad's answer).

Below brings together everything into two methods for encoding/decoding as base-64.

def encode(int)
  chars = [*'A'..'Z', *'a'..'z', *'0'..'9', '_', '!']
  digits = int.digits(64).reverse
  digits.map { |i| chars[i] }.join
end

And to decode

def decode(str)
  chars = [*'A'..'Z', *'a'..'z', *'0'..'9', '_', '!']
  digits = str.chars.map { |char| value = chars.index(char) }.reverse
  output = digits.each_with_index.map do |value, index|
    value * (64 ** index)
  end
  output.sum
end

Give them a try:

puts output = encode(123456) #=> "eJA"
puts decode(output) #=> 123456

The compression is pretty good, an integer around 99 Million (99,999,999) encodes down to 5 characters ("1pOkA").

To gain the extra compression of including upper and lower case characters using base-64 is inherantly case-sensetive. If you are wanting to make this case-insensetive, using the built in base-36 method per Jad's answer is the way to go.

Credit to @stefan for help with this.

How to compile Ruby?

The simple answer is that you can't, at least with MRI 1.8 (the standard). This is because 1.8 works by walking the Abstract Syntax Tree. Python, Ruby 1.9, JRuby, and Rubinius use byte code, which allows compilation to an Intermediate Representation (byte code). From MRI Ruby 2.3 it has become easy to do this, see this answer below.

With Rubinius, you can do something as described in this post: http://rubini.us/2011/03/17/running-ruby-with-no-ruby/

In JRuby you can use the "Ahead Of Time" compiler through, I believe, jrubyc.

This isn't really the standard way of doing things and you're generally better off just letting your Ruby implementation handle it like it wants to. Rubinius, at least, will cache byte code after the first compilation, updating it as it needs to.

Converting Ruby to C#

I don't know C# at all, so anything I say about C# should be taken with a grain of salt. However, I will try to explain what goes on in that piece of Ruby code.

class << Cache

Ruby has something called singleton methods. These have nothing to do with the Singleton Software Design Pattern, they are just methods that are defined for one and only one object. So, you can have two instances of the same class, and add methods to one of those two objects.

There are two different syntaxes for singleton methods. One is to just prefix the name of the method with the object, so def foo.bar(baz) would define a method bar only for object foo. The other method is called opening up the singleton class and it looks syntactically similar to defining a class, because that's also what happens semantically: singleton methods actually live in an invisible class that gets inserted between the object and its actual class in the class hierarchy.

This syntax looks like this: class << foo. This opens up the singleton class of object foo and every method defined inside of that class body becomes a singleton method of object foo.

Why is this used here? Well, Ruby is a pure object-oriented language, which means that everything, including classes is an object. Now, if methods can be added to individual objects, and classes are objects, this means that methods can be added to individual classes. In other words, Ruby has no need for the artificial distinction between regular methods and static methods (which are a fraud, anyway: they aren't really methods, just glorified procedures). What is a static method in C#, is just a regular method on a class object's singleton class.

All of this is just a longwinded way of explaining that everything defined between class << Cache and its corresponding end becomes static.

  STALE_REFRESH = 1
  STALE_CREATED = 2

In Ruby, every variable that starts with a capital letter, is actually a constant. However, in this case we won't translate these as static const fields, but rather an enum, because that's how they are used.

  # Caches data received from a block
  #
  # The difference between this method and usual Cache.get
  # is following: this method caches data and allows user
  # to re-generate data when it is expired w/o running
  # data generation code more than once so dog-pile effect
  # won't bring our servers down
  #
  def smart_get(key, ttl = nil, generation_time = 30.seconds)

This method has three parameters (four actually, we will see exactly why later), two of them are optional (ttl and generation_time). Both of them have a default value, however, in the case of ttl the default value isn't really used, it serves more as a marker to find out whether the argument was passed in or not.

30.seconds is an extension that the ActiveSupport library adds to the Integer class. It doesn't actually do anything, it just returns self. It is used in this case just to make the method definition more readable. (There are other methods which do something more useful, e.g. Integer#minutes, which returns self * 60 and Integer#hours and so on.) We will use this as an indication, that the type of the parameter should not be int but rather System.TimeSpan.

    # Fallback to default caching approach if no ttl given
    return get(key) { yield } unless ttl

This contains several complex Ruby constructs. Let's start with the easiest one: trailing conditional modifiers. If a conditional body contains only one expression, then the conditional can be appended to the end of the expression. So, instead of saying if a > b then foo end you can also say foo if a > b. So, the above is equivalent to unless ttl then return get(key) { yield } end.

The next one is also easy: unless is just syntactic sugar for if not. So, we are now at if not ttl then return get(key) { yield } end

Third is Ruby's truth system. In Ruby, truth is pretty simple. Actually, falseness is pretty simple, and truth falls out naturally: the special keyword false is false, and the special keyword nil is false, everything else is true. So, in this case the conditional will only be true, if ttl is either false or nil. false isn't a terrible sensible value for a timespan, so the only interesting one is nil. The snippet would have been more clearly written like this: if ttl.nil? then return get(key) { yield } end. Since the default value for the ttl parameter is nil, this conditional is true, if no argument was passed in for ttl. So, the conditional is used to figure out with how many arguments the method was called, which means that we are not going to translate it as a conditional but rather as a method overload.

Now, on to the yield. In Ruby, every method can accept an implicit code block as an argument. That's why I wrote above that the method actually takes four arguments, not three. A code block is just an anonymous piece of code that can be passed around, stored in a variable, and invoked later on. Ruby inherits blocks from Smalltalk, but the concept dates all the way back to 1958, to Lisp's lambda expressions. At the mention of anonymous code blocks, but at the very least now, at the mention of lambda expressions, you should know how to represent this implicit fourth method parameter: a delegate type, more specifically, a Func.

So, what's yield do? It transfers control to the block. It's basically just a very convenient way of invoking a block, without having to explicitly store it in a variable and then calling it.

    # Create window for data refresh
    real_ttl = ttl + generation_time * 2
    stale_key = "#{key}.stale"

This #{foo} syntax is called string interpolation. It means "replace the token inside the string with whatever the result of evaluating the expression between the braces". It's just a very concise version of String.Format(), which is exactly what we are going to translate it to.

    # Try to get data from memcache
    value = get(key)
    stale = get(stale_key)

    # If stale key has expired, it is time to re-generate our data
    unless stale
      put(stale_key, STALE_REFRESH, generation_time) # lock
      value = nil # force data re-generation
    end

    # If no data retrieved or data re-generation forced, re-generate data and reset stale key
    unless value
      value = yield
      put(key, value, real_ttl)
      put(stale_key, STALE_CREATED, ttl) # unlock
    end

    return value
  end
end

This is my feeble attempt at translating the Ruby version to C#:

public class Cache<Tkey, Tvalue> {
    enum Stale { Refresh, Created }

    /* Caches data received from a delegate
     *
     * The difference between this method and usual Cache.get
     * is following: this method caches data and allows user
     * to re-generate data when it is expired w/o running
     * data generation code more than once so dog-pile effect
     * won't bring our servers down
    */
    public static Tvalue SmartGet(Tkey key, TimeSpan ttl, TimeSpan generationTime, Func<Tvalue> strategy)
    {
        // Create window for data refresh
        var realTtl = ttl + generationTime * 2;
        var staleKey = String.Format("{0}stale", key);

        // Try to get data from memcache
        var value = Get(key);
        var stale = Get(staleKey);

        // If stale key has expired, it is time to re-generate our data
        if (stale == null)
        {
            Put(staleKey, Stale.Refresh, generationTime); // lock
            value = null; // force data re-generation
        }

        // If no data retrieved or data re-generation forced, re-generate data and reset stale key
        if (value == null)
        {
            value = strategy();
            Put(key, value, realTtl);
            Put(staleKey, Stale.Created, ttl) // unlock
        }

        return value;
    }

    // Fallback to default caching approach if no ttl given
    public static Tvalue SmartGet(Tkey key, Func<Tvalue> strategy) => 
        Get(key, strategy);

    // Simulate default argument for generationTime
    // C# 4.0 has default arguments, so this wouldn't be needed.
    public static Tvalue SmartGet(Tkey key, TimeSpan ttl, Func<Tvalue> strategy) => 
        SmartGet(key, ttl, new TimeSpan(0, 0, 30), strategy);

    // Convenience overloads to allow calling it the same way as 
    // in Ruby, by just passing in the timespans as integers in 
    // seconds.
    public static Tvalue SmartGet(Tkey key, int ttl, int generationTime, Func<Tvalue> strategy) => 
        SmartGet(key, new TimeSpan(0, 0, ttl), new TimeSpan(0, 0, generationTime), strategy);

    public static Tvalue SmartGet(Tkey key, int ttl, Func<Tvalue> strategy) => 
        SmartGet(key, new TimeSpan(0, 0, ttl), strategy);
}

Please note that I do not know C#, I do not know .NET, I have not tested this, I don't even know if it is syntactically valid. Hope it helps anyway.

Convert Ruby to Low Level Languages