How to Do Sane "Set-Difference" in Ruby

How to do sane set-difference in Ruby?

The - operator applied to two arrays a and b gives the relative complement of b in a (items that are in a but not in b).

What you are looking for is the symmetric difference of two sets (the union of both relative complements between the two). This will do the trick:

a = [1, 2, 9]
b = [1, 2, 3]
a - b | b - a          # => [3, 9]

If you are operating on Set objects, you may use the overloaded ^ operator:

c = Set[1, 2, 9]
d = Set[1, 2, 3]
c ^ d                  # => #<Set: {3, 9}>

For extra fun, you could also find the relative complement of the intersection in the union of the two sets:

( a | b ) - ( a & b )  # => #<Set: {3, 9}>

Performance of Sets V.S. Arrays in Ruby

The real answer is: write the most readable and maintainable code, and optimize it only after you've shown it is a bottleneck. If you can find an algorithm in that is in linear time, you won't have to optimize it. Here it's easy to find...

Not quite sure which methods you are suggesting, but using my fruity gem:

require 'fruity'
require 'set'

enum = 1000.times

compare do
  uniq { enum.each_with_object([]){|x, array| array << x}.uniq }
  set  { enum.each_with_object(Set[]){|x, set| set << x}.to_a }
  join { enum.inject([]){|array, x| array | [x]} }
end

# set is faster than uniq by 10.0% ± 1.0%
# uniq is faster than join by 394x ± 10.0

Clearly, it makes no sense building intermediate arrays like in the third method. Otherwise, it's not going to make a big difference since you will be in O(n); that's the main thing.

BTW, both sets, uniq and Array#| use eql? and hash on your objects, not <=>. These need to be defined in a sane manner, because the default is that objects are never eql? unless they have the same object_id (see this question)

Difference between @foo, self.foo, and foo?

Why is self.songs used instead of @songs

Using the method is more flexible. You're abstracting yourself from knowing how exactly it gets/stores data. The less you rely on implementation details, the easier it will be for you to change code later.

One small example, consider this implementation of songs

def songs
  @songs ||= []
  @songs
end

@songs may or may not have been assigned value prior to invocation of this method. But it doesn't care. It makes sure that @songs does have a sane default value. The concept is called "lazy initialization" and it's very tedious and error-prone to do if you use instance variables directly.

So, when in doubt, always use methods.

Override what Ruby thinks is the current time in Time.now?

You could use Mocha to change the return value of Time.now during a test:


Time.stubs(:now).returns(Time.now - 1.day)

The Class/Object Paradox confusion

You can see the problem in this diagram:

Ruby Method Lookup Flow

_{(source: phrogz.net)}

All object instances inherit from Object. All classes are objects, and Class is a class, therefore Class is an object. However, object instances inherit from their class, and Object is an instance of the Class class, therefore Object itself gets methods from Class.

As you can see in the diagram, however, there isn't a circular lookup loop, because there are two different inheritance 'parts' to every class: the instance methods and the 'class' methods. In the end, the lookup path is sane.

N.B.: This diagram reflects Ruby 1.8, and thus does not include the core BasicObject class introduced in Ruby 1.9.

What is the right way to iterate through an array in Ruby?

This will iterate through all the elements:

array = [1, 2, 3, 4, 5, 6]
array.each { |x| puts x }

# Output:

1
2
3
4
5
6

This will iterate through all the elements giving you the value and the index:

array = ["A", "B", "C"]
array.each_with_index {|val, index| puts "#{val} => #{index}" }

# Output:

A => 0
B => 1
C => 2

I'm not quite sure from your question which one you are looking for.

class self vs self.method with Ruby: what's better?

class << self is good at keeping all of your class methods in the same block. If methods are being added in def self.method form then there's no guarantee (other than convention and wishful thinking) that there won't be an extra class method tucked away later in the file.

def self.method is good at explicitly stating that a method is a class method, whereas with class << self you have to go and find the container yourself.

Which of these is more important to you is a subjective decision, and also depends on things like how many other people are working on the code and what their preferences are.

How to Do Sane "Set-Difference" in Ruby