Ruby Array Concat Versus + Speed

Ruby Array concat versus + speed?

According to the Ruby docs, the difference is:

Array#+ :

Concatenation — Returns a new array built by concatenating the two arrays together to produce a third array.

Array#concat :

Array#concat : Appends the elements of other_ary to self.

So the + operator will create a new array each time it is called (which is expensive), while concat only appends the new element.

What is the difference between #concat and += on Arrays?

+= would create a new array object, concat mutates the original object

a = [1,2]
a.object_id # => 19388760
a += [1]
a.object_id # => 18971360

b = [1,2]
b.object_id # => 18937180
b.concat [1]
b.object_id # => 18937180

Note the object_id for a changed while for b did not change

Ruby - Array.join versus String Concatenation (Efficiency)

Try it yourself with the Benchmark class.

require "benchmark"

n = 1000000
Benchmark.bmbm do |x|
x.report("concatenation") do
foo = ""
n.times do
foo << "foobar"
end
end

x.report("using lists") do
foo = []
n.times do
foo << "foobar"
end
string = foo.join
end
end

This produces the following output:

Rehearsal -------------------------------------------------
concatenation 0.300000 0.010000 0.310000 ( 0.317457)
using lists 0.380000 0.050000 0.430000 ( 0.442691)
---------------------------------------- total: 0.740000sec

user system total real
concatenation 0.260000 0.010000 0.270000 ( 0.309520)
using lists 0.310000 0.020000 0.330000 ( 0.363102)

So it looks like concatenation is a little faster in this case. Benchmark on your system for your use-case.

How to efficiently concatenate multiple arrays in Ruby?

If you've already determined that multiple concatenation is the fastest method, you can write it nicer using reduce:

[foo, bar, baz].reduce([], :concat)

ruby operator confusion with shovel () and += , Concating arrays

One difference is that because << works in place it is somewhat faster than +=. The following code

require 'benchmark'

a = ''
b= ''

puts Benchmark.measure {
100000.times { a << 'test' }
}

puts Benchmark.measure {
100000.times { b += 'test' }
}

yields

0.000000   0.000000   0.000000 (  0.004653)
0.060000 0.060000 0.120000 ( 0.108534)

Update

I originally misunderstood the question. Here's whats going on. Ruby variables only store references to objects, not the objects themselves. Here's simplified code that does the same thing as yours, and has the same issue. I've told it to print temp and words_array on each iteration of the loops.

def helper(word)
words_array = []

word.length.times do |i|
temp = ''
(i...word.length).each do |j|
temp << word[j]
puts "temp:\t#{temp}"
words_array << temp unless words_array.include?(temp)
puts "words:\t#{words_array}"
end
end

words_array
end

p helper("cat")

Here is what it prints:

temp:   c
words: ["c"]
temp: ca
words: ["ca"]
temp: cat
words: ["cat"]
temp: a
words: ["cat", "a"]
temp: at
words: ["cat", "at"]
temp: t
words: ["cat", "at", "t"]
["cat", "at", "t"]

As you can see, during each iteration of the inner loop after the first, ruby is simply replacing the last element of words_array. That is because words_array holds a reference to the string object referenced by temp, and << modifies that object in place rather than creating a new object.

On each iteration of the outer loop temp is set to a new object, and that new object is appended to words_array, so it doesn't replace the previous elements.

The += construct returns a new object to temp on each iteration of the inner loop, which is why it does behave as expected.

Efficiency of Arrays vs Ranges in Ruby

TL;DR

Ranges are generally faster and more memory-efficient than reifying Arrays. However, specific use cases may vary.

If in doubt, benchmark. You can use irb's relatively new measure command, or use the Benchmark module to compare and contrast different approaches. In general, reifying a Range as an Array takes more memory and is slower than comparing against a Range (or even a small Array of Range objects), but unless you loop over this code a lot this seems like a premature optimization.

Benchmarks

Using Ruby 3.1.0, the Range approach is around 3,655.77% faster on my system. For example:

require 'benchmark'

n = 100_000

Benchmark.bmbm do
_1.report("Range") do
n.times do
client_error = [200..299, 400..499]
client_error.include? 404
end
end

_1.report("Array") do
n.times do
client_error = [*(200..299), *(400..499)]
client_error.include? 404
end
end
end
Rehearsal -----------------------------------------
Range 0.022570 0.000107 0.022677 ( 0.022832)
Array 0.707742 0.041499 0.749241 ( 0.750012)
-------------------------------- total: 0.771918sec

user system total real
Range 0.020184 0.000043 0.020227 ( 0.020245)
Array 0.701911 0.037541 0.739452 ( 0.740037)

While the overall total times are better with Jruby and TruffleRuby, the performance differences between the approaches are only about 3-7x faster with Ranges. Meanwhile, Ruby 3.0.1 shows an approximate 37x speed improvement using a non-reified Range rather than an Array, so the Range approach is the clear winner here either way.

Your specific values will vary based on system specs, system load, and Ruby version and engine. For smaller values of n, I can't imagine it will make any practical difference, but you should definitely benchmark against your own systems to determine if the juice is worth the squeeze.

Can I make this Ruby code faster and/or use less memory?

You've got the right idea. However, Ruby has a built-in class that's perfect for building sets of unique items: Set.

animals = ["cat horse", "dog", "cat dog bird", "dog sheep", "chicken cow"]

unique_animals = Set.new

animals.each do |str|
unique_animals.merge(str.split)
end
# => cat
# horse
# dog
# bird
# sheep
# chicken
# cow

Or...

unique_animals = animals.reduce(Set.new) do |set, str|
set.merge(str.split)
end

Under the covers Set actually uses a Hash to store its items, but it acts more like an un-ordered Array and responds to all of the familiar Enumerable methods (each, map, select, etc.). If you need to turn it into a real Array, though, just use Set#to_a.



Related Topics



Leave a reply



Submit