Ruby Array concat versus + speed?
According to the Ruby docs, the difference is:
Array#+ :
Concatenation — Returns a new array built by concatenating the two arrays together to produce a third array.
Array#concat :
Array#concat : Appends the elements of other_ary to self.
So the +
operator will create a new array each time it is called (which is expensive), while concat
only appends the new element.
What is the difference between #concat and += on Arrays?
+=
would create a new array object, concat
mutates the original object
a = [1,2]
a.object_id # => 19388760
a += [1]
a.object_id # => 18971360
b = [1,2]
b.object_id # => 18937180
b.concat [1]
b.object_id # => 18937180
Note the object_id
for a
changed while for b
did not change
Ruby - Array.join versus String Concatenation (Efficiency)
Try it yourself with the Benchmark class.
require "benchmark"
n = 1000000
Benchmark.bmbm do |x|
x.report("concatenation") do
foo = ""
n.times do
foo << "foobar"
end
end
x.report("using lists") do
foo = []
n.times do
foo << "foobar"
end
string = foo.join
end
end
This produces the following output:
Rehearsal -------------------------------------------------
concatenation 0.300000 0.010000 0.310000 ( 0.317457)
using lists 0.380000 0.050000 0.430000 ( 0.442691)
---------------------------------------- total: 0.740000sec
user system total real
concatenation 0.260000 0.010000 0.270000 ( 0.309520)
using lists 0.310000 0.020000 0.330000 ( 0.363102)
So it looks like concatenation is a little faster in this case. Benchmark on your system for your use-case.
How to efficiently concatenate multiple arrays in Ruby?
If you've already determined that multiple concatenation is the fastest method, you can write it nicer using reduce:
[foo, bar, baz].reduce([], :concat)
ruby operator confusion with shovel () and += , Concating arrays
One difference is that because <<
works in place it is somewhat faster than +=
. The following code
require 'benchmark'
a = ''
b= ''
puts Benchmark.measure {
100000.times { a << 'test' }
}
puts Benchmark.measure {
100000.times { b += 'test' }
}
yields
0.000000 0.000000 0.000000 ( 0.004653)
0.060000 0.060000 0.120000 ( 0.108534)
Update
I originally misunderstood the question. Here's whats going on. Ruby variables only store references to objects, not the objects themselves. Here's simplified code that does the same thing as yours, and has the same issue. I've told it to print temp
and words_array
on each iteration of the loops.
def helper(word)
words_array = []
word.length.times do |i|
temp = ''
(i...word.length).each do |j|
temp << word[j]
puts "temp:\t#{temp}"
words_array << temp unless words_array.include?(temp)
puts "words:\t#{words_array}"
end
end
words_array
end
p helper("cat")
Here is what it prints:
temp: c
words: ["c"]
temp: ca
words: ["ca"]
temp: cat
words: ["cat"]
temp: a
words: ["cat", "a"]
temp: at
words: ["cat", "at"]
temp: t
words: ["cat", "at", "t"]
["cat", "at", "t"]
As you can see, during each iteration of the inner loop after the first, ruby is simply replacing the last element of words_array
. That is because words_array
holds a reference to the string object referenced by temp
, and <<
modifies that object in place rather than creating a new object.
On each iteration of the outer loop temp
is set to a new object, and that new object is appended to words_array
, so it doesn't replace the previous elements.
The +=
construct returns a new object to temp
on each iteration of the inner loop, which is why it does behave as expected.
Efficiency of Arrays vs Ranges in Ruby
TL;DR
Ranges are generally faster and more memory-efficient than reifying Arrays. However, specific use cases may vary.
If in doubt, benchmark. You can use irb's relatively new measure command, or use the Benchmark module to compare and contrast different approaches. In general, reifying a Range as an Array takes more memory and is slower than comparing against a Range (or even a small Array of Range objects), but unless you loop over this code a lot this seems like a premature optimization.
Benchmarks
Using Ruby 3.1.0, the Range approach is around 3,655.77% faster on my system. For example:
require 'benchmark'
n = 100_000
Benchmark.bmbm do
_1.report("Range") do
n.times do
client_error = [200..299, 400..499]
client_error.include? 404
end
end
_1.report("Array") do
n.times do
client_error = [*(200..299), *(400..499)]
client_error.include? 404
end
end
end
Rehearsal -----------------------------------------
Range 0.022570 0.000107 0.022677 ( 0.022832)
Array 0.707742 0.041499 0.749241 ( 0.750012)
-------------------------------- total: 0.771918sec
user system total real
Range 0.020184 0.000043 0.020227 ( 0.020245)
Array 0.701911 0.037541 0.739452 ( 0.740037)
While the overall total times are better with Jruby and TruffleRuby, the performance differences between the approaches are only about 3-7x faster with Ranges. Meanwhile, Ruby 3.0.1 shows an approximate 37x speed improvement using a non-reified Range rather than an Array, so the Range approach is the clear winner here either way.
Your specific values will vary based on system specs, system load, and Ruby version and engine. For smaller values of n, I can't imagine it will make any practical difference, but you should definitely benchmark against your own systems to determine if the juice is worth the squeeze.
Can I make this Ruby code faster and/or use less memory?
You've got the right idea. However, Ruby has a built-in class that's perfect for building sets of unique items: Set.
animals = ["cat horse", "dog", "cat dog bird", "dog sheep", "chicken cow"]
unique_animals = Set.new
animals.each do |str|
unique_animals.merge(str.split)
end
# => cat
# horse
# dog
# bird
# sheep
# chicken
# cow
Or...
unique_animals = animals.reduce(Set.new) do |set, str|
set.merge(str.split)
end
Under the covers Set actually uses a Hash to store its items, but it acts more like an un-ordered Array and responds to all of the familiar Enumerable methods (each
, map
, select
, etc.). If you need to turn it into a real Array, though, just use Set#to_a.
Related Topics
Package Configuration for Libffi Is Not Found in MACos While Installing Travis-Cli
Celluloid Async Inside Ruby Blocks Does Not Work
Naming Conventions for Boolean Attributes
How to Refresh a Page with Turbolinks
How Do Get a Random Datetime Rounded to Beginning of Hour in Rails
How to Break Out of a Map/Collect and Return Whatever Has Been Collected Up to That Point
How to Create a Rails 3 Route That Will Match All Requests and Direct to One Resource/Page
Approach for Installing System Service Implemented as Ruby Gem
Sending Http Post Request in Ruby by Net::Http
How to Make a Custom Environment in Rails a Default Environment
Calling a Class Method Within a Class
Change The Ruby Process Name in Top
Ruby Method, Proc, and Block Confusion
Rails Routing: Giving Default Values for Path Helpers
Have a Parent Class's Method Access the Subclass's Constants