Why Does Ruby Release Memory Only Sometimes

Finding the cause of a memory leak in Ruby

It looks like you are entering The Lost World here. I don’t think the problem is with c-bindings in racc either.

Ruby memory management is both elegant and cumbersome. It stores objects (named RVALUEs) in so-called heaps of size of approx 16KB. On a low level, RVALUE is a c-struct, containing a union of different standard ruby object representations.

So, heaps store RVALUE objects, which size is not more than 40 bytes. For such objects as String, Array, Hash etc. this means that small objects can fit in the heap, but as soon as they reach a threshold, an extra memory outside of the Ruby heaps will be allocated.

This extra memory is flexible; is will be freed as soon as an object became GC’ed. That’s why your testcase with big_string shows the memory up-down behaviour:

def report
puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`
.strip.split.map(&:to_i)[1].to_s + 'KB'
end
report
big_var = " " * 10000000
report
big_var = nil
report
ObjectSpace.garbage_collect
sleep 1
report
# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 11788KB

But the heaps (see GC[:heap_length]) themselves are not released back to OS, once acquired. Look, I’ll make a humdrum change to your testcase:

- big_var = " " * 10000000
+ big_var = 1_000_000.times.map(&:to_s)

And, voilá:

# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 57448KB

The memory is not released back to OS anymore, because each element of the array I introduced suits the RVALUE size and is stored in the ruby heap.

If you’ll examine the output of GC.stat after the GC was run, you’ll find that GC[:heap_used] value is decreased as expected. Ruby now has a lot of empty heaps, ready.

The summing up: I don’t think, the c code leaks. I think the problem is within base64 representation of huge image in your css. I have no clue, what’s happening inside parser, but it looks like the huge string forces the ruby heap count to increase.

Hope it helps.

Ruby process memory structure

It is likely that your application is allocating objects that are then groomed by the Garbage Collector. You can check this with a call to GC.stat

Ruby does not release memory back to the operating system in any meaningful way. (if you're running MRI) Consequently, if you allocate 18GB of memory and 15GB gets garbage collected, you'll end up with your ~3GB of heap data.

The Ruby MRI GC is not a compacting garbage collector, so as long as there is any data in the heap the heap will not be released. This leads to memory fragmentation and the values that you see in your app.

Does Ruby's Regexp interpolation leak memory?

It was a memory leak!

https://bugs.ruby-lang.org/issues/15916

Should be fixed in one of the next releases of Ruby (2.6.4 or 2.6.5?)

How to deal with Ruby 2.1.2 memory leaks?

From your GC logs it appears the issue is not a ruby object reference leak as the heap_live_slot value is not increasing significantly. That would suggest the problem is one of:

  1. Data being stored outside the heap (Strings, Arrays etc)
  2. A leak in a gem that uses native code
  3. A leak in the Ruby interpreter itself (least likely)

It's interesting to note that the problem exhibits on both OSX and Heroku (Ubuntu Linux).

Object data and the "heap"

Ruby 2.1 garbage collection uses the reported "heap" only for Objects that contain a tiny amount of data. When the data contained in an Object goes over a certain limit, the data is moved and allocated to an area outside of the heap. You can get the overall size of each data type with ObjectSpace:

require 'objspace'
ObjectSpace.count_objects_size({})

Collecting this along with your GC stats might indicate where memory is being allocated outside the heap. If you find a particular type, say :T_ARRAY increasing a lot more than the others you might need to look for an array you are forever appending to.

You can use pry-byebug to drop into a console to troll around specific objects, or even looking at all objects from the root:

ObjectSpace.memsize_of(some_object)
ObjectSpace.reachable_objects_from_root

There's a bit more detail on one of the ruby developers blog and also in this SO answer. I like their JRuby/VisualVM profiling idea.

Testing native gems

Use bundle to install your gems into a local path:

bundle install --path=.gems/

Then you can find those that include native code:

find .gems/ -name "*.c"

Which gives you: (in my order of suspiciousness)

  • digest-stringbuffer-0.0.2
  • digest-murmurhash-0.3.0
  • nokogiri-1.6.3.1
  • json-1.8.1

OSX has a useful dev tool called leaks that can tell you if it finds unreferenced memory in a running process. Not very useful for identifying where the memory comes from in Ruby but will help to identify when it is occurring.

First to be tested is digest-stringbuffer. Grab the example from the Readme and add in some GC logging with gc_tracer

require "digest/stringbuffer"
require "gc_tracer"
GC::Tracer.start_logging "gclog.txt"
module Digest
class Prime31 < StringBuffer
def initialize
@prime = 31
end

def finish
result = 0
buffer.unpack("C*").each do |c|
result += (c * @prime)
end
[result & 0xffffffff].pack("N")
end
end
end

And make it run lots

while true do
a=[]
500.times do |i|
a.push Digest::Prime31.hexdigest( "abc" * (1000 + i) )
end
sleep 1
end

Run the example:

bundle exec ruby ./stringbuffertest.rb &
pid=$!

Monitor the resident and virtual memory sizes of the ruby process, and the count of leaks identified:

while true; do
ps=$(ps -o rss,vsz -p $pid | tail +2)
leaks=$(leaks $pid | grep -c Leak)
echo "$(date) m[$ps] l[$leaks]"
sleep 15
done

And it looks like we've found something already:

Tue 26 Aug 2014 18:22:36 BST m[104776  2538288] l[8229]
Tue 26 Aug 2014 18:22:51 BST m[110524 2547504] l[13657]
Tue 26 Aug 2014 18:23:07 BST m[113716 2547504] l[19656]
Tue 26 Aug 2014 18:23:22 BST m[113924 2547504] l[25454]
Tue 26 Aug 2014 18:23:38 BST m[113988 2547504] l[30722]

Resident memory is increasing and the leaks tool is finding more and more unreferenced memory. Confirm the GC heap size, and object count looks stable still

tail -f gclog.txt | awk '{ print $1, $3, $4, $7, $13 }
1581853040832 468 183 39171 3247996
1581859846164 468 183 33190 3247996
1584677954974 469 183 39088 3254580
1584678531598 469 183 39088 3254580
1584687986226 469 183 33824 3254580
1587512759786 470 183 39643 3261058
1587513449256 470 183 39643 3261058
1587521726010 470 183 34470 3261058

Then report the issue.

It appears to my very untrained C eye that they allocate both a pointer and a buffer but only clean up the buffer.

Looking at digest-murmurhash, it seems to only provide functions that rely on StringBuffer so the leak might be fine once stringbuffer is fixed.

When they have patched it, test again and move onto the next gem. It's probably best to use snippets of code from your implementation for each gem test rather than a generic example.

Testing MRI

First step would be to prove the issue on multiple machines under the same MRI to rule out anything local, which you've already done.

Then try the same Ruby version on a different OS, which you've done too.

Try the code on JRuby or Rubinius if possible. Does the same issue occur?

Try the same code on 2.0 or 1.9 if possible, see if the same problem exists.

Try the head development version from github and see if that makes any difference.

If nothing becomes apparent, submit a bug to Ruby detailing the issue and all the things you have eliminated. Wait for a dev to help out and provide whatever they need. They will most likely want to reproduce the issue so if you can get the most concise/minimal example of the issue set up. Doing that will often help you identify what the issue is anyway.

How to avoid memory leaks in Ruby

The quick-fix to memory problems is often to spike in calls to GC.start, this force-initiates the garbage collector. Sometimes Ruby gets very lazy about cleaning up garbage and it can accumulate to a dangerous degree.

It's sometimes the case you inadvertently create structures that are difficult to clean-up, that is wildly inter-linked things that are, when analyzed more deeply, not actually retained. This makes life harder for the garbage collector. For example, a deep Hash of Hash structures with lots and lots of strings can take a lot more work to liberate than a simple Array.

If you're having memory problems you'll want to pay attention to how much garbage you're producing when doing operations. Look for ways of collapsing things to remove intermediate products. For instance, the classic case is this:

s = ''

10.times do |i|
s += i.to_s
end

This creates a string of the form 01234... as a final product, but it also creates 10 other strings with the intermediate products. That's 11x as much garbage as this solution:

s = ''
10.times do |i|
s << i.to_s
end

That creates a single string and appends to it repeatedly. Technically the to_s operation on a number also creates garbage, so that's another thing to keep in mind as well, that conversions aren't free. This is why you see symbols like :name used in Ruby quite frequently, you pay the cost of those once and once only. Every string "name" could be an independent object.

Ruby Memory Management

Don't do this:

def method(x)
x.split( doesn't matter what the args are )
end

or this:

def method(x)
x.gsub( doesn't matter what the args are )
end

Both will permanently leak memory in ruby 1.8.5 and 1.8.6. (not sure about 1.8.7 as I haven't tried it, but I really hope it's fixed.) The workaround is stupid and involves creating a local variable. You don't have to use the local, just create one...

Things like this are why I have lots of love for the ruby language, but no respect for MRI

What are your strategies to keep the memory usage low?

  1. Choose date structures that are efficient representations, scale well, and do what you need.
  2. Use algorithms that work using efficient data structures rather than bloated, but easier ones.
  3. Look else where. Ruby has a C bridge and its much easier to be memory conscious in C than in Ruby.


Related Topics



Leave a reply



Submit