Are Strings in Ruby Mutable

Are strings in Ruby mutable?

Yes, strings in Ruby, unlike in Python, are mutable.

s += "hello" is not appending "hello" to s - an entirely new string object gets created. To append to a string 'in place', use <<, like in:

s = "hello"
s << " world"
s # hello world

How do I create a mutable string in Ruby?

You add a + before the string like:

string = +'hello'

string << ' world'

puts(string)

hello world

Are strings mutable in Ruby?

Yes, << mutates the same object, and + creates a new one. Demonstration:

irb(main):011:0> str = "hello"
=> "hello"
irb(main):012:0> str.object_id
=> 22269036
irb(main):013:0> str << " world"
=> "hello world"
irb(main):014:0> str.object_id
=> 22269036
irb(main):015:0> str = str + " world"
=> "hello world world"
irb(main):016:0> str.object_id
=> 21462360
irb(main):017:0>

Why did Matz choose to make Strings mutable by default in Ruby?

This is in line with Ruby's design, as you note. Immutable strings are more efficient than mutable strings - less copying, as strings are re-used - but make work harder for the programmer. It is intuitive to see strings as mutable - you can concatenate them together. To deal with this, Java silently translates concatenation (via +) of two strings into the use of a StringBuffer object, and I'm sure there are other such hacks. Ruby chooses instead to make strings mutable by default at the expense of performance.

Ruby also has a number of destructive methods such as String#upcase! that rely on strings being mutable.

Another possible reason is that Ruby is inspired by Perl, and Perl happens to use mutable strings.

Ruby has Symbols and frozen Strings, both are immutable. As an added bonus, symbols are guaranteed to be unique per possible string value.

How can I describe mutable strings when strings are immutable by default?

I had missed it. The recommended way is to use the +@ method string literal.

(+"foo").frozen? # => false
(-"foo").frozen? # => true
"foo".frozen? # => true

Ruby: How does concatenation effect the String in memory?


How come concatenating to a string does not change its object_id?

Because it's still the same string it was before.

My understand was that Strings are immutable

No, they are not immutable. In Ruby, strings are mutable.

because Strings are essentally Arrays of Characters,

They are not. In Ruby, strings are mostly a factory for iterators (each_line, each_char, each_codepoint, each_byte). It implements a subset of the Array protocol, but that does not mean that it is an array.

and Arrays cannot be changed in memory since they are contiguous.

Wrong, arrays are mutable in Ruby.

Yet, as demonstrated below: Instantiating a String than adding characters does not change it's object_id. How does concatenation effect the String in memory?

The Ruby Language Specification does not prescribe any particular in-memory representation of strings. Any representation is fine, as long as it supports the semantics specified in the Ruby Language Specification.

Here's a couple of examples from some Ruby implementations:

  • Rubinius:
    • kernel/common/string.rb
    • kernel/bootstrap/string.rb
    • vm/builtin/string.cpp
  • Topaz:
    • topaz/objects/stringobject.py
  • Cardinal:
    • src/classes/String.pir
  • IronRuby:
    • Ruby/Builtins/MutableString.cs
  • JRuby:
    • core/src/main/java/org/jruby/RubyString.java

What's the difference between String.new and a string literal in Ruby?

== checks for equal content.

equal? checks for equal identity.

a = "hello"
b = "hello"

a == b # => true
a.equal?(b) # => false

In Ruby string literals are not immutable and thus creating a string and using a literal are indeed the same. In both cases Ruby creates a new string instance each time the expressions in evaluated.

Both of these are thus the same

10.times { String.new }
# is the same as
10.times { "" }

Let's verify this

10.times { puts "".object_id }

Prints 10 different numbers

70227981403600
70227981403520
70227981403460
...

Why? Strings are by default mutable and thus Ruby has to create a copy each time a string literal is reached in the source code. Even if those literals are usually rarely modified in practice.

Thus a Ruby program typically creates an excessive amount short-lived string objects, which puts a huge strain on garbage collection. It is not unusual that a Rails app creates 500,000 short-lived strings just to serve one request and this is one of the main performance bottlenecks of scaling Rails to millions or even 100 millions of users.

To address that Ruby 2.3 introduced frozen string literals, where all string literals default to being immutable. Since this is not backwards compatible it is opt-in with a pragma

# frozen_string_literal: true

Let's verify this too

# frozen_string_literal: true
10.times { puts "".object_id }

Prints the same number 10 times

69898321746880
69898321746880
69898321746880
...

Fun fact, setting a key in a hash also creates a copy of a string

str = "key"
hash = {}
hash[str] = true
puts str.object_id
puts hash.keys.first.object_id

Prints two different numbers

70243164028580
70243132639660

Ruby immutability of strings and symbols (What if we store them in variables)

Ruby variables are references to objects, so when you send a method to a variable, the object it references is the context in which it is evaluated. It's probably more clear to look at the first image in the top rated answer (below the accepted answer) here.

So, to figure out what's going on, let's dig into the documentation a bit and see what happens with your code snippet.

Ruby's Symbol class documentation:
https://ruby-doc.org/core-2.5.0/Symbol.html

Symbol objects represent names and some strings inside the Ruby interpreter. They are generated using the :name and :"string" literals syntax, and by the various to_sym methods. The same Symbol object will be created for a given name or string for the duration of a program's execution, regardless of the context or meaning of that name. Thus if Fred is a constant in one context, a method in another, and a class in a third, the Symbol :Fred will be the same object in all three contexts.

Ruby's Object#object_id documentation:
https://ruby-doc.org/core-2.5.1/Object.html#method-i-object_id

Returns an integer identifier for obj.

The same number will be returned on all calls to object_id for a given object, and no two active objects will share an id.

So here's what's happening step-by-step:

# We create two variables that refer to the same object, :foo
var1 = :foo
var2 = :foo

var1.object_id = 2598748
var2.object_id = 2598748
# Evaluated as:
# var1.object_id => :foo.object_id => 2598748
# var2.object_id => :foo.object_id => 2598748

As discussed in the first link above, Ruby is pass-by-value, but every value is an Object, so your variables both evaluate to the same value. Since every symbol made of the same string ("foo" in this case) refers to the same object, and Object#object_id always returns the same id for the same object, you get the same id back.



Related Topics



Leave a reply



Submit