What Are the Ruby's Object#Taint and Object#Trust Methods

What are the Ruby's Object#taint and Object#trust methods?

taint and trust are part of Ruby's security model. In Ruby, each object has a few flags that it carries around with it, two of which are the Trusted flag and the Tainted flag. How these flags are acted on depends on something called the safe level. The safe level is stored in $SAFE.

Each thread and fiber in a program can have its own safe level. Safe levels range from 0 through 4, with 0 enforcing no security and 4 enforcing so much it should only be used when you're evaling code. You can't assign a lower value to $SAFE than it already has. Also, on UNIX systems where a Ruby script runs as setuid, Ruby automatically sets the safe level to 1.

Tainting

When a object has it's tainted flag set, that means, roughly, that the object came from an unreliable source and therefore can't be used in sensitive operations. When the safe level is 0, the taint flag is ignored (but still set, you can pay attention to it if you want). There are a few methods related to tainting:

taint -- Make an object tainted. You can taint an object on all levels with the exception of safe level 4.
tainted? -- Check if an object is tainted.
untaint -- Remove tainting from an object. This can only be used in safe levels 0, 1, and 2.

Here's an example from the pragprog pickaxe (source) that shows tainting:

# internal data
# =============
x1 = "a string"
x1.tainted?     → false
x2 = x1[2, 4]
x2.tainted?     → false
x1 =~ /([a-z])/ → 0
$1.tainted?     → false
# external data
# =============
y1 = ENV["HOME"]
y1.tainted?      → true
y2 = y1[2, 4]
y2.tainted?      → true
y1 =~ /([a-z])/  → 1
$1.tainted?      → true

To summarize, you can't use dangerous methods on tainted data. So if you do this in safe level 3, you'd get an error:

eval(gets)

Trust

Trust is a lot simpler. Trust has to do with whether the object came from a trusted or untrusted source -- basically, whether it came from anything less than safe level 4, or safe level 4. I'm not sure exactly what effect Ruby's trust has, but take a look here:
http://www.ruby-forum.com/topic/1887006 .

Here are some more resources:
http://phrogz.net/ProgrammingRuby/taint.html -- Some great stuff on safe levels, but I think it's from 1.8 -- there is an updated version for 1.9, just only in the printed version of the book.

http://www.ruby-forum.com/topic/79295 -- On whether safe is safe enough.

Couldn't understand the difference between Object#taint and Object#trust in Ruby

Note: As @themarketka pointed out, as of Ruby 2.2.2, trust has been deprecated and made equivalent to tainting.

The difference is rather odd, and not particularly well documented.

NOTE: At $SAFE level 0, none of these markers do anything at all.

Tainting

The concept of tainting is whether an object comes from a trusted source. A string inputed from standard input is tainted, but a string that's just assigned is not. At higher safe levels, various potentially dangerous operations on tainted data are prohibited (throw SecurityException). Operations like eval, system, etc. Additionally, tainting can be inherited from so-called "child" objects:

2.0.0p0 :001 > s = "Hi!"
 => "Hi!"
2.0.0p0 :002 > s.taint
 => "Hi!"
2.0.0p0 :003 > (s + "World").tainted?
 => true

So, if I do something like system("rm -rf #{gets.chomp}") (DO NOT EXECUTE) at a higher safe level, Ruby will complain as the combination of my untainted string ("rm -rf #{...}") and a tainted string (gets.chomp) creates a tainted string.

Trust

Trust is, unlike tainting, applicable to code, and objects. All running code is either trusted, or untrusted, and all objects are either trusted or untrusted. Untrusted code can only modify untrusted objects. Untrusted code can only create untrusted objects. Code and objects created at safe levels 0-2 are trusted, but anything running or created at $SAFE level 3 or 4 is untrusted, and can only modify untrusted objects.

The Difference

The difference between tainting and trusting is subtle. Tainting is all about what operations you can conduct on data, but trust is about what data you can access. They protect different parts of the system. Additionally, while tainting always exists, and tainted objects can exist at any safe level, trust only comes into play at the so-called "sandboxing" $SAFE levels 3 and 4 which are almost exclusively used for sandboxing external code.

|| and && aren't methods on Object -- what are they?

Both | and || are operators. || is part of the language while | is implemented as a method by some classes (Array, FalseClass, Integer, NilClass and TrueClass) .

In programming languages, | is used in general as the bitwise OR operator. It combines the bits of its integer operands and produces a new integer value. When used with non-integer operands, some languages convert them to integer, others prohibit such usage.

|| is the logical OR operator. It combines two boolean values (true or false) and produces another boolean value. When its operands are not boolean values, they are converted to boolean by some languages. Ruby (and JavaScript and other languages) evaluate its first operand as boolean and the value of the expression is the value of its first operand if its boolean value is true or the value of its second operand if the logical value of its first one is false. The type of the resulting value is its original type, it is not converted to boolean.

Each language uses its own rules to decide what non-boolean values are converted to false (usually the number 0, the empty string '' and null or undefined); all the other values are converted to true. The only "false" values in Ruby are false (boolean) and nil (non-boolean); all the other values (including 0) are "true".

Because true || anything is true and false && anything is false, many programming languages including Ruby implement short-circuit evaluation for logical expressions.

Using short-circuit evaluation, a logical expression is evaluated from left to right, one operand at a time until the value of the expression can be computed without the need to compute the other operands. In the examples above, the value of anything doesn't change the value of the entire expression. Using short-circuit evaluation, the value of anything is not computed at all because it does not influence the value of the entire expression. Being anything a method call that takes considerable time to execute, the short-circuit evaluation avoids calling it and saves execution time.

As others already mentioned in comments to the question, implementing || as a method of some class is not possible. The value of its second operand must be evaluated in order to be passed as argument to the method and this breaks the short-circuiting behaviour.

The usual representation of the logical values in programming languages uses only one bit (and I guess Ruby does the same.) Results of | and || are the same for operands stored on one bit.

Ruby uses the | symbol to implement different flavors of the OR operation as follows:

bitwise OR for integers;
non-short-circuit logical OR for booleans and nil;
union for arrays.

An expression like:

x = false | a | b | c

ensures that all a, b and c expressions are evaluated (no short-circuit) and the value of x is the logical OR of the logical values of a, b and c.

If a, b and c are method calls, to achieve the same result using the logical OR operator (||) the code needs to look like this:

aa = a
bb = b
cc = c
x = aa || bb || cc

This way each method is called no matter what values are returned by the methods called before it.

For TrueClass, FalseClass and NilClass, the | operator is useful when short-circuit evaluation is not desired.

Also, for Array (an array is just an ordered set), the | operator implements union, an operation that is the semantically equivalent of logical OR for sets.

What's the purpose of tainting Ruby objects?

It used to be a pretty standard practice when writing CGIs in Perl. There is even a FAQ on it. The basic idea was that the run time could guarantee that you did not implicitly trust a tainted value.

ruby, purpose of string replace method

You're correct that this has something to do with pointers. s = "world" would construct a new object and assign s a pointer to that object. Whereas s.replace "world" modifies the string object that s already points to.

One case where replace would make a difference is when the variable isn't directly accessible:

class Foo
  attr_reader :x
  def initialize
    @x = ""
  end
end

foo = Foo.new

foo.x = "hello"       # this won't work. we have no way to assign a new pointer to @x
foo.x.replace "hello" # but this will

replace has nothing in particular to do with taintedness, the documentation is just stating that it handles tainted strings properly. There are better answers for explaining that topic: What are the Ruby's Object#taint and Object#trust methods?

are Ruby's logical operators methods, like binary operators are?

These are the operators that cannot be (re)defined:

&&, || (AND, OR)
.., ... (range)
?: (ternary)
rescue
= (and **=, &&=, &=, *=, +=. -=, <<=, >>= , ||=, |=, ^=)
defined?
not
and, or
if, unless, while, until

The others, like (incomplete list) !, ~, +, -, **, *, /, %, >>, ==, != are implemented as methods and can be redefined.

Ruby: listing Fixnum methods does not include arithmetic operators

Fixnum is an instance of Class. Class doesn't define a * instance method (what would that even do), nor do Class's ancestors (Module, Object, Kernel, BasicObject).

Now, 1 on the other hand is an instance of Fixnum, and since Fixnum defines a * instance method, that instance method shows up when you ask 1 about its methods:

1.methods.sort
# => [:!, :!=, :!~, :%, :&, :*, :**, :+, :+@, :-, :-@, :/, :<, :<<, :<=, … ]

You can see that Fixnum defines an instance method named *:

Fixnum.instance_methods.sort
# => [:!, :!=, :!~, :%, :&, :*, :**, :+, :+@, :-, :-@, :/, :<, :<<, :<=, … ]

Confuse about rule of Dangerous Method Bang

The bang is always used to mark the "more surprising" (I don't particularly like the definition that uses "dangerous") of a pair of methods that do the same (or almost same) thing in a slightly different manner.

In both of your cases, there is no second, less surprising, method, so in both your cases, you don't need and should not use a bang.

There are plenty of examples of methods that, e.g., mutate the receiver and don't have a bang:

Array
- Array#append
- Array#clear
- Array#delete
- Array#delete_at
- Array#delete_if
- Array#fill
- Array#insert
- Array#keep_if
- Array#pop
- Array#prepend
- Array#push
- Array#replace
- Array#shift
- Array#unshift
Hash
- Hash#clear
- Hash#delete
- Hash#delete_if
- Hash#keep_if
- Hash#replace
- Hash#rehash
- Hash#shift
- Hash#store
- Hash#update
IO
- Many methods in IO will in some way make changes to the I/O stream, such as advancing the file pointer (anything with read, write, print, or put in it, for example) or writing something to it (print, puts, anything with write in it).
Module
- Many methods in Module will in some way change the module; in fact, they would be pretty useless if they didn't! E.g. Module#alias_method, Module#define_method, Module#attr, Module#attr_reader, Module#attr_writer, Module#attr_accessor add methods to the module, and Module#prepend and Module#include modify the ancestry chain.
Random
- Methods that return random values will change the internal state of the pseudo-random number generator: Random#bytes and Random#rand.
String
- String#clear
- String#concat
- String#force_encoding
- String#insert
- String#prepend
- String#replace
- String#setbyte
Object
- Object#define_singleton_method
- Object#extend
- Object#freeze
- Object#instance_variable_set
- Object#remove_instance_variable
- Object#taint
- Object#trust
- Object#untaint
- Object#untrust

These are only some of the methods that I can think of off the top of my head that mutate their receiver. There are also other "dangerous" / "surprising" methods that are dangerous or surprising in a different way than mutating their receiver which don't have a bang: Module#private, Module#protected, and Module#public modify the way other things work which are evaluated in the same scope, e.g. method definitions. String#intern and String#to_sym mutate the global symbol table. Kernel#load, Kernel#require, and Kernel#require_relative mutate $LOADED_FEATURES. Many Regexp methods modify the thread-local global pseudo-variables $1, $2, $3, $4, $5, $6, $7, $8, and $9.

Obviously, the whole point of writer methods such as attribute writers (e.g. foo=) and indexing writers ([]=) is to mutate the receiver. There are also plenty of operator methods that mutate the receiver (e.g. Array#<<). However, in all of these cases, it doesn't make sense to add a bang to the name.

There is also one operator method whose name is the bang, namely BasicObject#!, but applying the rule about bang methods to this is obviously silly.

The! takeaway! is! that! bang! methods! should! only! be! used! for! marking! one! method! of! a! pair! if! you! use! bang! to! mark! every! potentially! unsafe! method! Ruby! would! get! very! annoying! to! read!

As a closing remark, I want to address a tiny part of your question (bold emphasis mine):

The names of potentially dangerous methods (i.e. methods that modify self or the arguments, exit! (doesn’t run the finalizers like exit does), etc) should end with an exclamation mark if there exists a safe version of that dangerous method.

Methods should never mutate their arguments. Period. That is so surprising and dangerous, no amount of exclamation marks are warning enough.

What Are the Ruby's Object#Taint and Object#Trust Methods