What Are Tainted Objects, and When Should We Untaint Them

What are tainted objects, and when should we untaint them?

What is Tainted?

User input is tainted, by definition. For example:

string = gets
string.tainted?
# => true

You can also manually taint an object.

string = 'Not yet tainted.'
string.tainted?
# => false

(string = 'Explicitly taint me!').taint
string.tainted?
# => true

Why Untaint an Object?

Generally, you would untaint an object only after you validate and/or sanitize it. Untainting an object marks it as "safe" for certain operations that you wouldn't want to run on untrusted strings or other objects, or when your safe level requires an untainted object to perform the desired operation.

Untainting an Object

The easiest way to untaint an object is to call the Object#untaint method on it. For example, if your string variable holds a tainted object, then:

(string = "Let's taint this string!").taint
string.untaint.tainted?
# => false

More About Tainted Objects

You can find out more about tainted objects from the Locking Ruby in the Safe chapter of Programming Ruby.

What are the Ruby's Object#taint and Object#trust methods?

taint and trust are part of Ruby's security model. In Ruby, each object has a few flags that it carries around with it, two of which are the Trusted flag and the Tainted flag. How these flags are acted on depends on something called the safe level. The safe level is stored in $SAFE.

Each thread and fiber in a program can have its own safe level. Safe levels range from 0 through 4, with 0 enforcing no security and 4 enforcing so much it should only be used when you're evaling code. You can't assign a lower value to $SAFE than it already has. Also, on UNIX systems where a Ruby script runs as setuid, Ruby automatically sets the safe level to 1.

Tainting

When a object has it's tainted flag set, that means, roughly, that the object came from an unreliable source and therefore can't be used in sensitive operations. When the safe level is 0, the taint flag is ignored (but still set, you can pay attention to it if you want). There are a few methods related to tainting:

taint -- Make an object tainted. You can taint an object on all levels with the exception of safe level 4.
tainted? -- Check if an object is tainted.
untaint -- Remove tainting from an object. This can only be used in safe levels 0, 1, and 2.

Here's an example from the pragprog pickaxe (source) that shows tainting:

# internal data
# =============
x1 = "a string"
x1.tainted?     → false
x2 = x1[2, 4]
x2.tainted?     → false
x1 =~ /([a-z])/ → 0
$1.tainted?     → false
# external data
# =============
y1 = ENV["HOME"]
y1.tainted?      → true
y2 = y1[2, 4]
y2.tainted?      → true
y1 =~ /([a-z])/  → 1
$1.tainted?      → true

To summarize, you can't use dangerous methods on tainted data. So if you do this in safe level 3, you'd get an error:

eval(gets)

Trust

Trust is a lot simpler. Trust has to do with whether the object came from a trusted or untrusted source -- basically, whether it came from anything less than safe level 4, or safe level 4. I'm not sure exactly what effect Ruby's trust has, but take a look here:
http://www.ruby-forum.com/topic/1887006 .

Here are some more resources:
http://phrogz.net/ProgrammingRuby/taint.html -- Some great stuff on safe levels, but I think it's from 1.8 -- there is an updated version for 1.9, just only in the printed version of the book.

http://www.ruby-forum.com/topic/79295 -- On whether safe is safe enough.

Couldn't understand the difference between Object#taint and Object#trust in Ruby

Note: As @themarketka pointed out, as of Ruby 2.2.2, trust has been deprecated and made equivalent to tainting.

The difference is rather odd, and not particularly well documented.

NOTE: At $SAFE level 0, none of these markers do anything at all.

Tainting

The concept of tainting is whether an object comes from a trusted source. A string inputed from standard input is tainted, but a string that's just assigned is not. At higher safe levels, various potentially dangerous operations on tainted data are prohibited (throw SecurityException). Operations like eval, system, etc. Additionally, tainting can be inherited from so-called "child" objects:

2.0.0p0 :001 > s = "Hi!"
 => "Hi!"
2.0.0p0 :002 > s.taint
 => "Hi!"
2.0.0p0 :003 > (s + "World").tainted?
 => true

So, if I do something like system("rm -rf #{gets.chomp}") (DO NOT EXECUTE) at a higher safe level, Ruby will complain as the combination of my untainted string ("rm -rf #{...}") and a tainted string (gets.chomp) creates a tainted string.

Trust

Trust is, unlike tainting, applicable to code, and objects. All running code is either trusted, or untrusted, and all objects are either trusted or untrusted. Untrusted code can only modify untrusted objects. Untrusted code can only create untrusted objects. Code and objects created at safe levels 0-2 are trusted, but anything running or created at $SAFE level 3 or 4 is untrusted, and can only modify untrusted objects.

The Difference

The difference between tainting and trusting is subtle. Tainting is all about what operations you can conduct on data, but trust is about what data you can access. They protect different parts of the system. Additionally, while tainting always exists, and tainted objects can exist at any safe level, trust only comes into play at the so-called "sandboxing" $SAFE levels 3 and 4 which are almost exclusively used for sandboxing external code.

Data-tainting in JavaScript

Data Tainting (or Taint Checking) is a language feature wherein user-input data is flagged as tainted, a flag that propagates to all data derived from this input. As a result, code can implement runtime assertions to ensure security critical code is not being called using tainted data (ie prevent SQLi, XSS type attacks).

Whilst Netscape implemented it in the browser in v3 and v4, support for it sadly never materialized elsewhere, so @trejder is absolutely right that it should be avoided in JavaScript.

What Are Tainted Objects, and When Should We Untaint Them