Difference Between . and #

Difference between hash() and id()

There are three concepts to grasp when trying to understand id, hash and the == and is operators: identity, value and hash value. Not all objects have all three.

  1. All objects have an identity, though even this can be a little slippery in some cases. The id function returns a number corresponding to an object's identity (in cpython, it returns the memory address of the object, but other interpreters may return something else). If two objects (that exist at the same time) have the same identity, they're actually two references to the same object. The is operator compares items by identity, a is b is equivalent to id(a) == id(b).

    Identity can get a little confusing when you deal with objects that are cached somewhere in their implementation. For instance, the objects for small integers and strings in cpython are not remade each time they're used. Instead, existing objects are returned any time they're needed. You should not rely on this in your code though, because it's an implementation detail of cpython (other interpreters may do it differently or not at all).

  2. All objects also have a value, though this is a bit more complicated. Some objects do not have a meaningful value other than their identity (so value an identity may be synonymous, in some cases). Value can be defined as what the == operator compares, so any time a == b, you can say that a and b have the same value. Container objects (like lists) have a value that is defined by their contents, while some other kinds of objects will have values based on their attributes. Objects of different types can sometimes have the same values, as with numbers: 0 == 0.0 == 0j == decimal.Decimal("0") == fractions.Fraction(0) == False (yep, bools are numbers in Python, for historic reasons).

    If a class doesn't define an __eq__ method (to implement the == operator), it will inherit the default version from object and its instances will be compared solely by their identities. This is appropriate when otherwise identical instances may have important semantic differences. For instance, two different sockets connected to the same port of the same host need to be treated differently if one is fetching an HTML webpage and the other is getting an image linked from that page, so they don't have the same value.

  3. In addition to a value, some objects have a hash value, which means they can be used as dictionary keys (and stored in sets). The function hash(a) returns the object a's hash value, a number based on the object's value. The hash of an object must remain the same for the lifetime of the object, so it only makes sense for an object to be hashable if its value is immutable (either because it's based on the object's identity, or because it's based on contents of the object that are themselves immutable).

    Multiple different objects may have the same hash value, though well designed hash functions will avoid this as much as possible. Storing objects with the same hash in a dictionary is much less efficient than storing objects with distinct hashes (each hash collision requires more work). Objects are hashable by default (since their default value is their identity, which is immutable). If you write an __eq__ method in a custom class, Python will disable this default hash implementation, since your __eq__ function will define a new meaning of value for its instances. You'll need to write a __hash__ method as well, if you want your class to still be hashable. If you inherit from a hashable class but don't want to be hashable yourself, you can set __hash__ = None in the class body.

What's the difference between a hash with = and :

{ some_arbitrary_expression(some_argument, arg2) => another_arbitrary_expression(arg) }

Is the general syntax for Hash literals. Any object that responds to hash and eql? can be used as a key in a Hash.

{ some_valid_symbol: arbitrary_expression(arg1, arg2) }

Is the "new-style" Hash literal syntax for Symbol keys.

I always thought => is the old syntax for : but there seems to be really a difference.

I'm not sure where you were taught this, but I sure would like to know so I can warn others about this source. It has never been true, there are no current plans to make it true, and it probably never will be true. To my knowledge, there is nothing in the official documentation, in the RubySpec, or in any of the well-known books (The Ruby Programming Language, Programming Ruby) that says it either.

Can you please explain the difference and tell me how I can convert such an hash: { 'username': 'John' } to this { 'username' = > 'John'} presentation?

You can change the String representation of a Hash by monkey-patching Hash#to_s or Hash#inspect, but it is unclear what that would buy you.

If a method you are using expects a Hash whose keys are Strings and you pass it a Hash whose keys are Symbols, then changing the String representation of the Hash is not going to help you. You need to fix the source and make sure that your keys are Strings.

hashing vs hash function, don't know the difference

A hash function takes some input data (typically a bunch of binary bytes, but could be anything - whatever you make it to) and calculates a hash value, which is typically an integer number (but, again, can be anything). The process of doing this is called hashing.

The hash value is always the same size, no matter what the input looks like. Well, I suppose you cold make a hash function that has a variable-size output, but I haven't seen one in the wild yet. It wouldn't be very practical. Thus, by its very nature, hashing is usually a one-way calculation. You can't normally get the original data back from the hash value, because there are many more possible input data combinations than there are possible hash values.

The main advantages are:

  • The hash value is always the same size
  • The same input will always generate the same output.
  • If it's a good hash function, different inputs will usually generate different outputs, but it's still possible that two different inputs generate the same output (this is called a hash collision).

If you have a cryptographical hash function you also get one more advantage:

  • From having only the hash value, it's impossible (unfeasible) to come up with input data that would hash to this value. Never mind that it's not the original input data, any kind of input data that would hash to the given output value is impossible to find in a useful timeframe.

The results of a hash function can be used in various ways. As mentioned in other answers, hash tables are one common use-case. Verifying data integrity is another case - for example, you download a file, then hash it, then check the hash value against the value that was specified in the webpage where you downloaded the file from. If they don't match, the file was not downloaded correctly. If you combine hash values with public-key cryptography you can get digital signatures. And I'm sure there are other uses to which the principle can be put.

what is difference between crypto hash and hashtable hashes in python?

Cryptographic Hash functions are different from Hashtable Hash functions. One main difference is that cryptographic hash functions are designed not to have hash collision weaknesses. They are designed to be more secure and irreversible in most cases. But Hashtable hash functions like hash are faster and are designed to use to quickly access items in memory or comparing items or etc.

Suppose two differenct Scenarios. If you want to store passwords in a database you must use something like pbkdf2 so it is more secure and so slower to generate in order to prevent brute forces. But in another case you just want to have a set of items and check if an item exists in that set. You can simply store a 32-bit or 64-bit hash of items(e.g. classes) and compare hashes quickly instead of classes.

For example for string "hello", it is much faster to compute and store 1267296259 as it is a 32-bit integer and more secure and slower to compute and store aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d.

P.S. A good example is here.

Difference between an object and a hash?

There just isn't any. All three of those are literally equal.

What's the difference between Hash.new(0) and {}

Hash.new(0) sets default value for any key to 0, while {} sets nil

h1 = Hash.new(0)
h1.default # => 0
h1[:a] += 1 # => 1
h2 = {}
h2.default # => nil
h2[:a] += 1 # => NoMethodError: undefined method `+' for nil:NilClass

Fundamental difference between Hashing and Encryption algorithms

Well, you could look it up in Wikipedia... But since you want an explanation, I'll do my best here:

Hash Functions

They provide a mapping between an arbitrary length input, and a (usually) fixed length (or smaller length) output. It can be anything from a simple crc32, to a full blown cryptographic hash function such as MD5 or SHA1/2/256/512. The point is that there's a one-way mapping going on. It's always a many:1 mapping (meaning there will always be collisions) since every function produces a smaller output than it's capable of inputting (If you feed every possible 1mb file into MD5, you'll get a ton of collisions).

The reason they are hard (or impossible in practicality) to reverse is because of how they work internally. Most cryptographic hash functions iterate over the input set many times to produce the output. So if we look at each fixed length chunk of input (which is algorithm dependent), the hash function will call that the current state. It will then iterate over the state and change it to a new one and use that as feedback into itself (MD5 does this 64 times for each 512bit chunk of data). It then somehow combines the resultant states from all these iterations back together to form the resultant hash.

Now, if you wanted to decode the hash, you'd first need to figure out how to split the given hash into its iterated states (1 possibility for inputs smaller than the size of a chunk of data, many for larger inputs). Then you'd need to reverse the iteration for each state. Now, to explain why this is VERY hard, imagine trying to deduce a and b from the following formula: 10 = a + b. There are 10 positive combinations of a and b that can work. Now loop over that a bunch of times: tmp = a + b; a = b; b = tmp. For 64 iterations, you'd have over 10^64 possibilities to try. And that's just a simple addition where some state is preserved from iteration to iteration. Real hash functions do a lot more than 1 operation (MD5 does about 15 operations on 4 state variables). And since the next iteration depends on the state of the previous and the previous is destroyed in creating the current state, it's all but impossible to determine the input state that led to a given output state (for each iteration no less). Combine that, with the large number of possibilities involved, and decoding even an MD5 will take a near infinite (but not infinite) amount of resources. So many resources that it's actually significantly cheaper to brute-force the hash if you have an idea of the size of the input (for smaller inputs) than it is to even try to decode the hash.

Encryption Functions

They provide a 1:1 mapping between an arbitrary length input and output. And they are always reversible. The important thing to note is that it's reversible using some method. And it's always 1:1 for a given key. Now, there are multiple input:key pairs that might generate the same output (in fact there usually are, depending on the encryption function). Good encrypted data is indistinguishable from random noise. This is different from a good hash output which is always of a consistent format.

Use Cases

Use a hash function when you want to compare a value but can't store the plain representation (for any number of reasons). Passwords should fit this use-case very well since you don't want to store them plain-text for security reasons (and shouldn't). But what if you wanted to check a filesystem for pirated music files? It would be impractical to store 3 mb per music file. So instead, take the hash of the file, and store that (md5 would store 16 bytes instead of 3mb). That way, you just hash each file and compare to the stored database of hashes (This doesn't work as well in practice because of re-encoding, changing file headers, etc, but it's an example use-case).

Use a hash function when you're checking validity of input data. That's what they are designed for. If you have 2 pieces of input, and want to check to see if they are the same, run both through a hash function. The probability of a collision is astronomically low for small input sizes (assuming a good hash function). That's why it's recommended for passwords. For passwords up to 32 characters, md5 has 4 times the output space. SHA1 has 6 times the output space (approximately). SHA512 has about 16 times the output space. You don't really care what the password was, you care if it's the same as the one that was stored. That's why you should use hashes for passwords.

Use encryption whenever you need to get the input data back out. Notice the word need. If you're storing credit card numbers, you need to get them back out at some point, but don't want to store them plain text. So instead, store the encrypted version and keep the key as safe as possible.

Hash functions are also great for signing data. For example, if you're using HMAC, you sign a piece of data by taking a hash of the data concatenated with a known but not transmitted value (a secret value). So, you send the plain-text and the HMAC hash. Then, the receiver simply hashes the submitted data with the known value and checks to see if it matches the transmitted HMAC. If it's the same, you know it wasn't tampered with by a party without the secret value. This is commonly used in secure cookie systems by HTTP frameworks, as well as in message transmission of data over HTTP where you want some assurance of integrity in the data.

A note on hashes for passwords:

A key feature of cryptographic hash functions is that they should be very fast to create, and very difficult/slow to reverse (so much so that it's practically impossible). This poses a problem with passwords. If you store sha512(password), you're not doing a thing to guard against rainbow tables or brute force attacks. Remember, the hash function was designed for speed. So it's trivial for an attacker to just run a dictionary through the hash function and test each result.

Adding a salt helps matters since it adds a bit of unknown data to the hash. So instead of finding anything that matches md5(foo), they need to find something that when added to the known salt produces md5(foo.salt) (which is very much harder to do). But it still doesn't solve the speed problem since if they know the salt it's just a matter of running the dictionary through.

So, there are ways of dealing with this. One popular method is called key strengthening (or key stretching). Basically, you iterate over a hash many times (thousands usually). This does two things. First, it slows down the runtime of the hashing algorithm significantly. Second, if implemented right (passing the input and salt back in on each iteration) actually increases the entropy (available space) for the output, reducing the chances of collisions. A trivial implementation is:

var hash = password + salt;
for (var i = 0; i < 5000; i++) {
hash = sha512(hash + password + salt);
}

There are other, more standard implementations such as PBKDF2, BCrypt. But this technique is used by quite a few security related systems (such as PGP, WPA, Apache and OpenSSL).

The bottom line, hash(password) is not good enough. hash(password + salt) is better, but still not good enough... Use a stretched hash mechanism to produce your password hashes...

Another note on trivial stretching

Do not under any circumstances feed the output of one hash directly back into the hash function:

hash = sha512(password + salt); 
for (i = 0; i < 1000; i++) {
hash = sha512(hash); // <-- Do NOT do this!
}

The reason for this has to do with collisions. Remember that all hash functions have collisions because the possible output space (the number of possible outputs) is smaller than then input space. To see why, let's look at what happens. To preface this, let's make the assumption that there's a 0.001% chance of collision from sha1() (it's much lower in reality, but for demonstration purposes).

hash1 = sha1(password + salt);

Now, hash1 has a probability of collision of 0.001%. But when we do the next hash2 = sha1(hash1);, all collisions of hash1 automatically become collisions of hash2. So now, we have hash1's rate at 0.001%, and the 2nd sha1() call adds to that. So now, hash2 has a probability of collision of 0.002%. That's twice as many chances! Each iteration will add another 0.001% chance of collision to the result. So, with 1000 iterations, the chance of collision jumped from a trivial 0.001% to 1%. Now, the degradation is linear, and the real probabilities are far smaller, but the effect is the same (an estimation of the chance of a single collision with md5 is about 1/(2128) or 1/(3x1038). While that seems small, thanks to the birthday attack it's not really as small as it seems).

Instead, by re-appending the salt and password each time, you're re-introducing data back into the hash function. So any collisions of any particular round are no longer collisions of the next round. So:

hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
hash = sha512(hash + password + salt);
}

Has the same chance of collision as the native sha512 function. Which is what you want. Use that instead.

Difference between hash() and hashSync() functions of BCrypt package of NodeJs

hashSync is used to Synchronously generates a hash for the given string. It returns the hashed string

hash is used for Asynchronously generating a hash for the given string. It returns promise is callback is committed and you need to resolve the promise.

refer https://www.npmjs.com/package/bcryptjs#hashsyncs-salt

What is the difference between hash()%n and n%hash()

Think of operator modulus % as a way to distribute uniformly a set of numbers through reducing them over a smaller range. The set of numbers are, of corse, the hashcodes of input keys. The small range is the capacity of the table.

This is a useful technique when you want to assign an index in a small table to store a high number.

The inverse operation sounds quite weird (and useless): Taking in account that the hash codes are high numbers and n is small, n % hash would return always n, so it has no interest at all.

Java choses indexes through hash & (length-1), indeed, which is not aritmetically equivalent to hash % length, but it is an alternative -and cheaper than modulus- formula to reduce and distribute (credits to @Zabuza).

What's the difference between Hash and NamedTuple in Crystal?

The API docs already explain this pretty good. From NamedTuple (emphasis by me):

A named tuple is a fixed-size, immutable, stack-allocated mapping of a fixed set of keys to values.

You can think of a NamedTuple as an immutable Hash whose keys (which are of type Symbol), and the types for each key, are known at compile time.

And further:

The compiler knows what types are in each key, so when indexing a named tuple with a symbol literal the compiler will return the value for that key and with the expected type. Indexing with a symbol literal for which there's no key will give a compile-time error.

In contrast, Hash:

A Hash is a generic collection of key-value pairs mapping keys of type K to values of type V.

Put in simple words, a hash is a data structure that can be changed at runtime and all keys/values can have any type as long as it matches the generic type arguments K/V.
A named tuple on the other hand is an immutable data structure which is completely known at compile time. If you access a key, the compiler knows its type. Having a named tuple is pretty much similar to just having the keys as variables with a common prefix:

foo = {bar: "bar", baz: 1}

foo_bar = "bar"
foo_baz = 1

NamedTuple just adds a few tools to use these variables as a coherent set.



Related Topics



Leave a reply



Submit