Struct Hash for Different Types

Struct hash for different types

You can make a protocol containing all the shared data, so both S1 and S2 go through the same hasher.

I created a SharedData protocol containing the shared properties (only data in this case) and gave it a Hashable implementation. Then S1 and S2 now conform to SharedData. It looks like this:

protocol SharedData: Hashable {
    var data: String { get }
}

extension SharedData {
    func hash(into hasher: inout Hasher) {
        hasher.combine(data)
    }
}

struct S1: SharedData {
    let data: String
    let otherData: String

    // ... Equality & Hash functions, both only use data
}
struct S2: SharedData {
    let data: String
    let otherData: Int

    // ... Equality & Hash functions, both only use data
}

And the comparison looks like this:

print(S1(data: "hi", otherData: "there").hashValue)
print(S2(data: "hi", otherData: 1).hashValue)

let set: Set<Int> = [S1(data: "Hello", otherData: "world!").hashValue, S1(data: "abc", otherData: "123").hashValue]

let obj2 = S2(data: "Hello", otherData: 0)
print(set.contains(obj2.hashValue))

Prints:

-5068166682035457964  // <- these change every program execution
-5068166682035457964
true

Unfortunately it's not possible to do the following, yet:

let set: Set<SharedData> = [S1(data: "Hello", otherData: "world!"), S1(data: "abc", otherData: "123")]

Error:

Protocol 'SharedData' as a type cannot conform to 'Hashable'

I believe SE-0309 (Unlock existentials for all protocols) could fix this.

Struct Field Hash function with other Fields of the same Struct Set - GoLang

There's a lot of ways you can do this, but seeing as you're looking for a simple way to do this, you could just serialise the data, hash that and assign. The easiest way to do this would be to marshal your Block type, hash the result, and assign that to the Hash field.

Personally, I prefer it when this is made more explicit by splitting out the data that makes up the hash, and embed this type into the block type itself, but that really is up to you. Be advised that json marshalling maps may not be deterministic, so depending on what's in your Tx type, you may need some more work there.

Anyway, with embedded types, it'd look like this:

// you'll rarely interact with this type directly, never outside of hashing
type InBlock struct {
    Index     int    `json:"index"`
    PrevHash  string `json:"PrevHash"`
    Txs       []Tx   `json:"txs"`
    Timestamp int64  `json:"timestamp"`
}

// almost identical in to the existing block type
type Block struct {
    InBlock // embed the block fields
    Hash      string
}

Now, the hashing function can be turned into a receiver function on the Block type itself:

// CalculateHash will compute the hash, set it on the Block field, returns an error if we can't serialise the hash data
func (b *Block) CalculateHash() error {
    data, err := json.Marshal(b.InBlock) // marshal the InBlock data
    if err != nil {
        return err
    }
    hash := md5.Sum(data)
    b.Hash = hex.EncodeToString(hash[:])
    return nil
}

Now the only real difference is how you initialise your Block type:

block := Block{
    InBlock: InBlock{
        Index:     0,
        PrevHash:  "Genesis",
        Txs:       []Tx{},
        Timestamp: time.Now().Unix(),
    },
    Hash: "", // can be omitted
}
if err := block.CalculateHash(); err != nil {
    panic("something went wrong: " + err.Error())
}
// block now has the hash set

To access fields on your block variable, you don't need to specify InBlock, as the Block type doesn't have any fields with a name that mask the fields of the type it embeds, so this works:

txs := block.Txs
// just as well as this
txs := block.InBlock.Txs

Without embedding types, it would end up looking like this:

type Block struct {
    Index     int    `json:"index"`
    PrevHash  string `json:"PrevHash"`
    Txs       []Tx   `json:"txs"`
    Timestamp int64  `json:"timestamp"`
    Hash      string `json:"-"` // exclude from JSON mashalling
}

Then the hash stuff looks like this:

func (b *Block) CalculateHash() error {
    data, err := json.Marshal(b)
    if err != nil {
        return err
    }
    hash := md5.Sum(data)
    b.Hash = hex.EncodeToString(hash[:])
    return nil
}

Doing things this way, the underlying Block type can be used as you are doing right now already. The downside, at least in my opinion, is that debugging/dumping data in a human readable format is a bit annoying, because the hash is never included in a JSON dump, because of the json:"-" tag. You could work around that by only including the Hash field in the JSON output if it is set, but that would really open the door to weird bugs where hashes don't get set properly.

About the map comment

So iterating over maps is non-deterministic in golang. Determinism, as you probably know, is very important in blockchain applications, and maps are very commonly used data structures in general. When dealing with them in situations where you can have several nodes processing the same workload, it's absolutely crucial that each one of the nodes produces the same hash (obviously, provided they do the same work). Let's say you had decided to define your block type, for whatever reason as having Txs as a map by ID (so Txs map[uint64]Tx), in this case it wouldn't be guaranteed that the JSON output is the same on all nodes. If that were the case, you'd need to marshal/unmarshal the data in a way that addresses this problem:

// a new type that you'll only use in custom marshalling
// Txs is a slice here, using a wrapper type to preserve ID
type blockJSON struct {
    Index     int    `json:"index"`
    PrevHash  string `json:"PrevHash"`
    Txs       []TxID `json:"txs"`
    Timestamp int64  `json:"timestamp"`
    Hash      string `json:"-"`
}

// TxID is a type that preserves both Tx and ID data
// Tx is a pointer to prevent copying the data later on
type TxID struct {
    Tx *Tx    `json:"tx"`
    ID uint64 `json:"id"`
}

// not the json tags are gone
type Block struct {
    Index     int
    PrevHash  string
    Txs       map[uint64]Tx // as a map
    Timestamp int64
    Hash      string
}

func (b Block) MarshalJSON() ([]byte, error) {
    cpy := blockJSON{
        Index:     b.Index,
        PrevHash:  b.PrevHash,
        Txs:       make([]TxID, 0, len(b.Txs)), // allocate slice
        Timestamp: b.Timestamp,
    }
    keys := make([]uint64, 0, len(b.Txs)) // slice of keys
    for k := range b.Txs {
        keys = append(keys, k) // add keys to the slice
    }
    // now sort the slice. I prefer Stable, but for int keys Sort
    // should work just fine
    sort.SliceStable(keys, func(i, j int) bool {
        return keys[i] < keys[j]
    }
    // now we can iterate over our sorted slice and append to the Txs slice ensuring the order is deterministic
    for _, k := range keys {
        cpy.Txs = append(cpy.Txs, TxID{
            Tx: &b.Txs[k],
            ID: k,
        })
    }
    // now we've copied over all the data, we can marshal it:
    return json.Marshal(cpy)
}

The same must be done for the unmarshalling, because the serialised data is no longer compatible with our original Block type:

func (b *Block) UnmarshalJSON(data []byte) error {
    wrapper := blockJSON{} // the intermediary type
    if err := json.Unmarshal(data, &wrapper); err != nil {
        return err
    }
    // copy over fields again
    b.Index = wrapper.Index
    b.PrevHash = wrapper.PrevHash
    b.Timestamp = wrapper.Timestamp
    b.Txs = make(map[uint64]Tx, len(wrapper.Txs)) // allocate map
    for _, tx := range wrapper.Txs {
        b.Txs[tx.ID] = *tx.Tx // copy over values to build the map
    }
    return nil
}

Instead of copying over field-by-field, especially because we don't really care whether the Hash field retains its value, you can just reassign the entire Block variable:

func (b *Block) UnmarshalJSON(data []byte) error {
    wrapper := blockJSON{} // the intermediary type
    if err := json.Unmarshal(data, &wrapper); err != nil {
        return err
    }
    *b = Block{
        Index:     wrapper.Index,
        PrevHash:  wrapper.PrevHash,
        Txs:       make(map[uint64]Tx, len(wrapper.Txs)),
        Timestamp: wrapper.Timestamp,
    }
    for _, tx := range wrapper.Txs {
        b.Txs[tx.ID] = *tx.Tx // populate map
    }
    return nil
}

But yeah, as you can probably tell: avoid maps in types that you want to hash, or implement different methods to get the hash in a more reliable way

How to properly hash the custom struct?

boost::hash_combine is really good one to hash different fields.

If you don't have boost library you can use this :

template <class T>
inline void hash_combine(std::size_t & s, const T & v)
{
  std::hash<T> h;
  s^= h(v) + 0x9e3779b9 + (s<< 6) + (s>> 2);
}

 struct S {
  int field1;
  short field2;
  std::string field3;
  // ...
};

template <class T>
class MyHash;

template<>
struct MyHash<S>
{
    std::size_t operator()(S const& s) const 
    {
        std::size_t res = 0;
       hash_combine(res,s.field1);
       hash_combine(res,s.field2);
       hash_combine(res,s.field3);
        return res;
    }
};

And then probably std::unordered_set<S> s; , etc

What's a good hash function for struct with 3 unsigned chars and an int, for unordered_map?

What you do in your hash function depends on the values you got, not necessarily so much on their types. If all four data members contain each value evenly distributed, I would combine the two characters into an unsigned long and return the result of xoring the two values:

typedef unsigned long ulong;
return n ^ (ulong(a << 16) | ulong(b << 8) | ulong(c));

It is certainly a hash function. Whether it is one which works well is a different question. You might also combine the result with std::hash<unsigned long>.

Defining hash function as part of a struct

std::unordered_set is defined within std namespace. And it uses std::hash structures to hash many different types. If you want to be able to use std::unordered_set<X> (without adding much info to the declaration), you must create another overload of the std::hash template, so as to make it hash your structure.

You should be able to get it working by doing the following:

# include <unordered_set>
struct X {
    size_t operator()(void) const {}
    bool operator()(const X &other) const {}
};

namespace std {
    template<>
    struct hash<X> {
        inline size_t operator()(const X& x) const {
            // size_t value = your hash computations over x
            return value;
        }
    };
}

int main() {
    std::unordered_set<X> s;
}

Andalso, you must provide either an overload to std::equal_to, or a comparison operator (operator==()) for your structure. You should add one of the following:

struct X {
    ...
    inline bool operator==(const X& other) const {
        // bool comparison = result of comparing 'this' to 'other'
        return comparison;
    }

};

Or:

template <>
struct equal_to<X> {
    inline bool operator()(const X& a, const X& b) const {
        // bool comparison = result of comparing 'a' to 'b'
        return comparison;
    }
};

Hashing function for nested struct in struct when using an unordered_set

You never defined hash<Address>! Instead, you have to use your own function operator()(name.address).

Simple XORing is maybe not the best solution. I strongly recommend you copy over Boost's hash_combine(), and put the whole load into namespace std:

template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
  std::hash<T> hasher;
  seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

namespace std
{
  template <> struct hash<Address>
  {
    inline size_t operator()(const Address & a) const
    {
      return hash<int>()(a.num);
    }
  };

  template <> struct hash<Name>
  {
    inline size_t operator()(const Name & a) const
    {
      size_t seed = 0;
      hash_combine(seed, name.first);
      hash_combine(seed, name.second);
      hash_combine(seed, name.address);
      return seed;
    }
  };
}

Now you can use the types directly (subject to implementing an equality comparator): std::unordered_set<Name> s;

Unique Hash from struct

Feng has a point, the accepted answer doesn't work not only because there are no exported fields in the struct, but also the fact that the way MD5 hashes does have order significance, see RFC 1321 3.4.

Not sure on efficacy, but I've got something working by passing in the slice as bytes using encoding/gob and bytes representing a hash to use in Compare.

Playground

package main

import (
    "bytes"
    "encoding/gob"
    "fmt"
)

type S struct {
    K1 string
    K2 int
}

func main() {
    sa := []S{{ K1: "foo", K2: 1}, {K1: "bar", K2: 2}, {K1: "baz", K2: 3,}}
    sb := []S{{ K1: "baz", K2: 3}, {K1: "bar", K2: 2}, {K1: "foo", K2: 1,}}
    sc := []S{}

    a := Hash(sa)
    b := Hash(sb)
    c := Hash(sc)

    fmt.Println(Compare(a, b))
    fmt.Println(Compare(a, c))
}

func Compare(a, b []byte) bool {
    a = append(a, b...)
    c := 0
    for _, x := range a {
        c ^= int(x)
    }
    return c == 0
}

func Hash(s []S) []byte {
    var b bytes.Buffer
    gob.NewEncoder(&b).Encode(s)
    return b.Bytes()
}

Struct Hash for Different Types