How to Use a Hashmap in C++

Implementing a HashMap in C

Well if you know the basics behind them, it shouldn't be too hard.

Generally you create an array called "buckets" that contain the key and value, with an optional pointer to create a linked list.

When you access the hash table with a key, you process the key with a custom hash function which will return an integer. You then take the modulus of the result and that is the location of your array index or "bucket". Then you check the unhashed key with the stored key, and if it matches, then you found the right place.

Otherwise, you've had a "collision" and must crawl through the linked list and compare keys until you match. (note some implementations use a binary tree instead of linked list for collisions).

Check out this fast hash table implementation:

https://attractivechaos.wordpress.com/2009/09/29/khash-h/

What is the best way to use a HashMap in C++?

The standard library includes the ordered and the unordered map (std::map and std::unordered_map) containers. In an ordered map the elements are sorted by the key, insert and access is in O(log n). Usually the standard library internally uses red black trees for ordered maps. But this is just an implementation detail. In an unordered map insert and access is in O(1). It is just another name for a hashtable.

An example with (ordered) std::map:

#include <map>
#include <iostream>
#include <cassert>

int main(int argc, char **argv)
{
std::map<std::string, int> m;
m["hello"] = 23;
// check if key is present
if (m.find("world") != m.end())
std::cout << "map contains key world!\n";
// retrieve
std::cout << m["hello"] << '\n';
std::map<std::string, int>::iterator i = m.find("hello");
assert(i != m.end());
std::cout << "Key: " << i->first << " Value: " << i->second << '\n';
return 0;
}

Output:


23
Key: hello Value: 23

If you need ordering in your container and are fine with the O(log n) runtime then just use std::map.

Otherwise, if you really need a hash-table (O(1) insert/access), check out std::unordered_map, which has a similar to std::map API (e.g. in the above example you just have to search and replace map with unordered_map).

The unordered_map container was introduced with the C++11 standard revision. Thus, depending on your compiler, you have to enable C++11 features (e.g. when using GCC 4.8 you have to add -std=c++11 to the CXXFLAGS).

Even before the C++11 release GCC supported unordered_map - in the namespace std::tr1. Thus, for old GCC compilers you can try to use it like this:

#include <tr1/unordered_map>

std::tr1::unordered_map<std::string, int> m;

It is also part of boost, i.e. you can use the corresponding boost-header for better portability.

implement hash map in c

Creating a hash map requires a bit of work, but is definitely an interesting thing to do.

The general approach to build a hash map is to create an array called buckets that contains keys and values. However, this structure is not able to deal with collisions. A collision between two values may arise when different keys are assigned by the hash function to the same value.

To address this problem a third field, usually a pointer, is added to the bucket. When a collision arises, the new value is added in the data structure pointed at by the third field, usually a linked list or binary tree. For more information about how to resolve collisions, read this link.

An example of the Hash map Structure described above (note the collision at index 153): Hash map Structure

To access the hash table, a custom hash function, which returns an integer which identifies the array index location, is applied to the key. Finally, you check if the key used to access the element and the key stored in the array at the index returned by the hash function match. If they do, you have found the right element.

This is just an example; you can find different ways to implement a hash map.

In addition, this question has already been asked here.

Why are there no hashtables in the C standard library?

There is no hashtable in the standard C library because either:

  • no-one has submitted a proposal to the working group; or
  • the working group has deemed it unnecessary.

That's the way ISO works. Proposals are put forward and accepted or rejected.

You have to be careful what you add to the standard library since you have two conflicting groups. As a user, you might want every data structure under the sun to be added to the standard to make the language more useful.

But, as a language implementor (as an aside, these are probably the people that tend to make up most of the various working groups so their view is likely to have more impact), you don't really want the hassle of having to implement stuff that may not be used by everyone. All the stuff that was there when C89 appeared was to do with the fact that the primary purpose was to codify existing practice rather than introduce new practices. All iterations of the standards since then have been a little freer in what they can do but backwards compatibility is still an important issue.

Myself, I also have conflicts. I'd love to have all the features of the Java, C++ or Python libraries at my disposal in C. Of course, that would make it so much harder to learn everything for newcomers and, as one commenter stated, probably make it so any code monkey can pump out useful code, reducing my value in the process :-)

And I pretty much have all the data structures I'll ever need, from my long and (mostly) illustrious career. You're not limited to the standard library for this sort of stuff. There are plenty of third-party tools you can get to do the job and (like me) you can also roll your own.

If you want to know why certain decisions were made in each iteration, ISO (and ANSI originally, before ISO took over) usually publish rationale documents. The C89 one from ANSI can be found here. It contains this little beauty in the scope:

This Rationale focuses primarily on additions, clarifications, and changes made to the language as described in the Base Documents. It is not a rationale for the C language as a whole: the Committee was charged with codifying an existing language, not designing a new one. No attempt is made in this Rationale to defend the pre-existing syntax of the language, such as the syntax of declarations or the binding of operators.

I especially enjoy the admission that they're not responsible for any unholy mess that may have predated their attempts to standardise.

But, perhaps the real answer to your question lies in this bit, one of the guiding principles:


Keep the spirit of C. The Committee kept as a major goal to preserve the traditional spirit of C. There are many facets of the spirit of C, but the essence is a community sentiment of the underlying principles upon which the C language is based. Some of the facets of the spirit of C can be summarized in phrases like:

  • Trust the programmer.
  • Don't prevent the programmer from doing what needs to be done.
  • Keep the language small and simple.
  • Provide only one way to do an operation.
  • Make it fast, even if it is not guaranteed to be portable.

That third one is probably the main reason why the library wasn't massively expanded with the initial standardisation effort - that, and the fact that such an expansion from a committee would probably have resulted in ANSI C being labeled C2038 rather than C89.

C - How to assign multiple values to the same key in a hashmap?

This may not be the best possible solution, but thanks to the above comments I got something working. I am declaring multiple values as pointers to char. Please let me know if I can improve this.

I am yet to implement collision correction.

struct map {        /* creating a map structure */
struct map *next;
char *KEY;
char *value1;
char *value2;
char *value3;
};

I then pass these to the insert function.

struct map *insert(char *KEY, char *value1, char *value2, char *value3) {
/* install: put (name, defn) in */
/* Install uses lookup to determine whether the KEY being installed
is already present. Proceeds to create a new entry or update*/
struct map *np;
unsigned hashval;

if ((np = lookup(KEY)) == NULL) {
np = (struct map *) malloc(sizeof(*np));
if (np == NULL || (np->KEY = strdup(KEY)) == NULL) {
return NULL;
}
hashval = hash(KEY);
np->next = hashtab[hashval];
hashtab[hashval] = np;
}
else {
free((void *)np->value1); //type cast cannot convert from 'Value' to 'Value*'
free((void *)np->value2);
free((void *)np->value3);
}
if ((np->time = strdup(value1)) == NULL) { //map has no field value
return NULL;
}
if ((np->description = strdup(value2)) == NULL) { //map has no field value
return NULL;
}
if ((np->duration = strdup(value3)) == NULL) { //map has no field value
return NULL;
}

return np;
}

Printing these will give the key pointing the the four values.

key1->value1 value2 value3



Related Topics



Leave a reply



Submit