Trie Implementation

How to create a trie in Python

Unwind is essentially correct that there are many different ways to implement a trie; and for a large, scalable trie, nested dictionaries might become cumbersome -- or at least space inefficient. But since you're just getting started, I think that's the easiest approach; you could code up a simple trie in just a few lines. First, a function to construct the trie:

>>> _end = '_end_'
>>>
>>> def make_trie(*words):
... root = dict()
... for word in words:
... current_dict = root
... for letter in word:
... current_dict = current_dict.setdefault(letter, {})
... current_dict[_end] = _end
... return root
...
>>> make_trie('foo', 'bar', 'baz', 'barz')
{'b': {'a': {'r': {'_end_': '_end_', 'z': {'_end_': '_end_'}},
'z': {'_end_': '_end_'}}},
'f': {'o': {'o': {'_end_': '_end_'}}}}

If you're not familiar with setdefault, it simply looks up a key in the dictionary (here, letter or _end). If the key is present, it returns the associated value; if not, it assigns a default value to that key and returns the value ({} or _end). (It's like a version of get that also updates the dictionary.)

Next, a function to test whether the word is in the trie:

>>> def in_trie(trie, word):
... current_dict = trie
... for letter in word:
... if letter not in current_dict:
... return False
... current_dict = current_dict[letter]
... return _end in current_dict
...
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'baz')
True
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'barz')
True
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'barzz')
False
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'bart')
False
>>> in_trie(make_trie('foo', 'bar', 'baz', 'barz'), 'ba')
False

I'll leave insertion and removal to you as an exercise.

Of course, Unwind's suggestion wouldn't be much harder. There might be a slight speed disadvantage in that finding the correct sub-node would require a linear search. But the search would be limited to the number of possible characters -- 27 if we include _end. Also, there's nothing to be gained by creating a massive list of nodes and accessing them by index as he suggests; you might as well just nest the lists.

Finally, I'll add that creating a directed acyclic word graph (DAWG) would be a bit more complex, because you have to detect situations in which your current word shares a suffix with another word in the structure. In fact, this can get rather complex, depending on how you want to structure the DAWG! You may have to learn some stuff about Levenshtein distance to get it right.

Implementation of Trie data structure

The member key of Node is declared as

Node *key[2];

So it's an array of two pointers and given this line in Trie::insertUtil,

int tmp_chr = a[idx]-'0';  // A variable ignored in the following code, BTW.

I'll assume that the "strings" that the OP is trying to insert are composed only by the characters '0' and '1'.

Note that, in the posted code, the null-terminator needed in the used C-string is simply ignored, which is an error by itself, easily fixed by using a proper std::string instead.

The other issue is in the same loop:

for(int idx = 0; idx < 5; idx++)
{ // ^^^^^^^ It should stop before the null-terminator
// (...)
int tmp_chr = a[idx]-'0'; // Are you sure that there are only '0' or '1'?
if( !(temp->key[1]) )
{ // ^^^ 1 is wrong, here, it should be temp->key[tmp_chr]
temp->key[a[idx]-'0'] = new Node();
// ^^^^^^^^^^ Why not use tmp_chr here and in the following?
// ...
}
// ...
}

Trie implementation

Your has function should probably look like this:

if (c[val]!=null && word.length()>1) {
return c[val].has(word.substring(1)); //<-- Change is on this line
} else if (c[val].flag==true && word.length()==1) {
...etc

You perform the recursive call, but you really need to let that value propagate back out to the original caller.



Related Topics



Leave a reply



Submit