Explanation of Ruby Code for Building Trie Data Structures

Explanation of Ruby code for building Trie data structures

You're probably getting lost inside that mess of code which takes an approach that seems a better fit for C++ than for Ruby. Here's the same thing in a more concise format that uses a special case Hash for storage:

class Trie < Hash
  def initialize
    # Ensure that this is not a special Hash by disallowing
    # initialization options.
    super
  end

  def build(string)
    string.chars.inject(self) do |h, char|
      h[char] ||= { }
    end
  end
end

It works exactly the same but doesn't have nearly the same mess with pointers and such:

trie = Trie.new
trie.build('dogs')
puts trie.inspect

Ruby's Enumerable module is full of amazingly useful methods like inject which is precisely what you want for a situation like this.

ruby trie implementation reference issue

Ruby's string concat mutates the string and doesn't return a new string. You may want the + operator instead. So basically change the 2 lines inside collect's for-loop as per below:

stringn = string + letter
collect(node.hash[letter], stringn)

Also, you probably want to either always initialize @words to empty in print before calling collect, or make it a local variable in print and pass it to collect.

Are data structures used in higher level languages?

So since these higher level languages
manage the memory for you, what would
you use data structures for?

The main reason for using a data structure is not about garbage collection. But it is about storing data in a way that is efficient in some way. So what matters most is HOW you are organizing the data. Which is exactly what the language can't automatically figure out for you.

Sure the high level language will come with several preloaded data structures (and you should 100% use these preloaded data structures when they are provided instead of making your own), but not all data structures are provided that you may need.

Data structures organize the storage of memory in some way so that the algorithms that run on them can be implemented giving efficient results.

For most tasks you wouldn't need to implement your own data structures. But this depends fully on what you are coding.

I can understand the need for queues
and stacks but would you ever need to
use a binary tree in Ruby?

There are a lot of examples for using a binary tree, but not in common every day projects, for example you may need to implement huffman coding.

Other data structures can be used to have the space savings and fast lookups of using a trie, or you may need to store a LOT of data with fast lookup by using a btree. Several data structures have specific uses and are optimized for different things. Whether the language is modern or not and whether it has garbage collection or not doesn't change that.

The trend though, is that custom implemented data structures are coded less, and thought about less. A similar argument happens with common algorithms. In more modern languages, like LINQ you simply specify to sort. You don't actually say how to sort.

Match pattern in Ruby with Regexp

Here's a method to find the longest common prefix in an array.

def _lcp(str1, str2)
  end_index = [str1.length, str2.length].min - 1
  end_index.downto(0) do |i|
    return str1[0..i] if str1[0..i] == str2[0..i]
  end
  ''
end

def lcp(strings)
  strings.inject do |acc, str|
    _lcp(acc, str)
  end
end

lcp [
  'http://www.example.com?id=123456',
  'http://www.example.com?id=234567',
  'http://www.example.com?id=987654'
]
#=> "http://www.example.com?id="

lcp [
  'http://www.example.com?id=123456',
  'http://www.example.com?id=123457'
]
#=> "http://www.example.com?id=12345"

Finding words frequency of huge data in a database

Finding information on huge data is done by parallelizing it and use a cluster rather then a single machine.

What you are describing is a classic map-reduce problem, that can be handled using the following functions (in pseudo code):

map(doc):
  for each word in doc:
      emitIntermediate(word,"1")
reduce(list<word>):
  emit(word,size(list))

The map reduce framework, which is implemented in many languages - allows you to easily scale the problem and use a huge cluster without much effort, taking care of failures and workers management for you.

In here: doc is a single document, it usually assumes a collection of documents. If you have only one huge document, you can of course split it to smaller documents and invoke the same algorithm.

Ruby keep occurance count in ordered data structure

So here is where I ended up.. with a working solution. I used a normal array as a priority queue of sorts, so rather than having the ID of the object be the key, and the value how many times it's been accessed, I simple am storing the object ID in an array.

With an array of ID's, when it comes time to 'increment' I simply delete it from the array, and push it back on the end of the array - since arrays have 'implied' indexes the preserve order.

How to get the index positions (x,y) of the keys of a variably deep Trie in Ruby

Here's a simple recursive function that outputs the position of each key in the spreadsheet.

def to_coords hash, x = 0, y = 0
  hash.each do |k, v|
    puts "#{x},#{y} #{k}"
    x = to_coords(v, x, y + 1)
  end
  return x + (hash.empty? ? 1 : 0)
end

For your example, this outputs

0,0 Canada
0,1 Male
0,2 Children
1,2 Old
2,2 Teenager
3,1 Female
3,2 Children
4,2 Old
5,2 Teenager
6,0 France
6,1 Male
6,2 Children
7,2 Old
8,2 Teenager
9,1 Female
9,2 Children
10,2 Old
11,2 Teenager

You didn't give a full example of your input so this will need to be tweaked a bit to fit your application. The basic idea is that if you are at the bottom level (Children, Old, Teenager), then each key is just shifted over by one, hence the hash.empty? ? 1 : 0. If you are not at the bottom level then iterating over the subhashes tells you what X value to use next.

Explanation of Ruby Code for Building Trie Data Structures