Interesting Tree/Hierarchical Data Structure Problem

Database Structure for Tree Data Structure

You mention the most commonly implemented, which is Adjacency List:
https://blogs.msdn.microsoft.com/mvpawardprogram/2012/06/25/hierarchies-convert-adjacency-list-to-nested-sets

There are other models as well, including materialized path and nested sets:
http://communities.bmc.com/communities/docs/DOC-9902

Joe Celko has written a book on this subject, which is a good reference from a general SQL perspective (it is mentioned in the nested set article link above).

Also, Itzik Ben-Gann has a good overview of the most common options in his book "Inside Microsoft SQL Server 2005: T-SQL Querying".

The main things to consider when choosing a model are:

1) Frequency of structure change - how frequently does the actual structure of the tree change. Some models provide better structure update characteristics. It is important to separate structure changes from other data changes however. For example, you may want to model a company's organizational chart. Some people will model this as an adjacency list, using the employee ID to link an employee to their supervisor. This is usually a sub-optimal approach. An approach that often works better is to model the org structure separate from employees themselves, and maintain the employee as an attribute of the structure. This way, when an employee leaves the company, the organizational structure itself does not need to be changes, just the association with the employee that left.

2) Is the tree write-heavy or read-heavy - some structures work very well when reading the structure, but incur additional overhead when writing to the structure.

3) What types of information do you need to obtain from the structure - some structures excel at providing certain kinds of information about the structure. Examples include finding a node and all its children, finding a node and all its parents, finding the count of child nodes meeting certain conditions, etc. You need to know what information will be needed from the structure to determine the structure that will best fit your needs.

Suitable tree data structure

Since there are only hundreds of nodes, just make the structure the same as you have described.

  • Each node has a unique reference to parent node.
  • Each node has a list of child node.
  • Each node has an id
  • There is a (external) map from id --> node. May not even necessary.

2 way traversal is possible, since parent and child nodes are know. Same for find parent and find child.

Find id can be done by traverse the whole tree, if no map. Or you can use the map to quickly find the node.

Add node is easy, since there is a list in each node. Rearrange is also easy, since you can just freely add/remove from list of child nodes and reassign the parent node.

I'm answering this question from a language-agnostic aspect. This is a tree structure without any limit, so implementation is not that popular.

Tree-like Datastructure (for use with VirtualTreeview)

The data structure you're requesting is very simple, it's so simple I'd recommend using the windows-provided TTreeView: it allows storing the text and an ID straight into the tree's node with no additional work.


Despite my recommendation to use the simpler TTreeView I'm going to provide my take on the data structure problem. First of all I'm going to use classes, not records. In your very short code sample you're mixing records and classes in a very unfrotunate way: When you make a copy of the TRoot record (assigning records makes complete copies, because records are allways treated as "values"), you're not making a "deep copy" of the tree: The complete copy of TRoot will contain the same Kids:TList as the original, because classes, unlike records, are references: you're coping the value of the reference.

An other problem when you have a record with an object field is life cycle management: A record doesn't have an destructor so you'll need an other mechanism to free the owned object (Kids:TList). You could replace the TList with an array of Tkid but then you'll need to be very careful when passing the monster record around, because you might end making deep copies of huge records when you least expect it.

In my opinion the most prudent thing to do is to base the data structure on classes, not records: class instances (objects) are passed around as references, so you can move them around all you want with no problems. You also get built-in life cycle management (the destructor)

The base class would look like this. You'll notice it can be used as either the Root or the Kid, because both Root and Kid share data: The both have a name and an ID:

TNodeClass = class
public
Name: string;
ID: Integer;
end;

If this class is used as an Root, it needs a way to store the Kids. I assume you're on Delphi 2010+, so you have generics. This class, complete with a list, looks like this:

type
TNode = class
public
ID: integer;
Name: string;
VTNode: PVirtualNode;
Sub: TObjectList<TNode>;

constructor Create(aName: string = ''; anID: integer = 0);
destructor Destroy; override;
end;

constructor TNode.Create(aName:string; anID: Integer);
begin
Name := aName;
ID := anID;

Sub := TObjectList<TNode>.Create;
end;

destructor TNode.Destroy;
begin
Sub.Free;
end;

You might not immediately realize this, but this class alone is enough to implement a multi-level tree! Here's some code to fill up the tree with some data:

Root := TNode.Create;

// Create the Contacts leaf
Root.Sub.Add(TNode.Create('Contacts', -1));
// Add some contacts
Root.Sub[0].Sub.Add(TNode.Create('Abraham', 1));
Root.Sub[0].Sub.Add(TNode.Create('Lincoln', 2));

// Create the "Recent Calls" leaf
Root.Sub.Add(TNode.Create('Recent Calls', -1));
// Add some recent calls
Root.Sub[1].Sub.Add(TNode.Create('+00 (000) 00.00.00', 3));
Root.Sub[1].Sub.Add(TNode.Create('+00 (001) 12.34.56', 4));

You need a recursive procedure to fill the virtual tree view using this type:

procedure TForm1.AddNodestoTree(ParentNode: PVirtualNode; Node: TNode);
var SubNode: TNode;
ThisNode: PVirtualNode;

begin
ThisNode := VT.AddChild(ParentNode, Node); // This call adds a new TVirtualNode to the VT, and saves "Node" as the payload

Node.VTNode := ThisNode; // Save the PVirtualNode for future reference. This is only an example,
// the same TNode might be registered multiple times in the same VT,
// so it would be associated with multiple PVirtualNode's.

for SubNode in Node.Sub do
AddNodestoTree(ThisNode, SubNode);
end;

// And start processing like this:
VT.NodeDataSize := SizeOf(Pointer); // Make sure we specify the size of the node's payload.
// A variable holding an object reference in Delphi is actually
// a pointer, so the node needs enough space to hold 1 pointer.
AddNodesToTree(nil, Root);

When using objects, different nodes in your Virtual Tree may have different types of objects associated with them. In our example we're only adding nodes of TNode type, but in the real world you might have nodes of types TContact, TContactCategory, TRecentCall, all in one VT. You'll use the is operator to check the actual type of the object in the VT node like this:

procedure TForm1.VTGetText(Sender: TBaseVirtualTree; Node: PVirtualNode;
Column: TColumnIndex; TextType: TVSTTextType; var CellText: string);
var PayloadObject:TObject;
Node: TNode;
Contact : TContact;
ContactCategory : TContactCategory;
begin
PayloadObject := TObject(VT.GetNodeData(Node)^); // Extract the payload of the node as a TObject so
// we can check it's type before proceeding.
if not Assigned(PayloadObject) then
CellText := 'Bug: Node payload not assigned'
else if PayloadObject is TNode then
begin
Node := TNode(PayloadObject); // We know it's a TNode, assign it to the proper var so we can easily work with it
CellText := Node.Name;
end
else if PayloadObject is TContact then
begin
Contact := TContact(PayloadObject);
CellText := Contact.FirstName + ' ' + Contact.LastName + ' (' + Contact.PhoneNumber + ')';
end
else if PayloadObject is TContactCategory then
begin
ContactCategory := TContactCategory(PayloadObject);
CellText := ContactCategory.CategoryName + ' (' + IntToStr(ContactCategory.Contacts.Count) + ' contacts)';
end
else
CellText := 'Bug: don''t know how to extract CellText from ' + PayloadObject.ClassName;
end;

And here's an example why to store VirtualNode pointer to your node instances:

procedure TForm1.ButtonModifyClick(Sender: TObject);
begin
Root.Sub[0].Sub[0].Name := 'Someone else'; // I'll modify the node itself
VT.InvalidateNode(Root.Sub[0].Sub[0].VTNode); // and invalidate the tree; when displayed again, it will
// show the updated text.
end;

You know have an working example for a simple tree data structure. You'll need to "grow" this data structure to suite your needs: the possibilities are endless! To give you some ideas, directions to explore:

  • You can turn the Name:string into a virtual method GetText:string;virtual and then create specialized descendants of TNode that override GetText to provide specialized behavior.
  • Create a TNode.AddPath(Path:string; ID:Integer) that allows you to do Root.AddPath('Contacts\Abraham', 1); - that is, a method that automatically creates all intermediary nodes to the final node, to allow easy creation of the tree.
  • Include an PVirtualNode into TNode itself so you can check rather the Node is "checked" in the Virtual Tree. This would be a bridge of the data-GUI separation.

C++ Tree Data Structure

You could store your nodes in a single vector and use relative pointers that are not changed when the vectors are resized:

typedef int32_t Offset;
struct Node {
Node(Offset p) : parent(p) {}
Offset parent = 0; // 0 means no parent, so root node
std::vector<Offset> children;
};

std::vector<Node> tree;
std::vector<uint32_t> free_list;

To add a node:

uint32_t index;
if (free_list.empty()) {
index = tree.size();
tree.emplace_back(parent_index - tree.size());
} else {
index = free_list.back();
free_list.pop_back();
tree[index].parent = parent_index - index;
}

tree[parent_index].children.push_back(index - parent_index);

To remove a node:

assert(node.children.empty());
if (node.parent) {
Node* parent = &node + node.parent;
auto victim = find(parent->children.begin(), parent->children.end(), -node.parent);
swap(*victim, parent->children.back()); // more efficient than erase from middle
parent->children.pop_back();
}
free_list.push_back(&node - tree.data());

How do I efficiently search this hierarchical structure?

Add all the nodes to dictionary with the code as key. (you can do it once), the look-up in dictionary is basically O(1).

void FillDictionary(Dictionary<string, Node> dictionary, Node node)
{
if (dictionary.ContainsKey(node.Code))
return;

dictionary.Add(node.Code, node);

foreach (Node child in node.Children)
FillDictionary(dictionary, child)
}

If you know the root, usage will be:

var dictionary = new Dictionary<string, Node>();
FillDictionary(dictionary, rootNode);

If you don't you can call the FillDictionary() method on all your nodes with the same dictionary.



Related Topics



Leave a reply



Submit