Fast Relational Method of Storing Tree Data (For Instance Threaded Comments on Articles)

Fast Relational method of storing tree data (for instance threaded comments on articles)

I really like how Drupal solves this problem. It assigns a thread id to each comment. This id starts at 1 for the first comment. If a reply is added to this comment, the id 1.1 is assigned to it. A reply to comment 1.1 is given the thread id 1.1.1. A sibling of comment 1.1 is given the thread id 1.2. You get the idea. Calculating these thread ids can be done easily with one query when a comment is added.

When the thread is rendered, all of the comments that belong to the thread are fetched in a single query, sorted by the thread id. This gives you the threads in the ascending order. Furthermore, using the thread id, you can find the nesting level of each comment, and indent it accordingly.

1
1.1
1.1.1
1.2
1.2.1

There are a few issues to sort out:

  • If one component of the thread id grows to 2 digits, sorting by thread id will not produce the expected order. An easy solution is ensuring that all components of a thread id are padded by zeros to have the same width.
  • Sorting by descending thread id does not produce the expected descending order.

Drupal solves the first issue in a more complicated way using a numbering system called vancode. As for the second issue, it is solved by appending a backslash (whose ASCII code is higher than digits) to thread ids when sorting by descending order. You can find more details about this implementation by checking the source code of the comments module (see the big comment before the function comment_get_thread).

How to represent the data for threaded comments(along with comment voting) in mongodb?

Just store the comments as you want them represented on your blog. You want threaded/nested comments? Then store them in a nested fashion:

postId: {
comments: [
{
id: "47cc67093475061e3d95369d" // ObjectId
title: "Title of comment",
body: "Comment body",
timestamp: 123456789,
author: "authorIdentifier",
upVotes: 11,
downVotes: 2,
comments: [
{
id: "58ab67093475061e3d95a684"
title: "Nested comment",
body: "Hello, this is a nested/threaded comment",
timestamp: 123456789,
author: "authorIdentifier",
upVotes: 11,
downVotes: 2,
comments: [
// More nested comments
]
}
]
},
{
// Another top-level comment
}
]
}

The postId refers to the blog post to which the comments belong and has been used as the key (or _id in MongoDB) of the document. Each comment has a unique id, in order to vote or comment on individual comments.

To get the aggregated votes, you'll need to write map-reduce functions somewhere along these lines:

function map() {
mapRecursive(this.comments)
}

function mapRecursive(comments) {
comments.forEach(
function (c) {
emit(comment.author, { upVotes: c.upVotes, downVotes: c.downVotes });
mapRecursive(c.comments);
}
);
}

function reduce(key, values) {
var upVotes = 0;
var downVotes = 0;

values.forEach(
function(votes) {
upVotes += votes.upVotes;
downVotes += votes.downVotes;
}
);

return { upVotes: upVotes, downVotes: downVotes };
}

I haven't tested these functions and they don't check for null values either. That's up to you :)

Mysql orderby hierarchical text, yet sort as a number?

Pad the numbers in lineage with zeros so you can properly order them as strings.

Of course for proper padding you need to know the max number you want to store there.

There are other ways to store hierarchic tree relations (parent links, nested tree sets). Your approach has some benefits for specific tasks, but, as you can see, also has its limits.



Related Topics



Leave a reply



Submit