Fast Relational method of storing tree data (for instance threaded comments on articles)
I really like how Drupal solves this problem. It assigns a thread id to each comment. This id starts at 1 for the first comment. If a reply is added to this comment, the id 1.1
is assigned to it. A reply to comment 1.1
is given the thread id 1.1.1
. A sibling of comment 1.1
is given the thread id 1.2
. You get the idea. Calculating these thread ids can be done easily with one query when a comment is added.
When the thread is rendered, all of the comments that belong to the thread are fetched in a single query, sorted by the thread id. This gives you the threads in the ascending order. Furthermore, using the thread id, you can find the nesting level of each comment, and indent it accordingly.
1
1.1
1.1.1
1.2
1.2.1
There are a few issues to sort out:
- If one component of the thread id grows to 2 digits, sorting by thread id will not produce the expected order. An easy solution is ensuring that all components of a thread id are padded by zeros to have the same width.
- Sorting by descending thread id does not produce the expected descending order.
Drupal solves the first issue in a more complicated way using a numbering system called vancode. As for the second issue, it is solved by appending a backslash (whose ASCII code is higher than digits) to thread ids when sorting by descending order. You can find more details about this implementation by checking the source code of the comments module (see the big comment before the function comment_get_thread).
How to represent the data for threaded comments(along with comment voting) in mongodb?
Just store the comments as you want them represented on your blog. You want threaded/nested comments? Then store them in a nested fashion:
postId: {
comments: [
{
id: "47cc67093475061e3d95369d" // ObjectId
title: "Title of comment",
body: "Comment body",
timestamp: 123456789,
author: "authorIdentifier",
upVotes: 11,
downVotes: 2,
comments: [
{
id: "58ab67093475061e3d95a684"
title: "Nested comment",
body: "Hello, this is a nested/threaded comment",
timestamp: 123456789,
author: "authorIdentifier",
upVotes: 11,
downVotes: 2,
comments: [
// More nested comments
]
}
]
},
{
// Another top-level comment
}
]
}
The postId
refers to the blog post to which the comments belong and has been used as the key (or _id
in MongoDB) of the document. Each comment has a unique id
, in order to vote or comment on individual comments.
To get the aggregated votes, you'll need to write map-reduce functions somewhere along these lines:
function map() {
mapRecursive(this.comments)
}
function mapRecursive(comments) {
comments.forEach(
function (c) {
emit(comment.author, { upVotes: c.upVotes, downVotes: c.downVotes });
mapRecursive(c.comments);
}
);
}
function reduce(key, values) {
var upVotes = 0;
var downVotes = 0;
values.forEach(
function(votes) {
upVotes += votes.upVotes;
downVotes += votes.downVotes;
}
);
return { upVotes: upVotes, downVotes: downVotes };
}
I haven't tested these functions and they don't check for null
values either. That's up to you :)
Mysql orderby hierarchical text, yet sort as a number?
Pad the numbers in lineage with zeros so you can properly order them as strings.
Of course for proper padding you need to know the max number you want to store there.
There are other ways to store hierarchic tree relations (parent links, nested tree sets). Your approach has some benefits for specific tasks, but, as you can see, also has its limits.
Related Topics
SQL Server: How to Get All Child Records Given a Parent Id in a Self Referencing Table
Declare Variable for a Query String
How to Find "Holes" in a Table
Move Data from One Table to Another, Postgresql Edition
Which Is Better: Bookmark/Key Lookup or Index Seek
Calling Stored Procedure from Another Stored Procedure SQL Server
How to Escape Non-Format Characters in Oracle's To_Char
SQL Bulk Insert with Firstrow Parameter Skips the Following Line
How to Generate Crud Stored Procedures from a Table in SQL Server Management Studio
Pivot/Crosstab Query in Oracle 10G (Dynamic Column Number)
The Best Way to Use a Db Table as a Job Queue (A.K.A Batch Queue or Message Queue)
Standard Use of 'Z' Instead of Null to Represent Missing Data
Tricks for Generating SQL Statements in Excel
Select Multiple Rows with the Same Value(S)
Between Clause Versus <= and >=
How to Count the Number of Times a Character Appears in a SQL Column
Export from SQL Server 2012 to .CSV Through Management Studio