How to Use My SQL Knowledge with Cloudant/Couchdb

How can I use my sql knowledge with Cloudant/CouchDB?

This community wiki page provides links to some SQL patterns showing how they may be implemented in Cloudant.

As this is a community wiki feel free to add new Q&A links here.

General database queries

  • Joining two documents by key?
  • How do I do the SQL equivalent of “DISTINCT” in CouchDB?
  • How do I do a triple join in CouchDB?
  • Inner joining by document field?
  • How do I do subqueries?
  • How to do a one-to-many join?
  • How to do many-to-many joins?
  • multiple WHERE and OR clauses
  • WHERE ... IN query
  • WHERE ... NOT IN query
  • Performing queries across databases
  • Can I do transactions and locks in CouchDB?
  • Cloudant SQL Join cookbook

  • Add your new Q&A links here ...

Aggregate database queries

  • COUNT(*) aggregate function?
  • SUM * GROUP BY queries?

  • Add your new Q&A links here ...

Full text queries (lucene) - Cloudant only

  • Add your new Q&A links here ...

See also

  • View cookbook for SQL jockeys

Cloudant: Searching across databases

It is not possible to perform joins across databases using CouchDB or Cloudant. You will need to either:

  • put all your data in a single database and query that
  • have separate databases and replicate the data from each to a single database and query that
  • have separate databases and perform the join functionality in your application tier

I've added this question to: How can I use my sql knowledge with Cloudant/CouchDB?

Performing a WHERE - IN query in CouchDB

You need to use views keys query parameter to get records with keys in specified set.

function(doc){
emit(doc.table.id, null);
}

And then

GET /db/_design/ddoc_name/_view/by_table_id?keys=[2,4,56]

To retrieve document content in same time just add include_docs=True query parameter to your request.

UPD: Probably, you might be interested to retrieve documents by this reference ids (2,4,56). By default CouchDB views "maps" emitted keys with documents they belongs to. To tweak this behaviour you could use linked documents trick:

function(doc){
emit(doc.table.id, {'_id': doc.table.id});
}

And now request

GET /db/_design/ddoc_name/_view/by_table_id?keys=[2,4,56]&include_docs=True

will return rows with id field that points to document that holds 2,4 and 56 keys and doc one that contains referenced document content.

Cloudant 1 to many function

You're getting in the right direction! You already got that right with the global IDs. Having the type of the document as part of the ID in some form is a very good idea, so that you don't get confused later (all documents are in the same "pot").

Here are some minor problems with your current solution (before getting to your actual question):

  • Don't emit the doc as value in emit(key, value). You can always ask for the document that belongs to a view row by querying with include_docs=true. Having the doc as view value increases the view indexes a lot. When you don't need a specific value, use emit(key, null).

  • You also don't need the ID in the emit value. You'll get the ID of the document that belongs to a view row as part of the row anyway.

View Collation

Now to your problem of aggregating the vehicles with their user. You got the basic pattern right. This pattern is called view collation, you can read more about it in the CouchDB docs (ignore that it is in the "Couchapp" section).

The trick with view collation is that you return two or more types of documents, but make sure that they are sorted in a way that allows for direct grouping. Thus it is important to understand how CouchDB sorts the view result. See the collation specification for more information on that one. An important key to understanding view collation is that rows with array keys are sorted by key elements. So when two rows have the same key[0], they sort by key[1]. If that's equal as well, key[2] is considered, and so on.

Your map function frist groups users and vehicles by user ID (key[0]). Your map function then uses the fact that 0 sorts before 1 in the second element of the key, so your view will contain the following:

  • user 1
  • vehicle of user 1
  • vehicle of user 1
  • vehicle of user 1
  • user 2
  • user 3
  • vehicle of user 3
  • user 4
  • etc.

As you can see, the vehicles of a user immediately follow their user. Thus you can group this result into aggregates without performing expensive sort or lookup operations.

Note that users are sorted according to their ID, and vehicles within users also according to their ID. This is because you use the IDs in the key array.

Creating Queries

Now that view isn't worth much if you can't query according to your needs. A view as you have it supports the following queries:

  • Get all users with their vehicles
  • Get a range of users with their vehicles
  • Get a single user with its vehicles
  • Get a single user without vehicles (you could also use the _all_docs view for that though)

Example query for "all users between user 1 and user 3 (inclusive) with their vehicles"

We want to query for a range, so we use startkey and endkey in the query:

startkey=["user:1", 0]
endkey=["user:3", 1, {}]

Note the use of {} as sentinel value, which is required so that the end key is larger than any row that has a key of ["user:3", 1, (anyConceivableVehicleId)]

How to treat Cloudant (CouchDB) as a document stack?

There are a number of options for retrieving documents from Cloudant. Views are the underlying technology that allow you to query, sort, and aggregate documents. In your particular example it sounds like you just want to get the most (or least) recent document. You can do this with a view, or in Cloudant you can simply create an index.

Suppose you have a date field called create_date. In Cloudant you can create an index like so (go to Query then click edit next to "Your available indexes"):

{
"index": {
"fields": [
"create_date"
]
},
"type": "json"
}

This will create a view and you will see it listed under "Design Documents". You can query that view in the dashboard as follows:

{
"selector": {
"create_date": {
"$gt": 0
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"create_date": "desc"
}
],
"limit": 1
}

Note, I have limited my query to 1 document. This will return the most recent document added to Cloudant. To retrieve the earliest document added to Cloudant change the sort to "create_date": "asc".

You can run this outside of the dashboard using an HTTP POST call to /db/_find/. See this link for more information:

https://docs.cloudant.com/cloudant_query.html#finding-documents-using-an-index

UPDATE: Using text indexes and bookmarks

The above approach assumes you are going to delete each document and re-run the query every time. If you used an ascending sort you would always process the documents in order, but if you used a descending sort you could process newer documents as they are inserted.

An alternative approach would be to use bookmarks (as suggested by the OP in the comments below). To do see first create a text index in Cloudant:

{
"index": {},
"type": "text"
}

Run the same query as above. The results will now include a bookmarks field similar to the following:

{
"docs":[{
"_id":"aa279ae2835f51d8ea13ee3e6ae3a210",
"_rev":"1-e90f3814f49b3e89158f8d2337de89cb"}
],
"bookmark": "g1AAAAD4eJzLYWBgYM5gTmHQSElKzi9KdUhJMtRLytVNSczRLS5JzEtJLEox1EvOyS9NScwr0ctLLckB6mBKUgCSSfb____PAvPdHK_uzd_TwMCQKJ1Fuml5LECSYQGQAhq4H2HiAWEHoIkKaCaaE23iAYiJ9xEmHhY7AHZjFgAnFk_X"
}

In subsequent queries you can pass the bookmark to traverse the documents in order:

{
"selector": {
"create_date": {
"$gt": 0
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"create_date": "desc"
}
],
"limit": 1,
"bookmark" : "g1AAAAD4eJzLYWBgYM5gTmHQSElKzi9KdUhJMtRLytVNSczRLS5JzEtJLEox1EvOyS9NScwr0ctLLckB6mBKUgCSSfb____PAvPdHK_uzd_TwMCQKJ1Fuml5LECSYQGQAhq4H2HiAWEHoIkKaCaaE23iAYiJ9xEmHhY7AHZjFgAnFk_X"
}

More information about bookmarks can be found here:

https://docs.cloudant.com/cloudant_query.html#working-with-indexes

Can i view Cloudant Local Documents via browser UI

The local documents are invisible from the UI and are only accessible via the API if you know the document key. For example, if you created this document:

{"_id":"_local/bob",
"name":"Bob",
"surname":"TheBuilder"}

... you can really only retrieve it by doing:

https://<service_url>/<dbname>/_local/bob

OR
you can see it in the UI by modifying the dashboard url like this:

https://<service_url>/dashboard.html#database/<dbname>/_local/bob

Using local documents is quite a niche use case. They are used by the Cloudant replicator to store state that doesn't need to be replicated.

When using PouchDB it can be useful to use local documents in an in-browser PouchDB database to store local state/config that you don't want replicated to the Cloud.

How to query PouchDB with SQL-like operators

As much as I wanted to use the pouchdb-find plugin, I couldn't find a way to achieve what I needed. Instead I used a workaround:

Change the document structure to store label IDs in an array

{_id: 'record/1', name: 'Record 1', labels: ['label/1', 'label/2', 'label/3']},
// may not be sorted when being stored
{_id: 'record/2', name: 'Record 2', labels: ['label/1', 'label/5', 'label/7', 'label/3']},
{_id: 'record/3', name: 'Record 3', labels: ['label/2', 'label/3', 'label/4', 'label/5']}

Create a design document

It will emit multiple complex keys for each record to represent all possible label-mappings in an ascending order. The map function will utilize a recursive process to generate the keys:

{
_id: '_design/records-with-labels',
views: {
'records-with-labels': {
map: function(doc) {
// important: sort them so that the required keys to be created are lesser
var labelIds = doc.labels.sort();
var lastIx = labelIds.length - 1;

var emitKey = function emitKey(currentKey, currentIx) {
console.log('emitting: ' + currentKey.join(',') + ' for ' + doc._id);
emit(currentKey, null);

var nextIx = currentIx + 1;

for (var jumpIx = nextIx + 1; jumpIx <= lastIx; jumpIx++) {
var jumpedLabelId = labelIds[jumpIx];
var jumpingKey = currentKey.concat([jumpedLabelId]);

console.log('emitting: ' + jumpingKey.join(',') + ' for ' + doc._id);
emit(jumpingKey, null);
}

if (nextIx > lastIx) {
return;
}

var nextLabelId = labelIds[nextIx];

currentKey.push(nextLabelId);

emitKey(currentKey, currentIx + 1);
};

labelIds.forEach(function(labelId, i) {
emitKey([labelId], i);
});

}.toString()
}
}
}

For example, the document record/1 will have these keys generated:

emitting: label/1 for record/1
emitting: label/1,label/3 for record/1
emitting: label/1,label/2 for record/1
emitting: label/1,label/2,label/3 for record/1
emitting: label/2 for record/1
emitting: label/2,label/3 for record/1
emitting: label/3 for record/1

Querying

I just need to ensure that the query labels are sorted in ascending order.

To query records that have 'label/1' and 'label/3':

Db.query('records-with-labels', {
key: ['label/1', 'label/3']
});

To query records that have 'label/3' or 'label/3':

Db.query('records-with-labels', {
keys: [['label/1'], ['label/3']]
});

This will give us duplicated records that has both labels, but a reduce function should help eliminating them.

Conclusion

For now I don't know if there is a better solution, but this is good enough for me because in my case, a record will not have too many labels.

If you have better suggestions, please comment or edit the answer.



Related Topics



Leave a reply



Submit