How can I use my sql knowledge with Cloudant/CouchDB?
This community wiki page provides links to some SQL patterns showing how they may be implemented in Cloudant.
As this is a community wiki feel free to add new Q&A links here.
General database queries
- Joining two documents by key?
- How do I do the SQL equivalent of “DISTINCT” in CouchDB?
- How do I do a triple join in CouchDB?
- Inner joining by document field?
- How do I do subqueries?
- How to do a one-to-many join?
- How to do many-to-many joins?
- multiple WHERE and OR clauses
- WHERE ... IN query
- WHERE ... NOT IN query
- Performing queries across databases
- Can I do transactions and locks in CouchDB?
Cloudant SQL Join cookbook
Add your new Q&A links here ...
Aggregate database queries
- COUNT(*) aggregate function?
SUM * GROUP BY queries?
Add your new Q&A links here ...
Full text queries (lucene) - Cloudant only
- Add your new Q&A links here ...
See also
- View cookbook for SQL jockeys
Cloudant: Searching across databases
It is not possible to perform joins across databases using CouchDB or Cloudant. You will need to either:
- put all your data in a single database and query that
- have separate databases and replicate the data from each to a single database and query that
- have separate databases and perform the join functionality in your application tier
I've added this question to: How can I use my sql knowledge with Cloudant/CouchDB?
Performing a WHERE - IN query in CouchDB
You need to use views keys
query parameter to get records with keys in specified set.
function(doc){
emit(doc.table.id, null);
}
And then
GET /db/_design/ddoc_name/_view/by_table_id?keys=[2,4,56]
To retrieve document content in same time just add include_docs=True
query parameter to your request.
UPD: Probably, you might be interested to retrieve documents by this reference ids (2,4,56). By default CouchDB views "maps" emitted keys with documents they belongs to. To tweak this behaviour you could use linked documents trick:
function(doc){
emit(doc.table.id, {'_id': doc.table.id});
}
And now request
GET /db/_design/ddoc_name/_view/by_table_id?keys=[2,4,56]&include_docs=True
will return rows with id
field that points to document that holds 2
,4
and 56
keys and doc
one that contains referenced document content.
Cloudant 1 to many function
You're getting in the right direction! You already got that right with the global IDs. Having the type of the document as part of the ID in some form is a very good idea, so that you don't get confused later (all documents are in the same "pot").
Here are some minor problems with your current solution (before getting to your actual question):
Don't emit the doc as value in
emit(key, value)
. You can always ask for the document that belongs to a view row by querying withinclude_docs=true
. Having the doc as view value increases the view indexes a lot. When you don't need a specific value, useemit(key, null)
.You also don't need the ID in the emit value. You'll get the ID of the document that belongs to a view row as part of the row anyway.
View Collation
Now to your problem of aggregating the vehicles with their user. You got the basic pattern right. This pattern is called view collation, you can read more about it in the CouchDB docs (ignore that it is in the "Couchapp" section).
The trick with view collation is that you return two or more types of documents, but make sure that they are sorted in a way that allows for direct grouping. Thus it is important to understand how CouchDB sorts the view result. See the collation specification for more information on that one. An important key to understanding view collation is that rows with array keys are sorted by key elements. So when two rows have the same key[0]
, they sort by key[1]
. If that's equal as well, key[2]
is considered, and so on.
Your map function frist groups users and vehicles by user ID (key[0]
). Your map function then uses the fact that 0
sorts before 1
in the second element of the key, so your view will contain the following:
- user 1
- vehicle of user 1
- vehicle of user 1
- vehicle of user 1
- user 2
- user 3
- vehicle of user 3
- user 4
- etc.
As you can see, the vehicles of a user immediately follow their user. Thus you can group this result into aggregates without performing expensive sort or lookup operations.
Note that users are sorted according to their ID, and vehicles within users also according to their ID. This is because you use the IDs in the key array.
Creating Queries
Now that view isn't worth much if you can't query according to your needs. A view as you have it supports the following queries:
- Get all users with their vehicles
- Get a range of users with their vehicles
- Get a single user with its vehicles
- Get a single user without vehicles (you could also use the
_all_docs
view for that though)
Example query for "all users between user 1 and user 3 (inclusive) with their vehicles"
We want to query for a range, so we use startkey
and endkey
in the query:
startkey=["user:1", 0]
endkey=["user:3", 1, {}]
Note the use of {}
as sentinel value, which is required so that the end key is larger than any row that has a key of ["user:3", 1, (anyConceivableVehicleId)]
How to treat Cloudant (CouchDB) as a document stack?
There are a number of options for retrieving documents from Cloudant. Views are the underlying technology that allow you to query, sort, and aggregate documents. In your particular example it sounds like you just want to get the most (or least) recent document. You can do this with a view, or in Cloudant you can simply create an index.
Suppose you have a date field called create_date
. In Cloudant you can create an index like so (go to Query then click edit next to "Your available indexes"):
{
"index": {
"fields": [
"create_date"
]
},
"type": "json"
}
This will create a view and you will see it listed under "Design Documents". You can query that view in the dashboard as follows:
{
"selector": {
"create_date": {
"$gt": 0
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"create_date": "desc"
}
],
"limit": 1
}
Note, I have limited my query to 1 document. This will return the most recent document added to Cloudant. To retrieve the earliest document added to Cloudant change the sort to "create_date": "asc"
.
You can run this outside of the dashboard using an HTTP POST call to /db/_find/
. See this link for more information:
https://docs.cloudant.com/cloudant_query.html#finding-documents-using-an-index
UPDATE: Using text indexes and bookmarks
The above approach assumes you are going to delete each document and re-run the query every time. If you used an ascending sort you would always process the documents in order, but if you used a descending sort you could process newer documents as they are inserted.
An alternative approach would be to use bookmarks (as suggested by the OP in the comments below). To do see first create a text index in Cloudant:
{
"index": {},
"type": "text"
}
Run the same query as above. The results will now include a bookmarks
field similar to the following:
{
"docs":[{
"_id":"aa279ae2835f51d8ea13ee3e6ae3a210",
"_rev":"1-e90f3814f49b3e89158f8d2337de89cb"}
],
"bookmark": "g1AAAAD4eJzLYWBgYM5gTmHQSElKzi9KdUhJMtRLytVNSczRLS5JzEtJLEox1EvOyS9NScwr0ctLLckB6mBKUgCSSfb____PAvPdHK_uzd_TwMCQKJ1Fuml5LECSYQGQAhq4H2HiAWEHoIkKaCaaE23iAYiJ9xEmHhY7AHZjFgAnFk_X"
}
In subsequent queries you can pass the bookmark to traverse the documents in order:
{
"selector": {
"create_date": {
"$gt": 0
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"create_date": "desc"
}
],
"limit": 1,
"bookmark" : "g1AAAAD4eJzLYWBgYM5gTmHQSElKzi9KdUhJMtRLytVNSczRLS5JzEtJLEox1EvOyS9NScwr0ctLLckB6mBKUgCSSfb____PAvPdHK_uzd_TwMCQKJ1Fuml5LECSYQGQAhq4H2HiAWEHoIkKaCaaE23iAYiJ9xEmHhY7AHZjFgAnFk_X"
}
More information about bookmarks can be found here:
https://docs.cloudant.com/cloudant_query.html#working-with-indexes
Can i view Cloudant Local Documents via browser UI
The local documents are invisible from the UI and are only accessible via the API if you know the document key. For example, if you created this document:
{"_id":"_local/bob",
"name":"Bob",
"surname":"TheBuilder"}
... you can really only retrieve it by doing:
https://<service_url>/<dbname>/_local/bob
OR
you can see it in the UI by modifying the dashboard url like this:
https://<service_url>/dashboard.html#database/<dbname>/_local/bob
Using local documents is quite a niche use case. They are used by the Cloudant replicator to store state that doesn't need to be replicated.
When using PouchDB it can be useful to use local documents in an in-browser PouchDB database to store local state/config that you don't want replicated to the Cloud.
How to query PouchDB with SQL-like operators
As much as I wanted to use the pouchdb-find
plugin, I couldn't find a way to achieve what I needed. Instead I used a workaround:
Change the document structure to store label IDs in an array
{_id: 'record/1', name: 'Record 1', labels: ['label/1', 'label/2', 'label/3']},
// may not be sorted when being stored
{_id: 'record/2', name: 'Record 2', labels: ['label/1', 'label/5', 'label/7', 'label/3']},
{_id: 'record/3', name: 'Record 3', labels: ['label/2', 'label/3', 'label/4', 'label/5']}
Create a design document
It will emit multiple complex keys for each record to represent all possible label-mappings in an ascending order. The map
function will utilize a recursive process to generate the keys:
{
_id: '_design/records-with-labels',
views: {
'records-with-labels': {
map: function(doc) {
// important: sort them so that the required keys to be created are lesser
var labelIds = doc.labels.sort();
var lastIx = labelIds.length - 1;
var emitKey = function emitKey(currentKey, currentIx) {
console.log('emitting: ' + currentKey.join(',') + ' for ' + doc._id);
emit(currentKey, null);
var nextIx = currentIx + 1;
for (var jumpIx = nextIx + 1; jumpIx <= lastIx; jumpIx++) {
var jumpedLabelId = labelIds[jumpIx];
var jumpingKey = currentKey.concat([jumpedLabelId]);
console.log('emitting: ' + jumpingKey.join(',') + ' for ' + doc._id);
emit(jumpingKey, null);
}
if (nextIx > lastIx) {
return;
}
var nextLabelId = labelIds[nextIx];
currentKey.push(nextLabelId);
emitKey(currentKey, currentIx + 1);
};
labelIds.forEach(function(labelId, i) {
emitKey([labelId], i);
});
}.toString()
}
}
}
For example, the document record/1
will have these keys generated:
emitting: label/1 for record/1
emitting: label/1,label/3 for record/1
emitting: label/1,label/2 for record/1
emitting: label/1,label/2,label/3 for record/1
emitting: label/2 for record/1
emitting: label/2,label/3 for record/1
emitting: label/3 for record/1
Querying
I just need to ensure that the query labels are sorted in ascending order.
To query records that have 'label/1' and 'label/3':
Db.query('records-with-labels', {
key: ['label/1', 'label/3']
});
To query records that have 'label/3' or 'label/3':
Db.query('records-with-labels', {
keys: [['label/1'], ['label/3']]
});
This will give us duplicated records that has both labels, but a reduce function should help eliminating them.
Conclusion
For now I don't know if there is a better solution, but this is good enough for me because in my case, a record will not have too many labels.
If you have better suggestions, please comment or edit the answer.
Related Topics
SQL Server 2005 Row_Number() Without Order By
Pass String Variable in R Script to Use It in SQL Statement
Representing Ecommerce Products and Variations Cleanly in the Database
Query Runs Slow with Date Expression, But Fast with String Literal
How to Use an Alias in a Postgresql Order by Clause
SQL Server 2008 - Help Writing Simple Insert Trigger
How to Split String Using Delimiter Char Using T-Sql
Insert Select Statement in Oracle 11G
MySQL #1140 - Mixing of Group Columns
Alternate of Lead Lag Function in SQL Server 2008
Calculate the Number of Records for Each Date Between 2 Dates
Hierarchical Queries in SQL Server 2005
Why Do We Need Group by with Aggregate Functions
SQL Error "Ora-01722: Invalid Number"
Script to Kill All Connections to a Database (More Than Restricted_User Rollback)