Firestore: How to Get Random Documents in a Collection

Firestore: How to get random documents in a collection

Using randomly generated indexes and simple queries, you can randomly select documents from a collection or collection group in Cloud Firestore.

This answer is broken into 4 sections with different options in each section:

  1. How to generate the random indexes
  2. How to query the random indexes
  3. Selecting multiple random documents
  4. Reseeding for ongoing randomness

How to generate the random indexes

The basis of this answer is creating an indexed field that when ordered ascending or descending, results in all the document being randomly ordered. There are different ways to create this, so let's look at 2, starting with the most readily available.

Auto-Id version

If you are using the randomly generated automatic ids provided in our client libraries, you can use this same system to randomly select a document. In this case, the randomly ordered index is the document id.

Later in our query section, the random value you generate is a new auto-id (iOS, Android, Web) and the field you query is the __name__ field, and the 'low value' mentioned later is an empty string. This is by far the easiest method to generate the random index and works regardless of the language and platform.

By default, the document name (__name__) is only indexed ascending, and you also cannot rename an existing document short of deleting and recreating. If you need either of these, you can still use this method and just store an auto-id as an actual field called random rather than overloading the document name for this purpose.

Random Integer version

When you write a document, first generate a random integer in a bounded range and set it as a field called random. Depending on the number of documents you expect, you can use a different bounded range to save space or reduce the risk of collisions (which reduce the effectiveness of this technique).

You should consider which languages you need as there will be different considerations. While Swift is easy, JavaScript notably can have a gotcha:

  • 32-bit integer: Great for small (~10K unlikely to have a collision) datasets
  • 64-bit integer: Large datasets (note: JavaScript doesn't natively support, yet)

This will create an index with your documents randomly sorted. Later in our query section, the random value you generate will be another one of these values, and the 'low value' mentioned later will be -1.

How to query the random indexes

Now that you have a random index, you'll want to query it. Below we look at some simple variants to select a 1 random document, as well as options to select more than 1.

For all these options, you'll want to generate a new random value in the same form as the indexed values you created when writing the document, denoted by the variable random below. We'll use this value to find a random spot on the index.

Wrap-around

Now that you have a random value, you can query for a single document:

let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
.order(by: "random")
.limit(to: 1)

Check that this has returned a document. If it doesn't, query again but use the 'low value' for your random index. For example, if you did Random Integers then lowValue is 0:

let postsRef = db.collection("posts")
queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: lowValue)
.order(by: "random")
.limit(to: 1)

As long as you have a single document, you'll be guaranteed to return at least 1 document.

Bi-directional

The wrap-around method is simple to implement and allows you to optimize storage with only an ascending index enabled. One downside is the possibility of values being unfairly shielded. E.g if the first 3 documents (A,B,C) out of 10K have random index values of A:409496, B:436496, C:818992, then A and C have just less than 1/10K chance of being selected, whereas B is effectively shielded by the proximity of A and only roughly a 1/160K chance.

Rather than querying in a single direction and wrapping around if a value is not found, you can instead randomly select between >= and <=, which reduces the probability of unfairly shielded values by half, at the cost of double the index storage.

If one direction returns no results, switch to the other direction:

queryRef = postsRef.whereField("random", isLessThanOrEqualTo: random)
.order(by: "random", descending: true)
.limit(to: 1)

queryRef = postsRef.whereField("random", isGreaterThanOrEqualTo: random)
.order(by: "random")
.limit(to: 1)

Selecting multiple random documents

Often, you'll want to select more than 1 random document at a time. There are 2 different ways to adjust the above techniques depending on what trade offs you want.

Rinse & Repeat

This method is straight forward. Simply repeat the process, including selecting a new random integer each time.

This method will give you random sequences of documents without worrying about seeing the same patterns repeatedly.

The trade-off is it will be slower than the next method since it requires a separate round trip to the service for each document.

Keep it coming

In this approach, simply increase the number in the limit to the desired documents. It's a little more complex as you might return 0..limit documents in the call. You'll then need to get the missing documents in the same manner, but with the limit reduced to only the difference. If you know there are more documents in total than the number you are asking for, you can optimize by ignoring the edge case of never getting back enough documents on the second call (but not the first).

The trade-off with this solution is in repeated sequences. While the documents are randomly ordered, if you ever end up overlapping ranges you'll see the same pattern you saw before. There are ways to mitigate this concern discussed in the next section on reseeding.

This approach is faster than 'Rinse & Repeat' as you'll be requesting all the documents in the best case a single call or worst case 2 calls.

Reseeding for ongoing randomness

While this method gives you documents randomly if the document set is static the probability of each document being returned will be static as well. This is a problem as some values might have unfairly low or high probabilities based on the initial random values they got. In many use cases, this is fine but in some, you may want to increase the long term randomness to have a more uniform chance of returning any 1 document.

Note that inserted documents will end up weaved in-between, gradually changing the probabilities, as will deleting documents. If the insert/delete rate is too small given the number of documents, there are a few strategies addressing this.

Multi-Random

Rather than worrying out reseeding, you can always create multiple random indexes per document, then randomly select one of those indexes each time. For example, have the field random be a map with subfields 1 to 3:

{'random': {'1': 32456, '2':3904515723, '3': 766958445}}

Now you'll be querying against random.1, random.2, random.3 randomly, creating a greater spread of randomness. This essentially trades increased storage to save increased compute (document writes) of having to reseed.

Reseed on writes

Any time you update a document, re-generate the random value(s) of the random field. This will move the document around in the random index.

Reseed on reads

If the random values generated are not uniformly distributed (they're random, so this is expected), then the same document might be picked a dispropriate amount of the time. This is easily counteracted by updating the randomly selected document with new random values after it is read.

Since writes are more expensive and can hotspot, you can elect to only update on read a subset of the time (e.g, if random(0,100) === 0) update;).

Firestore get random document with where condition

I agree with Frank, the link has everything you need. You can chain the where conditions.

After following the steps, you can query like:

Firestore.firestore().collection("quotes")
.whereField("categories", arrayContainsAny: ['change'])
.whereField("random", isGreaterThanOrEqualTo: random)
.limit(to: 25)

edit ahhh I see your point. I would work around this issue by querying again.

  • if the result is less than what's needed (25) run another query, something like
Firestore.firestore().collection("quotes")
.whereField("categories", arrayContainsAny: ['change'])
.whereField("random", isLessThan: random)
.limit(to: 25)

You can then store some of the results of both queries to get your 25

Flutter Firebase Firestore. How to get a random document?

I believe you'll have to implement that yourself, based on a random index that fall inside the length of documents in your collection, then just pull it.

Kind of like:

// first get a snapshot of your collection
QuerySnapshot collection = await FirebaseFirestore.instance.collection('YOUR_COLLECTION').get();

// based on how many documents you have in your collection
// just pull one random index
var random = Random().nextInt(collection.docs.length);

// then just get the document that falls under that random index
DocumentSnapshot randomDoc = collection.docs[random];

Get random documents from Firestore using Javascript

Typescript (remove the types if you use Javascript) version for getting random documents using random integers (and wrap-around). This answer uses Cloud function for testing but essentially you just need to copy the getDocuments function.

export const getRandomDoc = functions.https.onRequest(async (req, res): Promise<any> => {
try {
const randomDocs = await getDocuments(2)
return res.send(randomDocs)
} catch (err) {
return res.send(err)
}
});

const getDocuments = async (count: number): Promise<Array<FirebaseFirestore.DocumentData>> => {
const randomNum = Math.floor(Math.random() * 10000)
const snapshot = await admin.firestore().collection("col").where("rInt", ">=", randomNum).limit(count).get()
if (snapshot.empty) {
return getDocuments(count)
}
return snapshot.docs.map(d => d.data())
}

Whenever you add a new document to that collection, add the rInt field along with it which is an integer between 0 to 10000 (both inclusive). You pass the number of documents you need in the getDoucments function. It will fetch N consecutive matched docs though as this uses limit() method.

In the query we look for documents where rInt is greater than or equal to that random number generated using Math.random() and limit the results to count parameter passed in the function. If the snapshot is empty we retry the function. (It'll be worth to add a logic which makes this function repeat only N number of time else recursion will take it's time).


Using .limit() as in the function above will end up returning N documents in a row. By default, docs will be ordered by their document ID unless you specify any particular field using orderBy method. Instead making a separate request for each doc will increase the randomicity.

/**
* @param {number} count - number of documents to retrieve
* @param {number} loopNum - number of times the loop is being repeated. Defaults to 0
* @param {Array<any>} curDocs - array of documents matched so far. Defaults to an empty array
* @returns
*/
const getDocuments = async (count: number, loopNum: number, curDocs: Array<any>): Promise<Array<FirebaseFirestore.DocumentData>> => {
// Creating an array of requests to get documents
const requests = []
for (let i = 0; i < count; i++) {
// New random number for each request
const randomNum = Math.floor(Math.random() * 10000)
console.log(`Random Num: ${randomNum}`);
requests.push(admin.firestore().collection("col").where("rInt", ">=", randomNum).limit(1).get())
// limit is set to 1 so each request will return 1 document only
}

// Using Promise.all() to run all the promises
const snapshots = await Promise.all(requests)

// Removing empty snapshots
const filteredSnapshots = snapshots.filter(d => !d.empty)

// Creating an array of doc data
const matchedDocs = filteredSnapshots.map(doc => doc.docs[0].data())

// If documents received are less than requested,
// repeat the function
if (matchedDocs.length !== count) {
// If the function is repeated 5 times,
// return with whatever has matched so far
if (loopNum + 1 === 5) {
console.log("Returning CurDocs")
return curDocs
}
return getDocuments(count, loopNum + 1, [...curDocs, ...matchedDocs])
}

// Return if you get requested num of docs in first go
return snapshots.map(d => d.docs[0].data())
}

// Calling the function
getDocuments(5, 0, [])

Do note that you can use in operator if you are requesting less than or equal to 10 documents. The in operator combines up to 10 equality (==) clauses on the same field with a logical OR.


For original version (swift) by @Dan McGrath, check this answer.

Flutter Firestore retrieve random firestore document

In case of Auto-Id version, you need to create a random id (for _randomIndex) in the same way that firebase creates an id for a new document, like so.

import 'dart:math';

String getRandomGeneratedId() {
const int AUTO_ID_LENGTH = 20;
const String AUTO_ID_ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

const int maxRandom = AUTO_ID_ALPHABET.length;
final Random randomGen = Random();

final StringBuffer buffer = StringBuffer();
for (int i = 0; i < AUTO_ID_LENGTH; i++) {
buffer.write(AUTO_ID_ALPHABET[randomGen.nextInt(maxRandom)]);
}

return buffer.toString();
}

Is there any standard firestore query to get random documents?

Clever use of startAt and limit!

As you can see in the reference, there are no built-in methods that would return random documents.

In order to avoid the loop, you can use Promise.all:

const indices = getRandomIndices(collection.size);
const docs = await Promise.all(indices.map(i => {
return dbRef.startAt(i).limit(1).get();
}));

And for the getRandomIndices, I suggest: create an array [0, 1, 2, ...], shuffle it as describe in this SO answer, and slice its 5 first elements.

Get random doucments from Firestore

If you need 5 new random items for all your users in the application, then don't do that operation in Firestore, do it in the Realtime Database, it's much cheaper for choosing such random items. Both databases are working really well together in the same project. That being said, you can have a structure that looks like this:

Firebase-root
|
--- products
| |
| --- $productId: true
| |
| --- $productId: true
|
--- consumedProducts
|
--- $productId: true
|
--- $productId: true

There are two solutions to this problem. Every time you get 5 new random IDs from the "products" node, add them also to the "consumedProducts" node. To be able not to choose the same IDs again, always check if the new IDs are not already present in the "consumedProducts" node. After a while, when the "consumedProducts" will contain the same IDs as the "products" node, then you can simply remove it and start over again. The second solution might be to add those 5 elements into the "consumedProducts" and right after that delete them from "products" node. When the "products" node remains empty, do the same thing with the "consumedProducts".

Now according to the logic of your app, you should decide which one is better to be used, but remember, always keep in sync, the actual products from Firestore with corresponding IDs in the Realtime Database. For instance, if you add a new product in Firestore, add the corresponding ID in the Realtime Database node. That should happen also when you delete a product from Firestore.



Related Topics



Leave a reply



Submit