Perform Join Query in Google Cloud Firestore

Perform JOIN query in google cloud firestore

Firestore does not have the concept of a server-side JOIN. All documents in a single read operation must come from the same collection.

That means that to get data from multiple collections, you will need to perform multiple read operations - at least one per collection, but possibly more. This is normal in most NoSQL databases, and not nearly as slow as many developers think for the amount of data you should read from a client-side app.

If the number of documents you need to read is prohibitive for your application, consider changing your data model to require fewer reads. Typically this means that you'll end up duplicating some of the data into a format that is more easy to read.

For example in your use-case you seem to have a social network. A common solution there is to store the complete feed for each user, so all the posts for people they follow, as a separate collection in the database.

So when a user writes a post, you write that post to the main posts collection, and also to the feed collection of each user that follows them. This operation is known as fanning out your data, and while it complicates the write operation and duplicates data, it makes the code that reads the data simpler, and much more scalable. Since in many applications read operations are far more common than write operations, many NoSQL data modelers consider this a valid trade-off.

This topic is incredibly broad and hard to do justice in a single answer, which is why I recommend you also:

  • read NoSQL data modeling
  • watch Getting to know Cloud Firestore

How to inner-join in firestore

As I was telling in the coments Firestore does not support multi collection querys cause its no relational DB. If you need to access multiple collections you would manage querys independently.

This is how I usually get related collections data (Sorry this is JS code but I dont know DART):

    var data = {};

//First you get users data
DocumentReference document = Firestore.collection("users")

document.get().then((snapshot) => {

//In this case I will store data in some object, so I can add events as an array for a key in each user object

snapshot.forEach((userDoc) => {
var userDocData = userDoc.data()

if (data[userDoc.id] == undefined) {
data[userDoc.id] = userDocData
}

})

//So in this moment data object contains users, now fill users with events data

//In this var you count how many async events have been downloaded, with results or not.
var countEvents = 0

Object.keys(data).forEach((userDocId) => {

//Here Im creating another query to get all events for each user

SnapshotReference eventsForCurrentUserRef = Firestore.collection("events").where("userId", "==", userDocId)

eventsForCurrentUserRef.get.then((eventsForUserSnapshot) => {
//Count events
countEvents++

eventsForUserSnapshot.forEach((eventDoc) => {

var eventDocData = eventDoc.data()

//Check if array exists, if not create it
if (data[eventDocData.userId].events == undefined) {
data[eventDocData.userId].events = []
}

data[eventDocData.userId].events.push(eventDocData)

})

if(countEvents == Object.keys(data).length){
//Lookup for events in every user has finished
}

})

})

})

How to join multiple documents in a Cloud Firestore query?

I would do 1 user doc call and the needed posts call.

let users = {} ;
let loadedPosts = {};
db.collection('users').get().then((results) => {
results.forEach((doc) => {
users[doc.id] = doc.data();
});
posts = db.collection('posts').orderBy('timestamp', 'desc').limit(3);
posts.get().then((docSnaps) => {
docSnaps.forEach((doc) => {
loadedPosts[doc.id] = doc.data();
loadedPosts[doc.id].userName = users[doc.data().uid].name;
});
});

SQL like joins in Firestore

From my understanding SQL like JOINS does not exist in Firestore, and you have to retrieve the data in multiple queries.

Yes, that's correct, a "JOIN" type operation is not supported by Cloud Firestore. A Firestore query can only get documents from a single collection at a time. If you need to get data from both collections and add the results to a list, you'll have to perform two different queries.

The key to solving this problem is to attempt to get the results of the second query only when the first query completes. In code, these queries should look like this:

creditcardsRef.get().addOnCompleteListener(new OnCompleteListener<QuerySnapshot>() {
@Override
public void onComplete(@NonNull Task<QuerySnapshot> task) {
if (task.isSuccessful()) {
List<UserCardModel> userCardList = new ArrayList<>();
for (QueryDocumentSnapshot document : task.getResult()) {
userCardList.add(document.toObject(UserCardModel.class));

ownedcreditcardsRef.get().addOnCompleteListener(new OnCompleteListener<QuerySnapshot>() {
@Override
public void onComplete(@NonNull Task<QuerySnapshot> task) {
if (task.isSuccessful()) {
for (QueryDocumentSnapshot doc : task.getResult()) {
userCardList.add(doc.toObject(UserCardModel.class));
}

//Do what you need to do with the list
} else {
Log.d(TAG, "Error getting documents: ", task.getException());
}
}
});
}
} else {
Log.d(TAG, "Error getting documents: ", task.getException());
}
}
});

However, this approach might be considered a little bit expensive since you always read the content of both collections. A possible alternative you have is to introduce a third collection with data already "joined" from both collections. This practice is called denormalization and is a common practice when it comes to Firebase. For a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It's for Firebase Realtime Database, but the same rules apply to Cloud Firestore.

firebase firestore: Performing SQL-like joins

Bearing in mind two things about Firestore:

  • query execution time depends on the size of the result set, and not on the total number of documents
  • we cannot do join queries (this is a consequence of the constraint on execution time) *

a few solutions come to mind to handle your comments and users:

Data modeling

Duplicate the user info in the comments

This was your first approach and it is the suggested way for dealing with such cases in NoSQL, it is coined "de-normalization". As you pointed out, it means that when a user's name or avatar changes, you have to modify all associated comments with a Cloud Function.

However, as pointed out by @RobertKawecki (or is it Freddie? ), you should probably keep the initial name and avatar as it was the correct thing at the point of comment creation.

More details on this technique can be viewed in this Firebase presentation.

Store comments in user documents

You could reverse the problem and store comments in a subcollection in user documents. Then you can use collection group queries to retrieve all comments. Personally I might go this way ;)

Latency optimization

Now if you still don't want to duplicate the user data in the comments, performance can be improved over your initial code in a few ways.

The current latency of your function is 15 + 15 unit of time as you fetch 15 documents from the comments collection (I think you forgot the limit clause in your code), then 15 documents from the users collection (I think you forgot to add Promise.all in your code).

Use caching

Ok this one is basic and will bring little improvement. Suppose that in your 15 last comments, some belong to the same user. If you cache users while you retrieve them, you can spare querying multiple times for the same user. The latency will be reduced to 15 + (15 - number of duplicated users).

⛷️ Parallelize

In your function you fetch comments first, then fetch the needed user documents. You could parallelize the fetch using the stream API: stream your comments, as soon as you receive one, fetch the associated user document in parallel. The code is a little too involved to type it here. This would bring the latency to 15 + 1

☎️ Fetch the data from the front-end

This one can be a great latency saver. You can use Firestore front-end SDKs for web or mobile to query your database directly from the front-end. There you can use a snapshot listener on the comments collection, and fetch the associated user on each reveived comment (like in the parallelize solution), and display it right away.

This brings the latency to 2 . Hard to beat! And you also get offline support!

* As a side note, you can perform a join query if you maintain a join table, but it won't help you in your case as it requires another collection and Cloud Functions, and it wouldn't reduce latency. I wrote an article about it here.



Related Topics



Leave a reply



Submit