What Is the Correct Way to Structure This Kind of Data in Firestore

What is the correct way to structure this kind of data in Firestore?

What is the correct way to structure this kind of data in Firestore?

You need to know that there is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. The best and correct solution is the solution that fits your needs and makes your job easier. Bear also in mind that there is also no single "correct data structure" in the world of NoSQL databases. All data is modeled to allow the use-cases that your app requires. This means that what works for one app, may be insufficient for another app. So there is not a correct solution for everyone. An effective structure for a NoSQL type database is entirely dependent on how you intend to query it.

The way you are structuring your data looks good to me. In general, there are two ways in which you can achieve the same thing. The first one would be to keep a reference of the provider in the product object (as you already do) or to copy the entire provider object within the product document. This last technique is called denormalization and is a quite common practice when it comes to Firebase. So we often duplicate data in NoSQL databases, to suit queries that may not be possible otherwise. For a better understanding, I recommend you see this video, Denormalization is normal with the Firebase Database. It's for Firebase Realtime Database but the same principles apply to Cloud Firestore.

Also, when you are duplicating data, there is one thing that needs to keep in mind. In the same way, you are adding data, you need to maintain it. In other words, if you want to update/delete a provider object, you need to do it in every place that it exists.

You might wonder now, which technique is best. In a very general sense, the best way in which you can store references or duplicate data in a NoSQL database is completely dependent on your project's requirements.

So you should ask yourself some questions about the data you want to duplicate or simply keep it as references:

Is the static or will it change over time?
If it does, do you need to update every duplicated instance of the data so they all stay in sync? This is what I have also mentioned earlier.
When it comes to Firestore, are you optimizing for performance or cost?

If your duplicated data needs to change and stay in sync in the same time, then you might have a hard time in the future keeping all those duplicates up to date. This will also might imply you spend a lot of money keeping all those documents fresh, as it will require a read and write for each document for each change. In this case, holding only references will be the winning variant.

In this kind of approach, you write very little duplicated data (pretty much just the Provider ID). So that means that your code for writing this data is going to be quite simple and quite fast. But when reading the data, you will need to load the data from both collections, which means an extra database call. This typically isn't a big performance issue for reasonable numbers of documents, but definitely does require more code and more API calls.

If you need your queries to be very fast, you may want to prefer to duplicate more data so that the client only has to read one document per item queried, rather than multiple documents. But you may also be able to depend on local client caches makes this cheaper, depending on the data the client has to read.

In this approach, you duplicate all data for a provider for each product document. This means that the code to write this data is more complex, and you're definitely storing more data, one more provider object for each product document. And you'll need to figure out if and how to keep up to date on each document. But on the other hand, reading a product document now gives you all information about the provider document in one read.

This is a common consideration in NoSQL databases: you'll often have to consider write performance and disk storage vs. reading performance and scalability.

For your choice of whether or not to duplicate some data, it is highly dependent on your data and its characteristics. You will have to think that through on a case-by-case basis.

So in the end, remember that both are valid approaches, and neither of them is pertinently better than the other. It all depends on what your use-cases are and how comfortable you are with this new technique of duplicating data. Data duplication is the key to faster reads, not just in Cloud Firestore or Firebase Realtime Database but in general. Any time you add the same data to a different location, you're duplicating data in favor of faster read performance. Unfortunately in return, you have a more complex update and higher storage/memory usage. But you need to note that extra calls in Firebase real-time database, are not expensive, in Firestore are. How much duplication data versus extra database calls is optimal for you, depends on your needs and your willingness to let go of the "Single Point of Definition mindset", which can be called very subjective.

After finishing a few Firebase projects, I find that my reading code gets drastically simpler if I duplicate data. But of course, the writing code gets more complex at the same time. It's a trade-off between these two and your needs that determines the optimal solution for your app. Furthermore, to be even more precise you can also measure what is happening in your app using the existing tools and decide accordingly. I know that is not a concrete recommendation but that's software development. Everything is about measuring things.

Remember also, that some database structures are easier to be protected with some security rules. So try to find a schema that can be easily secured using Cloud Firestore Security Rules.

Please also take a look at my answer from this post where I have explained more about collections, maps and arrays in Firestore.

Whats the best way to structure Firestore database?

It's good to hear that the following structure:

What is the correct way to structure this kind of data in Firestore?

Answers your question, but regarding:

Do I get fewer reads & writes if you structure your database with more sub-collections/documents like the image above?

No, it doesn't really matter if you query a collection or a sub-collection, you'll always have to pay a number of reads that is equal to the number of documents that are returned.

Also which of the two options can reach usage limits faster?

The structure doesn't matter. What it really matters is the number of requests you perform.

Firestore data structure for two use cases

There is no "perfect", "the best" or "the correct" solution for structuring a Firestore database. We are usually structuring the database according to the queries that we intend to perform.

Regarding storing all the places in a single collection vs. having one collection per state, please note that there is no difference in terms of speed or costs. You'll always have to pay a number of reads that is equal to the number of documents that your query returns. However, if you need to display in your app, for example, all places of all states, then having a collection for each state, will require a separate query for each state.

Furthermore, regarding saving a list of places in a user's profile vs. storing only the IDs, it's a matter of measurement. You should measure how often the details within the places are changed. Remember that if a place is changed, then you should update that data in all places it exists. So if it's not changed so often then you can save the entire place object, otherwise, save only the ID.

How to structure Cloud Firestore data for an Instagram clone?

In the Cloud Firestore I have a collection of Users and I'm not sure how I should store the pictures.

We are usually structuring a Firestore database according to the queries that we want to perform. So if you have a clear picture of what the queries should be, then building the database schema might be very easy. Please also remember that there is no "perfect", "the best" or "the correct" solution for structuring a Cloud Firestore database. You build the structure according to the use-case of your app.

Should I have a collection of posts at the same level as the collection of users or should each User document point to a sub-collection of posts?

Queries in Firestore are shallow, meaning that it will only get documents from the collection that the query is run against. So it doesn't really matter if you have a top-level collection or a nested sub-collection inside a document, the Query will always return documents only from a single collection.

I feel like having each User Document point to a Sub-Collection of Posts makes more sense in terms of the organization of the Database.

Yes, that's right, if you add a sub-collection called "posts" under each User document, your schema will look more organized.

But at the same time, I think it would be more difficult to get the Posts from the Database, without specifying which user I'm getting the sub-collection from.

It doesn't! It makes no difference if you add it as a top-level collection, or as a sub-collection, you'll be able to query both very easily.

Suppose you have a schema that looks like this:

Firestore-root
  |
  --- users (collection)
  |    |
  |    --- $uid (document)
  |         |
  |         --- //fields
  |
  --- posts (collection)
       |
       --- $postId (document)
            |
            --- uid: "$uid"

To get all existing posts, you need to use the following CollectionReference object:

FirebaseFirestore rootRef = FirebaseFirestore.getInstance();
CollectionReference postsRef = rootRef.collection("posts");

And to get all the posts that correspond to a particular user, you need to use the following query:

Query query = postsRef.whereEqual("uid", uid);

However, if your database schema looks like this:

Firestore-root
  |
  --- users (collection)
       |
       --- $uid (document)
       |    |
       |    --- posts (collection)
       |          |
       |         --- $postId (document)
       |
       --- $uid (document)
            |
            --- posts (collection)
                 |
                 --- $postId (document)

To get all existing posts, you need to use a collection group query that looks like this:

FirebaseFirestore rootRef = FirebaseFirestore.getInstance();
CollectionReference postsRef = rootRef.collectionGroup("posts");

And to get all the posts that correspond to a particular user, you need to use the following CollectionReference:

CollectionReference postsRef = rootRef.collection("users").document(uid).collection("posts");

So it's up to you to decide which solution is better from this perspective.

Firebase Firestore database structure

is there an additional cost of querying this kind of massive collection?

The cost and performance of reading from Firestore are purely based on the amount of data (number of documents and their size) you retrieve, and not in any way on the number of documents in the collection.

But what is limited in Firestore is the number of writes you can do to data that is "close to each other". That intentionally vague definition means that it's typically better for write scalability to spread the data over separate subcollections, if the data naturally lends itself to that (such as in your case).

To get a great introduction to Firestore, and to data modeling trade-offs, watch Getting to know Cloud Firestore.

Firestore - what is best data structure for my case(performance/price)?

There is no singular best structure here, it all depends on the use-cases of your app.

The performance of a read operation in Firestore purely depends on the amount of data retrieved, and not on the size of the database. So it makes no difference if you read 20 user documents from a collection of 100 documents in total, or if there are 100 million documents in there - the performance will be the same.

What does make a marginal difference is the number of API calls you need to make. So loading 20 user documents with 20 cals will be slower than loading them with 1 call. But if you use a single collection group query to load the documents from multiple collections of the same name, that's the same performance again - as you're loading 20 documents with a single API call.

The cost is also going to be the same, as you pay for the number of documents read and the bandwidth consumed by those documents, which is the same in these scenarios.

I highly recommend watching the Getting to know Cloud Firestore video series to learn more about data modeling considerations and pricing when using Firestore.

Changing data structure after app has been published - Firestore

So the approach I have used is document versioning. There is an explanation here.

You basically version your documents so when you app reads them, it knows how to update those documents to get them to the desired version. So in your case, you would have no version, and need to get to version 1, which means read the sub-collections to the top collection and remove the sub collection before working with the document.

Yes it is more work, but allows an iterative approach to document changes. And sometimes, a script is written to update to the desired state and new code is deployed . Which usually happens when someone wants it done yesterday. Which with many documents can have it's own issues.

What Is the Correct Way to Structure This Kind of Data in Firestore