At What Point Is It Worth Using a Database

At what point is it worth using a database?

A database is worthwhile when:

  1. Your application evolves to some
    form of data driven execution.
  2. You're spending time designing and
    developing external data storage
    structures.
  3. Sharing data between applications or
    organizations (including individual
    people)
  4. The data is no longer short and
    simple.
  5. Data Duplication

Evolution to Data Driven Execution
When the data is changing but the execution is not, this is a sign of a data driven program or parts of the program are data driven. A set of configuration options is a sign of a data driven function, but the whole application may not be data driven. In any case, a database can help manage the data. (The database library or application does not have to be huge like Oracle, but can be lean and mean like SQLite).

Design & Development of External Data Structures
Posting questions to Stack Overflow about serialization or converting trees and lists to use files is a good indication your program has graduated to using a database. Also, if you are spending any amount of time designing algorithms to store data in a file or designing the data in a file is a good time to research the usage of a database.

Sharing Data
Whether your application is sharing data with another application, another organization or another person, a database can assist. By using a database, data consistency is easier to achieve. One of the big issues in problem investigation is that teams are not using the same data. The customer may use one set of data; the validation team another and development using a different set of data. A database makes versioning the data easier and allows entities to use the same data.

Complex Data
Programs start out using small tables of hard coded data. This evolves into using dynamic data with maps, trees and lists. Sometimes the data expands from simple two columns to 8 or more. Database theory and databases can ease the complexity of organizing data. Let the database worry about managing the data and free up your application and your development time. After all, how the data is managed is not as important as to the quality of the data and it's accessibility.

Data Duplication
Often times, when data grows, there is an ever growing attraction for duplicate data. Databases and database theory can minimize the duplication of data. Databases can be configured to warn against duplications.

Moving to using a database has many factors to be considered. Some include but are not limited to: data complexity, data duplication (including parts of the data), project deadlines, development costs and licensing issues. If your program can run more efficiently with a database, then do so. A database may also save development time (and money). There are other tasks that you and your application can be performing than managing data. Leave data management up to the experts.

When/why should I start using a database?

I would recommend using a lightweight database like SQLite which reduces a lot of the overhead of using a database.

http://sqlite.org

Is it faster to access data from files or a database server?

I'll add to the it depends crowd.

This is the kind of question that has no generic answer but is heavily dependent on the situation at hand. I even recently moved some data from a SQL database to a flat file system because the overhead of the DB, combined with some DB connection reliability issues, made using flat files a better choice.

Some questions I would ask myself when making the choice include:

  1. How am I consuming the data? For example will I just be reading from the beginning to the end rows in the order entered? Or will I be searching for rows that match multiple criteria?

  2. How often will I be accessing the data during one program execution? Will I go once to get all books with Salinger as the author or will I go several times to get several different authors? Will I go more than once for several different criteria?

  3. How will I be adding data? Can I just append a row to the end and that's perfect for my retrieval or will it need to be resorted?

  4. How logical will the code look in six months? I emphasize this because I think this is too often forgotten in designing things (not just code, this hobby horse is actually from my days as a Navy mechanic cursing mechanical engineers). In six months when I have to maintain your code (or you do after working another project) which way of storing and retrieving data will make more sense. If going from flat files to a DB results in a 1% efficiency improvement but adds a week of figuring things out when you have to update the code have you really improved things.

Is there ever a time where using a database 1:1 relationship makes sense?

A 1:1 relationship typically indicates that you have partitioned a larger entity for some reason. Often it is because of performance reasons in the physical schema, but it can happen in the logic side as well if a large chunk of the data is expected to be "unknown" at the same time (in which case you have a 1:0 or 1:1, but no more).

As an example of a logical partition: you have data about an employee, but there is a larger set of data that needs to be collected, if and only if they select to have health coverage. I would keep the demographic data regarding health coverage in a different table to both give easier security partitioning and to avoid hauling that data around in queries unrelated to insurance.

An example of a physical partition would be the same data being hosted on multiple servers. I may keep the health coverage demographic data in another state (where the HR office is, for example) and the primary database may only link to it via a linked server... avoiding replicating sensitive data to other locations, yet making it available for (assuming here rare) queries that need it.

Physical partitioning can be useful whenever you have queries that need consistent subsets of a larger entity.

NoSQL worthwhile usage

I wonder what "concept of NoSQL" you mean, because it is an umbrella term for a wide field of different database technologies. The only thing they have in common is what sets them apart from each other: they are "not (only) SQL". They have widely different philosophies, use-cases and target groups.

Just to give you an overview, here are a few of the large factions of NoSQL databases.

There are document-based databases like MongoDB or CouchDB. Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.

There are graph databases like Neo4j or GiraffeDB. Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.

Then you have simple key-value stores like MemcacheDB, Cassandra or Google's BigTable. They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.

And these are just a few facets of the new database world.

But there is still one sector where relational databases excel, and that's when it comes to following the ACID principle. Most NoSQL databases don't fully guarantee all four of these:

  • Atomic transactions (chains of commands which are processed together, n-order and all-or-none)
  • Consistent database schema with constraints and triggers which ensure that garbage data can not exist in the database.
  • Isolation of transactions - transactions which are guaranteed to be unaffected by others which happen at the same time.
  • Durability - safety from data-loss even in case of a sudden system crash*

(* to be fair, most of the databases listed above are indeed pretty durable, especially those which are easy to set up as redundant fail-over clusters.

When should I use a NoSQL database instead of a relational database? Is it okay to use both on the same site?

Relational databases enforces ACID. So, you will have schema based transaction oriented data stores. It's proven and suitable for 99% of the real world applications. You can practically do anything with relational databases.

But, there are limitations on speed and scaling when it comes to massive high availability data stores. For example, Google and Amazon have terabytes of data stored in big data centers. Querying and inserting is not performant in these scenarios because of the blocking/schema/transaction nature of the RDBMs. That's the reason they have implemented their own databases (actually, key-value stores) for massive performance gain and scalability.

NoSQL databases have been around for a long time - just the term is new. Some examples are graph, object, column, XML and document databases.

For your 2nd question: Is it okay to use both on the same site?

Why not? Both serves different purposes right?

Why do you create a View in a database?

A view provides several benefits.

1. Views can hide complexity

If you have a query that requires joining several tables, or has complex logic or calculations, you can code all that logic into a view, then select from the view just like you would a table.

2. Views can be used as a security mechanism

A view can select certain columns and/or rows from a table (or tables), and permissions set on the view instead of the underlying tables. This allows surfacing only the data that a user needs to see.

3. Views can simplify supporting legacy code

If you need to refactor a table that would break a lot of code, you can replace the table with a view of the same name. The view provides the exact same schema as the original table, while the actual schema has changed. This keeps the legacy code that references the table from breaking, allowing you to change the legacy code at your leisure.

These are just some of the many examples of how views can be useful.



Related Topics



Leave a reply



Submit