Database Eav Pros/Cons and Alternatives

Database EAV Pros/Cons and Alternatives

This is not to be considered an exhaustive answer, but just a few points on the topic.

Since the question is also tagged with the [sql] tag, let me say that, in general, relational databases aren't particularly suitable for storing data using the EAV model. You can still design an EAV model in SQL, but you will have to sacrifice many advantages that a relational database would give. Not only you won't be able to enforce referential integrity, use SQL data types for values and enforce mandatory attributes, but even the very basic queries can become difficult to write. In fact, to overcome this limitation, several EAV solutions rely on data duplication, instead of joining with related tables, which as you can imagine, has plenty of drawbacks.

If you really require a schemaless design, "allowing an unlimited number of attributes", your best bet is probably to use a NoSQL solution. Even though the weaknesses of EAV relative to relational databases also apply to NoSQL alternatives, you will be offered additional features that are difficult to achieve with conventional SQL databases. For example, usually NoSQL datastores can be scaled much easier than relational databases, simply because they were designed to solve some sort of scalability problem, and they intentionally dropped features that make scaling difficult.

Many cloud computing platforms (such as those offered by Amazon, Google and Microsoft) are featuring datastores based on the EAV model, where an arbitrary number of attributes can be associated with a given entity. If you are considering deploying your application to the cloud, you may consider this both as a business advantage, as well as a technical one, because the strong competition between the big vendors is pushing the value-to-cost ratios to very high levels, by continually pushing up on the features and pushing down the financial and implementation costs.

Entity Attribute Value Database vs. strict Relational Model Ecommerce

There's a few general pros and cons I can think of, there are situations where one is better than the other:

Option 1, EAV Model:

  • Pro: less time to design and develop a simple application
  • Pro: new entities easy to add (might even
    be added by users?)
  • Pro: "generic" interface components
  • Con: complex code required to validate simple data types
  • Con: much more complex SQL for simple
    reports
  • Con: complex reports can become almost
    impossible
  • Con: poor performance for large data sets

Option 2, Modelling each entity separately:

  • Con: more time required to gather
    requirements and design
  • Con: new entities must be modelled and
    designed by a professional
  • Con: custom interface components for each
    entity
  • Pro: data type constraints and validation simple to implement
  • Pro: SQL is easy to write, easy to
    understand and debug
  • Pro: even the most complex reports are relatively simple
  • Pro: best performance for large data sets

Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)

  • Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
  • Con: new entities must be modelled and designed by a professional
  • Pro: new attributes might be easily added later on
  • Con: complex code required to validate simple data types (for the custom attributes)
  • Con: custom interface components still required, but generic interface components may be possible for the custom attributes
  • Con: SQL becomes complex as soon as any custom attribute is included in a report
  • Con: good performance generally, unless you start need to search by or report by the custom attributes

* I'm not sure if Option 3 would necessarily save any time in the design phase.

Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.

Alternatives to the EAV model vs Hybrid Strategy vs simplifying and improving builds

After some thought, and considering the clients needs/requests, using an EAV model was the correct answer here.

After doing some more research I decided to use Postrgresql and make full use of its HSTORE data type, which allows storing, searching, and indexing of key value pairs in a single field.

Here is a paper benchmarking hstore vs EAV:
http://wiki.hsr.ch/Datenbanken/files/Benchmark_of_KVP_vs.hstore-_doc.pdf

The paper above benchmarks hstore vs an EAV table, and hstore came out way ahead.

Another option we considered was having a task table that covered all the bases:

id, name, value_1, value_2... note_1, notes_2

Obviously the thought of that killed me inside a bit, so I was either going to use a task_type attribute table:

a task is prescribed by an administrator to a user and has a task_type, the task_type_attributes are for all tasks of that type (ie, define that for a exercise task, we want to be able to store information about the intensity of the exercise, the time the exercise took etc).

Once the user brings up the task, they see the task_attributes as fields to fill out. They enter these fields, and the attribute_value they enter are then associated with the task_entry of the patient (which also states if they completed it, skipped it, etc)

task_attributes

  • id
  • task_type_id
  • attribute
  • attribute_value_type (for generating the desired fields on the app side - ie, knowing to have a dropdown vs a text input)
  • min_value
  • max_value
  • required

tasK_entry_values

  • task_entry_id
  • task_type_attribute_id
  • value

Hope this might be of use to someone. I'd also be interested in any and all criticism/feedback for this design.

Database Design: to EAV or not to EAV?

Typically, lots of empty cells are cheap and not worth normalizing away. The only draw back to #2 is if you have a very large number of rows (millions - where performance problems could arise), a very large number of columns (more than about 20 - where it's just annoying to look at the data), or there are a number of unique constraints on the EAV table.

With that said, it is now 2011 and it makes sense to use a programming framework with a database abstraction layer these days so that you're not designing database relationships directly. Something like Django's Object Relational Mapper allow you to focus on the models themselves and let best practices take care of themselves (95% of the time). This tutorial will help you get started. Django only applies to web development database modeling. For non-web environments, other frameworks will be better.

Database design: EAV options?

Although minimalist as shown, the attribute table of Model2 introduces the concept of meta-data into the mix, with all the good that comes from it. There are other advantages to Model2, for example the performance gains associated with smaller row size (of the Value table), but I'd like to focus on the meta-data concept.

Even as-is Model2's attribute table constitute a repository of all valid attributes (with model1 one would need to run an aggregate query of sorts to get such a list). Also, and as-is, the repository is sufficient to introduce foreign key constraints to help maintaining the integrity of the dataset (with Model 1 one would need external forms of validation of the values stored in attribute column.

With a few simple additions, the attribute table can become a versatile repository which can be used for various purposes. For example the table may include some of the following

  • info such as the display-friendly name of each attribute
  • some flags indicating the type of field (numeric vs. string vs. date etc.), for differentiated handling / processing
  • the particular Value table where the underlying attribute is stored (Model only shows one table but optimization/scaling sometimes prompts splitting the tables)
  • the fact that the attribute may be stored as its own column in the "Value" table (again a form of optimization, essentially getting the best of both worlds: the flexibility of the schema of the EAV model but the performance of traditional relational model for the attributes that are the most used and/or the most common to all entities.
  • the ability to rename attributes, without disturbing the main table. Changes at meta-data level only.
  • various application-oriented semantics. For example indicators that a particular attribute should be offered as one of the basic vs. advanced search fields.

In a nutshell, the attribute table becomes a resource which allows the application to be truly data-driven (or more precisely, meta data driven). Indeed you may also like an entity table i.e. one where the metadata pertaining to the various entities types are gathered: which are the different entity types, which attributes are allowed for which entity type etc.

Now... do pay heed to the comment from zerkms, below the question itself. For all its benefits, the EAV model also comes with its share of drawbacks and challenges, as hinted the complexity of the queries come to mind, and also performance issues. These concerns should however not disqualify, a priori, EAV: there are many use cases where EAV is a better approach.
Assuming EAV is the choice then Model2, or even something slighly more sophisticated is definitively superior to model1.

Is this a good case to use EAV or no

There's no short good or bad answer to this concern, because it depends of many things.

  • Do you have a lot of product types ?
  • How do you think each of them will evolve (think to what will happen when you will add new fields to products) ?
  • Do you need to handle "variants" of the products ?
  • Do you intend to add entirely new types of products ?

Etc.
EAV is probably a good way to go if you answer if you answer "yes" to some or all these questions.

Regarding C#, I have implemented in the past an EAV data catalog with it, and using Entity Framework over SQL Server (so a RDBMS).
It worked nice to me.

But if you need to handle a lot of products, performance can quickly become an issue. You could also look for a "NoSQL" solution, did you think about it ?

Just keep in mind that your model object does not have to match your data model.
For example you could perfectly have a stronly typed object for each type of product if you need so.



Related Topics



Leave a reply



Submit