Entity Attribute Value Database Vs. Strict Relational Model Ecommerce

Entity Attribute Value Database vs. strict Relational Model Ecommerce

There's a few general pros and cons I can think of, there are situations where one is better than the other:

Option 1, EAV Model:

  • Pro: less time to design and develop a simple application
  • Pro: new entities easy to add (might even
    be added by users?)
  • Pro: "generic" interface components
  • Con: complex code required to validate simple data types
  • Con: much more complex SQL for simple
    reports
  • Con: complex reports can become almost
    impossible
  • Con: poor performance for large data sets

Option 2, Modelling each entity separately:

  • Con: more time required to gather
    requirements and design
  • Con: new entities must be modelled and
    designed by a professional
  • Con: custom interface components for each
    entity
  • Pro: data type constraints and validation simple to implement
  • Pro: SQL is easy to write, easy to
    understand and debug
  • Pro: even the most complex reports are relatively simple
  • Pro: best performance for large data sets

Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)

  • Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
  • Con: new entities must be modelled and designed by a professional
  • Pro: new attributes might be easily added later on
  • Con: complex code required to validate simple data types (for the custom attributes)
  • Con: custom interface components still required, but generic interface components may be possible for the custom attributes
  • Con: SQL becomes complex as soon as any custom attribute is included in a report
  • Con: good performance generally, unless you start need to search by or report by the custom attributes

* I'm not sure if Option 3 would necessarily save any time in the design phase.

Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.

EAV - in an ecommerce case, is it really an anti-pattern?

It's useful to understand why EAV is so often considered an anti-pattern, and how well those criticisms apply to your case. It's also useful to understand why so many people are drawn to EAV, and what positive benefits they see in it.

The biggest problem with most EAV databases is that it is nearly impossible to write any kind of systematic and integrated extraction or reporting system, in order to turn the data into useful information.

The same sort of report that can be developed in less than an hour's work when given well managed data in a well designed relational database can take weeks to develop in an EAV database. The reason is that the data is fundamentally unanalyzed data that has been stored without regard for its inherent logical structure.

This turns out to be very closely related to the reason why some other people are so drawn to EAV. You can bypass data analysis completely. And logical database design is practically automatic, since all EAV databases have the same table structure. When you've built one, you've built them all.

This gives you a handle on a few things to look for in any case you might consider. Is the logical structure of the database truly dynamic and unknowable? Or are people just jumping to that conclusion, because the schedule allows no time for data analysis and database design?

How will the data be used? Are people expecting to do the kind of retrieval that is classically expected of data in a database? Or are they planning on treating each attempted retrieval as an exploration into uncharted territory?

In your particular case, I would ask how long it's going to be before somebody "important" asks for product data to be combined across companies, even though each of them maintains its own knowledge base of products and attributes? If that's really never going to happen, maybe you will be ok. If not, you'd better move on before management realizes how hard it is to use this data.

Should I use EAV model?

Great question, but of course, there is no "one true way". As per @BenV, Magento does use the EAV model. My experience with it has been overwhelmingly positive, however it does trip up other users. Some considerations:

1. Performance.
EAV requires complex, multi-table joins to populate your object with the relevant attributes. That does incur a performance hit. However, that can be mitigated through careful caching (at all levels through the stack, including query caching) and the selective use of denormalization. Magento does allow administrators to select a denormalized model for categories and products where the number of SKUs warrants it (generally in the thousands). That in turn requires Observers that trigger re-indexing (always good!) and updates to the "flat" denormalized tables when product data changes. That can also be scheduled or manually triggered with a prompt to the administrator.

2. 3rd Party User Complexity
If you ever plan to make this application available to other users, many will find EAV too complex and you'll end up dealing with a lot of bleating and uninformed abuse on the user forums (ref Magento!!).

3. Future extensibility and plugin architecture.
There is no doubt that the EAV model really comes into it's own when extensibility is a factor. It is very simple to add new attributes into the model while minimizing the risk of breaking existing ORM and controller code.

4. Changes in datatype
EAV does make it a little harder to alter attribute datatypes. If your initial design calls for a particular attribute datatype that changes in future (say int to varchar), it means that you will have to migrate all the records for that attribute to the corresponding table that matches the new datatype. Of course, purists would suggest that you get the design right first time, but reality does intrude sometimes!

5. Manual product imports
One thing that EAV makes almost impossible is importing products (or other entities) into the database using SQL and/or phpMyAdmin-style CSV/XML. You'll need to write an Importer module that accepts the structured data and passes it through the application's Model layer to persist it to the database. That does add to your complexity.

Database Design for ECommerce project (Should I use EAV Approach)

What you need is a combination of EAV for product features and nested sets for product categories.

While I certainly agree that EAV is almost always a bad choice, one application where EAV is the perfect choice is for handling product attributes in an online catalog.

Think about how websites show product attributes... The attributes of products are always shown as a vertical list with two columns: "Attribute" | "Value". Sometimes these lists show side-by-side comparisons of multiple products. EAV works perfectly for doing this kind of thing. The things that make EAV meaningless and inefficient for most applications are exactly what makes EAV meaningful and efficient for product attributes in an online catalog.

One of the reasons why everyone always says "EAV is EVIL!" is that the attributes in EAV are "meaningless" insofar as the column name (i.e. meaning of the attribute) is table-driven and is therefore not defined by the schema. The whole point of schemas is to give your model meaning so this point is well taken. However in the case of an online product catalog, the meaning of product attributes is really unimportant to the system, itself. The only reason your catalog system cares about product attributes is to dump them in a list or possibly in a product comparison matrix. Therefore EAV is doesn't happen to be evil in this particular case.

For product categories, you want a nested set model, as I described in the answer to this question. Nested sets give you very quick retrieval along with the ability to traverse multiple levels of an unbalanced hierarchy at the expense of some precalculation effort at edit time.

Entity attribute value model - Performance alternative?

First, sometimes this model makes querying data much easier. I asked a question couple of days ago here and some users suggested why I didn't change my model to a 1NF form to make querying data easier. Only when they realized I was stuck with this design, they provided some answers to the question. The point is I was lucky enough to have only 12 columns to be summed up; otherwise, if my table contained 300 columns, perhaps no user bothered themselves to write a query for that problem. :-)

Second, sometimes the implementation of this design is easier due to some limitations naturally imposed by databases. If your meta_key values contain some lengthy values larger than 30 characters, either you have to shorten the values and do the mapping somewhere or this would possibly be the only option you could have.

Finally, performance is very important; that's true. But, on the other hand, there are certain techniques you could apply to improve the performance; such as by creating proper indexes, partitioning tables, and so on.

In this case, the table sizes are very small. So, unless your queries are very complicated such as having heavy calculations and complicated joins and aggregations, and if the application is not sensitive to small time fractions, I guess you wouldn't suffer from performance if adopted this model.

At the end, if you are still too much concerned about the performance, I would suggest create both models, populate them with some random or real data, and analyze the plan costs to see what model better suits your needs.



Related Topics



Leave a reply



Submit