Should I Use Eav Model

Should I use EAV model?

Great question, but of course, there is no "one true way". As per @BenV, Magento does use the EAV model. My experience with it has been overwhelmingly positive, however it does trip up other users. Some considerations:

1. Performance.
EAV requires complex, multi-table joins to populate your object with the relevant attributes. That does incur a performance hit. However, that can be mitigated through careful caching (at all levels through the stack, including query caching) and the selective use of denormalization. Magento does allow administrators to select a denormalized model for categories and products where the number of SKUs warrants it (generally in the thousands). That in turn requires Observers that trigger re-indexing (always good!) and updates to the "flat" denormalized tables when product data changes. That can also be scheduled or manually triggered with a prompt to the administrator.

2. 3rd Party User Complexity
If you ever plan to make this application available to other users, many will find EAV too complex and you'll end up dealing with a lot of bleating and uninformed abuse on the user forums (ref Magento!!).

3. Future extensibility and plugin architecture.
There is no doubt that the EAV model really comes into it's own when extensibility is a factor. It is very simple to add new attributes into the model while minimizing the risk of breaking existing ORM and controller code.

4. Changes in datatype
EAV does make it a little harder to alter attribute datatypes. If your initial design calls for a particular attribute datatype that changes in future (say int to varchar), it means that you will have to migrate all the records for that attribute to the corresponding table that matches the new datatype. Of course, purists would suggest that you get the design right first time, but reality does intrude sometimes!

5. Manual product imports
One thing that EAV makes almost impossible is importing products (or other entities) into the database using SQL and/or phpMyAdmin-style CSV/XML. You'll need to write an Importer module that accepts the structured data and passes it through the application's Model layer to persist it to the database. That does add to your complexity.

Should I use EAV database design model or a lot of tables

EAV is rarely a win. In your case I can see the appeal of EAV given that different categories will have different attributes and this will be hard to manage otherwise. However, suppose someone wants to search for "all hard drives with more than 3 platters, using a SATA interface, spinning at 10k rpm?" Your query in EAV will be painful. If you ever want to support a query like that, EAV is out.

There are other approaches however. You could consider an XML field with extended data or, if you are on PostgreSQL 9.2, a JSON field (XML is easier to search though). This would give you a significantly larger range of possible searches without the headaches of EAV. The tradeoff would be that schema enforcement would be harder.

Is this a good case to use EAV or no

There's no short good or bad answer to this concern, because it depends of many things.

  • Do you have a lot of product types ?
  • How do you think each of them will evolve (think to what will happen when you will add new fields to products) ?
  • Do you need to handle "variants" of the products ?
  • Do you intend to add entirely new types of products ?

Etc.
EAV is probably a good way to go if you answer if you answer "yes" to some or all these questions.

Regarding C#, I have implemented in the past an EAV data catalog with it, and using Entity Framework over SQL Server (so a RDBMS).
It worked nice to me.

But if you need to handle a lot of products, performance can quickly become an issue. You could also look for a "NoSQL" solution, did you think about it ?

Just keep in mind that your model object does not have to match your data model.
For example you could perfectly have a stronly typed object for each type of product if you need so.

Database Design for ECommerce project (Should I use EAV Approach)

What you need is a combination of EAV for product features and nested sets for product categories.

While I certainly agree that EAV is almost always a bad choice, one application where EAV is the perfect choice is for handling product attributes in an online catalog.

Think about how websites show product attributes... The attributes of products are always shown as a vertical list with two columns: "Attribute" | "Value". Sometimes these lists show side-by-side comparisons of multiple products. EAV works perfectly for doing this kind of thing. The things that make EAV meaningless and inefficient for most applications are exactly what makes EAV meaningful and efficient for product attributes in an online catalog.

One of the reasons why everyone always says "EAV is EVIL!" is that the attributes in EAV are "meaningless" insofar as the column name (i.e. meaning of the attribute) is table-driven and is therefore not defined by the schema. The whole point of schemas is to give your model meaning so this point is well taken. However in the case of an online product catalog, the meaning of product attributes is really unimportant to the system, itself. The only reason your catalog system cares about product attributes is to dump them in a list or possibly in a product comparison matrix. Therefore EAV is doesn't happen to be evil in this particular case.

For product categories, you want a nested set model, as I described in the answer to this question. Nested sets give you very quick retrieval along with the ability to traverse multiple levels of an unbalanced hierarchy at the expense of some precalculation effort at edit time.

EAV Model Scheme for Stock System or different apprroach?

I gather that these are your primary requirements:

  1. Flexible attributes

    • Your exact need here is unclear: it sounds like you either expect the attributes to change, or at least expect that all attributes will not always be applicable to all products (i.e. a sparse matrix)
  2. Products are also categorized, and the category will (at least partially) determine what attributes are applicable to a product
  3. The attributes themselves may have additional properties aside from their value, that must be provided by the user (i.e. a unit that goes with a weight)
  4. Input validation is a must, and checks things like:

    • All required attributes are present
    • Attributes which are not applicable are not present
    • Attributes have valid values
    • User-provided attribute properties have valid values
  5. You probably also want to make sure you can search/filter efficiently by attributes

These different requirements all result in different technical needs, and different technical solutions. Some are matters of database, and some will have to be solved in code regardless of database choice. Obviously you are aware of some of these issues, but I think it is worth really breaking it down:

Flexible Attributes

Having a list of flexible attributes (as you know) does not work well with RDBMS systems where your table schema has to be pre-defined. This includes pretty much all of the SQLs, and definitely MySQL. The issue is that changing the table schema is expensive and for large tables can take minutes or hours, making it practically impossible to add attributes if you have to add a column to a table to do it.

Even if your list of attributes rarely changes, a large table of attributes is very inefficient if most products don't have a value for most attributes (i.e. a sparse matrix).

In the long run, you just won't get anywhere if your attributes are stored as a column in tables. Even if you break it down per-category, you are still going to have large empty tables that you can't add columns to dynamically.

If you stick with an RDBMS your only option is really an EAV system. Having considered, researched, and implemented EAV systems, I wouldn't worry too much about all the hype you hear about them on the internet. I know that there are lots of articles out there talking about the EAV "anti-pattern", and I'm the kind of person who takes proper use of software design patterns seriously, but EAV does have a perfectly valid time and place, and this is it. In the long run you will not be able to do this on an RDBMS without EAV. You could certainly look at a NoSQL system that is designed for this specific kind of problem, but when the rest of your database is in a standard RDBMS, installing or switching to a NoSQL system just to store your attribute values is almost certainly overkill. You certainly aren't going to want to lose the ACID compliance that a RDMBS comes with, and most NoSQL systems don't guarantee ACID compliance. There is a wave of NewSQL systems out there that are designed to get the best of both worlds, but if this is just one part of a larger application (which I'm sure is the case), it probably isn't worth investigating completely new technologies just to make this one feature happen. You could also consider using something like JSON storage inside MySQL to store your attribute values. That is a viable option now that MySQL has better JSON support, but that only makes a small change to the big picture: you would still need all your other EAV tables to keep track of allowed attributes, categories, etc. It is only the attribute values that you would be able to place inside of the JSON data, so the potential benefits of JSON storage are relatively small (and have other issues that I will mention down the road).

So in summary, I would say that as long as the rest of your application runs on a RDBMS, it is perfectly reasonable to use EAV to manage flexible attributes. If you were trying to build your entire system in an EAV inside of a RDBMS, then you would definitely be wasting your time and I'd tell you to go find a good NoSQL database that fits the problem you are trying to solve. The disadvantages of EAV do still apply though: you can't easily perform consistency checks within your RDBMS system, and will have to do that yourself in code.

Categorized products with category-specific attributes

You've pretty much got it here. This is relatively straight-forward inside an EAV system. You will have your attributes table, you will have a category table, and then you will need a standard one-to-many or many-to-many relationship between the attributes and categories table which will determine which attributes are available to which category. You obviously also have a relationship between products and categories, so you know which products therefore need which attributes.

Your option #3 is designed to fulfill this requirement, but having a table with each attribute as a column will scale very poorly as your system grows, and will definitely break if you ever need to dynamically add attributes. You don't want to be running ALTER TABLE statements on the fly, especially if you have more than a few thousand records.

Managing attribute properties

It is one thing to store dynamic attributes and values. It is another problem entirely to store dynamic attributes, values, and associated meta data (i.e. store a weight as well as the unit the weight is in). This however is no longer a database problem, but rather a code problem. In terms of actually storing the information your best bet is to probably store your meta data inside your attribute values table, and rely upon some code abstractions to handle the input validation as well as form building. That can get quite complicated quite fast, especially if done wrong, and talking through such a system would take another entire post. However, I think you are on the right track: for a fancier attribute that requires both a value and meta data, you need to somehow assign a class that is responsible for input processing and form validation. For instance for a simple text field you have a "text" class that reads the user's value out of the form and stores it in the proper "attribute_values" table, with no meta data stored. Then for your "weight" attribute you would have a "weight" attribute that stores the number given by the user (i.e. 0.5) but then also stores the unit the user specified with that number (i.e. 'lbs') and persists both to the "attribute_values" table (in pseudo-SQL): INSERT INTO attribute_values value='0.5', meta_data='{"unit":"lbs"}', product_id=X, attribute_id=X. Ironically JSON probably would be a good way to store this meta data, since the exact meta data kept will also vary by attribute type, and I doubt you would another level of tables to handle that variation in your EAV tables.

Again, this is more of a code problem than storage problem. If you decided to do JSON tables the overall picture to meet this requirement wouldn't change: your "attribute type classes" would simply store the meta data in a different way. That would probably look something like: UPDATE products SET attributes='{"weight":0.5,"unit":"lbs"}' WHERE id=X

Input Validation

This will have to be handled exclusively by code regardless of how you store your data, so this requirement doesn't matter much in terms of deciding your database structure. A class-based system as described above will also be able to handle input validation, if properly executed.

Sort/Search/Filter

This doesn't matter if you are exclusively using your attributes for data storage/retrieval, but will you be searching on attributes at all? With a proper EAV system and good indexes, you can actually search/sort efficiently in an RDBMS system (although it can start to get painful if you search by more than a handful of indexes at a time). I haven't looked in detail, but I'm pretty sure that using JSON for storage won't scale well when it comes to searching. While MySQL can work with JSON now and search the columns directly, I seriously doubt that such searching/sorting makes use of MySQL indexes, which means that it won't work with large databases. I could be wrong on that one though. It would be worth digging into before committing to a MySQL/JSON storage setup, if you were going to do something like that.

Depending on your needs, this is also a good place to compliment an RDBMS system with a NoSQL system. Having managed large-ish (~1.5 million product) e-commerce systems before, I have found that MySQL tends to fall flat in the searching/sorting category, especially if you are doing any kind of text searching. In an e-commerce system a query like: "Show me the results that best match the term 'blue truck' and have the attribute 'For ages 3-5'" is common, but doing something like that in MySQL is about impossible, primarily because of the need for relevancy based sorting and scoring. We solved this problem by using Apache Solr (Elastic is a similar solution) and it managed our searching/sorting/search term scoring very well. In this case it was a two database solution. MySQL kept all the actual data and stored attributes in EAV tables, and anytime something got updated we pushed a record of everything to Apache Solr for additional storage. When a query came in from a user we would query Apache Solr which was an expert at text searching and could also handle the attribute filtering with no trouble, and then we would pull the full product record out of our MySQL database. The system worked beautifully. We had 1.5 million products, thousands of custom attributes, and had no trouble running the whole thing off of a single virtual server. Obviously there was a lot of code going on behind the scenes, but the point is that it definitely worked and wasn't difficult to maintain. Never had any issues with performance from either MySQL or Solr.

Should I use a EAV model or not in laravel?

There are two options that I can think of.

1) Create a JSON field called something like attributes in your category table. And then store basically an array of Key Values in it. That will present some challenges when querying on attributes though. I know there are ways around it but i've never needed it so I do not know.

2) Create a Category Attributes table in your DB that goes something like this

cat_id - int
key - varchar
value - varchar
Composite Index Unique on [cat_id, key,value ]

Then create a Category Attribute model in laravel and define a hasmany relationship where Category has many Category Attributes

then querying your categories would go something like this

$categories = Category::whereHas('CategoryAttributes', function ($query) {
$query->where('key', '=', 'color');
$query->where('value','=', 'blue');
})->get();

Best beginner resources for understanding the EAV database model?

Here you go. An illustrative story: http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/

Entity Attribute Value Database vs. strict Relational Model Ecommerce

There's a few general pros and cons I can think of, there are situations where one is better than the other:

Option 1, EAV Model:

  • Pro: less time to design and develop a simple application
  • Pro: new entities easy to add (might even
    be added by users?)
  • Pro: "generic" interface components
  • Con: complex code required to validate simple data types
  • Con: much more complex SQL for simple
    reports
  • Con: complex reports can become almost
    impossible
  • Con: poor performance for large data sets

Option 2, Modelling each entity separately:

  • Con: more time required to gather
    requirements and design
  • Con: new entities must be modelled and
    designed by a professional
  • Con: custom interface components for each
    entity
  • Pro: data type constraints and validation simple to implement
  • Pro: SQL is easy to write, easy to
    understand and debug
  • Pro: even the most complex reports are relatively simple
  • Pro: best performance for large data sets

Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)

  • Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
  • Con: new entities must be modelled and designed by a professional
  • Pro: new attributes might be easily added later on
  • Con: complex code required to validate simple data types (for the custom attributes)
  • Con: custom interface components still required, but generic interface components may be possible for the custom attributes
  • Con: SQL becomes complex as soon as any custom attribute is included in a report
  • Con: good performance generally, unless you start need to search by or report by the custom attributes

* I'm not sure if Option 3 would necessarily save any time in the design phase.

Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.



Related Topics



Leave a reply



Submit