Most Efficient Method for Persisting Complex Types with Variable Schemas in Sql

Is there a way to persist a variable across a go?

The go command is used to split code into separate batches. If that is exactly what you want to do, then you should use it, but it means that the batches are actually separate, and you can't share variables between them.

In your case the solution is simple; you can just remove the go statements, they are not needed in that code.

Side note: You can't use a variable in a use statement, it has to be the name of a database.

Database schema which can support specialized properties

Anders, do not give up any integrity or hardness, eg type safety.

(Response coming).

@Anders. No, not at all, subtyping is fine (the question is which form you use and what are the dis/advantages). Do not give up any strength or Integrity or type safety or checks or DRI. The form you choose will demand additional Checks and maybe a bit of code (depends on your platform).

This subject is coming up frequently, but the seeker always has a narrow perspective; I keep making the same statements (a subset) from an unchanging set. The idea is to evaluate all the options. So I am writing a doc. Unfortunately it is taking longer. Maybe 4 pages. Not ready to post. But the diagrams are finished, I think you are on the ball, and you can use it right away.

Warning: Experienced Project Construction Engineers Only
Road not suitable for caravans or readers with a high Eek factor

Link to ▶Four Alternative Data Models◀ in Document Under Construction. Apologies for the mess on the floor; I will clean up soon.

▶Link to IDEF1X Notation◀ for anyone who is unfamiliar with the Standard for modelling Relational databases.

  1. They are all Relational, with full integrity.

  2. The 6NF options. Relational today (SQL) does not provide support for 6NF; it does not disallow it, it just does not provide the 5NF➔6NF structures. Therefore you need to build a small catalogue, what some people call "metadata". Really, it is just an extension of the standard SQL catalogue (sys tables). The level of control required is modelled in each option.

  3. Essentially EAV done properly, with full control and integrity (type safety, Declarative Referential Integrity, etc) rather than the mess it usually is.

You may be interested in these related question/answers (in particular, look at the Data Models):

Multiple Fixed vs Abstract Flexible

Database Schema-Related Problem

"Simple" Database Design Problem

Response to Comments

... That way, we can easily grab "Comment" rows associated with a given specialized type instance. Is this the way to do that, or will I regret that decision later? Is there any other pattern we're missing?

Not sure what you mean. Comments, Notes, Addresses, end up being used (columns resident in) in many tables, so the correct method is to Normalise them; provide One Table for Comment; that is referenced from any table that requires it. Here is a generic Comment table. It is used in Product (the supertype) because you stated any Product. It can just as easily be used in some of the Product subtypes, and not others; in which case the FK will be in said Product Subtypes.

Your Data Model

What is the purpose of the ProductType table in your Product 5NF/subtype example? Does it contain a row corresponding to each specialized Product, e.g., ProductCPU? I assume it indicates which specialization the base product is.

(Small critical mistake in the diagram, corrected.)

Yes, exactly.

In Standard Relational terms (not the uncontrolled messes passing off as databases), the ProductType is the Discriminator; it identifies which of the Product Subtypes apply to this Product. Tells you which Product Subtype table you need to join with. The pair together make a logical Product. Do not forget to produce the Views, one for each ProductType.

  • (Do evaluate how ProductType changes, exactly what role it plays, for each of the four Data Models.)

  • "Generalisation-specialisation" is all mumbo jumbo, OO terminology; without crossing the line and learning what Relational has been capable of for 30 years. If you learn a little about Relational, you will have the full power; otherwise you are limited to the very limited OO approach to everything (Ambler and Fowler have a lot to answer for). Please read this post, from 11 Dec 10 onwards. Relational databases model Entities, not objects; not classes.

For example, when adding a new product you'll want to provide, say, a dropdown selection of which product types it is possible to add. Based on this selection, it can be deduced which tables to put the data in. Correct? I'm sorry for talking about application code, but I just need to put it into perspective

Yes. And what page (with fields) to provide next, for the user to enter data.

No problem talking about the app code that will use the Rdb, they go together like husband and wife (not husband and slave).

  • For your OO classes, map the Class tree to the Rdb, once you have finished modelling the Rdb, independent of any app that will use it. Not the other way around. And not dependent on one app.

  • Forget about "persisting", it has many problems (Lost Updates; damaged data integrity; problematic debugging; massive contention; etc). All updates to the Rdb should be in Transactions, with ACID compliance, available for 30 years, but Fowler and Ambler have not read about it yet. Usually that means one stored proc pre xact.

The discriminant is a FK to a Type-table as we established earlier. It denotes which spec. sub type the base type adheres to. But what does the discriminant table contain in detail?

Is that not clear from the data model ? ProducType CHAR(1) or (2). Name Char(30).

Could be a display-friendly text stating the type for UI-purposes,

Yes, among other things, such as the control, contraint, etc, elimination of ambiguity when coding or reporting.

but does it also contain the exact table name which contains the specialized type?

No. That would be a little too physical to be placed in data. Disallowed on principle.

But it is not necessary.

Say I'm interested in the Product with ID = 1. It has a discriminant indicating that it is a ProductCPU. How would you go about retrieving this ProductCPU from your app code?

That will be easy if you take the provided model, and implement it (all the tables) as classes, correctly, etc. The example you request will not use Views (which are for lists, and more generic use). The pseudo-code would be:

  • given the ProductId (Subtype unknown, therefore your should not be sitting a a Subtype-specific window), load the Product supertype only
  • based on the Discriminator Product.ProductType, set indicators, etc, and load the applicable subtype, one of ProductCPU; ProductMemory; ProductDisk; ProductTape; etc.

  • I have seen (and do not agree with) OO methods that load all subtypes for the given ProductId at once: one subtype is valid; and the rest are invalid. The code still has to constrain itself to the valid class for the Product based on Product.ProductType.

Alternately, eg. where the context is, the user is sitting in a Subtype-specific window, eg. ProductCPU, with that class set up, and requests ProductId xxx. Then use the ProductCPU View. If it returns zero rows, it does not exist.

  • There may be a ProductDisk xxx, but not a ProductCPU xxx. How you handle that, whether you indicate there is a Product`xxx but it isn't a CPU, or not, that depends on the app requirements.

For lists, where the app fills in a grid, without regard to the ProductId, use the views (one each) to load each grid. That SQL is based on the join, and does not need to refer to ProductType.

A database within a database?

Not in MySQL, unless you define a structure for it, break up the data and store it in that structure. It would however be cumbersome. You can define a hierarchical structure in a single table by making a reference to a parent item in the same table. This way you could store even XML elements and their attributes in a single database, having only three tables for document, elements and attributes.
Querying these tables would be very hard, especially in MySQL. In Oracle you got CONNECT BY which allows recursive queries, but it is slow.

Maybe you should look into No-SQL databases like Mongo, which is better designed for these kind of tasks.

How to persist a graph data structure in a relational database?

The answer is unfortunately: Your consideration is completely right in every point. You have to store Nodes (Vertices) in one table, and Edges referencing a FromNode and a ToNode to convert a graph data structure to a relational data structure. And you are also right, that this ends up in a large number of lookups, because you are not able to partition it into subgraphs, that might be queried at once. You have to traverse from Node to Edge to Node to Edge to Node...and so on (Recursively, while SQL is working with Sets).

The point is...

Relational, Graph oriented, Object oriented, Document based are different types of data structures that meet different requirements. Thats what its all about and why so many different NoSQL Databases (most of them are simple document stores) came up, because it simply makes no sense to organize big data in a relational way.

Alternative 1 - Graph oriented database

But there are also graph oriented NoSQL databases, which make the graph data model a first class citizen like OrientDB which I am playing around with a little bit at the moment. The nice thing about it is, that although it persists data as a graph, it still can be used in a relational or even object oriented or document oriented way also (i.e. by querying with plain old SQL). Nevertheless Traversing the graph is the optimal way to get data out of it for sure.

Alternative 2 - working with graphs in memory

When it comes to fast routing, routing frameworks like Graphhopper build up the complete Graph (Billions of Nodes) inside memory. Because Graphhopper uses a MemoryMapped Implementation of its GraphStore, that even works on Android Devices with only some MB of Memory need. The complete graph is read from database into memor at startup, and routing is then done there, so you have no need to lookup the database.



Related Topics



Leave a reply



Submit