How to Design a Schema Where the Columns of a Table Are Not Fixed

how to design a schema where the columns of a table are not fixed

I recommend using a combination of numbers two and three. Where possible, model tables for standard associations like addresses. This is the most ideal approach...

But for constantly changing values that can't be summarized into logical groupings like that, use two tables in addition to the EMPLOYEES table:

  • EMPLOYEE_ATTRIBUTE_TYPE_CODES (two columns, employee_attribute_type_code and DESCRIPTION)
  • EMPLOYEE_ATTRIBUTES (three columns: employee_id foreign key to EMPLOYEES, employee_attribute_type_code foreign key to EMPLOYEE_ATTRIBUTE_TYPE_CODES, and VALUE)

In EMPLOYEE_ATTRIBUTES, set the primary key to be made of:

  • employee_id
  • employee_attribute_type_code

This will stop duplicate attributes to the same employee.

SQL Design: need ability to add custom columns to a table using a fixed schema

Can you add two more tables? One with the types of measurements and the other with a mapping from the type to the measurement itself?

Basically A table with {DataId, DataMeasurementTypeId, DataValue} and {DataMeasurementTypeId, DataMeasurementType}

That should allow you to provide stored procedures to retrieve all Datameasurements in a table.

The better optiom might be to solve it with a Name,Value table and have the business object layer take care of constructing the right content.. That would fit (and likely perform) better with BigTable approach of Google than RDBMS though.

How to design a database where the main entity table has 25+ columns but a single entity's columns gets 20% filled on average?

It depends:

  • How many entities (rows) you are planning to have?
  • What kind of queries you run against that table?
  • Will there be a lot of new properties in future?
  • How are you planning to use the properties?

You seem to be concerned about wasting space with simple table? Try to calculate if space saving with other approaches are really significant and worthwhile. The disk is (usually) cheap.

If you have low number of rows, then the single table is probably better (it is easier to implement).

If you plan to create complex queries against the properties (eg. where property1 < 123) then the simple table is probably easier.

If you are planing to add lot of new properties in the future then the Property/EntityProperties approach could be useful.


I'd go with the simple one table approach because you have a rather small amount of rows (<1M), you are probably running your database with server machines and not some handheld/mobile thing (SQLServer) and your database schema is rather rigid.

Database Design: A proper table design for large number of column values

Option 2, with ID, TrialID, StatisticID, StatisticValue

With proper indexing, it will perform fairly well (you can use PIVOT to get the values out on columns fairly easily in SQL Server 2005).

When the statistics are different datatypes, the problem becomes more interesting, but in many cases, I just up-size the datatype (sometimes ints just end up in the money field). For other non-compatible types, the best design in my mind is really separate tables for each type, but I've also seen multiple columns or a free-form text column.

How to set all table columns to NOT NULL at once?

You could use metadata table and build dynamic query:

SELECT format('ALTER TABLE %I '||STRING_AGG(format('ALTER COLUMN %I SET NOT NULL', COLUMN_NAME),CHR(13)||',')
, MIN(TABLE_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE IS_NULLABLE = 'YES'
AND TABLE_NAME = 't';

db<>Fiddle demo

How do I get around this relational database design smell?

Noting the "Relational" database tag.

The whole design feels like a bit of a design smell

Yes. It smells for two reasons.

  1. You have ids as Identifiers in each table. That will confuse you, and make for code that is easy to screw up. For an Identifier:
  • name it for the thing that it Identifies

    eg. mediaType, placementCode (they are strings, which is correct)
  • where it is located as a Foreign Key, name it exactly the same, so that there is no confusion about what the column is, and what PK it references

However depending on the mediaType, a placement can contain different details


  1. What you are seeking in logical terms, is an OR Gate.

    In Relational terms, it is a Subtype, here an Exclusive Subtype.

    That is, with complete integrity and constraints.

    mediaType is the Discriminator.

if I designed the schema this way I may end up with a lot of nullable columns.

Yes, you are correct. Nullable columns indicates that the modelling exercise, Normalisation, is incomplete. Two Subtype tables is correct.

Relational Data Model

CraigTA

Note • Notation

  • All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993

  • My IDEF1X Introduction is essential reading for beginners.

Note • Content

  • Exclusive Subtype

  • Each Placement is either a PlacementA xor a PlacementB

  • Refer to Subtype for full details on Subtype implementation.

  • Relational Key

  • They are strings, as you have given.

  • They are "made up from the data", as required by the Relational Model.

  • Such Keys are Logical, they ensure the rows are unique.

  • Further they provide Relational Integrity (as distinct from Referential Integrity), which cannot be shown here, in this small data model.

  • Note that IDs that are manufactured by the system, which is NOT data, and NOT seen by the user, are physical, pointing to Records (not logical rows). They provide record uniqueness but not row uniqueness. They cannot provide Relational integrity.

  • The RM requires that rows (not records) are unique.

SQL

The drawback of this is how would I then find a placement by id as I'd have to query across all tables:

Upgraded as per above, that would be:

The drawback of this is how would I then find the relevant Placement columns by the PK Placement, as I'd have to query across all tables:

First, understand that SQL works perfectly for Relational databases, but it is, by its nature, a low-level language. Most of us in the real world use an IDE (I don't know anyone who does not), thus much of its cumbersomeness is eased, and many coding errors are eliminated.

Where we have to code SQL directly, yes, that is what you have to do. Get used to it. There are just two tables here.

Your code will not work, it assumes the columns are identical datatypes and in the same order (which is required for the UNION). There are not.

  • Do not force them to be, just to make your UNION succeed. There may well be additional columns in one or the other Subtype, later on, and then your code will break, badly, everywhere that it is deployed.

  • For code that is implemented, never use asterisk in a SELECT (it is fine for development only). That guarantees failure when the database changes. Always use a column list, and request only the columns you need.

SELECT Placement,
ColumnA1,
ColumnA2,
ColumnB1 = "",
ColumnB2 = "",
ColumnB3 = ""
FROM PlacementA
WHERE Placement = 'ABCD'
--
UNION
--
SELECT Placement,
"",
"",
ColumnB1,
ColumnB2,
ColumnB3
FROM PlacementB
WHERE Placement = 'ABCD'

View

The Relational Model, and SQL its data sublanguage, has the concept of a View. This is how one would use it. Each Basetype and Subtype combination is considered a single unit, a single row.


CREATE VIEW PlacementA_V
AS
SELECT Placement,
MediaType,
ColumnCommon,
ColumnA1,
ColumnA2
FROM Placement BASE
JOIN PlacementA SUBA
ON BASE.Placement = SUBA.Placement


Comments

In Postgres, is there a way I could setup a constraint where the placement can ONLY exist in either PlacementA OR PlacementB and not both?

  1. That is Exclusivity.
  • If you read the linked Subtype doc, I have given a full explanation and technical details for implementation in SQL, including all code (follow the links in each document). It consists of:

    .

    a CONSTRAINT that calls a FUNCTION

    .

ALTER TABLE ProductBook -- subtype
ADD CONSTRAINT ProductBook_Excl_ck
-- check an existential condition, which calls
-- function using PK & discriminator
CHECK ( dbo.ValidateExclusive_fn ( ProductId, "B" ) = 1 )
  • We have had that capability in SQL for over 15 years in my experience.

  1. Postgres is not SQL compliant in many areas. None of the freeware is SQL compliant (their use of the term SQL is incorrect). They do not have a Server Architecture, most do not provide ACID Transactions, etc.  Most are not true languages (as demanded by Codd's Twelve Rules). Therefore, no.  Specifically, it cannot call a Function from DDL (again, because it is not an unified language, it is different bits here and there).

  2. As long as you understand and implement Standards, such as Open Architecture, to the degree possible in your particular database suite (it cannot be labelled a platform because it has no Server Architecture), that is the best you can do.

  3. The Open Architecture Standard demands:

  • no direct INSERT/UPDATE/DELETE to the tables

  • all your writes to the db are done via OLTP Transactions

    • which in SQL means:

      Stored Procedures with BEGIN TRAN ... COMMIT/ROLLBACK TRAN
    • but in Postgres means:

      Functions which are supposed to be "atomic"

      (quotes because it is nowhere near the Atomic that is implemented in SQL ACID Transactions [the A in ACID stands for Atomic] )

  1. Therefore, take the Exclusivity code in the Function I have given in SQL, and:
  • deploy it in every "atomic" Function that INSERT/DELETEs to the Basetype or Subtype tables in your pretend sql suite.

    (I do not allow UPDATE to a Key, refer CASCADE above.)

  • while we are here, it must be mentioned, such "atomic" Functions need to likewise have code to ensure that the Basetype-Subtype pair is INSERT/DELETEd as pair or not at all.



Related Topics



Leave a reply



Submit