What Are Database Constraints

What are database constraints?

Constraints are part of a database schema definition.

A constraint is usually associated with a table and is created with a CREATE CONSTRAINT or CREATE ASSERTION SQL statement.

They define certain properties that data in a database must comply with. They can apply to a column, a whole table, more than one table or an entire schema. A reliable database system ensures that constraints hold at all times (except possibly inside a transaction, for so called deferred constraints).

Common kinds of constraints are:

  • not null - each value in a column must not be NULL
  • unique - value(s) in specified column(s) must be unique for each row in a table
  • primary key - value(s) in specified column(s) must be unique for each row in a table and not be NULL; normally each table in a database should have a primary key - it is used to identify individual records
  • foreign key - value(s) in specified column(s) must reference an existing record in another table (via it's primary key or some other unique constraint)
  • check - an expression is specified, which must evaluate to true for constraint to be satisfied

Constraints in Data Dictionary table creation

You can use domains for this purpose. Table field => Data Element => Domain.
In the domain you can define possible values for that field. If your example with the weekdays is really what you need, than check out domain WEEKDAY in SE11.

Constraint database

I found a sophisticated solution to achieve more than what I thought; talking about checking data consistency. Apparently this is what we would call test-driven data analysis

So now with this implementation we are bound to Python, and Pandas, but fortunately, not only. We can even check data consistency in MySQL, PostgreSQL ... tables.

The plus I did not think about, is that we can infer rules based on sample data. This could be helpful for setting rules.
This is why there is tdda.constraints.verify_df and the tdda.constraints.discover_df.

As far as I read about, It does not propose a solution for checking (a weaker) consistency on last (n) files. Something I thought about that we could call batch files consistency, that only ensures a rule satisfaction for some set of runs (last n runs) and not all data.
It only acts on single files, it needs a higher level wiring to be able to condition (n) files that arrive successively.

For more:
https://tdda.readthedocs.io/en/latest/constraints.html#module-tdda.constraints

assertCSVFilesCorrect Checks a set of files in a directory, same is possible for Pandas dataframes, etc.

From the official documentation:

The tdda.constraints library is used to discover constraints from a
(Pandas) DataFrame, write them out as JSON, and to verify that
datasets meet the constraints in the constraints file. It also
supports tables in a variety of relation databases. There is also a
command-line utility for discovering and verifying constraints, and
detecting failing records.

ps: I am still open to other solutions, let me know as I imagine this is a use case for any ETL solution.

I also open a bounty to further enrich responses.

Database constraints - keep or ignore?

I think the constraints help you to have clean data. Performance is sometimes improved. In some cases, the performance can get affected by having the constraints. However, the answer to that is not removing the constraints. You have something called "denormalization" to help you deal with the performance issues (provided that your queries are already optimized). You can always create denormalized summary tables in such scenarios.

Did the guys who told you to "forget what you learnt" also tell you that they have forgotten the traffic rules they learnt at the driving classes?

What advantages do constraints provide to a database?

"just" data integrity? You say that like it's a minor thing. In all applications, it's critical. So yes, it provides that, and it's a huge benefit.



Related Topics



Leave a reply



Submit