Cross Table Dependency/Constraint in SQL Database

Cross Table Dependency/Constraint in SQL Database

TL;DR

But the dependency is in a separate table.

You mean there is a dependency (in the everyday sense) on another table. We say there is a constraint on the two tables. (They depend on each other.) In addition to the FK (foreign key) constraint that every students classes value is a classes class value.

What is this dependency called?

We can reasonably categorize the constraint as "inter-table". It is that classes equals SELECT class, SUM(student) AS total FROM classes LEFT JOIN students USING (class) GROUP BY class.

And can we say this is violating 3NF?

The constraint doesn't involve violating a NF. Moreover normalization applies only to a single table and its FDs (functional dependencies).

(A straightforward design is to have base students, base classes1 that is the original classes without total, and VIEW classes AS SELECT class, SUM(student) AS total FROM classes1 LEFT JOIN students USING (class) GROUP BY class.)


If I had a column in classes that stored the total number of students a class has, this feels like it should violate 3NF.

Whether a table is in a given NF (normal form) has nothing to do with any other tables. (We say a database is in a given NF when all its tables are.) Whether your design is nevertheless bad is another matter.

Since a class has just one total number of students, there is a FD (functional dependency) of total on class in classes, ie class functionally determines total.

We say that a set of columns functionally determines another set in a table when each subrow for the first always appears with the same subrow for the second. Normalization to higher NFs replaces a table by projections of it that join back ot it, per the FDs & JDs (join dependencies) that hold in it. There is redundancy in a database when two tables say the same thing about the business/application situation; but not all redundancy is bad. Learn proper information modeling & database design.

It may or may not violate a NF to have your class student count as a column in classes. What FDs violate a NF depends on all the FDs present and the NF. (And it only make sense to talk about a particular FD in a particular table violating a particular NF if you are talking about a particular part of a particular definition of that NF.)

(If a DBMS-calculated/computed/generated column violates a NF that would hold without it then that is not a problem, because it is controlled by the DBMS. You can think of the table as view of the table without the column.)

But the dependency is in a separate table.

When a sequence of database states cannot hold all the values possible per the the columns of tables we say constraints hold or the database is constrained. FDs (functional dependencies), MVDs (multi-valued dependencies), JDs (join dependencies), INDs (inclusion dependencies), EQDs (equality dependencies) and other "dependencies" (which technically are expressions given a context) are each associated with certain constraints. CKs (candidate keys), PKs (primary keys), superkeys (SQL PK & UNIQUE NOT NULL), FKs (foreign keys) (which technically are all column sets) & other notions are also each associated with certain constraints. But arbitrary conditions can hold on a sequence of database states.

SQL has a distinct but related notion of a constraint characterized by a name and an expression/condition (constraint in the above sense), declared by appropriate syntax. A state is constrained by column typing, PK, UNIQUE, NOT NULL & CHECK constraints. ASSERTION gives an arbitrary condition on a state but it is not supported by most DBMSs. CASCADES supports some inter-state inter-table constraints. SQL TRIGGERs enforce arbitrary constraints. Indexes also enforce constraints in a DBMS-specific way.


Because in some sense it has all the problems of a 3NF violation.

Your edits improved your question. Using the wrong words or using words in the wrong way at best states something that is not what we mean. But when what we write doesn't make sense it suggests that our problem, whatever else it involves, involves not knowing what the words mean. Forcing ourselves to use words correctly allows others to know what we really mean. Eg here maybe "... in the join of tables ... there would be a 3NF-violating FD ...". Even by explicitly saying that we are unsure we can communicate some of our vague groping without saying something that we don't mean. Eg your "this feels like ...". But it also leads us to clearly organize what we are faced with. This helps not only the problem we are working on but improves our problem solving.

Finding Cross Dependency Tables in SQL

This simple select statement returns the circular direct foreign key references in a database:

IF OBJECT_ID('dbo.d') IS NOT NULL DROP TABLE dbo.d;
IF OBJECT_ID('dbo.c') IS NOT NULL DROP TABLE dbo.c;
IF OBJECT_ID('dbo.b') IS NOT NULL
BEGIN
ALTER TABLE dbo.a DROP CONSTRAINT [dbo.a(bid)->dbo.b(bid)];
DROP TABLE dbo.b;
END
IF OBJECT_ID('dbo.a') IS NOT NULL DROP TABLE dbo.a;
CREATE TABLE dbo.a(aid INT PRIMARY KEY CLUSTERED, bid INT);
CREATE TABLE dbo.b(aid INT CONSTRAINT [dbo.b(aid)->dbo.a(aid)] REFERENCES dbo.a(aid), bid INT PRIMARY KEY CLUSTERED);
ALTER TABLE dbo.a ADD CONSTRAINT [dbo.a(bid)->dbo.b(bid)] FOREIGN KEY(bid) REFERENCES dbo.b(bid);
CREATE TABLE dbo.c(cid INT PRIMARY KEY CLUSTERED);
CREATE TABLE dbo.d(did INT PRIMARY KEY CLUSTERED, cid INT CONSTRAINT [dbo.d(cid)->dbo.c(cid)] REFERENCES dbo.c(cid));

SELECT *
FROM sys.foreign_keys fk1
JOIN sys.foreign_keys fk2
ON fk1.parent_object_id = fk2.referenced_object_id
AND fk2.parent_object_id = fk1.referenced_object_id;

From here you can join to the DMVs sys.tables and sys.columns to get the additional information like table and column names.

Two things to be aware of:

  1. You should stop using the compatibility views. They are there to support old scripts that where written for SQL 2000 and should not be used in new development.

  2. You can have up to 16 columns in a foreign key. Your script supports only two. However, you are not even returning the column names so you should not join to sys.columns at all, if you don't need the column names.

If you just need the names of the tables you can get away without an additional join by using this select statement instead:

SELECT 
QUOTENAME(OBJECT_SCHEMA_NAME(fk1.parent_object_id))+'.'+QUOTENAME(OBJECT_NAME(fk1.parent_object_id))+
' <-> '+
QUOTENAME(OBJECT_SCHEMA_NAME(fk1.referenced_object_id))+'.'+QUOTENAME(OBJECT_NAME(fk1.referenced_object_id))
FROM sys.foreign_keys fk1
JOIN sys.foreign_keys fk2
ON fk1.parent_object_id = fk2.referenced_object_id
AND fk2.parent_object_id = fk1.referenced_object_id
AND fk1.parent_object_id < fk1.referenced_object_id;

I also added an additional condition to the WHERE clause of the query to include each pair only once.

Can I use a counter in a database Many-to-Many field to reduce lookups?

This is normal. It is ultimately caching: encoding of state redundantly to benefit some patterns of usage at the expense of others. Of course it's also a complexification.

Just because the RDBMS data structure is relations doesn't mean you can't rearrange how you are encoding state from some straightforward form. Eg denormalization.

(Sometimes redundant designs (including ones like yours) are called "denormalized" when they are not actually the result of denormalization and the redundancy is not the kind that denormalization causes or normalization removes. Cross Table Dependency/Constraint in SQL Database Indeed one could reasonably describe your case as involving normalization without preserving FDs (functional dependencies). Start with a table with a user's id & other columns, their ratings (a relation) & its counter. Then ratings functionally determines counter since counter = select count(*) from ratings. Decompose to user etc + counter, ie table User, and user + ratings, which ungroups to table Rating. )


Do you have a suggestion as to the best term to use when googling this

A frequent comment by me: Google many clear, concise & specific phrasings of your question/problem/goal/desiderata with various subsets of terms & tags as you may discover them with & without your specific names (of variables/databases/tables/columns/constraints/etc). Eg 'when can i store a (sum OR total) redundantly in a database'. Human phrasing, not just keywords, seems to help. Your best bet may be along the lines of optimizing SQL database designs for performance. There are entire books ('amazon isbn'), some online ('pdf'). (But maybe mostly re queries). Investigate techniques relevant to warehousing, since an OLTP database acts as an input buffer to an OLAP database, and using SQL with big data. (Eg snapshot scheduling.)

PS My calling this "caching" (so does tag caching) is (typical of me) rather abstract, to the point where there are serious-jokes that everything in CS is caching. (Googling... "There are only two hard problems in Computer Science: cache invalidation and naming things."--Phil Karlton.) (Welcome to both.)

SQL: Avoid circular dependencies

Without circular references between tables:

User 
------
userid NOT NULL
PRIMARY KEY (userid)

Picture
---------
pictureid NOT NULL
userid NOT NULL
PRIMARY KEY (pictureid)
UNIQUE KEY (userid, pictureid)
FOREIGN KEY (userid)
REFERENCES User(userid)

ProfilePicture
---------
userid NOT NULL
pictureid NOT NULL
PRIMARY KEY (userid)
FOREIGN KEY (userid, pictureid) --- if a user is allowed to use only a
REFERENCES Picture(userid, picture) --- picture of his own in his profile

FOREIGN KEY (pictureid) --- if a user is allowed to use any
REFERENCES Picture(picture) --- picture in his profile

The only difference with this design and your needs is that a user may not have a profile picture associated with him.


With circular references between tables:

User 
------
userid NOT NULL
profilepictureid NULL --- Note the NULL here
PRIMARY KEY (userid)
FOREIGN KEY (userid, profilepictureid) --- if a user is allowed to use only a
REFERENCES Picture(userid, pictureid) --- picture of his own in his profile

FOREIGN KEY (profilepictureid) --- if a user is allowed to use any
REFERENCES Picture(pictureid) --- picture in his profile

Picture
---------
pictureid NOT NULL
userid NOT NULL
PRIMARY KEY (pictureid)
UNIQUE KEY (userid, pictureid)
FOREIGN KEY (userid)
REFERENCES User(userid)

The profilepictureid can be set to NOT NULL but then you have to deal with the chicken-and-egg problem when you want to insert into the two tables. This can be solved - in some DBMS, like PostgreSQL and Oracle - using deferred constraints.

Constraints instead Triggers (Specific question)

You can use constraints to ensure that every ContractEmployees row has a corresponding Employees row, and likewise for SalariedExployees. I don't know of a way to use constraints to enforce the opposite: making sure that for every Employees row, there is a row either in ContractEmployees or SalariedEmployees.

Backing up a bit... there are three main ways to model OO inheritance in a relational DB. The terminology is from Martin Fowler's Patterns of Enterprise Application Architecture:

  1. Single table inheritance: everything is just in one big table, with lots of optional columns that apply only to certain subclasses. Easy to do but not very elegant.

  2. Concrete table inheritance: one table for each concrete type. So if all employees are either salaried or contract, you'd have two tables: SalariedEmployees and ContractEmployees. I don't like this approach either, since it makes it harder to query all employees regardless of type.

  3. Class table inheritance: one table for the base class and one per subclass. So three tables: Employees, SalariedEmployeees, and ContractEmployees.

Here is an example of class table inheritance with constraints (code for MS SQL Server):

CREATE TABLE Employees 
(
ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
FirstName VARCHAR(100) NOT NULL DEFAULT '',
LastName VARCHAR(100) NOT NULL DEFAULT ''
);

CREATE TABLE SalariedEmployees
(
ID INT NOT NULL PRIMARY KEY REFERENCES Employees(ID),
Salary DECIMAL(12,2) NOT NULL
);

CREATE TABLE ContractEmployees
(
ID INT NOT NULL PRIMARY KEY REFERENCES Employees(ID),
HourlyRate DECIMAL(12,2) NOT NULL
);

The "REFERENCES Employees(ID)" part on the two subclass tables defines a foreign key constraint. This ensures that there must be a row in Employees for every row in SalariedEmployees or ContractEmployees.

The ID column is what links everything together. In the subclass tables, the ID is both a primary key for that table, and a foreign key pointing at the base class table.

Can anyone help me in Deriving of functional dependencies and Normalizing tables?

So this is an Entity Relationship Model diagram, which back in the day we also used to call a Chen diagram. For some odd reason colleges and tutorials almost exclusively teach data modelling using this type of diagram model. What's odd is that virtually no one in industry or practical DB development ever uses them, it is considered an entirely unnecessary and onerous intermediate step in data modelling and database design, we just go straight to tables, columns and relations modelling (which we call "ER Models/Diagrams" but are technically IDEFX1 model diagrams, but again no one calls them that).

Consequently what everyone coming out of school or a tutorial calls an ER Diagram and what everyone in industry and practical use thinks is an ER diagram are completely different things. Also, as you can see, many data folks with DB experience don't even know what a Chen diagram is or how to read one and so they are completely unaware that they are (almost) complete specs for the design of a database. As such the question is not at all "too broad" or inspecific. I myself had to learn how to read these diagram 20 years ago because I was interviewing so many college graduates at the time and this was the only diagramming/data-modelling technique that they knew.

I will walk through the process of reading and interpreting this diagram and explain how to turn that into Functional Dependencies and an (almost complete) data design. This is essentially the same task as converting a Chen diagram to text form, and I will leave it in text form without making the IDEFX1 (table-relational) diagram (I don't really have a good tool for it now that I'm retired). Datatypes are not specified here (which is normal for a Chen diagram) and although technically a formal relational data design does NOT require datatypes, practically speaking you do need them to finish your design and implementation. Also, there appears to be several mistakes or omissions in your Diagram which I will call out as I get to them.

So let's start with the basics of how to read a Chen diagram:

Sample Image

Legend:

  • The (blue) rectangles become tables which are Codd "relations" (intra-table).
  • The ovals or "bubbles" are columns in the table they are attached to,
  • Ovals with underlined text represent the primary key for the table they are attached to,
  • The (blue) diamonds represent Chen relations (inter-table) that will usually become your Foreign Key constraints,
  • The lines from a diamond (relation) to a rectangle (table) also indicate cardinality. An arrow means a cardinality of one, while no arrow is means "many" (this actually varies a lot in different "styles" so I am guessing as to which style you are using for this).

So now we can go through the diagram, first looking at the tables and columns and derive the intra-table functional dependencies from them. Intra-table functional dependencies go from the primary key columns to all of the other columns (if there are alternate keys then these would be additional functional dependencies)

Tables Functional Dependencies:

  • Cashier: Cashier_Id --> CashierName
  • Restaurant: Name --> Address, ContactNo
  • Chef: Chef_Id --> ChefName, Salary
  • Bill: Bill_No --> Price, OrderDetail
  • Customer: Cust_Id --> Cust_Name, Cust_Phone, Cust_Address
  • Meal: Meal_No --> Meal_Title, Meal_Price, Description, Quantity
  • Order: Order_No --> #Num_Meals(??)

I have marked the #Num_Meals column as questionable because it appears to be an aggregated field which are non-relational/denormalized and thus inappropriate in the data design (they are added much later in the application design/implementation, and usually as dynamic, not static elements). But I will leave it in for clarity.

Now we are ready to look at the Chen relations to derive the inter-table functional dependencies:

Relations:

  • PaidTo: Bill --> Cashier
  • Pays: Bill --> Customer
  • Places: Order --> Customer
  • Contains: Meal --> Order (*)
  • Prepare: Order --> Chef (*)
  • Works: Cashier --> Restaurant (*)
  • Has: Chef --> Restaurant (*)

(*) -- A note about the last four relations (Contains, Prepare, Works and Has), none of them has a connection with an arrow on it which implies that they are supposed to represent a many-to-many relation which has a couple of problems. First, Many-To-Many is not a valid relationship in relational data design and in practice requires the artificial creation of an intermediate "junction" table to change it from A:many-to-many:B into two relations between three tables: A:many-to-one:J:one-to-many:B. The other problem is that it seems clear from the context (the meaning of the table names) that you can have many Meals for one Order, but never many Orders for one Meal and thus the diagram is clearly wrong on that one.

Therefore, to avoid getting into the complexity of Junction tables I have just assumed that the diagram is wrong for these relations and that there is supposed to be arrows from <Contains> to [Order] and from <Prepare> to [Chef], etc. If you know for certain that these are supposed to be Many-to-Many relationships then you will have to incorporate that.

Now all we need to do is to combine and reduce the two lists of functional dependencies so that there are no redundancies. You can change the related-tables FDs into table-columns FDs simply by replacing the table names with the corresponding column names of their primary keys. For instance, the PaidTo relation's Bill --> Cashier would become Bill_No --> Cashier_Id. Once you have changed all of the FDs into table-column FDs, then you want to remove any redundancies by combining any FDs that have the same "key" on the left-hand side. Thus, Bill_No --> Cashier_Id and Bill_No --> Price, OrderDetail would be combined into Bill_No --> Price, OrderDetail, Cashier_Id.

Final Functional Dependencies:

  • Cashier: Cashier_Id --> CashierName, Restaurant(Name)
  • Restaurant: Name --> Address, ContactNo
  • Chef: Chef_Id --> ChefName, Salary, Restaurant(Name)
  • Bill: Bill_No --> Price, OrderDetail, Cashier_Id, Cust_Id
  • Customer: Cust_Id --> Cust_Name, Cust_Phone, Cust_Address
  • Meal: Meal_No --> Meal_Title, Meal_Price, Description, Quantity, Order_No
  • Order: Order_No --> #Num_Meals(??), Cust_Id, Chef_Id

Warning, different people, teachers, instructors and tutorials may do this (reduction) differently and thus end up with a different final list. You will have to review the rules that your course instructor expects and apply them here.

Finally, we are are ready to do the Table and Relations implementation/design. There is not enough information/context for me to do this all the way to BCNF for certain, but I can get it pretty close based on what we have above:

TABLE Restaurant
( Name Primary Key,
Address,
ContactNo
)

TABLE Cashier
( Cashier_Id Primary Key,
CashierName ,
Restaurant_Name FOREIGN KEY References Restaurant(Name)
)

TABLE Chef
( Chef_Id Primary Key,
ChefName,
Salary,
Restaurant_Name FOREIGN KEY References Restaurant(Name)
)

TABLE Bill
( Bill_No Primary Key,
Price,
OrderDetail,
Cashier_Id FOREIGN KEY References Cashier(Cashier_Id)
Cust_Id FOREIGN KEY References Customer(Cust_Id)
)

TABLE Meal
(
Meal_No Primary Key,
Meal_Title,
Meal_Price,
Description,
Quantity,
Order_No FOREIGN KEY References Order(Order_No)
)

TABLE Order
(
Order_No Primary Key,
#Num_Meals (??),
Cust_Id FOREIGN KEY References Customer(Cust_Id),
Chef_Id FOREIGN KEY References Chef(Chef_Id)
)

Note that there are no datatypes below because none were provided in the diagram. None of this will actually compile without specifying datatypes for each column. Also, the word "Order" is a reserved keyword in almost all databases, so you may want to change the name of the Orders table to something like 'Order_" to avoid any problems (otherwise you will have to explicitly quote this name in all of the SQL that refers to it).

One last thing I will note: It seems like the Bill table should have a relation/reference to the Order table but it does not (the OrderDetail column does not do this according to your diagram).

Prevent denormalization

-- Category CAT exists.
--
category {CAT}
PK {CAT}
-- Category type TYP exists.
--
ctype {TYP}
PK {TYP}

For each category, that category may be of more than one category type. For each category type, more than one category may be of that category type.

-- Category CAT is of type TYP.
--
category_ctype {CAT, TYP}
PK {CAT, TYP}

FK1 {CAT} REFERENCES category {CAT}
FK2 {TYP} REFERENCES ctype {TYP}
-- Request REQ is of category CAT, category type TYP.
--
request {REQ, CAT, TYP}
PK {REQ}

FK {CAT, TYP} REFERENCES category_ctype {CAT, TYP}

Note:

All attributes (columns) NOT NULL

PK = Primary Key
FK = Foreign Key

What SQL query to access data from a table twice-removed, fully normalized

Select u.Name, MAX(created_at) TransactionTime FROM #Users u
INNER JOIN #ORDERS o
ON u.Id = o.user_id
INNER JOIN #Transactions t
ON o.transaction_id = t.id
GROUP BY u.Name
ORDER BY MAX(created_at) desc

I feel the schema which you created in fine



Related Topics



Leave a reply



Submit