Identifying Transitive Dependencies

Identifying functional dependencies (FDs)

From your question it seems that you do not have a clear understanding of basics.

Application relationships & situations

First you have to take what you were told about your application (including business rules) and identify the application relationships (aka associations) (aka relations, in the math sense of association). Each gets a (base) table (aka relation, in the math sense of associated tuples) variable. Such an application relationship can be characterized by a row membership criterion (aka meaning) (aka predicate) that is a statement template. Eg suppose criterion student [si] takes course [ct] has table variable TAKES. The parameters of the criterion are the columns of its table. We can use a table name with columns (like an SQL declaration) as a shorthand for the criterion. Eg TAKES(si,ct). A criterion plus a row makes a statement (aka proposition) about a situation. Eg row (17,'CS101') gives student 17 takes course 'CS101' ie TAKES(17,'CS101'). Rows that give a true statement go in the table and rows that make a false one stay out.

If we can rephrase a criterion as the AND/conjunction of two others then we only need the tables with those other criteria. This is because NATURAL JOIN is defined so that the NATURAL JOIN of two tables containing the rows making their criteria true returns the rows that make the AND/conjunction of their criteria true. So we can NATURAL JOIN the two tables to get back the original. (This is what normalization is doing by decomposing tables into components.)

/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with title [ct]
from instructor with id [ii] and name [in] and office [io]
with grade [scg]
*/
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)

/* rows where
student with id [si] has name [sn] and address [sa] and major [sm]
and takes course [ci] with grade [scg]
*/
SG(si,sn,sa,sm,ci,scg)

/* rows where
course [ci] with title [ct]
is taught by instructor with id [ii] and name [in] and office [io]
*/
CI(ci,ct,ii,in,io,scg)

Now by the definition of NATURAL JOIN,
the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg)
are the rows in SG NATURAL JOIN CI.

And since
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
when/iff
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
ie since
the rows where
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)
are the rows where
SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg),
we have T = SG NATURAL JOIN CI.

Together the application relationships and situations that can arise determine both the rules and constraints! They are just things that are true of every application situation or every database state (ie values of one or more base tables) (which are are a function of the criteria and the possible application situations.)

Then we normalize to reduce redundancy. Normalization replaces a table variable by others whose predicates AND/conjoin together to the original's when this is beneficial.

The only time a rule can tell you something that you don't know already know from the (putative) criteria and (putative) situations is when you don't really understand the criteria or what situations can turn up, and the a priori rules are clarifying something about that. A person giving you rules is already using application relationships that they assume you understand and they can only have determined that a rule holds by using them and all the application situations that can arise (albeit informally)!

(Unfortunately, many presentations of information modeling don't even mention application relationships. Eg: If someone says "there is a X:Y relationship" then they must already have in mind a particular binary application relationship between entities; knowing it and what application situations can arise, they are reporting that it has a certain cardinality in a certain direction. This will correspond to some application relationship, represented by (a projection of) a table using column sets that identify entities. Plus some presentations/methods call FKs "relationships"--confusing them with those relationships.)

Check out "fact-based" information modeling methods Object-Role Modeling or (its predecessor) NIAM.

FDs & CKs

Given the criterion for putting rows into or leaving them out of a table and all possible situations that can arise, only some values (sets of rows) can ever be in a table variable.

For every subset of columns you need to decide which other columns can only have one value for a given subrow value for those columns. When it can only have one we say that the subset of columns functionally determines that column. We say that there is a FD (functional dependency) columns->column. This is when we can express the table's predicate as "... AND column=F(columns)" for some function F. (F is represented by the projection of the table on the column & columns.) But every superset of that subset will also functionally determine it, so that cuts down on cases. Conversely, if a given set does not determine a column then no subset of the set does. Applying Armstrong's axioms gives all the FDs that hold when given FDs hold. (Algorithms & software are available to apply them & determine FD closures & covers.) Also, you may think in terms of column sets being unique; then all other columns are functionally dependent on that set. Such a set is called a superkey.

Only after you have determined the FDs can you determine the CKs (candidate keys)! A CK is a superkey that contains no smaller superkey. (That a CK and/or superkey is present is also a constraint.) We can pick a CK as PK (primary key). PKs have no other role in relational theory.

A partial dependency relies on either one of the attributes from the
Primary key.

Don't use "involve" or "relies on" to give a definition. Say, "when" or "iff" ("if and only if").

Read a definition. A FD that holds is partial when/iff using a proper subset of the determinant gives a FD that holds with the same determined column; otherwise it is full. Note that this does not involve CKs. A relation is in 2NF when all non-prime attributes are fully functionally dependent on every CK.

A transitive dependency involves two or more non-key attributes in a
functional dependence where one of the non-key attributes is dependent
on a key attribute (from my PK).

Read a definition. S -> T is transitive when/iff there is an X where S -> X and X -> T and not (X -> S) and not (X = T). Note that this does not involve CKs. A relation is in 3NF when all non-prime attributes are non-transitively dependent on every CK.

"1NF" has no single meaning.

Not sure if this consistitues a transitive dependency

Transitive dependency occurs only if you have indirect relationship between more than 2 attributes that are not part of they key.

In your example, as you explained, the StaffID is part of your dependency, which is fine because it's the primary key.

Also you can look at this question that shows what is wrong with a transitive dependency. It could help put things into perspective.

In your table, if you delete staff member, you delete all the information (rightly so because you don't need it). If you leave phone number in a different table and, for instance, delete entry only in Staff, you're left with a wild phone number. But if your Staff table allowed multiple entries for the same person (but different departments) then the situation would be different.

Other sites that helped me in the past:

https://www.thoughtco.com/transitive-dependency-1019760
https://beginnersbook.com/2015/04/transitive-dependency-in-dbms/

Funnily they always follow the book example : )

Identifying the transitive dependency including a specific library - Gradle

You can use gradle dependencyInsight --dependency org.hibernate:hibernate-core to see details for given dependency.

How can I view transitive dependencies of a Maven pom.xml file?

On the CLI, use mvn dependency:tree
(Here are some additional Usage notes)

When running dependency:tree on multi-module maven project, use mvn compile dependency:tree instead1.

Otherwise, the POM Editor in M2Eclipse (Maven integration for Eclipse) is very good, and it includes a hierarchical dependency view.


1If you don't compile, you might get error Failed to execute goal on project baseproject: Could not resolve dependencies for project com.company:childproject:jar:1.0.0: Could not find artifact. This might happen because dependency:tree command doesn't build projects and doesn't resolve dependencies, and your projects are not installed in maven repository.

Verifying non-key column dependencies and transitive dependencies

To save on screen space I'll rename attributes:

  • Brand: BRD
  • Flavor: FLA
  • Animal: ANM
  • Breed size: SIZ
  • Age group: AGP
  • Ingredient: ING
  • Price: PRI

Method 1

Normalization, as in textbooks.

Step 1.1

Due to repeating groups in ingredients, the value in the table is not a relation, hence the table does not represent a relational variable, hence it is not in 1NF. The solution is to make sure that the ingredient column (ING) has exactly one ingredient per row-- as you have done. However, no new attributes are added (no new ID). Now we have (in 1NF):

R {BRD, FLA, ANM, SIZ, AGP, ING, PRI}

The whole heading is the key.

Step 1.2

From FDs:

  • {BRD, FLA} -> {ANM}
  • {BRD, FLA} -> {SIZ}
  • {BRD, FLA} -> {AGP}
  • {BRD, FLA} -> {PRI}

by applying union rule for FDs and Heath's theorem:

 R1 {BRD, FLA, ANM, SIZ, AGP, PRI}
KEY {BRD, FLA}

R2 {BRD, FLA, ING}
KEY {BRD, FLA, ING}

Done. That's it, if I got the FDs correctly.

Both are in BCNF, with confidence I would say that R1 is in 5NF, R2 in 6NF.

Method 2

Database design is predicate design.

Not a formal normalization method as found in most textbooks, but a design method which results in tables being in high NF (5NF, 6NF).

In the first step, the problem (business domain) is verbalized using simple predicates and associated constraints. The problem can be then reasoned about in natural language using logic.

A simple predicate can not be decomposed without losing information, its matching relvar is in 6NF.

In the second step, these simple predicates (and matching relvars) can be combined, making sure not to introduce redundancy and possibility of logical errors, namely contradiction. Once done, relvars (tables) are expected to be in 5NF.

Step 2.1

Describing the problem using simple predicates and matching 6NF relvars. Not verbalizing constraints (it gets long-winded), just stating them.

-- Brand BRD exists.
--
brand {BRD}
PK {BRD}
-- Flavor FLA exists.
--
flavor {FLA}
PK {FLA}
-- Animal type ANM exists.
--
animal {ANM}
PK {ANM}
-- Breed size SIZ exists.
--
bsize {SIZ}
PK {SIZ}
-- Age group AGP exists.
--
age {AGP}
PK {AGP}
-- Ingredient ING exists.
--
ingredient {ING}
PK {ING}
-- Pet food with flavor FLA made by brand BRD
-- is for animal type ANM.
--
food {BRD, FLA, ANM}
PK {BRD, FLA}

FK1 {BRD} REFERENCES brand {BRD}
FK2 {FLA} REFERENCES flavor {FLA}
FK3 {ANM} REFERENCES animal {ANM}
-- Pet food with flavor FLA made by brand BRD
-- is recommended for breed size SIZ.
--
food_bsize {BRD, FLA, SIZ}
PK {BRD, FLA}

FK1 {BRD, FLA} REFERENCES
food {BRD, FLA}

FK2 {SIZ} REFERENCES bsize {SIZ}
-- Pet food with flavor FLA made by brand BRD
-- is recommended for breed age group AGP.
--
food_age {BRD, FLA, AGP}
PK {BRD, FLA}

FK1 {BRD, FLA} REFERENCES
food {BRD, FLA}

FK2 {AGP} REFERENCES age {AGP}
-- Pet food with flavor FLA made by brand BRD
-- is priced at PRI Euros per unit.
--
price {BRD, FLA, PRI}
PK {BRD, FLA}

FK {BRD, FLA} REFERENCES
food {BRD, FLA}
-- Pet food with flavor FLA made by brand BRD
-- contains ingredient ING.
--
recipe {BRD, FLA, ING}
PK {BRD, FLA, ING}

FK1 {BRD, FLA} REFERENCES
food {BRD, FLA}

FK2 {ING} REFERENCES ingredient {ING}

Step 2.2

Just by looking at the keys, we can see that food, food_bsize, food_age, and price tables can be combined.

-- Pet food with flavor FLA made by brand BRD
-- is for animal type ANM, recommended for
-- breed size SIZ, breed age group AGP; priced
-- at PRI Euros per unit.
--
food_ {BRD, FLA, ANM, SIZ, AGP, PRI}
PK {BRD, FLA}

FK1 {BRD} REFERENCES brand {BRD}
FK2 {FLA} REFERENCES flavor {FLA}
FK3 {ANM} REFERENCES animal {ANM}
FK4 {SIZ} REFERENCES bsize {SIZ}
FK5 {AGP} REFERENCES age {AGP}

If we decide not to keep the first six tables defining domains, then the final result is as in the first method:

food_ {BRD, FLA, ANM, SIZ, AGP, PRI}
PK {BRD, FLA}

recipe {BRD, FLA, ING}
PK {BRD, FLA, ING}

FK {BRD, FLA} REFERENCES food_
{BRD, FLA}

However, in a real-world project, you will likely need brand, flavor, animal, bsize, and age tables to constrain domains. There is no rule in normalization stating that you should have them.

Also, not all attributes may be known at the sime time, so it is not likely that you'll combine all of food, food_bsize, food_age, and price tables into food_. This will depend on the business process and optionality of attributes.

Adding IDs

Adding surrogate keys (IDs) has nothing to do with normalization. You may need to add them for other reasons, take a look at this example.


Note:

All attributes (columns) NOT NULL

KEY = PK or AK

PK = Primary Key
AK = Alternate Key (Unique)
SK = Proper Superkey (Unique)
FK = Foreign Key

I always use the second method for design. Have never seen the first one used outside of a formal DB course of some kind; and people asking for help on SO.
For some reason, beyond my understanding, method 1 is taught in DB courses under topic "DB design" before method 2. Most schools do not teach method 2 at all. Go figure.



Related Topics



Leave a reply



Submit