Differencebetween Select and Project Operations

What is the difference between Select and Project Operations

Select Operation : This operation is used to select rows from a table (relation) that specifies a given logic, which is called as a predicate. The predicate is a user defined condition to select rows of user's choice.

Project Operation : If the user is interested in selecting the values of a few attributes, rather than selection all attributes of the Table (Relation), then one should go for PROJECT Operation.

See more : Relational Algebra and its operations

Query optimization: apply SELECT, PROJECT before JOIN

"Pushing selections" down is one of the basic optimization strategies which can reduce the amount of I/Os that need to be done .

For instance , a selection containing a sargable predicate, if pushed under the join , will effectively reduce the number of I/Os by reducing the number of tuples in the outer relation (Nested Loops Join requires |R|+|R|*|Q| I/Os) .

The main downside of pushing selections down is the situation in which the existing indices on the original relation cannot be used. The decision of whether to push or not is done in conjunction with the choice of the join method.

Similarly, you can "push" a projection down if it retains the attributes needed for the join.

Project and restrict in relational algebra

The two rules that can be applied to change the order of restrictions and projections maintaining the semantics of the expression are the following:

π_Y(σ_ΦX(E)) = σ_ΦX(π_Y(E)), if X ⊆ Y

otherwise, if the condition concerns attributes X ⊈ Y:

π_Y(σ_ΦX(E)) = π_Y(σ_ΦX(π_XY(E)))

where E is any relational expression producing a relation with a set of attributes that includes X and Y, π_X(E) is the projection of E over the set of attributes X and σ_ΦX(E) is the restriction over E with a condition ΦX over the set of attributes X.

These two rules are equivalence rules, so they can be applied in both directions. In general the optimizer tries to apply the restrictions before any other operation, if possible, and than to apply the projections before the joins.

Added

The first rule says that if you have a relation with attributes Z = Y ∪ W, performing a restriction over a subset of the attributes of Y, and then projecting the result on Y, is equivalent to perform first the projection, and then the restriction.

This equivalence can be proved in the following way.

Given E a relation with attributes Z = Y ∪ W, the definition of restriction is:

σ_ΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }

that is, the set of all the tuples of E such that ΦX(t) is true.

The definition of projection is:

π_Y(E) = { t₁ | t ∈ E ∧ Y ⊆ Z ∧ t₁ = t[Y] }

that is the set of tuples obtained by considering, for each tuple t of E, a (sub)tuple containing only the attributes Y of t.

So,

π_Y(σ_ΦX(E)) = π_Y(E') =
{ t₁ | t ∈ E' ∧ Y ⊆ Z ∧ t₁ = t[Y] }

where E' = σ_ΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }

Combining these two formulas, we get:

π_Y(σ_ΦX(E)) = { t₁ | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) ∧ Y ⊆ Z ∧ t₁ = t[Y] }

But since we know that X ⊆ Y, we can rewrite the formula as:

π_Y(σ_ΦX(E)) = { t₁ | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t) ∧ t₁ = t[Y] } [1]

Starting from the other term,

σ_ΦX(π_Y(E)) = σ_ΦX(E'') = { t | t ∈ E'' ∧ X ⊆ Z ∧ ΦX(t) }

where E'' = π_Y(E) = { t₁ | t ∈ E ∧ Y ⊆ Z ∧ t₁ = t[Y] }

Again, combining these two formulas and noting that X ⊆ Y, we get:

σ_ΦX(π_Y(E)) = { t₁ | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t₁) ∧ t₁ = t[Y] } [2]

[1] = [2] if we can show that ΦX(t) = ΦX(t[Y]), and this is true since both conditions are true or false at the same time, given that the condition is concerns only the attributes X, which are present both in t and in t[Y] (since X ⊆ Y).

The second rule says that, if you have a relation with attributes Z = X ∪ Y ∪ W, with X - Y ≠ ∅ performing a restriction over the attributes of X, and then projecting the result on Y, is equivalent to perform first a projection over the attributes X ∪ Y, then perform the restriction, and finally perform a new projection over the attributes X.

Also in this case a formal proof can be given, by reasoning in an analogous way to the above proof, but it is omitted here for brevity.

What are horizontal and vertical partitions in database and what is the difference?

Not a complete answer to the question but it answers what is asked in the question title. So the general meaning of horizontal and vertical database partitioning is:

Horizontal partitioning involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.

Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.

See more details here.

Are selection and projection associative?

If think if the selection is on subset of columns that are used in projections then YES,

but if not, there can be a situation where you are making a selection on columns that not exists.