Differencebetween Select and Project Operations

What is the difference between Select and Project Operations

Select Operation : This operation is used to select rows from a table (relation) that specifies a given logic, which is called as a predicate. The predicate is a user defined condition to select rows of user's choice.

Project Operation : If the user is interested in selecting the values of a few attributes, rather than selection all attributes of the Table (Relation), then one should go for PROJECT Operation.

See more : Relational Algebra and its operations

Query optimization: apply SELECT, PROJECT before JOIN

"Pushing selections" down is one of the basic optimization strategies which can reduce the amount of I/Os that need to be done .

For instance , a selection containing a sargable predicate, if pushed under the join , will effectively reduce the number of I/Os by reducing the number of tuples in the outer relation (Nested Loops Join requires |R|+|R|*|Q| I/Os) .

The main downside of pushing selections down is the situation in which the existing indices on the original relation cannot be used. The decision of whether to push or not is done in conjunction with the choice of the join method.

Similarly, you can "push" a projection down if it retains the attributes needed for the join.

Project and restrict in relational algebra

The two rules that can be applied to change the order of restrictions and projections maintaining the semantics of the expression are the following:

πYΦX(E)) = σΦXY(E)), if X ⊆ Y

otherwise, if the condition concerns attributes X ⊈ Y:

πYΦX(E)) = πYΦXXY(E)))

where E is any relational expression producing a relation with a set of attributes that includes X and Y, πX(E) is the projection of E over the set of attributes X and σΦX(E) is the restriction over E with a condition ΦX over the set of attributes X.

These two rules are equivalence rules, so they can be applied in both directions. In general the optimizer tries to apply the restrictions before any other operation, if possible, and than to apply the projections before the joins.

Added

The first rule says that if you have a relation with attributes Z = Y ∪ W, performing a restriction over a subset of the attributes of Y, and then projecting the result on Y, is equivalent to perform first the projection, and then the restriction.

This equivalence can be proved in the following way.

Given E a relation with attributes Z = Y ∪ W, the definition of restriction is:

σΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }

that is, the set of all the tuples of E such that ΦX(t) is true.

The definition of projection is:

πY(E) = { t1 | t ∈ E ∧ Y ⊆ Z ∧ t1 = t[Y] }

that is the set of tuples obtained by considering, for each tuple t of E, a (sub)tuple containing only the attributes Y of t.

So,

πYΦX(E)) = πY(E') =
{ t1 | t ∈ E' ∧ Y ⊆ Z ∧ t1 = t[Y] }

where E' = σΦX(E) = { t | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) }

Combining these two formulas, we get:

πYΦX(E)) = { t1 | t ∈ E ∧ X ⊆ Z ∧ ΦX(t) ∧ Y ⊆ Z ∧ t1 = t[Y] }

But since we know that X ⊆ Y, we can rewrite the formula as:

πYΦX(E)) = { t1 | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t) ∧ t1 = t[Y] }       [1]

Starting from the other term,

σΦXY(E)) = σΦX(E'') = { t | t ∈ E'' ∧ X ⊆ Z ∧ ΦX(t) }

where E'' = πY(E) = { t1 | t ∈ E ∧ Y ⊆ Z ∧ t1 = t[Y] }

Again, combining these two formulas and noting that X ⊆ Y, we get:

σΦXY(E)) = { t1 | t ∈ E ∧ X ⊆ Y ⊆ Z ∧ ΦX(t1) ∧ t1 = t[Y] }       [2]

[1] = [2] if we can show that ΦX(t) = ΦX(t[Y]), and this is true since both conditions are true or false at the same time, given that the condition is concerns only the attributes X, which are present both in t and in t[Y] (since X ⊆ Y).

The second rule says that, if you have a relation with attributes Z = X ∪ Y ∪ W, with X - Y ≠ ∅ performing a restriction over the attributes of X, and then projecting the result on Y, is equivalent to perform first a projection over the attributes X ∪ Y, then perform the restriction, and finally perform a new projection over the attributes X.

Also in this case a formal proof can be given, by reasoning in an analogous way to the above proof, but it is omitted here for brevity.

What are horizontal and vertical partitions in database and what is the difference?

Not a complete answer to the question but it answers what is asked in the question title. So the general meaning of horizontal and vertical database partitioning is:

Horizontal partitioning involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.

Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.

See more details here.

Are selection and projection associative?

If think if the selection is on subset of columns that are used in projections then YES,

but if not, there can be a situation where you are making a selection on columns that not exists.



Related Topics



Leave a reply



Submit