How Does Table Alias Names Affect Performance

How does table alias names affect performance?

The alias doesn't affect performance in any practical or measurable way at all (italics added on edit). That is, it would add a barely (if it all) measurable delay to query compilation. Once compiled (and re-used), it has no effect.

An alias removes ambiguity when you have more than one table because you know which table it comes from. It also prevents future table changes from breaking a query. Say, you add an audit column to one table where it already exists in another table. Queries without aliases using both tables will break.

An alias is also mandatory in some cases e.g. schema bound views.

The SQL parsing engine (that reads all queries before executing them, and uses the information to cache the compiled queries in the future for faster execution) is the only thing that looks at the aliases, and uses it to help remove ambiguities in symbol lookups. The system would already produce symbols, just like any other compilable statement in any other language, when it's being parsed prior to execution-storage.

Does using Aliases with AS slow down Query Speed?

Not at all and the performance impact is negligible. Alias give you much better time readability in the query as it removed ambiguity.

SQL Table Aliases - Good or Bad?

Table aliases are a necessary evil when dealing with highly normalized schemas. For example, and I'm not the architect on this DB so bear with me, it can take 7 joins in order to get a clean and complete record back which includes a person's name, address, phone number and company affiliation.

Rather than the somewhat standard single character aliases, I tend to favor short word aliases so the above example's SQL ends up looking like:

select person.FirstName
,person.LastName
,addr.StreetAddress
,addr.City
,addr.State
,addr.Zip
,phone.PhoneNumber
,company.CompanyName
from tblPeople person
left outer join tblAffiliations affl on affl.personID = person.personID
left outer join tblCompany company on company.companyID = affl.companyID

... etc

whats is the importance of table alias in sql join query?

No, your assumption is not true.

There are at least four reasons to use table aliases:

  • If the alias is shorter than the name it allows you to type less.
  • If you self-join a table you are required to give an alias to one or both of the tables to allow you to distinguish between them.
  • Derived tables must have an alias in some database systems (not Oracle though, I think).
  • The alias can make it clearer what the role of the table is in that particularly query.

There is no significant performance penalty from using them.

Performance implications of allowing alias to be used in HAVING clause

Narrowly focused on just that particular query, and with sample data loaded below. This does address some other queries such as the count(distinct ...) mentioned by others.

The alias in the HAVING appears to either slightly outperform or quite a bit outperform its alternative (depending on the query).

This uses a pre-existing table with about 5 million rows in it created quickly via this answer of mine which takes 3 to 5 minutes.

Resulting structure:

CREATE TABLE `ratings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5046214 DEFAULT CHARSET=utf8;

But using INNODB instead. Creates the expected INNODB gap anomaly due to the range reservation inserts. Just saying, but makes no difference. 4.7 Million rows.

Modify the table to get near Tim's assumed schema.

rename table ratings to students; -- not exactly instanteous (a COPY)
alter table students add column camId int; -- get it near Tim's schema
-- don't add the `camId` index yet

The following will take a while. Run it again and again in chunks or else your connection may timeout. The timeout is due to 5 million rows without a LIMIT clause in the update statement. Note, we do have a LIMIT clause.

So we are doing it in half a million row iterations. Sets a column to a
random number between 1 and 20

update students set camId=floor(rand()*20+1) where camId is null limit 500000; -- well that took a while (no surprise)

Keep running the above until no camId is null.

I ran it like 10 times (the whole thing takes 7 to 10 minutes)

select camId,count(*) from students
group by camId order by 1 ;

1 235641
2 236060
3 236249
4 235736
5 236333
6 235540
7 235870
8 236815
9 235950
10 235594
11 236504
12 236483
13 235656
14 236264
15 236050
16 236176
17 236097
18 235239
19 235556
20 234779

select count(*) from students;
-- 4.7 Million rows

Create a useful index (after the inserts of course).

create index `ix_stu_cam` on students(camId); -- takes 45 seconds

ANALYZE TABLE students; -- update the stats: http://dev.mysql.com/doc/refman/5.7/en/analyze-table.html
-- the above is fine, takes 1 second

Create the campus table.

create table campus
( camID int auto_increment primary key,
camName varchar(100) not null
);
insert campus(camName) values
('one'),('2'),('3'),('4'),('5'),
('6'),('7'),('8'),('9'),('ten'),
('etc'),('etc'),('etc'),('etc'),('etc'),
('etc'),('etc'),('etc'),('etc'),('twenty');
-- ok 20 of them

Run the two queries:

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 
FROM students
JOIN campus
ON campus.camID = students.camID
GROUP BY students.camID, campus.camName
HAVING COUNT(students.id) > 3
ORDER BY studentCount;
-- run it many many times, back to back, 5.50 seconds, 20 rows of output

and

SELECT students.camID, campus.camName, COUNT(students.id) as studentCount 
FROM students
JOIN campus
ON campus.camID = students.camID
GROUP BY students.camID, campus.camName
HAVING studentCount > 3
ORDER BY studentCount;
-- run it many many times, back to back, 5.50 seconds, 20 rows of output

So the times are identical. Ran each a dozen times.

The EXPLAIN output is the same for both

+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+
| 1 | SIMPLE | campus | ALL | PRIMARY | NULL | NULL | NULL | 20 | Using temporary; Using filesort |
| 1 | SIMPLE | students | ref | ix_stu_cam | ix_stu_cam | 5 | bigtest.campus.camID | 123766 | Using index |
+----+-------------+----------+------+---------------+------------+---------+----------------------+--------+---------------------------------+

Using the AVG() function, I am getting about a 12% increase in performance with the alias in the having (with identical EXPLAIN output) from the following two queries.

SELECT students.camID, campus.camName, avg(students.id) as studentAvg 
FROM students
JOIN campus
ON campus.camID = students.camID
GROUP BY students.camID, campus.camName
HAVING avg(students.id) > 2200000
ORDER BY students.camID;
-- avg time 7.5

explain

SELECT students.camID, campus.camName, avg(students.id) as studentAvg
FROM students
JOIN campus
ON campus.camID = students.camID
GROUP BY students.camID, campus.camName
HAVING studentAvg > 2200000
ORDER BY students.camID;
-- avg time 6.5

And lastly, the DISTINCT:

SELECT students.camID, count(distinct students.id) as studentDistinct 
FROM students
JOIN campus
ON campus.camID = students.camID
GROUP BY students.camID
HAVING count(distinct students.id) > 1000000
ORDER BY students.camID; -- 10.6 10.84 12.1 11.49 10.1 9.97 10.27 11.53 9.84 9.98
-- 9.9

SELECT students.camID, count(distinct students.id) as studentDistinct
FROM students
JOIN campus
ON campus.camID = students.camID
GROUP BY students.camID
HAVING studentDistinct > 1000000
ORDER BY students.camID; -- 6.81 6.55 6.75 6.31 7.11 6.36 6.55
-- 6.45

The alias in the having consistently runs 35% faster with the same EXPLAIN output. Seen below. So the same Explain output has been shown twice to not result in the same performance, but as a general clue.

+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+
| 1 | SIMPLE | campus | index | PRIMARY | PRIMARY | 4 | NULL | 20 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | students | ref | ix_stu_cam | ix_stu_cam | 5 | bigtest.campus.camID | 123766 | Using index |
+----+-------------+----------+-------+---------------+------------+---------+----------------------+--------+----------------------------------------------+

The Optimizer Appears to favor the alias in the having at the moment, especially for the DISTINCT.

Select SQL View Slow with table alias

The execution plans are very different.

The slow one has a part that leaps out as being problematic. It estimates a single row will be input to a nested loops join and result in a single scan of ValuationItems. In practice it ends up performing more than 1,000 such scans.

Estimated

Sample Image

Actual

enter link description here

SQL Server 2014 introduced a new cardinality estimator. Your fast plan is using it. This is shown in the XML as CardinalityEstimationModelVersion="120" Your slow plan isn't (CardinalityEstimationModelVersion="70").

So it looks as though in this case the assumptions used by the new estimator give you a better plan.

The reason for the difference is probably as the fast one is running cross database (references [ProbeProduction].[dbo].[ValuationsTotal]) and presumably the database you are executing it from has compatility level of 2014 so automatically gets the new CardinalityEstimator.

The slow one is executing in the context of ProbeProduction itself and I assume the compatibility level of that database must be < 2014 - so you are defaulting to the legacy cardinality estimator.

You can use OPTION (QUERYTRACEON 2312) to get the slow query to use the new cardinality estimator (changing the database compatibility mode to globally alter the behaviour shouldn't be done without careful testing of existing queries as it can cause regressions as well as improvements).

Alternatively you could just try and tune the query working within the limits of the legacy CE. Perhaps adding join hints to encourage it to use something more akin to the faster plan.



Related Topics



Leave a reply



Submit