Sql: Using Select *

Nested select statement in SQL Server

You need to alias the subquery.

SELECT name FROM (SELECT name FROM agentinformation) a  

or to be more explicit

SELECT a.name FROM (SELECT name FROM agentinformation) a  

Using SELECT result in another SELECT

NewScores is an alias to Scores table - it looks like you can combine the queries as follows:

SELECT 
ROW_NUMBER() OVER( ORDER BY NETT) AS Rank,
Name,
FlagImg,
Nett,
Rounds
FROM (
SELECT
Members.FirstName + ' ' + Members.LastName AS Name,
CASE
WHEN MenuCountry.ImgURL IS NULL THEN
'~/images/flags/ismygolf.png'
ELSE
MenuCountry.ImgURL
END AS FlagImg,
AVG(CAST(NewScores.NetScore AS DECIMAL(18, 4))) AS Nett,
COUNT(Score.ScoreID) AS Rounds
FROM
Members
INNER JOIN
Score NewScores
ON Members.MemberID = NewScores.MemberID
LEFT OUTER JOIN MenuCountry
ON Members.Country = MenuCountry.ID
WHERE
Members.Status = 1
AND NewScores.InsertedDate >= DATEADD(mm, -3, GETDATE())
GROUP BY
Members.FirstName + ' ' + Members.LastName,
MenuCountry.ImgURL
) AS Dertbl
ORDER BY;

What is the reason not to use select *?

The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.

When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select * is an anti-pattern.


So selecting columns is not a premature optimization. A few things off the top of my head ....

  1. If you specify columns in a SQL statement, the SQL execution engine will error if that column is removed from the table and the query is executed.
  2. You can more easily scan code where that column is being used.
  3. You should always write queries to bring back the least amount of information.
  4. As others mention if you use ordinal column access you should never use select *
  5. If your SQL statement joins tables, select * gives you all columns from all tables in the join

The corollary is that using select * ...

  1. The columns used by the application is opaque
  2. DBA's and their query profilers are unable to help your application's poor performance
  3. The code is more brittle when changes occur
  4. Your database and network are suffering because they are bringing back too much data (I/O)
  5. Database engine optimizations are minimal as you're bringing back all data regardless (logical).

Writing correct SQL is just as easy as writing Select *. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.

How to write a SQL DELETE statement with a SELECT statement in the WHERE clause?

You need to identify the primary key in TableA in order to delete the correct record. The primary key may be a single column or a combination of several columns that uniquely identifies a row in the table. If there is no primary key, then the ROWID pseudo column may be used as the primary key.

DELETE FROM tableA
WHERE ROWID IN
( SELECT q.ROWID
FROM tableA q
INNER JOIN tableB u on (u.qlabel = q.entityrole AND u.fieldnum = q.fieldnum)
WHERE (LENGTH(q.memotext) NOT IN (8,9,10) OR q.memotext NOT LIKE '%/%/%')
AND (u.FldFormat = 'Date'));

Why is SELECT * considered harmful?

There are really three major reasons:

  • Inefficiency in moving data to the consumer. When you SELECT *, you're often retrieving more columns from the database than your application really needs to function. This causes more data to move from the database server to the client, slowing access and increasing load on your machines, as well as taking more time to travel across the network. This is especially true when someone adds new columns to underlying tables that didn't exist and weren't needed when the original consumers coded their data access.

  • Indexing issues. Consider a scenario where you want to tune a query to a high level of performance. If you were to use *, and it returned more columns than you actually needed, the server would often have to perform more expensive methods to retrieve your data than it otherwise might. For example, you wouldn't be able to create an index which simply covered the columns in your SELECT list, and even if you did (including all columns [shudder]), the next guy who came around and added a column to the underlying table would cause the optimizer to ignore your optimized covering index, and you'd likely find that the performance of your query would drop substantially for no readily apparent reason.

  • Binding Problems. When you SELECT *, it's possible to retrieve two columns of the same name from two different tables. This can often crash your data consumer. Imagine a query that joins two tables, both of which contain a column called "ID". How would a consumer know which was which? SELECT * can also confuse views (at least in some versions SQL Server) when underlying table structures change -- the view is not rebuilt, and the data which comes back can be nonsense. And the worst part of it is that you can take care to name your columns whatever you want, but the next guy who comes along might have no way of knowing that he has to worry about adding a column which will collide with your already-developed names.

But it's not all bad for SELECT *. I use it liberally for these use cases:

  • Ad-hoc queries. When trying to debug something, especially off a narrow table I might not be familiar with, SELECT * is often my best friend. It helps me just see what's going on without having to do a boatload of research as to what the underlying column names are. This gets to be a bigger "plus" the longer the column names get.

  • When * means "a row". In the following use cases, SELECT * is just fine, and rumors that it's a performance killer are just urban legends which may have had some validity many years ago, but don't now:

    SELECT COUNT(*) FROM table;

    in this case, * means "count the rows". If you were to use a column name instead of * , it would count the rows where that column's value was not null. COUNT(*), to me, really drives home the concept that you're counting rows, and you avoid strange edge-cases caused by NULLs being eliminated from your aggregates.

    Same goes with this type of query:

    SELECT a.ID FROM TableA a
    WHERE EXISTS (
    SELECT *
    FROM TableB b
    WHERE b.ID = a.B_ID);

    in any database worth its salt, * just means "a row". It doesn't matter what you put in the subquery. Some people use b's ID in the SELECT list, or they'll use the number 1, but IMO those conventions are pretty much nonsensical. What you mean is "count the row", and that's what * signifies. Most query optimizers out there are smart enough to know this. (Though to be honest, I only know this to be true with SQL Server and Oracle.)

SELECT using 'CASE' in SQL

This is just the syntax of the case statement, it looks like this.

SELECT 
CASE
WHEN FRUIT = 'A' THEN 'APPLE'
WHEN FRUIT = 'B' THEN 'BANANA'
END AS FRUIT
FROM FRUIT_TABLE;

As a reminder remember; no assignment is performed the value becomes the column contents. (If you wanted to assign that to a variable you would put it before the CASE statement).



Related Topics



Leave a reply



Submit