What Is the Reason Not to Use Select *

What is the reason not to use select *?

The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.

When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select * is an anti-pattern.


So selecting columns is not a premature optimization. A few things off the top of my head ....

  1. If you specify columns in a SQL statement, the SQL execution engine will error if that column is removed from the table and the query is executed.
  2. You can more easily scan code where that column is being used.
  3. You should always write queries to bring back the least amount of information.
  4. As others mention if you use ordinal column access you should never use select *
  5. If your SQL statement joins tables, select * gives you all columns from all tables in the join

The corollary is that using select * ...

  1. The columns used by the application is opaque
  2. DBA's and their query profilers are unable to help your application's poor performance
  3. The code is more brittle when changes occur
  4. Your database and network are suffering because they are bringing back too much data (I/O)
  5. Database engine optimizations are minimal as you're bringing back all data regardless (logical).

Writing correct SQL is just as easy as writing Select *. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.

Performance issue in using SELECT *?

If you need a subset of the columns, you are giving bad help to the optimizer (cannot choose for index, or cannot go only to index, ...)

Some database can choose to retrieve data from indexes only. That thing is very very helpfull and give an incredible speedup. Running SELECT * queries does not allow this trick.

Anyway, from the point of view of application is not a good practice.


Example on this:

  • You have a table T with 20 columns (C1, C2, ..., C19 C20).
  • You have an index on T for (C1,C2)
  • You make SELECT C1, C2 FROM T WHERE C1=123
  • The optimizer have all the information on index, does not need to go to the table Data

Instead if you SELECT * FROM T WHERE C1=123, the optimizer needs to get all the columns data, then the index on (C1,C2) cannot be used.

In joins for multiple tables is a lot helpful.

SQL query - Select * from view or Select col1, col2, ... colN from view

NEVER, EVER USE "SELECT *"!!!!

This is the cardinal rule of query design!

There are multiple reasons for this. One of which is, that if your table only has three fields on it and you use all three fields in the code that calls the query, there's a great possibility that you will be adding more fields to that table as the application grows, and if your select * query was only meant to return those 3 fields for the calling code, then you're pulling much more data from the database than you need.

Another reason is performance. In query design, don't think about reusability as much as this mantra:

TAKE ALL YOU CAN EAT, BUT EAT ALL YOU TAKE.

SELECT * - pros /cons

In general, the use of SELECT * is not a good idea.

Pros:

  • When you add/remove columns, you don't have to make changes where you did use SELECT *
  • It is shorter to write
  • Also see the answers to: Can select * usage ever be justified?

Cons:

  • You are returning more data than you need. Say you add a VARBINARY column that contains 200k per row. You only need this data in one place for a single record - using SELECT * you can end up returning 2MB per 10 rows that you don't need
  • Explicit about what data is used
  • Specifying columns means you get an error when a column is removed
  • The query processor has to do some more work - figuring out what columns exist on the table (thanks @vinodadhikary)
  • You can find where a column is used more easily
  • You get all columns in joins if you use SELECT *
  • You can't use ordinal referencing (though using ordinal references for columns is bad practice in itself)
  • Also see the answers to: What is the reason not to use select *?

How to SELECT * and rename a column?

Yes you can do the following:

SELECT  bar AS foobar, a.* from foo as a;

But in this case you will get bar twice: one with name foobar and other with bar as * fetched it..

Does the number of columns returned affect the speed of a query?

You better avoid SELECT *

  • It leads to confusion when you change the table layout.
  • It selects unneeded columns, and your data packets get larger.
  • The columns can get duplicate names, which is also not good for some applications
  • If all the columns you need are covered by an index, SELECT columns will only use this index, while SELECT * will need to visit the table records to get the values you don't need. Also bad for performance.

View - Return 0 if no rows found in a grouped by query

If there is not an entry in the view results, then this will always return NULL - That's SQL. If you change your SELECT that you use against the view, you can achieve what you want:

SELECT IFNULL(total, 0) FROM total_transactions WHERE account_id = 2060

Edit:

(SELECT IFNULL(total, 0) total FROM total_transactions WHERE account_id = 2060)
UNION
(SELECT 0 total)


Related Topics



Leave a reply



Submit