Performance Issue in Using Select *

Performance issue in using SELECT *?

If you need a subset of the columns, you are giving bad help to the optimizer (cannot choose for index, or cannot go only to index, ...)

Some database can choose to retrieve data from indexes only. That thing is very very helpfull and give an incredible speedup. Running SELECT * queries does not allow this trick.

Anyway, from the point of view of application is not a good practice.


Example on this:

  • You have a table T with 20 columns (C1, C2, ..., C19 C20).
  • You have an index on T for (C1,C2)
  • You make SELECT C1, C2 FROM T WHERE C1=123
  • The optimizer have all the information on index, does not need to go to the table Data

Instead if you SELECT * FROM T WHERE C1=123, the optimizer needs to get all the columns data, then the index on (C1,C2) cannot be used.

In joins for multiple tables is a lot helpful.

What are the potentially negative implications of using SELECT * in MySQL?

There is a list of issues with using select *.

1 - consistency of output

If your client expects certain fields and you change the layout of a table, select * will change its output if the underlying tables changes, breaking your client code.

2 - Network performance

If you only need a subset of all field in the output, select * will waste network bandwidth by sending unneeded data across the wire.

3a - Database performance

On simple tables select * may not slow performance down, but if you have blob fields (that you're not interested in) select * will fetch those as well, killing performance

3b - Memory usage on the database server

If MySQL needs to use a temporary table, select * will tax the servers, disk, memory and CPU extra by making MySQL store more data in memory.

3c - On InnoDB covering indexes cannot be used

On InnoDB if you select fields that are indexed, MySQL needs never to read the actual table data, it just reads the info straight from the index, select * kills this optimization.

4 - Clarity of code

select * gives the reader of a query no info on which fields will be retrieved from the server. select name, address, telephone from ... makes it instantly clear what data we are dealing with.

If your tables are in Zulu you can even do

select
igama as name
ikheli as address
....

Which is much more useful than the original names for the 99.7% of the people that don't speak Zulu.

5 - Don't hack code, craft it

Select * is just a quick hack to make stuff work.

But when you write code you're supposed to have an idea what you're doing, make that explicit, select what you need, give fields meaningful name using aliases if needed.

Tweak your code to select only what you need.

If I see a select * in a code review, I mark it down sight unseen because it's a code smell.

Hope this helps.

Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc

One reason that selecting specific columns is better is that it raises the probability that SQL Server can access the data from indexes rather than querying the table data.

Here's a post I wrote about it: The real reason select queries are bad index coverage

It's also less fragile to change, since any code that consumes the data will be getting the same data structure regardless of changes you make to the table schema in the future.

What is the reason not to use select *?

The essence of the quote of not prematurely optimizing is to go for simple and straightforward code and then use a profiler to point out the hot spots, which you can then optimize to be efficient.

When you use select * you're make it impossible to profile, therefore you're not writing clear & straightforward code and you are going against the spirit of the quote. select * is an anti-pattern.


So selecting columns is not a premature optimization. A few things off the top of my head ....

  1. If you specify columns in a SQL statement, the SQL execution engine will error if that column is removed from the table and the query is executed.
  2. You can more easily scan code where that column is being used.
  3. You should always write queries to bring back the least amount of information.
  4. As others mention if you use ordinal column access you should never use select *
  5. If your SQL statement joins tables, select * gives you all columns from all tables in the join

The corollary is that using select * ...

  1. The columns used by the application is opaque
  2. DBA's and their query profilers are unable to help your application's poor performance
  3. The code is more brittle when changes occur
  4. Your database and network are suffering because they are bringing back too much data (I/O)
  5. Database engine optimizations are minimal as you're bringing back all data regardless (logical).

Writing correct SQL is just as easy as writing Select *. So the real lazy person writes proper SQL because they don't want to revisit the code and try to remember what they were doing when they did it. They don't want to explain to the DBA's about every bit of code. They don't want to explain to their clients why the application runs like a dog.

Performance benefit when SQL query is limited vs calling entire row?

It's not just the extra data aspect that you need to consider. Selecting all columns will negate the usefulness of covering indexes, since a bookmark lookup into the clustered index (or table) will be required.

SQL query - Select * from view or Select col1, col2, ... colN from view

NEVER, EVER USE "SELECT *"!!!!

This is the cardinal rule of query design!

There are multiple reasons for this. One of which is, that if your table only has three fields on it and you use all three fields in the code that calls the query, there's a great possibility that you will be adding more fields to that table as the application grows, and if your select * query was only meant to return those 3 fields for the calling code, then you're pulling much more data from the database than you need.

Another reason is performance. In query design, don't think about reusability as much as this mantra:

TAKE ALL YOU CAN EAT, BUT EAT ALL YOU TAKE.

Prepared statement doesn't fetches data

When using a MySQLi prepared statement, it's a little different than using a standard query. You will need to use mysqli_stmt::bind_result(), or store the result-set with mysqli_stmt::get_result() before you can use the data when you fetch the result. Note that you need the MySQL native driver mysqlnd before you can use get_result() - otherwise you'll need to bind each column manually with bind_result().

Here's an example how to use bind_result(). Note that you need to bind as many columns as there are in the query, and since you do SELECT *, you need to bind everything you have in the table - but, that method will fail if you later add a column to your table. It's therefor better to select just the columns you need, like SELECT id, test FROM...

$id = $_REQUEST['id'];

$stmt = $conn->prepare('SELECT test, id FROM test WHERE id = ?');
$stmt->bind_param('s', $id);

$stmt->execute();
$stmt->bind_result($test, $result_id);
$stmt->fetch());

/*
* If there was any matching rows, the variables defined in bind_param()
* now hold the values
* You can then use '$test' and '$result_id'
*/
echo $test;

$stmt->close();

If you however have the MySQL Native Driver installed, you can use get_result() and use it as a "regular" query. Then it doesn't matter all that much if you do SELECT * (although I don't recommend ever selecting everything - you should select the columns you need, and nothing more).

$id = $_REQUEST['id'];

$stmt = $conn->prepare('SELECT * FROM test WHERE id = ?');
$stmt->bind_param('s', $id);

$stmt->execute();
$result = $stmt->get_result(); // $result is now an object of MySQLi resource, and not MySQLi statement
// It can now be used as as the result-set of a regular query

$row = $result->fetch_assoc());
$test = $row['test'];
echo $test;

$stmt->close();
  • PHP.net on mysqli_stmt::get_result()
  • PHP.net on mysqli_stmt::bind_result()


Related Topics



Leave a reply



Submit