Why Is Using '*' to Build a View Bad

Why is using '*' to build a view bad?

I don't think there's much in software that is "just bad", but there's plenty of stuff that is misused in bad ways :-)

The example you give is a reason why * might not give you what you expect, and I think there are others. For example, if the underlying tables change, maybe columns are added or removed, a view that uses * will continue to be valid, but might break any applications that use it. If your view had named the columns explicitly then there was more chance that someone would spot the problem when making the schema change.

On the other hand, you might actually want your view to blithely
accept all changes to the underlying tables, in which case a * would
be just what you want.

Update: I don't know if the OP had a specific database vendor in mind, but it is now clear that my last remark does not hold true for all types. I am indebted to user12861 and Jonny Leeds for pointing this out, and sorry it's taken over 6 years for me to edit my answer.

Why is SELECT * considered harmful?

There are really three major reasons:

  • Inefficiency in moving data to the consumer. When you SELECT *, you're often retrieving more columns from the database than your application really needs to function. This causes more data to move from the database server to the client, slowing access and increasing load on your machines, as well as taking more time to travel across the network. This is especially true when someone adds new columns to underlying tables that didn't exist and weren't needed when the original consumers coded their data access.

  • Indexing issues. Consider a scenario where you want to tune a query to a high level of performance. If you were to use *, and it returned more columns than you actually needed, the server would often have to perform more expensive methods to retrieve your data than it otherwise might. For example, you wouldn't be able to create an index which simply covered the columns in your SELECT list, and even if you did (including all columns [shudder]), the next guy who came around and added a column to the underlying table would cause the optimizer to ignore your optimized covering index, and you'd likely find that the performance of your query would drop substantially for no readily apparent reason.

  • Binding Problems. When you SELECT *, it's possible to retrieve two columns of the same name from two different tables. This can often crash your data consumer. Imagine a query that joins two tables, both of which contain a column called "ID". How would a consumer know which was which? SELECT * can also confuse views (at least in some versions SQL Server) when underlying table structures change -- the view is not rebuilt, and the data which comes back can be nonsense. And the worst part of it is that you can take care to name your columns whatever you want, but the next guy who comes along might have no way of knowing that he has to worry about adding a column which will collide with your already-developed names.

But it's not all bad for SELECT *. I use it liberally for these use cases:

  • Ad-hoc queries. When trying to debug something, especially off a narrow table I might not be familiar with, SELECT * is often my best friend. It helps me just see what's going on without having to do a boatload of research as to what the underlying column names are. This gets to be a bigger "plus" the longer the column names get.

  • When * means "a row". In the following use cases, SELECT * is just fine, and rumors that it's a performance killer are just urban legends which may have had some validity many years ago, but don't now:

    SELECT COUNT(*) FROM table;

    in this case, * means "count the rows". If you were to use a column name instead of * , it would count the rows where that column's value was not null. COUNT(*), to me, really drives home the concept that you're counting rows, and you avoid strange edge-cases caused by NULLs being eliminated from your aggregates.

    Same goes with this type of query:

    SELECT a.ID FROM TableA a
    WHERE EXISTS (
    SELECT *
    FROM TableB b
    WHERE b.ID = a.B_ID);

    in any database worth its salt, * just means "a row". It doesn't matter what you put in the subquery. Some people use b's ID in the SELECT list, or they'll use the number 1, but IMO those conventions are pretty much nonsensical. What you mean is "count the row", and that's what * signifies. Most query optimizers out there are smart enough to know this. (Though to be honest, I only know this to be true with SQL Server and Oracle.)

Is it bad to call views inside a View in sql

No, it's fine. In many cases I personally consider it preferable to writing one view with a giant and difficult to understand definition. In my opinion, using multiple views allows you to:

  1. Encapsulate discrete logic in individual views.
  2. Re-use logic in the individual views without having to repeat the logic (eliminating update problems later).
  3. Name your logic so that it's easier for the next programmer to understand what you were trying to accomplish.

SQL Server: Invalid Column Name

Whenever this happens to me, I press Ctrl+Shift+R which refreshes intellisense, close the query window (save if necessary), then start a new session which usually works quite well.

SQL Server reports 'Invalid column name', but the column is present and the query works through management studio

I suspect that you have two tables with the same name. One is owned by the schema 'dbo' (dbo.PerfDiag), and the other is owned by the default schema of the account used to connect to SQL Server (something like userid.PerfDiag).

When you have an unqualified reference to a schema object (such as a table) — one not qualified by schema name — the object reference must be resolved. Name resolution occurs by searching in the following sequence for an object of the appropriate type (table) with the specified name. The name resolves to the first match:

  • Under the default schema of the user.
  • Under the schema 'dbo'.

The unqualified reference is bound to the first match in the above sequence.

As a general recommended practice, one should always qualify references to schema objects, for performance reasons:

  • An unqualified reference may invalidate a cached execution plan for the stored procedure or query, since the schema to which the reference was bound may change depending on the credentials executing the stored procedure or query. This results in recompilation of the query/stored procedure, a performance hit. Recompilations cause compile locks to be taken out, blocking others from accessing the needed resource(s).

  • Name resolution slows down query execution as two probes must be made to resolve to the likely version of the object (that owned by 'dbo'). This is the usual case. The only time a single probe will resolve the name is if the current user owns an object of the specified name and type.

[Edited to further note]

The other possibilities are (in no particular order):

  • You aren't connected to the database you think you are.
  • You aren't connected to the SQL Server instance you think you are.

Double check your connect strings and ensure that they explicitly specify the SQL Server instance name and the database name.

SQL Server SELECT where any column contains 'x'

First Method(Tested)
First get list of columns in string variable separated by commas and then you can search 'foo' using that variable by use of IN

Check stored procedure below which first gets columns and then searches for string:

DECLARE @TABLE_NAME VARCHAR(128)
DECLARE @SCHEMA_NAME VARCHAR(128)

-----------------------------------------------------------------------

-- Set up the name of the table here :
SET @TABLE_NAME = 'testing'
-- Set up the name of the schema here, or just leave set to 'dbo' :
SET @SCHEMA_NAME = 'dbo'

-----------------------------------------------------------------------

DECLARE @vvc_ColumnName VARCHAR(128)
DECLARE @vvc_ColumnList VARCHAR(MAX)

IF @SCHEMA_NAME =''
BEGIN
PRINT 'Error : No schema defined!'
RETURN
END

IF NOT EXISTS (SELECT * FROM sys.tables T JOIN sys.schemas S
ON T.schema_id=S.schema_id
WHERE T.Name=@TABLE_NAME AND S.name=@SCHEMA_NAME)
BEGIN
PRINT 'Error : The table '''+@TABLE_NAME+''' in schema '''+
@SCHEMA_NAME+''' does not exist in this database!'
RETURN
END

DECLARE TableCursor CURSOR FAST_FORWARD FOR
SELECT CASE WHEN PATINDEX('% %',C.name) > 0
THEN '['+ C.name +']'
ELSE C.name
END
FROM sys.columns C
JOIN sys.tables T
ON C.object_id = T.object_id
JOIN sys.schemas S
ON S.schema_id = T.schema_id
WHERE T.name = @TABLE_NAME
AND S.name = @SCHEMA_NAME
ORDER BY column_id


SET @vvc_ColumnList=''

OPEN TableCursor
FETCH NEXT FROM TableCursor INTO @vvc_ColumnName

WHILE @@FETCH_STATUS=0
BEGIN
SET @vvc_ColumnList = @vvc_ColumnList + @vvc_ColumnName

-- get the details of the next column
FETCH NEXT FROM TableCursor INTO @vvc_ColumnName

-- add a comma if we are not at the end of the row
IF @@FETCH_STATUS=0
SET @vvc_ColumnList = @vvc_ColumnList + ','
END

CLOSE TableCursor
DEALLOCATE TableCursor

-- Now search for `foo`


SELECT *
FROM testing
WHERE 'foo' in (@vvc_ColumnList );

2nd Method
In sql server you can get object id of table then using that object id you can fetch columns. In that case it will be as below:

Step 1: First get Object Id of table

select * from sys.tables order by name    

Step 2: Now get columns of your table and search in it:

 select * from testing where 'foo' in (select name from sys.columns  where  object_id =1977058079)

Note: object_id is what you get fetch in first step for you relevant table

Incorrect syntax near ''

Such unexpected problems can appear when you copy the code from a web page or email and the text contains unprintable characters like individual CR or LF and non-breaking spaces.

SQL select join: is it possible to prefix all columns as 'prefix.*'?

I see two possible situations here. First, you want to know if there is a SQL standard for this, that you can use in general regardless of the database. No, there is not. Second, you want to know with regard to a specific dbms product. Then you need to identify it. But I imagine the most likely answer is that you'll get back something like "a.id, b.id" since that's how you'd need to identify the columns in your SQL expression. And the easiest way to find out what the default is, is just to submit such a query and see what you get back. If you want to specify what prefix comes before the dot, you can use "SELECT * FROM a AS my_alias", for instance.



Related Topics



Leave a reply



Submit