Difference Between Information_Schema VS Sys Tables in SQL Server

SQL Server: should I use information_schema tables over sys tables?

I would always try to use the Information_schema views over querying the sys schema directly.

The Views are ISO compliant so in theory you should be able to easily migrate any queries across different RDBMS.

However, there have been some cases where the information that I need is just not available in a view.

I've provided some links with further information on the views and querying a SQL Server Catalog.

http://msdn.microsoft.com/en-us/library/ms186778.aspx

http://msdn.microsoft.com/en-us/library/ms189082.aspx

TSQL INFORMATION_SCHEMA.COLUMNS VS sys.columns VS COL_LENGTH('Table','ColumnName')

SQL Server stores metadata about your database and its contents. These are typically accessed via sys. tables and information_schema. views.

The sys tables are somewhat specific to SQL Server, and tend to be fairly normalised. In contrast, the information_schema views (like other many reporting views you may create) provide data in a more readable format - they often join various sys tables together to get their results. See Difference between Information_schema vs sys tables in SQL Server for some more info.

COL_LENGTH() is a function that operates on the database, and doesn't need to 'read data' as such.

However, for all practical purposes, you will find zero difference between these. If you're just after the column length of a specific column, use COL_LENGTH, as it will probably be marginally faster. Otherwise feel free to use the information_schema views as they provide more easy-to-read information (or a custom set of sys tables joined together) as the number of reads to get the metadata is very small.

For example, I have a table I use for testing called 'test' with 5 columns (ID, and col2, col3, col4, col5). It has almost 2 million rows, but none of the data in that table actually needed to be read - just the metadata.

I ran the commands to get the column lengths/info from each. Each took 0.000s to complete (e.g., less than 1 millisecond). Here are the commands and results (first 10 columns only) to demonstrate some of the differences.

SELECT col_length('dbo.test', 'col2') AS Col2_info
/*
Col2_info
100
*/

SELECT * FROM sys.columns where object_id = (SELECT TOP 1 object_id FROM sys.objects WHERE name = 'test')
/*
object_id name column_id system_type_id user_type_id max_length precision scale collation_name is_nullable is_ansi_padded
2094630505 ID 1 56 56 4 10 0 NULL 0 0
2094630505 col2 2 167 167 100 0 0 Latin1_General_CI_AS 0 1
2094630505 col3 3 167 167 100 0 0 Latin1_General_CI_AS 1 1
2094630505 col4 4 167 167 100 0 0 Latin1_General_CI_AS 1 1
2094630505 col5 5 167 167 100 0 0 Latin1_General_CI_AS 1 1
*/

SELECT * from information_schema.COLUMNS where table_name = 'test'
/*
TABLE_CATALOG TABLE_SCHEMA TABLE_NAME COLUMN_NAME ORDINAL_POSITION COLUMN_DEFAULT IS_NULLABLE DATA_TYPE CHARACTER_MAXIMUM_LENGTH
Testdb dbo test ID 1 NULL NO int NULL
Testdb dbo test col2 2 NULL NO varchar 100
Testdb dbo test col3 3 NULL YES varchar 100
Testdb dbo test col4 4 NULL YES varchar 100
Testdb dbo test col5 5 NULL YES varchar 100
*/

Note in the version above, the sys.columns version was a) harder to construct, as it was only related to the object_id of my test_table; also it provides data that is a lot less easily readable than the information_schema version.

INFORMATION_SCHEMA vs sysobjects

The INFORMATION_SCHEMA is part of the SQL-92 standard, so it's not likely to change nearly as often as sysobjects.

The views provide an internal, system table-independent view of the SQL Server metadata. They work correctly even if significant changes have been made to the underlying system tables.

You are always much better off querying INFORMATION_SCHEMA, because it hides the implementation details of the objects in sysobjects.

System Catalog vs Information Schema

INFORMATION_SCHEMA is there for compatibility, it doesn't expose all the information about objects on the instance.

sys however, fully exposes any relevant information, though you do need to write more SQL. INFORMATION_SCHEMA is "easier" to use for new users, as something like INFORMATION_SCHEMA.COLUMNS contains the names of the table, schema, the column and the data type in objects. To get that with sys you would have to use sys.schemas, sys.tables, sys.columns and sys.types.

There used to be a note on SQL Server's documentation on the column TABLE_SCHEMA to suggest it could be wrong. This was changed earlier this year as I questioned it on their Github. The note now states that the information may be incomplete, not incorrect. Again, this is because INFORMATION_SCHEMA doesn't expose all the information about the objects, which sys does.

Difference between information_schema.tables and pg_tables

The views in the INFORMATION_SCHEMA are defined by the SQL standard and display information that is required by that. So they can't display any Postgres specific information that doesn't go along with the rules of the SQL standard. So queries using that are likely to work on other DBMS products as well that support INFORMATION_SCHEMA Not all products implement it 100% correct though. Postgres also has some areas where it deviates from the specification of the INFORMATION_SCHEMA. But the similarities are close enough that it's really easy to port and use such a query with a different database.

All system tables and views in the pg_catalog schema (including pg_tables) are completely Postgres specific. Queries using those will never run on other DBMS products. The INFORMATION_SCHEMA views use those system views and tables to collect and present the metadata as required by the SQL standard.

SQL Server difference between catalog views, information schema views vs DMVs

Catalog views represent views over some hidden tables. They return data from the database itself (from disk).

DMVs represent views over internal functions. They return data from internal SQL structures (from memory). DMV names always start with sys.dm_.

Joining sys.columns and sys.tables on database name

You generally do not want to query the sys.columns or sys.tables (or any system tables) directly. You should be using the INFORMATION_SCHEMA views. These views are the ANSI standard way of querying system tables that could change from release to release. The INFORMATION_SCHEMA views will not change, at least in a breaking way.

SELECT COLUMN_NAME,* 
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = <TableName> AND TABLE_SCHEMA= <SchemaName>

Of course, the WHERE clause is optional here and could be omitted to see all columns in all tables etc.



Related Topics



Leave a reply



Submit