What Is The Advantage of Using Varbinary Over Varchar Here

What is the advantage of using varbinary over varchar here?

I believe the expectation is that the varbinary data will generally consume fewer bytes (5), than the varchar one (10 or 11, I think) per portion of the original string, and so, for very large numbers of components, or comparisons to occur, it should be more efficient.

But I'd recommend that if you were looking to use either solution, that you implement both (they're quite short), and try some profiling against your real data (and query patterns), to see if there are practical differences (I wouldn't expect so).

(Crafty Steal): And as Martin points out, the binary comparisons will be more efficient, since it won't involve all of the code that's there to deal with collations. :-)

why varbinary instead of varchar

Mediawiki changed from varchar to varbinary in early 2011:

War on varchar. Changed all occurrences of varchar(N) and varchar(N)
binary to varbinary(N). varchars cause problems ("Invalid mix of
collations" errors) on MySQL databases with certain configs, most
notably the default MySQL config.

What is the benefit of having varbinary field in a separate 1-1 table?

There is no performance nor operational advantage. Since SQL 2005 the LOB types are already stored for you by the engine in a separate allocation unit, a separate b-tree. If you study the Table and Index Organization of SQL Server you'll see that every partition has up to 3 allocation units: data, LOB and row-overflow:

Table Organization

(source: s-msft.com)

A LOB field (varchar(max), nvarchar(max), varbinary(max), XML, CLR UDTs as well as the deprecated types text, ntext and image) will have in the data record itself, in the clustered index, only a very small footprint: a pointer into the LOB allocation unit, see Anatomy of a Record.

By storing a LOB explicitly in a separate table you gain absolutely nothing. You just add unneeded complexity as former atomic updates have to distribute themselves now into two separate tables, complicating the application and the application transaction structure.

If the LOB content is an entire file then perhaps you should consider upgrade to SQL 2008 and using FILESTREAM.

Indexing on varbinary column vs varchar vs int - which is fast

It depends on the size of the column, but for two columns of the same size the varbinary will typically be faster. The other factor involved here is the collation used for the column. The default collation in Sql Server is not case senstive, meaning for for compare purposes SOME RANDOM KEY, some random key, and every permutation thereof are all the same value, and therefore the database must do extra work when comparing and sorting those keys to know what goes where and what value matches what other values: it's not just a straight byte-for-byte comparison anymore.

SQL Server indexing - varchar(100) vs varbinary(100)? [convert data]

It is better to create an index on varchar than varbinary. Varbinary is suitable for blobs but you can store strings in varbinary also. Such blobs are complementary to your actual data. Your own research lead to that conclusion also.

An email address can be entered by user in variety of formats - abc@xyz.com or Abc@Xyz.com etc. It is easier to store/extract such information in/from varchar field. Joe Enos is absolutely right that binary comparisons will be case-sensitive (comparing binary info) whereas varchar will be case-insensitive assuming that's how you have set up your DB and column collation. With varbinary, you'll also have to be careful about padding.

Varchar is alive and healthy. When you index varchar(100), try to use a non-clustered index. My general preference is to use a surrogate key in most situations as clustered index.

Disadvantage of choosing large MAX value for varchar or varbinary

That depends on whether it is ever reasonable to store a large amount of data in the particular column.

If you declare a column that would never properly store much data (i.e. an employee first name as a VARCHAR(1000)), you end up with a variety of problems

  1. Many if not most client APIs (i.e. ODBC drivers, JDBC drivers, etc) allocate memory buffers on the client that are large enough to store the maximum size of a particular column. So even though the database only has to store the actual data, you may substantially increase the amount of memory the client application uses.
  2. You lose the ability to drive data validation rules (or impart information about the data) from the table definition. If the database allows 1000 character first names, every application that interacts with the database will probably end up having its own rules for how large an employee name can be. If this is not mitigated by putting a stored procedure layer between all applications and the tables, this generally leads to various applications having various rules.
  3. Murphy's Law states that if you allow 1000 characters, someone will eventually store 1000 characters in the column, or at least a value large enough to cause errors in one or more application (i.e. no one checked to see whether every application's employee name field could display 1000 characters).

SQL Server varbinary(max) and varchar(max) data in a separate table

I would advise against separation. It complicates the design significantly for little or no benefit. As you probably know, SQL Server already stores LOBs on separate allocation units, as described in Table and Index Organization.

Your first concern (separate filegroup allocation for the LOB data) can be addressed explicitly, as Mikael has already pointed out, by appropriately specifying the desired filegroup in the CREATE TABLE statement.

Your second concern is no longer a concern with SQL Server 2012, see Online Index Operations for Indexes containing LOB columns. Even prior to SQL Server 2012 you could reorganize indexes with LOBs without problems (and REORGANIZE is online). Given that a full index rebuild is a very expensive operation (an online rebuild must be done at the table/index level, there is no partition online rebuild options), are you sure you want to complicate the design to accommodate for something that is, on one hand, seldom required, and on the other hand, will be available when you upgrade to SQL 2012?



Related Topics



Leave a reply



Submit