Is Varchar(Max) Always Preferable

Is varchar(MAX) always preferable?

There is a very good article on this subject by SO User @Remus Rusanu. Here is a snippit that I've stolen but I suggest you read the whole thing:

The code path that handles the MAX types (varchar, nvarchar and
varbinary) is different from the code path that handles their
equivalent non-max length types. The non-max types can internally be
represented as an ordinary pointer-and-length structure. But the max
types cannot be stored internally as a contiguous memory area, since
they can possibly grow up to 2Gb. So they have to be represented by a
streaming interface, similar to COM’s IStream. This carries over to
every operation that involves the max types, including simple
assignment and comparison, since these operations are more complicated
over a streaming interface. The biggest impact is visible in the code
that allocates and assign max-type variables (my first test), but the
impact is visible on every operation.

In the article he shows several examples that demonstrate that using varchar(n) typically improves performance.

You can find the entire article here.

Why not use varchar(max)?

My answer to this, isn't about the usage of Max, as much as it is about the reason for VARCHAR(max) vs TEXT.

In my book; first of all, Unless you can be absolutely certain that you'll never encode anything but english text and people won't refer to names of foreign locations, then you should use NVARCHAR or NTEXT.

Secondly, it's what the fields allow you to do.

TEXT is hard to update in comparison to VARCHAR, but you get the advantage of Full Text Indexing and lots of clever things.

On the other hand, VARCHAR(MAX) has some ambiguity, if the size of the cell is < 8000 chars, it will be treated as Row data. If it's greater, it will be treated as a LOB for storage purposes.
Because you can't know this without querying RBAR, this may have optimization strategies for places where you need to be sure about your data and how many reads it costs.

Otherwise, if your usage is relatively mundane and you don't expect to have problems with the size of data (IE you're using .Net and therefore don't have to be concerned about the size of your string/char* objects) then using VARCHAR(max) is fine.

sql varchar(max) vs varchar(fix)

MSDN

  • Use varchar when the sizes of the column data entries vary
    considerably.
  • Use varchar(max) when the sizes of the column data entries vary
    considerably, and the size might exceed 8,000 bytes.

When the the length is specified in declaring a VARCHAR variable or column, the maximum length allowed is 8000. If the length is greater than 8000, you have to use the MAX specifier as the length. If a length greater than 8000 is specified, the following error will be encountered (assuming that the length specified is 10000):

The size (10000) given to the type 'varchar' exceeds the maximum allowed for any data type (8000).

UPDATE :-
I found a link which I would like to share:-

Here

There is not much performance difference between Varchar[(n)] and Varchar(Max). Varchar[(n)] provides better performance results compared to Varchar(Max). If we know that data to be stored in the column or variable is less than or equal to 8000 characters, then using this Varchar[(n)] data type provides better performance compared to Varchar(Max).Example: When I ran the below script by changing the variable @FirstName type to Varchar(Max) then for 1 million assignments it is consistently taking double time than when we used data type as Varchar(50) for variable @FirstName.

DECLARE @FirstName VARCHAR(50), @COUNT INT=0, @StartTime DATETIME = GETDATE()
WHILE(@COUNT < 1000000)
BEGIN
SELECT @FirstName = 'Suraj', @COUNT = @COUNT +1
END
SELECT DATEDIFF(ms,@StartTime,GETDATE()) 'Time Taken in ms'
GO

Are there any disadvantages to always using nvarchar(MAX)?

Same question was asked on MSDN Forums:

  • Varchar(max) vs Varchar(255)

From the original post (much more information there):

When you store data to a VARCHAR(N) column, the values are physically stored in the same way. But when you store it to a VARCHAR(MAX) column, behind the screen the data is handled as a TEXT value. So there is some additional processing needed when dealing with a VARCHAR(MAX) value. (only if the size exceeds 8000)

VARCHAR(MAX) or NVARCHAR(MAX) is considered as a 'large value type'. Large value types are usually stored 'out of row'. It means that the data row will have a pointer to another location where the 'large value' is stored...

Best practices for SQL varchar column length

No DBMS I know of has any "optimization" that will make a VARCHAR with a 2^n length perform better than one with a max length that is not a power of 2.

I think early SQL Server versions actually treated a VARCHAR with length 255 differently than one with a higher maximum length. I don't know if this is still the case.

For almost all DBMS, the actual storage that is required is only determined by the number of characters you put into it, not the max length you define. So from a storage point of view (and most probably a performance one as well), it does not make any difference whether you declare a column as VARCHAR(100) or VARCHAR(500).

You should see the max length provided for a VARCHAR column as a kind of constraint (or business rule) rather than a technical/physical thing.

For PostgreSQL the best setup is to use text without a length restriction and a CHECK CONSTRAINT that limits the number of characters to whatever your business requires.

If that requirement changes, altering the check constraint is much faster than altering the table (because the table does not need to be re-written)

The same can be applied for Oracle and others - in Oracle it would be VARCHAR(4000) instead of text though.

I don't know if there is a physical storage difference between VARCHAR(max) and e.g. VARCHAR(500) in SQL Server. But apparently there is a performance impact when using varchar(max) as compared to varchar(8000).

See this link (posted by Erwin Brandstetter as a comment)

Edit 2013-09-22

Regarding bigown's comment:

In Postgres versions before 9.2 (which was not available when I wrote the initial answer) a change to the column definition did rewrite the whole table, see e.g. here. Since 9.2 this is no longer the case and a quick test confirmed that increasing the column size for a table with 1.2 million rows indeed only took 0.5 seconds.

For Oracle this seems to be true as well, judging by the time it takes to alter a big table's varchar column. But I could not find any reference for that.

For MySQL the manual says "In most cases, ALTER TABLE makes a temporary copy of the original table". And my own tests confirm that: running an ALTER TABLE on a table with 1.2 million rows (the same as in my test with Postgres) to increase the size of a column took 1.5 minutes. In MySQL however you can not use the "workaround" to use a check constraint to limit the number of characters in a column.

For SQL Server I could not find a clear statement on this but the execution time to increase the size of a varchar column (again the 1.2 million rows table from above) indicates that no rewrite takes place.

Edit 2017-01-24

Seems I was (at least partially) wrong about SQL Server. See this answer from Aaron Bertrand that shows that the declared length of a nvarchar or varchar columns makes a huge difference for the performance.

Good practices of SQL Server: nvarchar(max) performance

Best practice is to perform the appropriate data analysis BEFORE you design your table. Without context, one can assume that a description does not consist of pages and pages of text, so the "max" choice is probably not appropriate. As an additional consideration when choosing varchar(max), remember that you typically need to provide support for displaying such values in an application. If you do not intend to design a GUI to do so, then the choice is probably not appropriate.

And one more caveat - it is generally futile to attempt to future-proof your schema by choosing datatypes that exceed your foreseeable needs.

Using varchar(MAX) vs TEXT on SQL Server

The VARCHAR(MAX) type is a replacement for TEXT. The basic difference is that a TEXT type will always store the data in a blob whereas the VARCHAR(MAX) type will attempt to store the data directly in the row unless it exceeds the 8k limitation and at that point it stores it in a blob.

Using the LIKE statement is identical between the two datatypes. The additional functionality VARCHAR(MAX) gives you is that it is also can be used with = and GROUP BY as any other VARCHAR column can be. However, if you do have a lot of data you will have a huge performance issue using these methods.

In regard to if you should use LIKE to search, or if you should use Full Text Indexing and CONTAINS. This question is the same regardless of VARCHAR(MAX) or TEXT.

If you are searching large amounts of text and performance is key then you should use a Full Text Index.

LIKE is simpler to implement and is often suitable for small amounts of data, but it has extremely poor performance with large data due to its inability to use an index.



Related Topics



Leave a reply



Submit