Does Size of a Varchar Column Matter When Used in Queries

Does size of a VARCHAR column matter when used in queries

Yes, the length of varchar affects estimation of the query, memory that will be allocated for internal operation (for example for sorting) and as consequence resources of CPU. You can reproduce it with the following simple example.

1.Create two tables:

create table varLenTest1
(
a varchar(100)
)

create table varLenTest2
(
a varchar(8000)
)

2. Fill both of them with some data:

declare @i int
set @i = 20000

while (@i > 0)
begin
insert into varLenTest1 (a) values (cast(NEWID() as varchar(36)))
set @i = @i - 1
end

3. Execute the following queries with "include actual execution plan":

select a from varLenTest1 order by a OPTION (MAXDOP 1) ;
select a from varLenTest2 order by a OPTION (MAXDOP 1) ;

If you inspect execution plans of these queries, you can see that estimated IO cost and estimated CPU cost is very different:
Sample Image

Best practices for SQL varchar column length

No DBMS I know of has any "optimization" that will make a VARCHAR with a 2^n length perform better than one with a max length that is not a power of 2.

I think early SQL Server versions actually treated a VARCHAR with length 255 differently than one with a higher maximum length. I don't know if this is still the case.

For almost all DBMS, the actual storage that is required is only determined by the number of characters you put into it, not the max length you define. So from a storage point of view (and most probably a performance one as well), it does not make any difference whether you declare a column as VARCHAR(100) or VARCHAR(500).

You should see the max length provided for a VARCHAR column as a kind of constraint (or business rule) rather than a technical/physical thing.

For PostgreSQL the best setup is to use text without a length restriction and a CHECK CONSTRAINT that limits the number of characters to whatever your business requires.

If that requirement changes, altering the check constraint is much faster than altering the table (because the table does not need to be re-written)

The same can be applied for Oracle and others - in Oracle it would be VARCHAR(4000) instead of text though.

I don't know if there is a physical storage difference between VARCHAR(max) and e.g. VARCHAR(500) in SQL Server. But apparently there is a performance impact when using varchar(max) as compared to varchar(8000).

See this link (posted by Erwin Brandstetter as a comment)

Edit 2013-09-22

Regarding bigown's comment:

In Postgres versions before 9.2 (which was not available when I wrote the initial answer) a change to the column definition did rewrite the whole table, see e.g. here. Since 9.2 this is no longer the case and a quick test confirmed that increasing the column size for a table with 1.2 million rows indeed only took 0.5 seconds.

For Oracle this seems to be true as well, judging by the time it takes to alter a big table's varchar column. But I could not find any reference for that.

For MySQL the manual says "In most cases, ALTER TABLE makes a temporary copy of the original table". And my own tests confirm that: running an ALTER TABLE on a table with 1.2 million rows (the same as in my test with Postgres) to increase the size of a column took 1.5 minutes. In MySQL however you can not use the "workaround" to use a check constraint to limit the number of characters in a column.

For SQL Server I could not find a clear statement on this but the execution time to increase the size of a varchar column (again the 1.2 million rows table from above) indicates that no rewrite takes place.

Edit 2017-01-24

Seems I was (at least partially) wrong about SQL Server. See this answer from Aaron Bertrand that shows that the declared length of a nvarchar or varchar columns makes a huge difference for the performance.

Does VARCHAR size limit matter?

In general, for a VARCHAR field, the amount of data stored in each field determines its footprint on the disk rather than the maximum size (unlike a CHAR field which always has the same footprint).

There is an upper limit on the total data stored within all fields of an index of 900 bytes (900 byte index size limit in character length).

The larger you make the field, the more likely people will try to use for purposes other than what you intended - and the greater the screen real-estate required to show the value - so its good practice to try to pick the right size, rather than assuming that if you make it as large as possible it will save you having to revisit the design.

Importance of varchar length in MySQL table

No, in the sense that if the values you're storing in that column are always (say) less than 50 characters, declaring the column as varchar(50) or varchar(200) has the same performance.

What size to pick for a (n)varchar column?

I can only speak for Oracle. A VARCHAR2(50) and a VARCHAR2(255) take up exactly the same amount of space and perform identically, if you enter the value 'SMITH'.

However, the reason why it is generally not a good idea to go around declaring all your textual columns as VARCHAR2(4000) is that column length is, effectively, another constraint. And constraints are database implementation of business rules, so they are definitely something that should be defined on the database side of things.

As a for-example. You define a CHECK constraint on a column so that the values it can accept are only 'Y' and 'N'. That saves your application from having to deal with 'y' and 'n' or even '1' and '0'. The check constraint ensures your data conforms to expected standards. Your application code can then make valid assumptions about the nature of the data it has to deal with.

Column length definition is in the same boat. You declare something to be a VARCHAR2(10) because you don't want it accepting an entry of 'ABC123ZYX456' (for whatever reason!)

In Australia, I define STATE columns to be a varchar2(3) because I don't want people typing in 'New South Wales' or 'South Australia'. The column definition pretty much forces them to be entered as 'NSW' and 'SA'. In that sense, a VARCHAR2(3) is almost as much a check constraint as actually specifying a CHECK IN ('NSW','SA','VIC' etc) constraint.

In short, proper column lengths are a way of encoding business rules. They're another form of constraint. They bring all the advantages of constraints (and suffer from many of the same drawbacks). And they ensure, to a small extent, a degree of 'data cleanliness' that "proper" constraints help with, too.

I don't buy the argument, either, that it's best to stick these sorts of things in the client app because it's easier to change there. You have 20,000 people using an app, that's 20,000 updates. You have one database, that's one update. The 'easier to change the client app' argument, if true, would potentially mean the database just gets treated as a giant bit bucket with all the clever logic being handled in client code. It's a big discussion to have, but since all RDBMSes let you define constraints and so on in the database itself, it's pretty clear that there's at least a worthwhile case to be made that such fundamental logic belongs in the backend.

is there an advantage to varchar(500) over varchar(8000)?

From a processing standpoint, it will not make a difference to use varchar(8000) vs varchar(500). It's more of a "good practice" kind of thing to define a maximum length that a field should hold and make your varchar that length. It's something that can be used to assist with data validation. For instance, making a state abbreviation be 2 characters or a postal/zip code as 5 or 9 characters. This used to be a more important distinction for when your data interacted with other systems or user interfaces where field length was critical (e.g. a mainframe flat file dataset), but nowadays I think it's more habit than anything else.

Is there any downside to giving extra size/length to your database columns?

It takes more time and more disk transfers to load larger data items into memory. Defining large maximum sizes for columns increases the size of table rows. For many DBMS servers, table rows are the items transferred. So defining columns that are too fat does slow things down.

This effect is minimal for VARCHAR items. But VARCHAR is quite a bit slower than data types like integers. Eight byte integers take four times as much time to transfer as two byte integers. So, if a database is being designed for ultimate performance, limiting data columns to the range actually required will speed things up.The extent of this effect depends on whether the disk channel is a bottleneck or not.

Another possible bottleneck is the channel that links the server with the client, often a network channel. Bottlenecking in this channel can be reduced by queries that don't ask for data that will never be used, but there's a trade off here between asking for data only when you need it and making too many round trips.

There's also a trade off between designing for optimal performance and over designing in the anticipation of changing requirements.

Is varchar(MAX) always preferable?

There is a very good article on this subject by SO User @Remus Rusanu. Here is a snippit that I've stolen but I suggest you read the whole thing:

The code path that handles the MAX types (varchar, nvarchar and
varbinary) is different from the code path that handles their
equivalent non-max length types. The non-max types can internally be
represented as an ordinary pointer-and-length structure. But the max
types cannot be stored internally as a contiguous memory area, since
they can possibly grow up to 2Gb. So they have to be represented by a
streaming interface, similar to COM’s IStream. This carries over to
every operation that involves the max types, including simple
assignment and comparison, since these operations are more complicated
over a streaming interface. The biggest impact is visible in the code
that allocates and assign max-type variables (my first test), but the
impact is visible on every operation.

In the article he shows several examples that demonstrate that using varchar(n) typically improves performance.

You can find the entire article here.



Related Topics



Leave a reply



Submit