What are the use cases for selecting CHAR over VARCHAR in SQL?
The general rule is to pick CHAR if all rows will have close to the same length. Pick VARCHAR (or NVARCHAR) when the length varies significantly. CHAR may also be a bit faster because all the rows are of the same length.
It varies by DB implementation, but generally, VARCHAR (or NVARCHAR) uses one or two more bytes of storage (for length or termination) in addition to the actual data. So (assuming you are using a one-byte character set) storing the word "FooBar"
- CHAR(6) = 6 bytes (no overhead)
- VARCHAR(100) = 8 bytes (2 bytes of overhead)
- CHAR(10) = 10 bytes (4 bytes of waste)
The bottom line is CHAR can be faster and more space-efficient for data of relatively the same length (within two characters length difference).
Note: Microsoft SQL has 2 bytes of overhead for a VARCHAR. This may vary from DB to DB, but generally, there is at least 1 byte of overhead needed to indicate length or EOL on a VARCHAR.
As was pointed out by Gaven in the comments: Things change when it comes to multi-byte characters sets, and is a is case where VARCHAR becomes a much better choice.
A note about the declared length of the VARCHAR: Because it stores the length of the actual content, then you don't waste unused length. So storing 6 characters in VARCHAR(6), VARCHAR(100), or VARCHAR(MAX) uses the same amount of storage. Read more about the differences when using VARCHAR(MAX). You declare a maximum size in VARCHAR to limit how much is stored.
In the comments AlwaysLearning pointed out that the Microsoft Transact-SQL docs seem to say the opposite. I would suggest that is an error or at least the docs are unclear.
Any benefit of uses CHAR over VARCHAR?
- VARCHAR
varchar stores variable-length character string. it can require less storage than fixed-length types because it uses only as much space as it needs.
varchar also uses 1 or 2 extra bytes to record the value's length. for example varchar(10) will use up to 11 bytes of storage space. varchar helps performance because it saves space. however because the rows are variable length, they can grow when you update them, which can cause extra work. if a row grows and no longer fits in its original location, the behavior is storage engine-dependent...
- CHAR
char is fixed-length , mysql always allocates enough space for the specified number of characters. When storing a CHAR value, MySQL removes any trailing spaces. Values are padded with spaces as needed for comparisons.
char is useful if you want to store very short strings, or if all the values are nearly
the same length. For example, CHAR is a good choice for MD5 values for user passwords,
which are always the same length.
char is also better than VARCHAR for data that’s changed frequently, because a fixed-length row is not prone to fragmentation.
Why should I use char instead of varchar?
Prefer VARCHAR.
In olden days of tight storage, it mattered for space. Nowadays, disk storage is cheap, but RAM and IO are still precious. VARCHAR is IO and cache friendly; it allows you to more densely pack the db buffer cache with data rather than wasted literal "space" space, and for the same reason, space padding imposes an IO overhead.
The upside to CHAR() used to be reduced row chaining on frequently updated records. When you update a field and the value is larger than previously allocated, the record may chain. This is manageable, however; databases often support a "percent free" setting on your table storage attributes that tells the DB how much extra space to preallocate per row for growth.
VARCHAR is almost always preferable because space padding requires you to be aware of it and code differently. Different databases handle it differently. With VARCHAR you know your field holds only exactly what you store in it.
I haven't designed a schema in over a decade with CHAR.
What's the difference between VARCHAR and CHAR?
VARCHAR
is variable-length.
CHAR
is fixed length.
If your content is a fixed size, you'll get better performance with CHAR
.
See the MySQL page on CHAR and VARCHAR Types for a detailed explanation (be sure to also read the comments).
CHAR vs. VARCHAR and the ramifications when joining
Trailing space is ignored in string comparisons in SQL Server. There is no need to RTRIM
it yourself (which would make the condition unsargable)
is there an advantage to varchar(500) over varchar(8000)?
From a processing standpoint, it will not make a difference to use varchar(8000) vs varchar(500). It's more of a "good practice" kind of thing to define a maximum length that a field should hold and make your varchar that length. It's something that can be used to assist with data validation. For instance, making a state abbreviation be 2 characters or a postal/zip code as 5 or 9 characters. This used to be a more important distinction for when your data interacted with other systems or user interfaces where field length was critical (e.g. a mainframe flat file dataset), but nowadays I think it's more habit than anything else.
What is the advantage of using varbinary over varchar here?
I believe the expectation is that the varbinary data will generally consume fewer bytes (5), than the varchar one (10 or 11, I think) per portion of the original string, and so, for very large numbers of components, or comparisons to occur, it should be more efficient.
But I'd recommend that if you were looking to use either solution, that you implement both (they're quite short), and try some profiling against your real data (and query patterns), to see if there are practical differences (I wouldn't expect so).
(Crafty Steal): And as Martin points out, the binary comparisons will be more efficient, since it won't involve all of the code that's there to deal with collations. :-)
Why is VARCHAR slower than CHAR on updating rows?
Rows are laid out with the fixed size columns first, at fixed offsets from the start of the row. Then (after some important bytes in the middle) the variable sized data is placed at the end. Because it's variable sized, the actual offset to the data cannot be computed for the whole table (like the fixed data) but has to be computed on a row-by-row basis.
And if a varchar(5)
1 is storing NYC
and is then asked to store NYCX
, it may find that there's not a spare byte at the end of NYC
- it's being used for another column - so the row has to expand by moving everything after one byte further along to make space for the extra byte.
1I notice in one of your examples you failed to specify a length. Please drill into yourself that that's a bad habit
Related Topics
SQL Updating from an Inner Join
Which SQL Query Is Faster? Filter on Join Criteria or Where Clause
Db Design to Use Sub-Type or Not
Polymorphism in SQL Database Tables
Conversion of a Varchar Data Type to a Datetime Data Type Resulted in an Out-Of-Range Value
How to Select a Substring in Oracle SQL Up to a Specific Character
SQL Server: Should I Use Information_Schema Tables Over Sys Tables
Check If the String Contains Accented Characters in SQL
Select a Column in SQL Not in Group By
Why Can't I Seem to Force Oracle 11G to Consume More Cpus for a Single SQL Query
MySQL - How to Front Pad Zip Code with "0"
The New Pivot Function in Bigquery
Generate a Range of Dates Using SQL
How to Get Oracle Create Table Statement in SQL*Plus
How to Find Column Names for All Tables in All Databases in SQL Server