Any bad affect if I use TEXT data-type to store a number
Storing numbers in a text column is a very bad idea. You lose a lot of advantages when you do that:
- you can't prevent storing invalid numbers (e.g. 'foo')
- Sorting will not work the way you want to ('10' is "smaller" than '2')
- it confuses everybody looking at your data model.
I want to store a date value I usually use TEXT
That is another very bad idea. Mainly because of the same reasons you shouldn't be storing a number in a text column. In addition to completely wrong dates ('foo') you can't prevent "invalid" dates either (e.g. February, 31st). And then there is the sorting thing, and the comparison with >
and <
, and the date arithmetic....
Drawbacks of storing an integer as a string in a database
Unless you really need the features of an integer (that is, the ability to do arithmetic), then it is probably better for you to store the product IDs as strings. You will never need to do anything like add two product IDs together, or compute the average of a group of product IDs, so there is no need for an actual numeric type.
It is unlikely that storing product IDs as strings will cause a measurable difference in performance. While there will be a slight increase in storage size, the size of a product ID string is likely to be much smaller than the data in the rest of your database row anyway.
Storing product IDs as strings today will save you much pain in the future if the data provider decides to start using alphabetic or symbol characters. There is no real downside.
Are there disadvantages to using a generic varchar(255) for all text-based fields?
In storage, VARCHAR(255)
is smart enough to store only the length you need on a given row, unlike CHAR(255)
which would always store 255 characters.
But since you tagged this question with MySQL, I'll mention a MySQL-specific tip: as rows are copied from the storage engine layer to the SQL layer, VARCHAR
fields are converted to CHAR
to gain the advantage of working with fixed-width rows. So the strings in memory become padded out to the maximum length of your declared VARCHAR
column.
When your query implicitly generates a temporary table, for instance while sorting or GROUP BY
, this can use a lot of memory. If you use a lot of VARCHAR(255)
fields for data that doesn't need to be that long, this can make the temporary table very large.
You may also like to know that this "padding out" behavior means that a string declared with the utf8 character set pads out to three bytes per character even for strings you store with single-byte content (e.g. ascii or latin1 characters). And likewise utf8mb4 character set causes the string to pad out to four bytes per character in memory.
So a VARCHAR(255)
in utf8 storing a short string like "No opinion" takes 11 bytes on disk (ten lower-charset characters, plus one byte for length) but it takes 765 bytes in memory, and thus in temp tables or sorted results.
I have helped MySQL users who unknowingly created 1.5GB temp tables frequently and filled up their disk space. They had lots of VARCHAR(255)
columns that in practice stored very short strings.
It's best to define the column based on the type of data that you intend to store. It has benefits to enforce application-related constraints, as other folks have mentioned. But it has the physical benefits to avoid the memory waste I described above.
It's hard to know what the longest postal address is, of course, which is why many people choose a long VARCHAR
that is certainly longer than any address. And 255 is customary because it is the maximum length of a VARCHAR
for which the length can be encoded with one byte. It was also the maximum VARCHAR
length in MySQL older than 5.0.
Best data type for storing strings in SQL Server?
nvarchar stores unicode character data which is required if you plan to store non-English names. If it's a web application, I highly recommend using nvarchar even if you don't plan on being international. The downside is that it consumes twice as much space, 16-bits per character for nvarchar and 8-bits per character for varchar.
Is there a downside by chosing ntext as datatype for all text columns?
NTEXT is being deprecated for a start, so you should use NVARCHAR(MAX) instead.
You should always try to use the smallest datatype possible for a column. If you do need to support more than 4000 characters in a field, then you'll need to use NVARCHAR(MAX). If you don't need to support more than 4000 characters, then use NVARCHAR(n).
I believe NTEXT would always be stored out of row, incurring an overhead when querying. NVARCHAR(MAX) can be stored in row if possible. If it can't fit in row, then SQL Server will push it off row. See this MSDN article.
Edit:
For NVARCHAR, the maximum supported explicit size is 4000. After that, you need to use MAX which takes you up to 2^31-1 bytes.
For VARCHAR, the maximum supported explicit size is 8000 before you need to switch to MAX.
Difference between text and varchar (character varying)
There is no difference, under the hood it's all varlena
(variable length array).
Check this article from Depesz: http://www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/
A couple of highlights:
To sum it all up:
- char(n) – takes too much space when dealing with values shorter than
n
(pads them ton
), and can lead to subtle errors because of adding trailing
spaces, plus it is problematic to change the limit- varchar(n) – it's problematic to change the limit in live environment (requires exclusive lock while altering table)
- varchar – just like text
- text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name
The article does detailed testing to show that the performance of inserts and selects for all 4 data types are similar. It also takes a detailed look at alternate ways on constraining the length when needed. Function based constraints or domains provide the advantage of instant increase of the length constraint, and on the basis that decreasing a string length constraint is rare, depesz concludes that one of them is usually the best choice for a length limit.
Related Topics
How to Create a Parameterized SQL Query? Why Should I
Is There Any Rule of Thumb to Construct SQL Query from a Human-Readable Description
Rails 3 Query on Condition of an Association'S Count
SQL Join and Different Types of Joins
How to Use Parameters in Vba in the Different Contexts in Microsoft Access
Get Records With Max Value For Each Group of Grouped SQL Results
Xcode 4 and Core Data: How to Enable SQL Debugging
Explicit VS Implicit SQL Joins
Difference Between Lateral Join and a Subquery in Postgresql
Sort by Column Asc, But Null Values First
Stored Procedure That Automatically Delete Rows Older Than 7 Days in MySQL
How to Do 'Insert If Not Exists' in MySQL
Optimize Group by Query to Retrieve Latest Row Per User
When Do I Need to Use a Semicolon VS a Slash in Oracle Sql
How to Change the MySQL Root Password
How to Use Group by to Concatenate Strings in SQL Server