Any Downsides of Using Data Type "Text" For Storing Strings

Any bad affect if I use TEXT data-type to store a number

Storing numbers in a text column is a very bad idea. You lose a lot of advantages when you do that:

  • you can't prevent storing invalid numbers (e.g. 'foo')
  • Sorting will not work the way you want to ('10' is "smaller" than '2')
  • it confuses everybody looking at your data model.

I want to store a date value I usually use TEXT

That is another very bad idea. Mainly because of the same reasons you shouldn't be storing a number in a text column. In addition to completely wrong dates ('foo') you can't prevent "invalid" dates either (e.g. February, 31st). And then there is the sorting thing, and the comparison with > and <, and the date arithmetic....

Drawbacks of storing an integer as a string in a database

Unless you really need the features of an integer (that is, the ability to do arithmetic), then it is probably better for you to store the product IDs as strings. You will never need to do anything like add two product IDs together, or compute the average of a group of product IDs, so there is no need for an actual numeric type.

It is unlikely that storing product IDs as strings will cause a measurable difference in performance. While there will be a slight increase in storage size, the size of a product ID string is likely to be much smaller than the data in the rest of your database row anyway.

Storing product IDs as strings today will save you much pain in the future if the data provider decides to start using alphabetic or symbol characters. There is no real downside.

Are there disadvantages to using a generic varchar(255) for all text-based fields?

In storage, VARCHAR(255) is smart enough to store only the length you need on a given row, unlike CHAR(255) which would always store 255 characters.

But since you tagged this question with MySQL, I'll mention a MySQL-specific tip: as rows are copied from the storage engine layer to the SQL layer, VARCHAR fields are converted to CHAR to gain the advantage of working with fixed-width rows. So the strings in memory become padded out to the maximum length of your declared VARCHAR column.

When your query implicitly generates a temporary table, for instance while sorting or GROUP BY, this can use a lot of memory. If you use a lot of VARCHAR(255) fields for data that doesn't need to be that long, this can make the temporary table very large.

You may also like to know that this "padding out" behavior means that a string declared with the utf8 character set pads out to three bytes per character even for strings you store with single-byte content (e.g. ascii or latin1 characters). And likewise utf8mb4 character set causes the string to pad out to four bytes per character in memory.

So a VARCHAR(255) in utf8 storing a short string like "No opinion" takes 11 bytes on disk (ten lower-charset characters, plus one byte for length) but it takes 765 bytes in memory, and thus in temp tables or sorted results.

I have helped MySQL users who unknowingly created 1.5GB temp tables frequently and filled up their disk space. They had lots of VARCHAR(255) columns that in practice stored very short strings.

It's best to define the column based on the type of data that you intend to store. It has benefits to enforce application-related constraints, as other folks have mentioned. But it has the physical benefits to avoid the memory waste I described above.

It's hard to know what the longest postal address is, of course, which is why many people choose a long VARCHAR that is certainly longer than any address. And 255 is customary because it is the maximum length of a VARCHAR for which the length can be encoded with one byte. It was also the maximum VARCHAR length in MySQL older than 5.0.

Best data type for storing strings in SQL Server?

nvarchar stores unicode character data which is required if you plan to store non-English names. If it's a web application, I highly recommend using nvarchar even if you don't plan on being international. The downside is that it consumes twice as much space, 16-bits per character for nvarchar and 8-bits per character for varchar.

Is there a downside by chosing ntext as datatype for all text columns?

NTEXT is being deprecated for a start, so you should use NVARCHAR(MAX) instead.

You should always try to use the smallest datatype possible for a column. If you do need to support more than 4000 characters in a field, then you'll need to use NVARCHAR(MAX). If you don't need to support more than 4000 characters, then use NVARCHAR(n).

I believe NTEXT would always be stored out of row, incurring an overhead when querying. NVARCHAR(MAX) can be stored in row if possible. If it can't fit in row, then SQL Server will push it off row. See this MSDN article.

Edit:

For NVARCHAR, the maximum supported explicit size is 4000. After that, you need to use MAX which takes you up to 2^31-1 bytes.

For VARCHAR, the maximum supported explicit size is 8000 before you need to switch to MAX.

Difference between text and varchar (character varying)

There is no difference, under the hood it's all varlena (variable length array).

Check this article from Depesz: http://www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/

A couple of highlights:

To sum it all up:

  • char(n) – takes too much space when dealing with values shorter than n (pads them to n), and can lead to subtle errors because of adding trailing
    spaces, plus it is problematic to change the limit
  • varchar(n) – it's problematic to change the limit in live environment (requires exclusive lock while altering table)
  • varchar – just like text
  • text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name

The article does detailed testing to show that the performance of inserts and selects for all 4 data types are similar. It also takes a detailed look at alternate ways on constraining the length when needed. Function based constraints or domains provide the advantage of instant increase of the length constraint, and on the basis that decreasing a string length constraint is rare, depesz concludes that one of them is usually the best choice for a length limit.



Related Topics



Leave a reply



Submit