What Is the Best Datatype for Storing Urls in a MySQL Database

Best database field type for a URL

  1. Lowest common denominator max URL length among popular web browsers: 2,083 (Internet Explorer)

  1. http://dev.mysql.com/doc/refman/5.0/en/char.html

    Values in VARCHAR columns are variable-length strings. The length can be specified as a value from 0 to 255 before MySQL 5.0.3, and 0 to 65,535 in 5.0.3 and later versions. The effective maximum length of a VARCHAR in MySQL 5.0.3 and later is subject to the maximum row size (65,535 bytes, which is shared among all columns) and the character set used.

  1. So ...

    < MySQL 5.0.3 use TEXT

    or

    >= MySQL 5.0.3 use VARCHAR(2083)

What is the best datatype for storing URLs in a MySQL database?

If by "links" you mean links to web pages, I'm guessing you want to store URLs.

Since URLs are variable length strings the VARCHAR data type would seem the obvious choice.

What data type does a URL correspond to in MySQL?

Simply put the data type should be VARCHAR

URLs can contain any number of characters, and can be any length (within reason on the smaller end). A CHAR field can only contain the number of characters that is set in the table definition. A VariableCharacter (VARCHAR) field can contain a variable number of characters. So since not all URL's are of equal length you need the variability. You could make an argument to use a TEXT field if you needed to store really long URLs; however, for most use cases VARCHAR will suffice.

MySQL datatype for URL's

I would use a generic VARCHAR(255)

http://dev.mysql.com/doc/refman/5.5/en/char.html

How to store URLs in MySQL

According to the DNS spec the maximum length of the domain name is :

The DNS itself places only one restriction on the particular labels

that can be used to identify resource records. That one restriction

relates to the length of the label and the full name. The length of

any one label is limited to between 1 and 63 octets. A full domain

name is limited to 255 octets (including the separators).

255 * 3 = 765 < 767 (Just barely :-) )

However notice that each component can only be 63 characters long.

So I would suggest chopping the url into the component bits.

Using http://foo.example.com/a/really/long/path?with=lots&of=query¶meters=that&goes=on&forever&and=ever

Probably this would be adequate:

  • protocol flag ["http" -> 0 ] ( store "http" as 0, "https" as 1, etc. )
  • subdomain ["foo" ] ( 255 - 63 = 192 characters : I could subtract 2 more because min tld is 2 characters )
  • domain ["example"], ( 63 characters )
  • tld ["com"] ( 4 characters to handle "info" tld )
  • path [ "a/really/long/path" ] ( as long as you want -store in a separate table)
  • queryparameters ["with=lots&of=query¶meters=that&goes=on&forever&and=ever" ] ( store in a separate key/value table )
  • portnumber / authentication stuff that is rarely used can be in a separate keyed table if actually needed.

This gives you some nice advantages:

  • The index is only on the parts of the url that you need to search on (smaller index! )
  • queries can be limited to the various url parts ( find every url in the facebook domain for example )
  • anything url that has too long a subdomain/domain is bogus
  • easy to discard query parameters.
  • easy to do case insensitive domain name/tld searching
  • discard the syntax sugar ( "://" after protocol, "." between subdomain/domain, domain/tld, "/" between tld and path, "?" before query, "&" "=" in the query)
  • Avoids the major sparse table problem. Most urls will not have query parameters, nor long paths. If these fields are in a separate table then your main table will not take the size hit. When doing queries more records will fit into memory, therefore faster query performance.
  • (more advantages here).

What is the best column type for URL?

If you are prepared to always URL encode your URLs before you store them (an example turned up by Google was 中.doc URL encoding to %E4%B8%AD.doc) then you are safe sticking with varchar. If you want the non-ASCII characters in your URLs to remain readable in the database then I'd recommend nvarchar. If you don't want to be caught out, then go for nvarchar.

Since IE (the most restrictive of the mainstream browsers) doesn't support URLs longer than 2083 characters, then (apart from any considerations you might have on indexing or row length), you can cover most useful scenarios with nvarchar(2083).

What's the proper column type to save urls in MySQL?

Use TEXT, it's enough for every URL.

Note that with long URLs, you won't be able to create an index that covers the whole URL. If you need a UNIQUE index, you should calculate the URL hash, store the hash separately and index the hash instead.



Related Topics



Leave a reply



Submit