How Long Should SQL Email Fields Be

How long should SQL email fields be?

The theoretical limit is really long but do you really need worry about these long Email addresses? If someone can't login with a 100-char Email, do you really care? We actually prefer they can't.

Some statistical data may shed some light on the issue. We analyzed a database with over 10 million Email addresses. These addresses are not confirmed so there are invalid ones. Here are some interesting facts,

  1. The longest valid one is 89.
  2. There are hundreds longer ones up to the limit of our column (255) but they are apparently fake by visual inspection.
  3. The peak of the length distribution is at 19.
  4. There isn't long tail. Everything falls off sharply after 38.

We cleaned up the DB by throwing away anything longer than 40. The good news is that no one has complained but the bad news is not many records got cleaned out.

What is the optimal length for an email address in a database?

The maximum length of an email address is 254 characters.

Every email address is composed of two parts. The local part that comes before the '@' sign, and the domain part that follows it. In "user@example.com", the local part is "user", and the domain part is "example.com".

The local part must not exceed 64 characters and the domain part cannot be longer than 255 characters.

The combined length of the local + @ + domain parts of an email address must not exceed 254 characters. As described in RFC3696 Errata ID 1690.

I got the original part of this information from here

Acceptable field type and size for email address?

According to RFC 5321, forward and reverse path can be up to 256 chars long, so the email address can be up to 254 characters long. You're safe with using 255 chars.

Sql Table data type for email address?

It's good to go with NVARCHAR(320) - 64 characters for local part + @ + 255 for domain name.

You Use varchar(255) and nvarchar(255) Data type

List of standard lengths for database fields

W3C's recommendation:

If designing a form or database that will accept names from people
with a variety of backgrounds, you should ask yourself whether you
really need to have separate fields for given name and family name.

… Bear in mind that names in some cultures can be quite a lot longer
than your own. … Avoid limiting the field size for names in your
database
. In particular, do not assume that a four-character
Japanese name in UTF-8 will fit in four bytes – you are likely to
actually need 12.

https://www.w3.org/International/questions/qa-personal-names

For database fields, VARCHAR(255) is a safe default choice, unless you can actually come up with a good reason to use something else. For typical web applications, performance won't be a problem. Don't prematurely optimize.

What is the maximum length of a valid email address?

An email address must not exceed 254 characters.

This was accepted by the IETF following submitted erratum. A full diagnosis of any given address is available online. The original version of RFC 3696 described 320 as the maximum length, but John Klensin subsequently accepted an incorrect value, since a Path is defined as

Path = "<" [ A-d-l ":" ] Mailbox ">"

So the Mailbox element (i.e., the email address) has angle brackets around it to form a Path, which a maximum length of 254 characters to restrict the Path length to 256 characters or fewer.

The maximum length specified in RFC 5321 states:

The maximum total length of a reverse-path or forward-path is 256 characters.

RFC 3696 was corrected here.

People should be aware of the errata against RFC 3696 in particular. Three of the canonical examples are in fact invalid addresses.

I've collated a couple hundred test addresses, which you can find at http://www.dominicsayers.com/isemail

Storing Email Body in SQL Server database?

Generally I don't recommend building bulk email sending activities as there are a lot things to be done to avoid considering your email as spam

However if you decided to do it your self you need to decide the content of the emails, Is it text only, HTML that may contains embedded images,...

You can use varchar(max) for the field type. performance will not be a big issue however consider thinking about the retention policies

if you would like to save the email as file then you can use FILESTREAM which will provide you with better performance given that you use the SqlFileStream APIs

NVARCHAR(?) for Email addresses in SQL Server

I've always used 320 based on your latter calculation. It doesn't cost you anything to allow more*, unless people abuse it and stuff junk in there. It could cost you to allow less, as you'll have a frustrating users if they have legitimately longer e-mail addresses and now you'll have to go back and update schema, code, parameters etc. In the system I used to work with (an e-mail service provider), the longest e-mail address I came across naturally was about 120 characters - and it was clear they were just making a long e-mail address for grins.

* Not strictly true, since memory grant estimates are based on the assumption that varying-width columns are half-populated, so a wider column storing the same data can have lead to vastly different performance characteristics of certain queries.

And I've debated whether NVARCHAR is necessary for e-mail address. I've yet to come across an e-mail address with Unicode characters - I know the standard supports them, but so many existing systems do not, it would be pretty frustrating if that was your e-mail address.

And while it's true that NVARCHAR costs double the space, with SQL Server 2008 R2 you can benefit from Unicode compression, which basically treats all non-Unicode characters in an NVARCHAR column as ASCII, so you get those extra bytes back. Of course compression is only available in Enterprise+...

Another way to reduce space requirements is to use a central lookup table for all observed domain names, and store LocalPart and DomainID with the user, and store each unique domain name only once. Yes this makes for more cumbersome programming, but if you have 80,000 hotmail.com addresses, the cost is 80,0000 x 4 bytes instead of 80,000 x 11 bytes (or less with compression). If storage or I/O is your bottleneck, and not CPU, this is definitely an option worth investigating.

I wrote about this here:

  • Storing E-mail addresses more efficiently in SQL Server

What SQL column data type should be used to store email addresses?

Email addresses are allowed to have non-Unicode characters:
Are email addresses allowed to contain non-alphanumeric characters?

I would as a best practice create an nvarchar column for globalization/localization support in the future (because you never know). The extra space required for an nvarchar versus a varchar is very negligible.

As far as capturing every possible character for email validation is not easy, you could write a regex function to check for this as most high level languages are Unicode compliant.



Related Topics



Leave a reply



Submit