Is This a Good Way to Model Address Information in a Relational Database

Is this a good way to model address information in a relational database?

I actually use this as one of my interview questions. The following is a good place to start:

Addresses
---------
AddressId (PK)
Street1
... (etc)

and

AddressTypes
------------
AddressTypeId
AddressTypeName

and

UserAddresses (substitute "Company", "Account", whatever for Users)
-------------
UserId
AddressTypeId
AddressId

This way, your addresses are totally unaware of how they are being used, and your entities (Users, Accounts) don't directly know anything about addresses either. It's all up to the linking tables you create (UserAddresses in this case, but you can do whatever fits your model).

One piece of somewhat contradictory advice for a potentially large database: go ahead and put a "primary" address directly on your entities (in the Users table in this case) along with a "HasMoreAddresses" field. It seems icky compared to just using the clean design above, but can simplify coding for typical use cases, and the denormalization can make a big difference for performance.

Best way to model Customer -- Address

I tend towards first approach for all the usual reasons of normalisation. This approach also makes it easier to perform data cleansing on mailing details.

If you are possibly going to allow multiple addresses (mail, residential, etc) or wish to be able to use effective dates, consider this approach


Customer (id, phys_address_id)
Cust_address_type (cust_id, mail_address_id, address_type, start_date, end_date)
Address (id, street, city, etc.)

How should international geographical addresses be stored in a relational database?

I will summarize my thoughts from my blog post - A lesson in address storage (on archive.org).

On my current project [I work for a logistics company] we're storing international addresses. I've done research on addresses all over the world in the design of this portion of the database. There's a lot of different formats. In the Western world we tend to use a fairly uniform format - a few differences but they're mostly:

  • Street Number - Numeric
  • House or Building Name - [VarChar - in the UK some houses/buildings are identified by name, not by number]
  • Street Number Suffix [VarChar, although in most cases, Char(1) would suffice]
    • A, B etc
  • Street Name [VarChar]
  • Street Type [VarChar or Int if you have a StreetTypes table]
    • So far, I've found 262 unique types in the English speaking world, there are likely more, and don't forget other languages i.e. Strasse, Rue etc.
  • Street Direction [VarChar(2)]
    • N, E, S, W, NE, SE, NW, SW
  • Address Type [VarChar or Int if you have an AddressTypes table]
    • PO Box
    • Apartment
    • Building
    • Floor
    • Office
    • Suite
    • etc...
  • Address Type Identifier [VarChar]
    • i.e. Box Number, Apartment Number, Floor Number remember apartment numbers and offices sometimes have alphanumeric info - like 1A
  • Local Municipality [VarChar or Int if you have a Municipalities table]
    • For instance, if your hamlet/village appears in the address before the town.
  • City/Town [VarChar or Int if you have a Cities table]
  • Governing District [VarChar or Int if you have a Districts table]
    • State (U.S.)
    • Province (Canada)
    • Federal District (Mexico)
    • County (U.K.)
    • etc...
  • Postal Area [VarChar]
    • Zip (U.S.)
    • Postal Code (Canada, Mexico)
    • Postcode (U.K.)
  • Country [VarChar or Int if you have a Countries table]

This appears to cover most countries but the ordering of the fields may be displayed differently. You can find a list of display formats at http://www.bitboost.com/ref/international-address-formats.html#Formats

For instance, in many countries, the postal code falls before the city name and the street number falls after the street name. In Canada, U.S. and the U.K. the street number precedes the street name and the postal code (or ZIP) comes after the city name.

In answer to your question about separation of the addresses into different countries, I wouldn't suggest it, it will just make life harder in other areas - for instance reporting. The format I've provided covers all the addresses in our logistics database which covers USA, Canada, Mexico and the UK without any problems. It also covers all of our European, Chinese, Japanese and Malaysian addresses. I can't speak for other countries but I haven't yet had to store an address from a country that these fields won't support.

I don't suggest going with the Address1, Address2, Address3 format suggested by others and seen in many databases because parsing address information out of an alphanumeric string isn't as simple as it might first seem - especially if data isn't entered correctly, due to misinformation, typo, misspelling etc. If you separate your fields you can use distance algorithms to check for likely meaning, use probability to check street name against postal code and street number or to check province and city against street name etc. Try doing any of that when you've got a string denoting your whole street address. It's not a trivial matter by any stretch of the imagination.

QA on an address database is a headache, period. The easiest way to simplify your life in this area is to make sure all the fields hold only a single piece of information that can be automatically verified as correct at entry time. Probability, distance algorithms and regular expressions can check for validity of entry and provide feedback to the user as to what their mistake was and suggest suitable corrections.

One caveat to be aware of is roads with names that are also street types - if you're covering Canada you need to be aware of "Avenue Road" in Toronto which will trip you up big time if you're using the Address1, 2, 3 format. This likely occurs in other places too, although I'm not aware of them - this single instance was enough for me to scream WTF?!

Is it best to store address information with the order table or in it's own table?

It sounds like what you want (ideally) is for the order to be linked to the address as it was at the time of the order.

There are two main ways to implement this - either the order includes all of the address information within its own structure, or you implement the address table as a temporal table, and ensure the order has appropriate datetime information to locate the correct address rows when queried.

Which one you implement depends on how you wish to handle updates - both models may have issues, depending on whether in-flight orders should/should not be updated.

Modeling Geographic Locations in an Relational Database

You might want to have a look at Freebase.com as a site that's had some open discussion about what a "location" means and what it means when a location is included in another. These sorts of questions can generate a lot of discussion.

For example, there is the obvious "geographic nesting", but there are less obvious logical nestings. For example, in a strictly geographic sense, Vatican City is nested within Italy. But it's not nested politically. Similarly, if your user is located in a research center that belongs to a university, but isn't located on the University's property, do you model that relationship or not?

SQL Database Design Best Practice (Addresses)

You're on the right track by breaking address out into its own table. I'd add a couple of additional suggestions.

  1. Consider taking the Address FK columns out of the Customers/Orders tables and creating junction tables instead. In other words, treat Customers/Addresses and Orders/Addresses as many-to-many relationships in your design now so you can easily support multiple addresses in the future. Yes, this means introducing more tables and joins, but the flexibility you gain is well worth the effort.

  2. Consider creating lookup tables for city, state and country entities. The city/state/country columns of the address table then consist of FKs pointing to these lookup tables. This allows you to guarantee consistent spellings across all addresses and gives you a place to store additional metadata (e.g., city population) if needed in the future.

Is there common street addresses database design for all addresses of the world?

It is possible to represent addresses from lots of different countries in a standard set of fields. The basic idea of a named access route (thoroughfare) which the named or numbered buildings are located on is fairly standard, except in China sometimes. Other near universal concepts include: naming the settlement (city/town/village), which can be generically referred to as a locality; naming the region and assigning an alphanumeric postcode. Note that postcodes, also known as zip codes, are purely numeric only in some countries. You will need lots of fields if you really want to be generic.

The Universal Postal Union (UPU) provides address data for lots of countries in a standard format. Note that the UPU format holds all addresses (down to the available field precision) for a whole country, it is therefore relational. If storing customer addresses, where only a small fraction of all possible addresses will be stored, its better to use a single table (or flat format) containing all fields and one address per row.

A reasonable format for storing addresses would be as follows:

  • Address Lines 1-4
  • Locality
  • Region
  • Postcode (or zipcode)
  • Country

Address lines 1-4 can hold components such as:

  • Building
  • Sub-Building
  • Premise number (house number)
  • Premise Range
  • Thoroughfare
  • Sub-Thoroughfare
  • Double-Dependent Locality
  • Sub-Locality

Frequently only 3 address lines are used, but this is often insufficient. It is of course possible to require more lines to represent all addresses in the official format, but commas can always be used as line separators, meaning the information can still be captured.

Usually analysis of the data would be performed by locality, region, postcode and country and these elements are fairly easy for users to understand when entering data. This is why these elements should be stored as separate fields. However, don't force users to supply postcode or region, they may not be used locally.

Locality can be unclear, particularly the distinction between map locality and postal-locality. The postal locality is the one deemed by a postal authority which may sometimes be a nearby large town. However, the postcode will usually resolve any problems or discrepancies there, to allow correct delivery even if the official post-locality is not used.



Related Topics



Leave a reply



Submit