How to Document Your Database Structure

How to document a database

In my experience, ER (or UML) diagrams aren't the most useful artifact - with a large number of tables, diagrams (especially reverse engineered ones) are often a big convoluted mess that nobody learns anything from.

For my money, some good human-readable documentation (perhaps supplemented with diagrams of smaller portions of the system) will give you the most mileage. This will include, for each table:

Descriptions of what the table means and how it's functionally used (in the UI, etc.)
Descriptions of what each attribute means, if it isn't obvious
Explanations of the relationships (foreign keys) from this table to others, and vice-versa
Explanations of additional constraints and / or triggers
Additional explanation of major views & procs that touch the table, if they're not well documented already

With all of the above, don't document for the sake of documenting - documentation that restates the obvious just gets in people's way. Instead, focus on the stuff that confused you at first, and spend a few minutes writing really clear, concise explanations. That'll help you think it through, and it'll massively help other developers who run into these tables for the first time.

As others have mentioned, there are a wide variety of tools to help you manage this, like Enterprise Architect, Red Gate SQL Doc, and the built-in tools from various vendors. But while tool support is helpful (and even critical, in bigger databases), doing the hard work of understanding and explaining the conceptual model of the database is the real win. From that perspective, you can even do it in a text file (though doing it in Wiki form would allow several people to collaborate on adding to that documentation incrementally - so, every time someone figures out something, they can add it to the growing body of documentation instantly).

How do you document your database structure?

MySQL allows comments on tables and rows. PostgreSQL does as well. From other answers, Oracle and MSSQL have comments too.

For me, a combination of UML diagram for a quick refresher on field names, types, and constraints, and an external document (TeX, but could be any format) with extended description of everything database-related - special values, field comments, access notes, whatever - works best.

Database design for a document sharing system

Imagine the case when instead of a record per permission, having as many records as many permissions, you would have values like:

id1,id2,...,idn

Now, let's see a few problems:

Number of privileges

With the change you intend to accomplish, you would have to count the number of commas (and add 1) for each permissions, which would cause you a LOT of headache. Currently you can just count the number of records and you can group the results as you want.

Adding a new permission to a document

Suppose you have to add an id to the permissions list of a document. In thi case you will have to search for the permission record of the document. If not found, then you will need to insert a record and ensure that your id is put there. If found, then you will have to update the same record. Currently it's just an insert if the permission does not exist, often you already have the information and do not even need to query.

Removing a permission from a document

To achieve this with the suggested schema you will need to find the permission record of the document and check whether it contains at least a comma. If so, then you will need to split the values and construct a character sequence which does not contain the id you want to remove and update the record and to delete the record if there was no comma. Currently you just have to remove a record.

Finding the documents a user has permission to

With your suggested schema you will need to do some pretty slow string operations, like

like '%,<theid>,%'

and ensure that all the strings start and end with comma for this query, but without actually modifying the data, so you will need to concatenate the actual value with a comma prefix and a comma suffix, making your code underperformant and highly difficult to read and maintain. Currently you can easily query for such values.

1NF

In short, your suggestion would worsen your design. You will need to ask yourself whether you really have a problem with your current schema and if so, what it is. If you have serious problems with performance, you will need to make sure you correctly identify the problem with performance and if this is the problem of performancee, then improve the schema, possibly by adding indexes to some columns. Your suggestion would deviate from 1NF and it is a bad idea.

Database design guidelines

Personally I don't like either approach; as they create separate tables for each doc type when the data in the tables is nearly typed the same. I would be more in favor of one table in these cases.

I would have a documents table with a type in it which indicates document type and is a forein key to DocumentType table which lists all document types.

Depending on how document types are chosen for project and department, I would have a DepartmentDocuments table which list the documents which each department COULD use. I would also have a ProjectDepartmentDocuments table which indicates the mandated and optional documents used for a particular project, along with their information specific to the department/project.

I also question the relationship from users to projects. It appears a project can only have one user. If that's the case your design is fine. However if a project can have many users, you should have a projectUsers table to handle many-to-many relationships.

Database Design for Document Library

Some suggestions:

Add LastUpdatedTime/LastUpdatedIP/LastUpdatedName to all tables
Consider a FolderFolders table
Break the Files table into a Documents table and a Files table. As the system evolves there is a good change that you may add the ability to store content that are not files.
Do not treat folders as a sub-class of file. These are two separate concepts and combining them makes evolving the system difficult.
Be careful if you implement the Unix link concept. Most Windows users are used to folder security securing files as well. In your system, if a file can be stored in multiple folders, the file could be secured in one folder, but unsecured in another.

Database Design for Revisions?

Do not put it all in one table with an IsCurrent discriminator attribute. This just causes problems down the line, requires surrogate keys and all sorts of other problems.
Design 2 does have problems with schema changes. If you change the Employees table you have to change the EmployeeHistories table and all the related sprocs that go with it. Potentially doubles you schema change effort.
Design 1 works well and if done properly does not cost much in terms of a performance hit. You could use an xml schema and even indexes to get over possible performance problems. Your comment about parsing the xml is valid but you could easily create a view using xquery - which you can include in queries and join to. Something like this...

CREATE VIEW EmployeeHistory
AS
, FirstName, , DepartmentId

SELECT EmployeeId, RevisionXML.value('(/employee/FirstName)[1]', 'varchar(50)') AS FirstName,

  RevisionXML.value('(/employee/LastName)[1]', 'varchar(100)') AS LastName,

  RevisionXML.value('(/employee/DepartmentId)[1]', 'integer') AS DepartmentId,

FROM EmployeeHistories

Standards for a database design document

I grappled with this same problem a couple years ago. I resorted to using Joe Celko's excellent book SQL Programming Style to help me put together our company's database standards document.

http://www.amazon.com/Celkos-Programming-Kaufmann-Management-Systems/dp/0120887975/ref=sr_1_7?s=books&ie=UTF8&qid=1289941772&sr=1-7

designing database to hold different metadata information

This is called the Observation Pattern.

Sample Image

Three objects, for the example

Book
Title = 'Gone with the Wind' 
Author = 'Margaret Mitchell'
ISBN   = '978-1416548898'

Cat
Name = 'Phoebe'
Color = 'Gray'
TailLength = 9 'inch'

Beer Bottle
Volume = 500 'ml'
Color = 'Green'

This is how tables may look like:

Entity
EntityID    Name            Description
   1        'Book'            'To read'
   2        'Cat'             'Fury cat' 
   3        'Beer Bottle'     'To ship beer in'

PropertyType
PropertyTypeID   Name        IsTrait         Description
   1            'Height'     'NO'       'For anything that has height' 
   2            'Width'      'NO'       'For anything that has width' 
   3            'Volume'     'NO'       'For things that can have volume'
   4            'Title'      'YES'      'Some stuff has title' 
   5            'Author'     'YES'      'Things can be authored' 
   6            'Color'      'YES'      'Color of things' 
   7            'ISBN'       'YES'      'Books would need this'
   8            'TailLength' 'NO'       'For stuff that has long tails'
   9            'Name'       'YES'      'Name of things'

Property
PropertyID   EntityID  PropertyTypeID      
    1           1              4     -- book, title
    2           1              5     -- book, author
    3           1              7     -- book, isbn
    4           2              9     -- cat, name
    5           2              6     -- cat, color
    6           2              8     -- cat, tail length
    7           3              3     -- beer bottle, volume
    8           3              6     -- beer bottle, color

Measurement
PropertyID     Unit       Value 
    6          'inch'       9          -- cat, tail length
    7          'ml'        500         -- beer bottle, volume

Trait
PropertyID         Value 
    1         'Gone with the Wind'     -- book, title
    2         'Margaret Mitchell'      -- book, author
    3         '978-1416548898'         -- book, isbn
    4         'Phoebe'                 -- cat, name
    5         'Gray'                   -- cat, color
    8         'Green'                  -- beer bottle, color

EDIT:

Jefferey raised a valid point (see comment), so I'll expand the answer.

The model allows for dynamic (on-fly) creation of any number of entites
with any type of properties without schema changes. Hovewer, this flexibility has a price -- storing and searching is slower and more complex than in a usual table design.

Time for an example, but first, to make things easier, I'll flatten the model into a view.

create view vModel as 
select 
      e.EntityId
    , x.Name  as PropertyName
    , m.Value as MeasurementValue
    , m.Unit
    , t.Value as TraitValue
from Entity           as e
join Property         as p on p.EntityID       = p.EntityID
join PropertyType     as x on x.PropertyTypeId = p.PropertyTypeId
left join Measurement as m on m.PropertyId     = p.PropertyId
left join Trait       as t on t.PropertyId     = p.PropertyId
;

To use Jefferey's example from the comment

with 
q_00 as ( -- all books
    select EntityID
    from vModel
    where PropertyName = 'object type'
      and TraitValue   = 'book' 
),
q_01 as ( -- all US books
    select EntityID
    from vModel as a
    join q_00   as b on b.EntityID = a.EntityID
    where PropertyName = 'publisher country'
      and TraitValue   = 'US' 
),
q_02 as ( -- all US books published in 2008
    select EntityID
    from vModel as a
    join q_01   as b on b.EntityID = a.EntityID
    where PropertyName     = 'year published'
      and MeasurementValue = 2008 
),
q_03 as ( -- all US books published in 2008 not discontinued
    select EntityID
    from vModel as a
    join q_02   as b on b.EntityID = a.EntityID
    where PropertyName = 'is discontinued'
      and TraitValue   = 'no' 
),
q_04 as ( -- all US books published in 2008 not discontinued that cost less than $50
    select EntityID
    from vModel as a
    join q_03   as b on b.EntityID = a.EntityID
    where PropertyName     = 'price'
      and MeasurementValue < 50 
      and MeasurementUnit  = 'USD'
)
select
      EntityID
    , max(case PropertyName when 'title' than TraitValue else null end) as Title
    , max(case PropertyName when 'ISBN'  than TraitValue else null end) as ISBN
from vModel as a
join q_04   as b on b.EntityID = a.EntityID
group by EntityID ;

This looks complicated to write, but on a closer inspection you may notice a pattern in CTEs.

Now suppose we have a standard fixed schema design where each object property has its own column.
The query would look something like:

select EntityID, Title, ISBN
from vModel
WHERE ObjectType       = 'book'
  and PublisherCountry = 'US'
  and YearPublished    = 2008
  and IsDiscontinued   = 'no'
  and Price            < 50
  and Currency         = 'USD'
;

Database: what design to have a document table referencing others tables

I have some tables which can have 0..* documents.

I am assuming that car, house, fruit, author ... are some tables having documents.
I will suggest having a parent table called Documentary for these element. This documentary table will have zero-or-one relation to all of these tables. A record in documentary table will show one and only one record in one of other tables.
Documentary table will have a type field showing its related record type.

Having FK of Documentary inside document table will solve the problem.

(In OOP terminology Documentary will be an abstract entity)

Sample Image