Is a one column table good design?
Yes, it's certainly good design to design a table in such a way as to make it most efficient. "Bad RDBMS Design" is usually centered around inefficiency.
However, I have found that most cases of single column design could benefit from an additional column. For example, State Codes can typically have the Full State name spelled out in a second column. Or a blacklist can have notes associated. But, if your design really does not need that information, then it's perfectly ok to have the single column.
Is having a single column table in SQL Server considered a bad practice?
Almost every table that I create has the following columns:
- Primary key (generally a number and named after the table is
Id
after it). - CreatedAt
- CreatedBy
- CreatedOn (the server where the row was created)
One use for a single column table is to effectively implement a check constraint where the code can dynamically validate values. I would typically implement this using a reference table with a proper foreign key relationships and the above columns.
Another instance would be a number table, which just stores integer values.
In general, I would say that it isn't a good idea. There may be specific cases such as a number table where it is fine.
What's the better database design: more tables or more columns?
I have a few fairly simple rules of thumb I follow when designing databases, which I think can be used to help make decisions like this....
- Favor normalization. Denormalization is a form of optimization, with all the requisite tradeoffs, and as such it should be approached with a YAGNI attitude.
- Make sure that client code referencing the database is decoupled enough from the schema that reworking it doesn't necessitate a major redesign of the client(s).
- Don't be afraid to denormalize when it provides a clear benefit to performance or query complexity.
- Use views or downstream tables to implement denormalization rather than denormalizing the core of the schema, when data volume and usage scenarios allow for it.
The usual result of these rules is that the initial design will favor tables over columns, with a focus on eliminating redundancy. As the project progresses and denormalization points are identified, the overall structure will evolve toward a balance that compromises with limited redundancy and column proliferation in exchange for other valuable benefits.
MySQL: multiple tables or one table with many columns?
Any time information is one-to-one (each user has one name and password), then it's probably better to have it one table, since it reduces the number of joins the database will need to do to retrieve results. I think some databases have a limit on the number of columns per table, but I wouldn't worry about it in normal cases, and you can always split it later if you need to.
If the data is one-to-many (each user has thousands of rows of usage info), then it should be split into separate tables to reduce duplicate data (duplicate data wastes storage space, cache space, and makes the database harder to maintain).
You might find the Wikipedia article on database normalization interesting, since it discusses the reasons for this in depth:
Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.
Denormalization is also something to be aware of, because there are cases where repeating data is better (since it reduces the amount of work the database needs to do when reading data). I'd highly recommend making your data as normalized as possible to start out, and only denormalize if you're aware of performance problems in specific queries.
When I should use one to one relationship?
1 to 0..1
The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.
A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.
1 to 1
The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".
Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.
Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.
Separate tables may allow more granular security.
These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.
See also: Why use a 1-to-1 relationship in database design?
Database Design: A proper table design for large number of column values
Option 2, with ID, TrialID, StatisticID, StatisticValue
With proper indexing, it will perform fairly well (you can use PIVOT to get the values out on columns fairly easily in SQL Server 2005).
When the statistics are different datatypes, the problem becomes more interesting, but in many cases, I just up-size the datatype (sometimes ints just end up in the money field). For other non-compatible types, the best design in my mind is really separate tables for each type, but I've also seen multiple columns or a free-form text column.
DB design for one-to-one single column table
Your option 1 is the best design choice. Create the two tables along these lines:
- jobs (job_id PK, title_id FK not null, start_date, end_date, ...)
- job_titles (title_id PK, title)
The PKs should have clustered indexes; jobs.title_id and job_titles should have nonclustered or secondary indexes; job_titles.title should have a unique constraint.
This relationship can be modeled as 1-to-1 or 1-to-many (one title, many jobs). To enforce 1-to-1 modeling, apply a unique constraint to jobs.title_id. However, you should not model this as a 1-to-1 relationship, because it's not. You even say so yourself: "The same job title will be used by multiple jobs in the DB" and "A single job only ever has one title." An entry in the jobs table represents a certain position held by a certain user during a certain period of time. Because this is a 1-to-many relationship, a separate table is the correct way to model the data.
Here's a simple example of why this is so. Your company only has one CEO, but what happens if the current one steps down and the board appoints a new one? You'll have two entries in jobs which both reference the same title, even though there's only one CEO "position" and the two users' job date ranges don't overlap. If you enforce a 1-to-1 relationship, modeling this data is impossible.
Why these particular indexes and constraints?
- The ID columns are PKs and clustered indexes for hopefully obvious reasons; you use these for joins
- jobs.title_id is an FK for hopefully obvious data integrity reasons
- jobs.title_id is not null because every job should have a title
- jobs.title_id needs an index in order to speed up joins
- job_titles.title has an index because you've indicated you'll be querying based on this column (though I wouldn't query in such a fashion, especially since you've said there will be many titles; see below)
- job_titles.title has a unique constraint because there's no reason to have duplicates of the same title. You can (and will) have multiple jobs with the same title, but you don't need two entries for "CEO" in job_titles. Enforcing this uniqueness will preserve data integrity useful for reporting purposes (e.g. plot the productivity of IT's web division based on how many "web developer" jobs are filled)
Remarks:
Job title is going to be used as part of an auto-complete field so I'll be using a query to fetch results.
As I mentioned before, use key-value pairs here. Fetch a list of them into memory in your app, and query that list for your autocomplete values. Then send the ID off to the DB for your actual SQL query. The queries will perform better that way; even with indexes, searching integers is generally quicker than searching strings.
You've said that titles will be user created. Put some input sanitation and validation process in place, because you don't want redundant entries like "WEB DEVELOPER", "web developer", "web developer", etc. Validation should occur at both the application and DB levels; the unique constraint is part (but all) of this. Prodigitalson's remark about separate machine and display columns is related to this issue.
Related Topics
SQL Count Total Number of Rows Whilst Using Limit
Create Table Permission Denied in Database 'Master'
Use of Xml.Modify to Insert Parameters into Specific Element of an Xml Column
How to Get SQL Error in Stored Procedure
Visiting a Directed Graph as If It Were an Undirected One, Using a Recursive Query
Doing "Points of Interest Along a Route" in Google Maps
Oracle: Getting Maximum Value of a Group
SQL Server, Converting Ntext to Nvarchar(Max)
Confusion with Oracle Connect By
Determine Varchar Content in Nvarchar Columns
Query Excel Worksheet in Ms-Access Vba (Using Adodb Recordset)
What's the Simplest Way to Import an SQLite SQL File into a Web SQL Database
Simple SQL Lite Table/Import Question
Dropping Multiple Partitions in Impala/Hive