Do You Put Your Database Static Data into Source-Control? How

Do you use source control for your database items?

Must read Get your database under version control. Check the series of posts by K. Scott Allen.

When it comes to version control, the database is often a second or even third-class citizen. From what I've seen, teams that would never think of writing code without version control in a million years-- and rightly so-- can somehow be completely oblivious to the need for version control around the critical databases their applications rely on. I don't know how you can call yourself a software engineer and maintain a straight face when your database isn't under exactly the same rigorous level of source control as the rest of your code. Don't let this happen to you. Get your database under version control.

How should you build your database from source control?

Here are some some answers to your questions:

  1. Should both test and production environments be built from source control? YES
    • Should both be built using automation - or should production by built by copying objects from a stable, finalized test environment?
    • Automation for both. Do NOT copy data between the environments
    • How do you deal with potential differences between test and production environments in deployment scripts?
    • Use templates, so that actually you would produce different set of scripts for each environment (ex. references to external systems, linked databases, etc)
    • How do you test that the deployment scripts will work as effectively against production as they do in test?
    • You test them on pre-production environment: test deployment on exact copy of production environment (database and potentially other systems)
  2. What types of objects should be version controlled?

    • Just code (procedures, packages, triggers, java, etc)?
    • Indexes?
    • Constraints?
    • Table Definitions?
    • Table Change Scripts? (eg. ALTER scripts)
    • Everything?
    • Everything, and:
      • Do not forget static data (lookup lists etc), so you do not need to copy ANY data between environments
      • Keep only current version of the database scripts (version controlled, of course), and
      • Store ALTER scripts: 1 BIG script (or directory of scripts named liked 001_AlterXXX.sql, so that running them in natural sort order will upgrade from version A to B)
  3. Which types of objects shouldn't be version controlled?

    • Sequences?
    • Grants?
    • User Accounts?
    • see 2. If your users/roles (or technical user names) are different between environments, you can still script them using templates (see 1.)
  4. How should database objects be organized in your SCM repository?

    • How do you deal with one-time things like conversion scripts or ALTER scripts?
    • see 2.
    • How do you deal with retiring objects from the database?
    • deleted from DB, removed from source control trunk/tip
    • Who should be responsible for promoting objects from development to test level?
    • dev/test/release schedule
    • How do you coordinate changes from multiple developers?
    • try NOT to create a separate database for each developer. you use source-control, right? in this case developers change the database and check-in the scripts. to be completely safe, re-create the database from the scripts during nightly build
    • How do you deal with branching for database objects used by multiple systems?
    • tough one: try to avoid at all costs.
  5. What exceptions, if any, can be reasonable made to this process?

    • Security issues?
    • do not store passwords for test/prod. you may allow it for dev, especially if you have automated daily/nightly DB rebuilds
    • Data with de-identification concerns?
    • Scripts that can't be fully automated?
    • document and store with the release info/ALTER script
  6. How can you make the process resilient and enforceable?

    • To developer error?
    • tested with daily build from scratch, and compare the results to the incremental upgrade (from version A to B using ALTER). compare both resulting schema and static data
    • To unexpected environmental issues?
    • use version control and backups
    • compare the PROD database schema to what you think it is, especially before deployment. SuperDuperCool DBA may have fixed a bug that was never in your ticket system :)
    • For disaster recovery?
  7. How do you convince decision makers that the benefits of DB-SCM truly justify the cost?

    • Anecdotal evidence?
    • Industry research?
    • Industry best-practice recommendations?
    • Appeals to recognized authorities?
    • Cost/Benefit analysis?
    • if developers and DBAs agree, you do not need to convince anyone, I think (Unless you need money to buy a software like a dbGhost for MSSQL)
  8. Who should "own" database objects in this model?

    • Developers?
    • DBAs?
    • Data Analysts?
    • More than one?
    • Usually DBAs approve the model (before check-in or after as part of code review). They definitely own performance related objects. But in general the team own it [and employer, of course :)]

How do you store static data in your SQL Server Database Project in VS 2012

You can use this approach:

  • Put your reference data into XML files, one per table
  • Add XML files with reference data to your database project
  • Use a Post-Deployment script to extract the data from XML and merge it into your tables

Here is a more detailed description of each step, illustrated with an example. Let's say that you need to initialize a table of countries that has this structure:

create table Country (
CountryId uniqueidentifier NOT NULL,
CountryCode varchar(2) NOT NULL,
CountryName varchar(254) NOT NULL
)

Create a new folder called ReferenceData under your database project. It should be a sibling folder of the Schema Objects and Scripts.

Add a new XML file called Country.xml to the ReferenceData folder. Populate the file as follows:

<countries>
<country CountryCode="CA" CountryName="Canada"/>
<country CountryCode="MX" CountryName="Mexico"/>
<country CountryCode="US" CountryName="United States of America"/>
</countries>

Find Script.PostDeployment.sql, and add the following code to it:

DECLARE @h_Country int

DECLARE @xmlCountry xml = N'
:r ..\..\ReferenceData\Country.xml
'

EXEC sp_xml_preparedocument @h_Country OUTPUT, @xmlCountry

MERGE Country AS target USING (
SELECT c.CountryCode, c.CountryName
FROM OPENXML(@h_Country, '/countries/country', 1)
WITH (CountryCode varchar(2), CountryName varchar(254)) as c) AS source (CountryCode, CountryName)
ON (source.CountryCode = target.CountryCode)
WHEN MATCHED THEN
UPDATE SET CountryName = source.CountryName
WHEN NOT MATCHED BY TARGET THEN
INSERT (CountryId, CountryCode, CountryName) values (newid(), source.CountryCode, source.CountryName)
;

I tried this solution only in VS 2008, but it should be agnostic to your development environment.

RedGate SQL Source Control Data

I tried the solution involving Migration Scripts, but the order in which they are executed was still causing problems.

The solution I opted for in the end was to have a separate database that was linked to source control, with all tables that contain system data or mixed data to have their records source controlled.

I then develop in another database and push schema and data changes to the source controlled database to commit them. That way, the source controlled data tables never contain 'user' data.

I found the DLM Automation Cmdlets to be lacking in features and instead opted for running SQL Compare and SQL Data Compare using the command line to perform our CI. This hasn't been without the occasional hiccup, but these are solved using the aforementioned tools and manually pushing from SC into our CI database.

SQL Server Data Tools - How do I preserve data along with schema?

There are two ways to preserve static data and publish it with a database.

  1. Have a "reference" database with static data populated. At the time of publishing a new instance, SQL Server Data tools has a "Data Compare" tool which allows you to compare to live databases, and creates a custom script to update one database with data from the other.

  2. Create scripts that contain insert statements, and then run these scripts at publish time. SQL Server Data tools has two tools to assist in this.

    a. Open the data table (right click on SQL Server object explorer, and select "View Data"), and then click on the "Script" button at the top. It will create an insert script for all rows in the table. More on Comparing Data from MSDN

    b. Then take this created script, and add it to the Database Project as a "Post Deployment" script. When you create a publish script for the database, any Post Deployment scripts in the project are automatically included in the master script. More on post deployment scripts from MSDN

How to version control SQL Server databases?

We've just started doing the following on some of our projects, and it seems to work quite well, for populating "static" tables.

Our scripts follow a pattern where a temp table is constructed, and is then populated with what we want the real table to resemble. We only put human readable values here (i.e. we don't include IDENTITY/GUID columns). The remainder of the script takes the temp table and performs appropriate INSERT/UPDATE/DELETE statements to make the real table resemble the temp table. When we have to change this "static" data, all we have to update is the population of the temp table. This means that DIFFing between versions works as expected, and rollback scripts are as simple as getting a previous version from source control.

The INSERT/UPDATE/DELETEs only have to be written once. In fact, our scripts are slightly more complicated, and have two sets of validation run before the actual DML statements. One set validate the temp table data (i.e. that we're not going to violate any constraints by attempting to make the database resemble the temp table). The other validate the temp table and the target database (i.e. that foreign keys are available).

How do you manage static data for microservices?

Making a standard for how to do this sort of goes against the reason for making microservices, i.e. that you can adapt each microservice to the context it exists in.

That being said, Postgres, Mongo and MySQL all run scripts in /docker-entrypoint-initdb.d when initializing a fresh database instance. The scripts have to fit the database obviously, but it's a fairly standardized way of doing it.

They all have descriptions of how to do it on the image page on docker hub.

You can either get your scripts into the container by making a custom image that contains the scripts or you can map them into the directory using a docker-compose volume mapping.

There are some databases that don't have an easy way to initialize a new database. MSSQL comes to mind. In that case, you might have to handle it programmatically.

How do you track database changes in source control?

Use Visual studio database edition to script out your database. Works like a charm and you can use any Source control system, of course best if it has VS plugins. This tool has also a number of other useful features. Check them out here in this great blog post

http://www.vitalygorn.com/blog/post/2008/01/Handling-Database-easily-with-Visual-Studio-2008.aspx

or check out MSDN for the official documentation



Related Topics



Leave a reply



Submit