How to Write a Query to Extract Individual Changes from Snapshots of Data

How can I write a query to extract individual changes from snapshots of data?

Here's a working sample that uses UNPIVOT. It's based on my answer to my question Better way to Partially UNPIVOT in Pairs in SQL

This has some nice features.

Adding additional fields is easy. Just add values to the SELECT and UNPIVOT clause. You don't have to add additional UNION clauses
The where clause WHERE curr.value <> prev.value never changes regardless of how many fields are added.
The performance is surprisingly fast.
Its portable to Current versions of Oracle if you need that

SQL

Declare @Snapshots as table(
Sequence int,
DateTaken      datetime,
[id] int,
field1 varchar(20),
field2 int)

INSERT INTO @Snapshots VALUES 

      (1,    '2011-01-01',      1,     'Red',          2),
      (2,    '2011-01-01',      2,     'Blue',        10),
      (3,    '2011-02-01',      1,     'Green',        2),
      (4,    '2011-03-01',      1,     'Green' ,       3),
      (5,    '2011-03-01',      2,     'Purple',       2),
      (6,    '2011-04-01',      1,     'Yellow',       2)

;WITH Snapshots (Sequence, DateTaken, ID, Field1, Field2, _Index)
AS
(
    SELECT Sequence, DateTaken, ID, Field1, Field2, ROW_NUMBER() OVER (ORDER BY ID, Sequence) _Index
    FROM @Snapshots
)
,  data as(
SELECT
     c._Index
    , c.DateTaken
    ,  c.ID
    , cast(c.Field1  as varchar(max)) Field1
    , cast(p.Field1  as varchar(max))Field1_Previous
    , cast(c.Field2   as varchar(max))Field2
    , cast(p.Field2  as varchar(max)) Field2_Previous 

FROM Snapshots c
JOIN Snapshots p ON p.ID = c.ID AND (p._Index + 1) = c._Index
)

, fieldsToRows 
     AS (SELECT DateTaken, 
                id,
                _Index,
                value,
                field

         FROM   data p UNPIVOT (value FOR field IN (field1, field1_previous, 
                                                        field2, field2_previous) ) 
                AS unpvt
        ) 
SELECT 
    curr.DateTaken,
    curr.ID,
    curr.field,
    prev.value previous,
    curr.value 'current'

FROM 
        fieldsToRows curr 
        INNER  JOIN  fieldsToRows prev
        ON curr.ID = prev.id
            AND curr._Index = prev._Index 
            AND curr.field + '_Previous' = prev.field
WHERE 
    curr.value <> prev.value

Output

DateTaken               ID          field     previous current
----------------------- ----------- --------- -------- -------
2011-02-01 00:00:00.000 1           Field1    Red      Green
2011-03-01 00:00:00.000 1           Field2    2        3
2011-04-01 00:00:00.000 1           Field1    Green    Yellow
2011-04-01 00:00:00.000 1           Field2    3        2
2011-03-01 00:00:00.000 2           Field1    Blue     Purple
2011-03-01 00:00:00.000 2           Field2    10       2

How to recreate old snapshot using field history table in Bigquery

I actually identified a way to rollback (virtually, not on actual table) all the updates made after given specific date. Following are the steps followed:

Create dummy tables:

WITH
Table_A AS
(
SELECT 1 As ID, '2020-6-28' as created_date, 10 as qty, 100 as value
Union ALL
SELECT 2 As ID, '2020-5-29' as created_date, 20 as qty, 200 as value),

Table_A_field_history AS
(
SELECT 'xyz' id,'2020-07-29'    created_date,'12345'    created_by,'qty'    field,'10'  new_value,'200' old_value,'1'   A_id
UNION ALL
SELECT 'abc' id,'2020-07-24'    created_date,'12345'    created_by,'qty'    field,'20'  new_value,'10' old_value,'2'    A_id
UNION ALL
SELECT 'xyz' id,'2020-07-29'    created_date,'12345'    created_by,'value'  field,'100' new_value,'2000' old_value,'1'  A_id
UNION ALL
SELECT 'abc' id,'2020-07-24'    created_date,'12345'    created_by,'value'  field,'200' new_value,'5000' old_value,'2'  A_id
UNION ALL

SELECT 'xyz' id,'2020-06-29'    created_date,'12345'    created_by,'qty'    field,'200' new_value,'' old_value,'1'  A_id
UNION ALL
SELECT 'abc' id,'2020-05-30'    created_date,'12345'    created_by,'qty'    field,'10'  new_value,'' old_value,'2'  A_id
UNION ALL
SELECT 'xyz' id,'2020-06-29'    created_date,'12345'    created_by,'value'  field,'2000'    new_value,'' old_value,'1'  A_id
UNION ALL
SELECT 'abc' id,'2020-05-30'    created_date,'12345'    created_by,'value'  field,'5000'    new_value,'' old_value,'2'  A_id

),

Step 1. Create date cte to filter data based on given date:

`date_spine 
AS

(
SELECT * FROM UNNEST(GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 Day)) AS as_of_date 
),`

Step 2. Above created date cte can be used as a Spine for our query, cross join to map as_of_date with all the changes made in the history table.

    date_changes
AS
(
SELECT DISTINCT
date.as_of_date,
hist.A_id 
FROM Table_A_field_history hist CROSS JOIN date_spine date
),

Step 3. Now we have as_of_date mapped to all historical transactions, now we can get max of change date.

most_recent_changes AS (
  SELECT
    dc.as_of_date,
    dc.A_id ,
    MAX(fh.created_date) AS created_date,
  FROM date_changes dc
  LEFT JOIN Table_A_field_history AS fh
    ON dc.A_id   = fh.A_id  
  WHERE CAST(fh.created_date AS DATE) <= dc.as_of_date
  GROUP BY  dc.as_of_date,
    dc.A_id 
),

Step 4. Now mapping max change date with actual created_date and history table

past_changes AS (
  SELECT
    mr.as_of_date,
    mr.A_id,
    mr.created_date,
    a.id AS entry_id,
    a.created_by AS created_by_id,
    CASE WHEN a.field='qty' THEN a.new_value ELSE '' END AS qty,
    CASE WHEN a.field='value' THEN a.new_value ELSE '' END AS value,
  FROM most_recent_changes AS mr
  LEFT JOIN Table_A_field_history AS a
    ON mr.A_id  = a.A_id 
    AND mr.created_date = a.created_date
  WHERE a.id IS NOT NULL
)

Step 5. Now we can use as_of_date to get historical state of Table A.

Select *
From past_changes x
WHERE x.as_of_date = '2020-07-29'

Do we extract snapshot of data or updated data if data is being updated in the database table?

The data source scans over the table, you would get the record version at time of read.

The time of read could be anytime in that 5 hour window. You could reduce the impact by adding an order clause to your source component so that the latest data gets read and transferred last. This would ensure the data you expect to get the most updates during those 5 hours to catch the most upserts

If the upsert updates a record that has already been copied over in that 5 hour window then you will miss the update

Ultimately you would need to look at a incremental load design if you need to catch every single upsert

How do I create a moving snapshot of pass holder information in my database

I ended up creating a second table to hold the snapshot information in conjunction with the original table. The latter was modified to add a RecordID column to act as a primary key and a foreign key in the former. My additional (former) table looks like:

SubScriptionSnapshotData
    SubscriptionRecordID int
    Year smallint
    YearElement smallint
    ElementType varchar(15)
    Status varchar(15)

So each record in this table, points to a record from the original table and notes the moving snapshot information for it. Essentially I can record the year and month of the snapshot and the status of the subscription in the snapshot.

By using (YearElement, ElementType) instead of simply Month, I can create snapshots by month or week or whatever. For example, April is represented as (4, 'Month'). A snapfrom from week 32 could be represented as (32, 'Week'). Status is simply 'New', 'Expired' or 'Current'.

How to Write a Query to Extract Individual Changes from Snapshots of Data

How can I write a query to extract individual changes from snapshots of data?

How to recreate old snapshot using field history table in Bigquery

Do we extract snapshot of data or updated data if data is being updated in the database table?

How do I create a moving snapshot of pass holder information in my database

Related Topics

Leave a reply