How to Write a Query to Extract Individual Changes from Snapshots of Data

How can I write a query to extract individual changes from snapshots of data?

Here's a working sample that uses UNPIVOT. It's based on my answer to my question Better way to Partially UNPIVOT in Pairs in SQL

This has some nice features.

  1. Adding additional fields is easy. Just add values to the SELECT and UNPIVOT clause. You don't have to add additional UNION clauses

  2. The where clause WHERE curr.value <> prev.value never changes regardless of how many fields are added.

  3. The performance is surprisingly fast.

  4. Its portable to Current versions of Oracle if you need that


SQL

Declare @Snapshots as table(
Sequence int,
DateTaken datetime,
[id] int,
field1 varchar(20),
field2 int)

INSERT INTO @Snapshots VALUES

(1, '2011-01-01', 1, 'Red', 2),
(2, '2011-01-01', 2, 'Blue', 10),
(3, '2011-02-01', 1, 'Green', 2),
(4, '2011-03-01', 1, 'Green' , 3),
(5, '2011-03-01', 2, 'Purple', 2),
(6, '2011-04-01', 1, 'Yellow', 2)

;WITH Snapshots (Sequence, DateTaken, ID, Field1, Field2, _Index)
AS
(
SELECT Sequence, DateTaken, ID, Field1, Field2, ROW_NUMBER() OVER (ORDER BY ID, Sequence) _Index
FROM @Snapshots
)
, data as(
SELECT
c._Index
, c.DateTaken
, c.ID
, cast(c.Field1 as varchar(max)) Field1
, cast(p.Field1 as varchar(max))Field1_Previous
, cast(c.Field2 as varchar(max))Field2
, cast(p.Field2 as varchar(max)) Field2_Previous

FROM Snapshots c
JOIN Snapshots p ON p.ID = c.ID AND (p._Index + 1) = c._Index
)

, fieldsToRows
AS (SELECT DateTaken,
id,
_Index,
value,
field

FROM data p UNPIVOT (value FOR field IN (field1, field1_previous,
field2, field2_previous) )
AS unpvt
)
SELECT
curr.DateTaken,
curr.ID,
curr.field,
prev.value previous,
curr.value 'current'

FROM
fieldsToRows curr
INNER JOIN fieldsToRows prev
ON curr.ID = prev.id
AND curr._Index = prev._Index
AND curr.field + '_Previous' = prev.field
WHERE
curr.value <> prev.value

Output

DateTaken               ID          field     previous current
----------------------- ----------- --------- -------- -------
2011-02-01 00:00:00.000 1 Field1 Red Green
2011-03-01 00:00:00.000 1 Field2 2 3
2011-04-01 00:00:00.000 1 Field1 Green Yellow
2011-04-01 00:00:00.000 1 Field2 3 2
2011-03-01 00:00:00.000 2 Field1 Blue Purple
2011-03-01 00:00:00.000 2 Field2 10 2

How to recreate old snapshot using field history table in Bigquery

I actually identified a way to rollback (virtually, not on actual table) all the updates made after given specific date. Following are the steps followed:

Create dummy tables:

WITH
Table_A AS
(
SELECT 1 As ID, '2020-6-28' as created_date, 10 as qty, 100 as value
Union ALL
SELECT 2 As ID, '2020-5-29' as created_date, 20 as qty, 200 as value),

Table_A_field_history AS
(
SELECT 'xyz' id,'2020-07-29' created_date,'12345' created_by,'qty' field,'10' new_value,'200' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-07-24' created_date,'12345' created_by,'qty' field,'20' new_value,'10' old_value,'2' A_id
UNION ALL
SELECT 'xyz' id,'2020-07-29' created_date,'12345' created_by,'value' field,'100' new_value,'2000' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-07-24' created_date,'12345' created_by,'value' field,'200' new_value,'5000' old_value,'2' A_id
UNION ALL

SELECT 'xyz' id,'2020-06-29' created_date,'12345' created_by,'qty' field,'200' new_value,'' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-05-30' created_date,'12345' created_by,'qty' field,'10' new_value,'' old_value,'2' A_id
UNION ALL
SELECT 'xyz' id,'2020-06-29' created_date,'12345' created_by,'value' field,'2000' new_value,'' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-05-30' created_date,'12345' created_by,'value' field,'5000' new_value,'' old_value,'2' A_id

),

Step 1. Create date cte to filter data based on given date:

`date_spine 
AS

(
SELECT * FROM UNNEST(GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 Day)) AS as_of_date
),`

Step 2. Above created date cte can be used as a Spine for our query, cross join to map as_of_date with all the changes made in the history table.

    date_changes
AS
(
SELECT DISTINCT
date.as_of_date,
hist.A_id
FROM Table_A_field_history hist CROSS JOIN date_spine date
),

Step 3. Now we have as_of_date mapped to all historical transactions, now we can get max of change date.

most_recent_changes AS (
SELECT
dc.as_of_date,
dc.A_id ,
MAX(fh.created_date) AS created_date,
FROM date_changes dc
LEFT JOIN Table_A_field_history AS fh
ON dc.A_id = fh.A_id
WHERE CAST(fh.created_date AS DATE) <= dc.as_of_date
GROUP BY dc.as_of_date,
dc.A_id
),

Step 4. Now mapping max change date with actual created_date and history table

past_changes AS (
SELECT
mr.as_of_date,
mr.A_id,
mr.created_date,
a.id AS entry_id,
a.created_by AS created_by_id,
CASE WHEN a.field='qty' THEN a.new_value ELSE '' END AS qty,
CASE WHEN a.field='value' THEN a.new_value ELSE '' END AS value,
FROM most_recent_changes AS mr
LEFT JOIN Table_A_field_history AS a
ON mr.A_id = a.A_id
AND mr.created_date = a.created_date
WHERE a.id IS NOT NULL
)

Step 5. Now we can use as_of_date to get historical state of Table A.

Select *
From past_changes x
WHERE x.as_of_date = '2020-07-29'

Do we extract snapshot of data or updated data if data is being updated in the database table?

The data source scans over the table, you would get the record version at time of read.

The time of read could be anytime in that 5 hour window. You could reduce the impact by adding an order clause to your source component so that the latest data gets read and transferred last. This would ensure the data you expect to get the most updates during those 5 hours to catch the most upserts

If the upsert updates a record that has already been copied over in that 5 hour window then you will miss the update

Ultimately you would need to look at a incremental load design if you need to catch every single upsert

How do I create a moving snapshot of pass holder information in my database

I ended up creating a second table to hold the snapshot information in conjunction with the original table. The latter was modified to add a RecordID column to act as a primary key and a foreign key in the former. My additional (former) table looks like:

SubScriptionSnapshotData
SubscriptionRecordID int
Year smallint
YearElement smallint
ElementType varchar(15)
Status varchar(15)

So each record in this table, points to a record from the original table and notes the moving snapshot information for it. Essentially I can record the year and month of the snapshot and the status of the subscription in the snapshot.

By using (YearElement, ElementType) instead of simply Month, I can create snapshots by month or week or whatever. For example, April is represented as (4, 'Month'). A snapfrom from week 32 could be represented as (32, 'Week'). Status is simply 'New', 'Expired' or 'Current'.



Related Topics



Leave a reply



Submit