How can I write a query to extract individual changes from snapshots of data?
Here's a working sample that uses UNPIVOT. It's based on my answer to my question Better way to Partially UNPIVOT in Pairs in SQL
This has some nice features.
Adding additional fields is easy. Just add values to the SELECT and UNPIVOT clause. You don't have to add additional UNION clauses
The where clause
WHERE curr.value <> prev.value
never changes regardless of how many fields are added.The performance is surprisingly fast.
Its portable to Current versions of Oracle if you need that
SQL
Declare @Snapshots as table(
Sequence int,
DateTaken datetime,
[id] int,
field1 varchar(20),
field2 int)
INSERT INTO @Snapshots VALUES
(1, '2011-01-01', 1, 'Red', 2),
(2, '2011-01-01', 2, 'Blue', 10),
(3, '2011-02-01', 1, 'Green', 2),
(4, '2011-03-01', 1, 'Green' , 3),
(5, '2011-03-01', 2, 'Purple', 2),
(6, '2011-04-01', 1, 'Yellow', 2)
;WITH Snapshots (Sequence, DateTaken, ID, Field1, Field2, _Index)
AS
(
SELECT Sequence, DateTaken, ID, Field1, Field2, ROW_NUMBER() OVER (ORDER BY ID, Sequence) _Index
FROM @Snapshots
)
, data as(
SELECT
c._Index
, c.DateTaken
, c.ID
, cast(c.Field1 as varchar(max)) Field1
, cast(p.Field1 as varchar(max))Field1_Previous
, cast(c.Field2 as varchar(max))Field2
, cast(p.Field2 as varchar(max)) Field2_Previous
FROM Snapshots c
JOIN Snapshots p ON p.ID = c.ID AND (p._Index + 1) = c._Index
)
, fieldsToRows
AS (SELECT DateTaken,
id,
_Index,
value,
field
FROM data p UNPIVOT (value FOR field IN (field1, field1_previous,
field2, field2_previous) )
AS unpvt
)
SELECT
curr.DateTaken,
curr.ID,
curr.field,
prev.value previous,
curr.value 'current'
FROM
fieldsToRows curr
INNER JOIN fieldsToRows prev
ON curr.ID = prev.id
AND curr._Index = prev._Index
AND curr.field + '_Previous' = prev.field
WHERE
curr.value <> prev.value
Output
DateTaken ID field previous current
----------------------- ----------- --------- -------- -------
2011-02-01 00:00:00.000 1 Field1 Red Green
2011-03-01 00:00:00.000 1 Field2 2 3
2011-04-01 00:00:00.000 1 Field1 Green Yellow
2011-04-01 00:00:00.000 1 Field2 3 2
2011-03-01 00:00:00.000 2 Field1 Blue Purple
2011-03-01 00:00:00.000 2 Field2 10 2
How to recreate old snapshot using field history table in Bigquery
I actually identified a way to rollback (virtually, not on actual table) all the updates made after given specific date. Following are the steps followed:
Create dummy tables:
WITH
Table_A AS
(
SELECT 1 As ID, '2020-6-28' as created_date, 10 as qty, 100 as value
Union ALL
SELECT 2 As ID, '2020-5-29' as created_date, 20 as qty, 200 as value),
Table_A_field_history AS
(
SELECT 'xyz' id,'2020-07-29' created_date,'12345' created_by,'qty' field,'10' new_value,'200' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-07-24' created_date,'12345' created_by,'qty' field,'20' new_value,'10' old_value,'2' A_id
UNION ALL
SELECT 'xyz' id,'2020-07-29' created_date,'12345' created_by,'value' field,'100' new_value,'2000' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-07-24' created_date,'12345' created_by,'value' field,'200' new_value,'5000' old_value,'2' A_id
UNION ALL
SELECT 'xyz' id,'2020-06-29' created_date,'12345' created_by,'qty' field,'200' new_value,'' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-05-30' created_date,'12345' created_by,'qty' field,'10' new_value,'' old_value,'2' A_id
UNION ALL
SELECT 'xyz' id,'2020-06-29' created_date,'12345' created_by,'value' field,'2000' new_value,'' old_value,'1' A_id
UNION ALL
SELECT 'abc' id,'2020-05-30' created_date,'12345' created_by,'value' field,'5000' new_value,'' old_value,'2' A_id
),
Step 1. Create date cte to filter data based on given date:
`date_spine
AS
(
SELECT * FROM UNNEST(GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 Day)) AS as_of_date
),`
Step 2. Above created date cte can be used as a Spine for our query, cross join to map as_of_date
with all the changes made in the history table.
date_changes
AS
(
SELECT DISTINCT
date.as_of_date,
hist.A_id
FROM Table_A_field_history hist CROSS JOIN date_spine date
),
Step 3. Now we have as_of_date
mapped to all historical transactions, now we can get max of change date.
most_recent_changes AS (
SELECT
dc.as_of_date,
dc.A_id ,
MAX(fh.created_date) AS created_date,
FROM date_changes dc
LEFT JOIN Table_A_field_history AS fh
ON dc.A_id = fh.A_id
WHERE CAST(fh.created_date AS DATE) <= dc.as_of_date
GROUP BY dc.as_of_date,
dc.A_id
),
Step 4. Now mapping max change date with actual created_date
and history table
past_changes AS (
SELECT
mr.as_of_date,
mr.A_id,
mr.created_date,
a.id AS entry_id,
a.created_by AS created_by_id,
CASE WHEN a.field='qty' THEN a.new_value ELSE '' END AS qty,
CASE WHEN a.field='value' THEN a.new_value ELSE '' END AS value,
FROM most_recent_changes AS mr
LEFT JOIN Table_A_field_history AS a
ON mr.A_id = a.A_id
AND mr.created_date = a.created_date
WHERE a.id IS NOT NULL
)
Step 5. Now we can use as_of_date
to get historical state of Table A.
Select *
From past_changes x
WHERE x.as_of_date = '2020-07-29'
Do we extract snapshot of data or updated data if data is being updated in the database table?
The data source scans over the table, you would get the record version at time of read.
The time of read could be anytime in that 5 hour window. You could reduce the impact by adding an order clause to your source component so that the latest data gets read and transferred last. This would ensure the data you expect to get the most updates during those 5 hours to catch the most upserts
If the upsert updates a record that has already been copied over in that 5 hour window then you will miss the update
Ultimately you would need to look at a incremental load design if you need to catch every single upsert
How do I create a moving snapshot of pass holder information in my database
I ended up creating a second table to hold the snapshot information in conjunction with the original table. The latter was modified to add a RecordID column to act as a primary key and a foreign key in the former. My additional (former) table looks like:
SubScriptionSnapshotData
SubscriptionRecordID int
Year smallint
YearElement smallint
ElementType varchar(15)
Status varchar(15)
So each record in this table, points to a record from the original table and notes the moving snapshot information for it. Essentially I can record the year and month of the snapshot and the status of the subscription in the snapshot.
By using (YearElement, ElementType) instead of simply Month, I can create snapshots by month or week or whatever. For example, April is represented as (4, 'Month'). A snapfrom from week 32 could be represented as (32, 'Week'). Status is simply 'New', 'Expired' or 'Current'.
Related Topics
How to Return a New Identity Column Value from an SQLserver Select Statement
Using Output Clause to Insert Value Not in Inserted
How to Make a Stored Procedure Return a "Dataset" Using a Parameter I Pass
Running Powershell Scripts Through SQL
Local Collection Types Not Allowed in SQL Statements
Generic SQL That Both Access and Odbc/Oracle Can Understand
Calculate Difference Between Start_Time and End_Time in Seconds from Unix_Time Yyyy-Mm-Dd Hh:Mm:Ss
What Does a Caret (^) Do in a SQL Query
SQL Server Compact Edition Isnull(Sth, ' ') Returns a Boolean Value
Foreign Key for Either-Or Column
Transact-Sql: How to Tokenize a String
The Job Failed. the Job Was Invoked by User<User>. the Last Step to Run Was Step1
Version Number Sorting in SQL Server
Oracle as Keyword and Subqueries
Create a SQL View Based Converting Ranges into Rows