How to do forward fill as a PL/PGSQL function
Correct call
Seems like your displayed query is incorrect, and the test case is just too reduced to show it.
Assuming you want to "forward fill" partitioned by id
, you'll have to say so:
SELECT row_num, id
, str, gap_fill(str) OVER w AS strx
, val, gap_fill(val) OVER w AS valx
FROM example
WINDOW w AS (PARTITION BY id ORDER BY row_num); -- !
The WINDOW
clause is just a syntactical convenience to avoid spelling out the same window frame repeatedly. The important part is the added PARTITION
clause.
Simpler function
Much simpler, actually:
CREATE OR REPLACE FUNCTION gap_fill_internal(s anyelement, v anyelement)
RETURNS anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN COALESCE(v, s); -- that's all!
END
$func$;
CREATE AGGREGATE gap_fill(anyelement) (
SFUNC = gap_fill_internal,
STYPE = anyelement
);
Slightly faster in a quick test.
Standard SQL
Without custom function:
SELECT row_num, id
, str, first_value(str) OVER (PARTITION BY id, ct_str ORDER BY row_num) AS strx
, val, first_value(val) OVER (PARTITION BY id, ct_val ORDER BY row_num) AS valx
FROM (
SELECT *, count(str) OVER w AS ct_str, count(val) OVER w AS ct_val
FROM example
WINDOW w AS (PARTITION BY id ORDER BY row_num)
) sub;
The query becomes more complex with a subquery. Performance is similar. Slightly slower in a quick test.
More explanation in these related answers:
- Carry over long sequence of missing values with Postgres
- Retrieve last known value for each column of a row
- SQL group table by "leading rows" without pl/sql
db<>fiddle here - showing all with extended test case
Apply function to all columns in a Postgres table dynamically
What you ask is not a trivial task. You should be comfortable with PL/pgSQL. I do not advise this kind of dynamic SQL queries for beginners, too powerful.
That said, let's dive in. Buckle up!
CREATE OR REPLACE FUNCTION f_gap_fill_update(_tbl regclass, _id text, _row_num text, OUT nullable_columns int, OUT updated_rows int)
LANGUAGE plpgsql AS
$func$
DECLARE
_pk text := quote_ident(_row_num);
_sql text;
BEGIN
SELECT INTO _sql, nullable_columns
concat_ws(E'\n'
, 'UPDATE ' || _tbl || ' t'
, 'SET (' || string_agg( quote_ident(a.attname), ', ') || ')'
, ' = (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
, 'FROM ('
, ' SELECT ' || _pk
, ' , ' || string_agg(format('gap_fill(%1$I) OVER w AS %1$I', a.attname), ', ')
, ' FROM ' || _tbl
, format(' WINDOW w AS (PARTITION BY %I ORDER BY %s)', _id, _pk)
, ' ) u'
, format('WHERE t.%1$s = u.%1$s', _pk)
, 'AND (' || string_agg('t.' || quote_ident(a.attname), ', ') || ') IS DISTINCT FROM'
, ' (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
)
, count(*) -- AS _col_ct
FROM (
SELECT a.attname
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped
AND NOT a.attnotnull
ORDER BY a.attnum
) a;
IF nullable_columns = 0 THEN
RAISE EXCEPTION 'No nullable columns found in table >>%<<', _tbl;
ELSIF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
-- RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql; -- execute
GET DIAGNOSTICS updated_rows = ROW_COUNT;
END
$func$;
Example call:
SELECT * FROM f_gap_fill_update('example', 'id', 'row_num');
db<>fiddle here
The function is state of the art.
Generates and executes a query of the form:
UPDATE tbl t
SET (str, val, col1)
= (u.str, u.val, u.col1)
FROM (
SELECT row_num
, gap_fill(str) OVER w AS str, gap_fill(val) OVER w AS val
, gap_fill(col1) OVER w AS col1
FROM tbl
WINDOW w AS (PARTITION BY id ORDER BY row_num)
) u
WHERE t.row_num = u.row_num
AND (t.str, t.val, t.col1) IS DISTINCT FROM
(u.str, u.val, u.col1)
Using pg_catalog.pg_attribute
instead of the information schema. See:
- "Information schema vs. system catalogs"
Note the final WHERE
clause to prevent (possibly expensive) empty updates. Only rows that actually change will be written. See:
- How do I (or can I) SELECT DISTINCT on multiple columns?
Moreover, only nullable columns (not defined NOT NULL
) will even be considered, to avoid unnecessary work.
Using ROW
syntax in UPDATE
to keep the code simple. See:
- SQL update fields of one table from fields of another one
The function returns two integer values: nullable_columns
and updated_rows
, reporting what the names suggest.
The function defends against SQL injection properly. See:
- Table name as a PostgreSQL function parameter
- SQL injection in Postgres functions vs prepared queries
About GET DIAGNOSTICS
:
- Calculate number of rows affected by batch query in PostgreSQL
The above function updates, but does not return rows. Here is a basic demo how to return rows of varying type:
CREATE OR REPLACE FUNCTION f_gap_fill_select(_tbl_type anyelement, _id text, _row_num text)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass := pg_typeof(_tbl_type)::text::regclass;
_sql text;
BEGIN
SELECT INTO _sql
'SELECT ' || string_agg(CASE WHEN a.attnotnull
THEN format('%I', a.attname)
ELSE format('gap_fill(%1$I) OVER w AS %1$I', a.attname) END
, ', ' ORDER BY a.attnum)
|| E'\nFROM ' || _tbl
|| format(E'\nWINDOW w AS (PARTITION BY %I ORDER BY %I)', _id, _row_num)
FROM pg_attribute a
WHERE a.attrelid = _tbl
AND a.attnum > 0
AND NOT a.attisdropped;
IF _sql IS NULL THEN
RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
END IF;
RETURN QUERY EXECUTE _sql;
-- RAISE NOTICE '%', _sql; -- debug
END
$func$;
Call (note special syntax!):
SELECT * FROM f_gap_fill_select(NULL::example, 'id', 'row_num');
db<>fiddle here
About returning a polymorphic row type:
- Refactor a PL/pgSQL function to return the output of various SELECT queries
PL/pgSQL function won't run correctly outside pgAdmin
It seems like you can replace the whole function with this plain, much cheaper UPDATE
query and a RETURNING
clause:
UPDATE table1
SET val = val + 1
WHERE status IS NULL
RETURNING id, name AS t1, data AS t2;
If there can be race conditions, consider:
- Postgres UPDATE … LIMIT 1
You can run this query as is, without function wrapper. If you want a function wrapper, prepend RETURN QUERY
to return results directly. Code example:
- PostgreSQL: ERROR: 42601: a column definition list is required for functions returning "record"
SQL group table by leading rows without pl/sql
This can be achieved by setting rows containing a
to a specific value and all the other rows to a different value. Then use a cumulative sum to get the desired number for the rows. The group number is set to the next number when a new value in the val column is encountered and all the proceeding rows with a
will have the same group number as the one before and this continues.
I assume that you would need a distinct number for each group and the number doesn't matter.
select id, val, sum(ex) over(order by id) cm_sum
from (select t.*
,case when val = 'a' then 0 else 1 end ex
from t) x
The result for the query above with the data in question, would be
id val cm_sum
--------------
1 a 0
2 a 0
3 a3 1
4 a 1
5 a 1
6 a6 2
7 a 2
8 a8 3
9 a 3
Impute via fill-forward/LOCF a column over a range of sequential rows in SQL?
The following query structure will achieve fill-forward if using a PostgreSQL flavoured SQL dialect (e.g. Netezza PureData) for a datetime index (assuming past data). It will also work for multi-column index/keys.
Given the following parameters:
<key_cols>
- list of columns uniquely identifying each time-series sample (e.g.UNIT, TIME
)<impute_col>
- column in which values need to be imputed (e.g.VALUE
)<impute_over_range_col>
- the sequential range column for the time-series (e.g.TIME
)
and deriving:
<keys_no_range>
- key columns except for<impute_over_range_col>
SELECT DISTINCT T1.<key_cols>,
COALESCE(T1.<impute_col>, T2.<impute_col>) AS <impute_col>
FROM table T1
LEFT OUTER JOIN (SELECT T1.<key_cols>,
T1.<impute_col>,
LEAD(T1.<impute_over_range_col>,1)
OVER (PARTITION BY T1.<keys_no_range>
ORDER BY T1.<key_cols>)
AS NEXT_RANGE
FROM table T1
WHERE T1.<impute_col> IS NOT NULL
ORDER BY T1.<key_cols>
) T2
ON (T1.<impute_over_range_col> BETWEEN T2.<impute_over_range_col>
AND COALESCE(NEXT_RANGE, CURRENT_DATE))
AND T1.<keys_no_range>[0] = T2.<keys_no_range>[0]
AND T1.<keys_no_range>[1] = T2.<keys_no_range>[1]
-- ... for each col in <keys_no_range>
Concretely, for the example in the question:
SELECT DISTINCT T1.UNIT, T1.TIME,
COALESCE(T1.VALUE, T2.VALUE) AS VALUE
FROM table T1
LEFT OUTER JOIN (SELECT T1.UNIT, T1.TIME,
T1.VALUE,
LEAD(T1.TIME,1)
OVER (PARTITION BY T1.UNIT
ORDER BY T1.UNIT, T1.TIME)
AS NEXT_RANGE
FROM table T1
WHERE T1.VALUE IS NOT NULL
ORDER BY T1.UNIT, T1.TIME
) T2
ON (T1.TIME BETWEEN T2.TIME
AND COALESCE(NEXT_RANGE, CURRENT_DATE))
AND T1.UNIT = T2.UNIT
Here is an SQLFiddle of the above query: http://sqlfiddle.com/#!15/d589b/1
Retrieve last known value for each column of a row
This should work but keep in mind it is an uggly solution
select * from
(select dt from
(select rank() over (order by ctid desc) idx, dt
from sometable ) cx
where idx = 1) dtz,
(
select a from
(select rank() over (order by ctid desc) idx, a
from sometable where a is not null ) ax
where idx = 1) az,
(
select b from
(select rank() over (order by ctid desc) idx, b
from sometable where b is not null ) bx
where idx = 1) bz,
(
select c from
(select rank() over (order by ctid desc) idx, c
from sometable where c is not null ) cx
where idx = 1) cz
See it here at fiddle: http://sqlfiddle.com/#!15/d5940/40
The result will be
DT A B C
October, 16 2013 00:00:00+0000 abc died fred
Related Topics
Temporary Table Record Limit in SQL Server
SQL Server Split Comma Separated Values into Columns
MySQL Count() Multiple Columns
SQL Server Normalization Tactic: Varchar VS Int Identity
Split String Oracle into a Single Column and Insert into a Table
How to Calculate Between Different Group of Rows of the Same Table
How to Add Dynamic Column to an Existing Table
Fixing Holes/Gaps in Numbers Generated by Postgres Sequence
As400 SQL Query with Parameter
How to Calculate Moving Sum with Reset Based on Condition in Teradata SQL
How to Create Foreign Keys Across Databases
Does SQL Server Optimize Dateadd Calculation in Select Query
SQL Query: How to Create Subtotal Rows When There Is No Aggregate Function
Ssms: How to Import (Copy/Paste) Data from Excel
Import CSV File Error:Column Value Containing Column Delimiter