How to Do Forward Fill as a Pl/Pgsql Function

How to do forward fill as a PL/PGSQL function

Correct call

Seems like your displayed query is incorrect, and the test case is just too reduced to show it.

Assuming you want to "forward fill" partitioned by id, you'll have to say so:

SELECT row_num, id
     , str, gap_fill(str) OVER w AS strx
     , val, gap_fill(val) OVER w AS valx
FROM   example
WINDOW w AS (PARTITION BY id ORDER BY row_num);  -- !

The WINDOW clause is just a syntactical convenience to avoid spelling out the same window frame repeatedly. The important part is the added PARTITION clause.

Simpler function

Much simpler, actually:

CREATE OR REPLACE FUNCTION gap_fill_internal(s anyelement, v anyelement)
  RETURNS anyelement
  LANGUAGE plpgsql AS
$func$
BEGIN
RETURN COALESCE(v, s);  -- that's all!
END
$func$;

CREATE AGGREGATE gap_fill(anyelement) ( 
  SFUNC = gap_fill_internal, 
  STYPE = anyelement 
);

Slightly faster in a quick test.

Standard SQL

Without custom function:

SELECT row_num, id
     , str, first_value(str) OVER (PARTITION BY id, ct_str ORDER BY row_num) AS strx
     , val, first_value(val) OVER (PARTITION BY id, ct_val ORDER BY row_num) AS valx
FROM (     
   SELECT *, count(str) OVER w AS ct_str, count(val) OVER w AS ct_val
   FROM   example
   WINDOW w AS (PARTITION BY id ORDER BY row_num)
   ) sub;

The query becomes more complex with a subquery. Performance is similar. Slightly slower in a quick test.

More explanation in these related answers:

Carry over long sequence of missing values with Postgres
Retrieve last known value for each column of a row
SQL group table by "leading rows" without pl/sql

db<>fiddle here - showing all with extended test case

Apply function to all columns in a Postgres table dynamically

What you ask is not a trivial task. You should be comfortable with PL/pgSQL. I do not advise this kind of dynamic SQL queries for beginners, too powerful.

That said, let's dive in. Buckle up!

CREATE OR REPLACE FUNCTION f_gap_fill_update(_tbl regclass, _id text, _row_num text, OUT nullable_columns int, OUT updated_rows int)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _pk  text  := quote_ident(_row_num);
   _sql text;
BEGIN   
   SELECT INTO _sql, nullable_columns
          concat_ws(E'\n'
          , 'UPDATE ' || _tbl || ' t'
          , 'SET   (' || string_agg(        quote_ident(a.attname), ', ') || ')'
          , '    = (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
          , 'FROM  (' 
          , '   SELECT ' || _pk
          , '        , ' || string_agg(format('gap_fill(%1$I) OVER w AS %1$I', a.attname), ', ')
          , '   FROM   ' || _tbl
          , format('   WINDOW w AS (PARTITION BY %I ORDER BY %s)', _id, _pk)
          , '   ) u'
          , format('WHERE t.%1$s = u.%1$s', _pk)
          , 'AND  (' || string_agg('t.' || quote_ident(a.attname), ', ') || ') IS DISTINCT FROM'
          , '     (' || string_agg('u.' || quote_ident(a.attname), ', ') || ')'
          )
        , count(*) -- AS _col_ct
   FROM  (
      SELECT a.attname
      FROM   pg_attribute a
      WHERE  a.attrelid = _tbl
      AND    a.attnum > 0
      AND    NOT a.attisdropped
      AND    NOT a.attnotnull
      ORDER  BY a.attnum
      ) a;

   IF nullable_columns = 0 THEN
      RAISE EXCEPTION 'No nullable columns found in table >>%<<', _tbl;
   ELSIF _sql IS NULL THEN
      RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
   END IF;
   
   -- RAISE NOTICE '%', _sql;       -- debug
   EXECUTE _sql;              -- execute
   GET DIAGNOSTICS updated_rows = ROW_COUNT; 
END
$func$;

Example call:

SELECT * FROM f_gap_fill_update('example', 'id', 'row_num');

db<>fiddle here

The function is state of the art.
Generates and executes a query of the form:

UPDATE tbl t
SET   (str, val, col1)
    = (u.str, u.val, u.col1)
FROM  (
   SELECT row_num
        , gap_fill(str) OVER w AS str, gap_fill(val) OVER w AS val
        , gap_fill(col1) OVER w AS col1
   FROM   tbl
   WINDOW w AS (PARTITION BY id ORDER BY row_num)
   ) u
WHERE t.row_num = u.row_num
AND  (t.str, t.val, t.col1) IS DISTINCT FROM
     (u.str, u.val, u.col1)

Using pg_catalog.pg_attribute instead of the information schema. See:

"Information schema vs. system catalogs"

Note the final WHERE clause to prevent (possibly expensive) empty updates. Only rows that actually change will be written. See:

How do I (or can I) SELECT DISTINCT on multiple columns?

Moreover, only nullable columns (not defined NOT NULL) will even be considered, to avoid unnecessary work.

Using ROW syntax in UPDATE to keep the code simple. See:

SQL update fields of one table from fields of another one

The function returns two integer values: nullable_columns and updated_rows, reporting what the names suggest.

The function defends against SQL injection properly. See:

Table name as a PostgreSQL function parameter
SQL injection in Postgres functions vs prepared queries

About GET DIAGNOSTICS:

Calculate number of rows affected by batch query in PostgreSQL

The above function updates, but does not return rows. Here is a basic demo how to return rows of varying type:

CREATE OR REPLACE FUNCTION f_gap_fill_select(_tbl_type anyelement, _id text, _row_num text)
  RETURNS SETOF anyelement
  LANGUAGE plpgsql AS
$func$
DECLARE
   _tbl regclass := pg_typeof(_tbl_type)::text::regclass;
   _sql text;
BEGIN   
   SELECT INTO _sql
          'SELECT ' || string_agg(CASE WHEN a.attnotnull
                                  THEN format('%I', a.attname)
                                  ELSE format('gap_fill(%1$I) OVER w AS %1$I', a.attname) END
                                , ', ' ORDER BY a.attnum)
        || E'\nFROM ' || _tbl
        || format(E'\nWINDOW w AS (PARTITION BY %I ORDER BY %I)', _id, _row_num)
   FROM   pg_attribute a
   WHERE  a.attrelid = _tbl
   AND    a.attnum > 0
   AND    NOT a.attisdropped;
   
   IF _sql IS NULL THEN
      RAISE EXCEPTION 'SQL string is NULL. Should not occur!';
   END IF;

   RETURN QUERY EXECUTE _sql;
   -- RAISE NOTICE '%', _sql;       -- debug
END
$func$;

Call (note special syntax!):

SELECT * FROM f_gap_fill_select(NULL::example, 'id', 'row_num');

db<>fiddle here

About returning a polymorphic row type:

Refactor a PL/pgSQL function to return the output of various SELECT queries

PL/pgSQL function won't run correctly outside pgAdmin

It seems like you can replace the whole function with this plain, much cheaper UPDATE query and a RETURNING clause:

UPDATE table1
SET    val = val + 1
WHERE  status IS NULL
RETURNING id, name AS t1, data AS t2;

If there can be race conditions, consider:

Postgres UPDATE … LIMIT 1

You can run this query as is, without function wrapper. If you want a function wrapper, prepend RETURN QUERY to return results directly. Code example:

PostgreSQL: ERROR: 42601: a column definition list is required for functions returning "record"

SQL group table by leading rows without pl/sql

This can be achieved by setting rows containing a to a specific value and all the other rows to a different value. Then use a cumulative sum to get the desired number for the rows. The group number is set to the next number when a new value in the val column is encountered and all the proceeding rows with a will have the same group number as the one before and this continues.

I assume that you would need a distinct number for each group and the number doesn't matter.

select id, val, sum(ex) over(order by id) cm_sum
from (select t.*
      ,case when val = 'a' then 0 else 1 end ex
      from t) x

The result for the query above with the data in question, would be

id  val cm_sum
--------------
1   a   0
2   a   0
3   a3  1
4   a   1
5   a   1
6   a6  2
7   a   2
8   a8  3
9   a   3

Impute via fill-forward/LOCF a column over a range of sequential rows in SQL?

The following query structure will achieve fill-forward if using a PostgreSQL flavoured SQL dialect (e.g. Netezza PureData) for a datetime index (assuming past data). It will also work for multi-column index/keys.

Given the following parameters:

<key_cols> - list of columns uniquely identifying each time-series sample (e.g. UNIT, TIME )
<impute_col> - column in which values need to be imputed (e.g. VALUE )
<impute_over_range_col> - the sequential range column for the time-series (e.g. TIME)

and deriving:

<keys_no_range> - key columns except for <impute_over_range_col>

SELECT DISTINCT T1.<key_cols>, 
                COALESCE(T1.<impute_col>, T2.<impute_col>) AS <impute_col>
FROM table T1
LEFT OUTER JOIN (SELECT T1.<key_cols>,
                     T1.<impute_col>,
                     LEAD(T1.<impute_over_range_col>,1) 
                         OVER (PARTITION BY T1.<keys_no_range> 
                               ORDER BY T1.<key_cols>)
                         AS NEXT_RANGE
                     FROM table T1
                     WHERE T1.<impute_col> IS NOT NULL
                     ORDER BY T1.<key_cols>
                ) T2
              ON (T1.<impute_over_range_col> BETWEEN T2.<impute_over_range_col> 
                                             AND COALESCE(NEXT_RANGE, CURRENT_DATE)) 
              AND T1.<keys_no_range>[0] = T2.<keys_no_range>[0]
              AND T1.<keys_no_range>[1] = T2.<keys_no_range>[1]
              -- ... for each col in <keys_no_range>

Concretely, for the example in the question:

SELECT DISTINCT T1.UNIT, T1.TIME, 
                COALESCE(T1.VALUE, T2.VALUE) AS VALUE
FROM table T1
LEFT OUTER JOIN (SELECT T1.UNIT, T1.TIME,
                     T1.VALUE,
                     LEAD(T1.TIME,1) 
                         OVER (PARTITION BY T1.UNIT 
                               ORDER BY T1.UNIT, T1.TIME)
                         AS NEXT_RANGE
                     FROM table T1
                     WHERE T1.VALUE IS NOT NULL
                     ORDER BY T1.UNIT, T1.TIME
                ) T2
              ON (T1.TIME BETWEEN T2.TIME
                           AND COALESCE(NEXT_RANGE, CURRENT_DATE)) 
              AND T1.UNIT = T2.UNIT

Here is an SQLFiddle of the above query: http://sqlfiddle.com/#!15/d589b/1

Retrieve last known value for each column of a row

This should work but keep in mind it is an uggly solution

select * from
(select dt from
(select rank() over (order by ctid desc) idx, dt
  from sometable ) cx
where idx = 1) dtz,
(
select a from
(select rank() over (order by ctid desc) idx, a
  from sometable where a is not null ) ax 
where idx = 1) az,
(
select b from
(select rank() over (order by ctid desc) idx, b
  from sometable where b is not null ) bx 
where idx = 1) bz,
(
select c from
(select rank() over (order by ctid desc) idx, c
  from sometable where c is not null ) cx
where idx = 1) cz

See it here at fiddle: http://sqlfiddle.com/#!15/d5940/40

The result will be

DT                                   A        B      C
October, 16 2013 00:00:00+0000      abc     died    fred

How to Do Forward Fill as a Pl/Pgsql Function