Normalize Array Subscripts So They Start With 1

Normalize array subscripts so they start with 1

There is a simpler method that is ugly, but I believe technically correct: extract the largest possible slice out of the array, as opposed to the exact slice with computed bounds.
It avoids the two function calls.

Example:

select ('[5:7]={1,2,3}'::int[])[-2147483648:2147483647];

results in:


int4
---------
{1,2,3}

Array 1 contained in array 2 and elements in the same order

General case

All elements of the second array are in the first, too. In the same order, but gaps are allowed.

I suggest this polymorphic PL/pgSQL function:

CREATE OR REPLACE FUNCTION array_contains_array_in_order(arr1 ANYARRAY
, arr2 ANYARRAY
, elem ANYELEMENT = NULL)
RETURNS bool AS
$func$
DECLARE
pos int := 1;
BEGIN
FOREACH elem in ARRAY arr2
LOOP
pos := pos + array_position(arr1[pos:], elem); -- see below
IF pos IS NULL THEN
RETURN FALSE;
END IF;
END LOOP;

RETURN true; -- all elements found in order
END
$func$ LANGUAGE plpgsql IMMUTABLE COST 3000;

As @a_horse commented, we can omit the upper bound in array subscripts to mean "unbounded" (arr1[pos:]). In older versions before 9.6 substitute with arr1[pos:2147483647] - 2147483647 = 2^31 - 1 being the theoretical max array index, the greatest signed int4 number.

This works for ...

  • any 1-dimensional array type, not just integer[].

  • arrays with NULL values, thanks to array_position() which works for NULL, too.

  • arrays with duplicate elements.

  • only for default array subscripts starting with 1. You can easily cover non-standard subscripts if needed:

  • Normalize array subscripts for 1-dimensional array so they start with 1

About the ANYELEMENT trick:

  • How do I get the type of an array's elements?

Performance

I ran a quick performance test comparing this function to the one @a_horse supplied. This one was around 5x faster.

If you use this a filter for a big table I strongly suggest you combine it with a (logically redundant) sargable filter like:

SELECT *
FROM tbl
WHERE arr @> '{2,160,134,58,149,111}'::int[]
AND array_contains_array_in_order(arr, '{2,160,134,58,149,111}')

This will use a GIN index on the array column like:

CREATE INDEX ON tbl USING gin (arr);

And only filter remaining (typically very few!) arrays that share all elements. Typically much faster.

Caveats with intarray module

Note: applies to integer[] exclusively, not smallint[] or bigint[] or any other array type!

Careful, if you have installed the intarray extension, which provides its own variant of the @> operator for int[]. Either you create an (additional) GIN index with its special operator class (which is a bit faster where applicable):

CREATE INDEX ON intarr USING gin (arr gin__int_ops);

Or, while you only have a GIN index with the default operator class, you must explicitly denote the standard operator to cooperate with the index:

WHERE  arr OPERATOR(pg_catalog.@>) '{2,160,134,58,149,111}'::int[] 

Details:

  • GIN index on smallint[] column not used or error "operator is not unique"

Simple case

As commented, your case is simpler:

The complete second array is included in the first (same order, no gaps!).

CREATE OR REPLACE FUNCTION array_contains_array_exactly(arr1 ANYARRAY, arr2 ANYARRAY)
RETURNS bool AS
$func$
DECLARE
len int := array_length(arr2, 1) - 1; -- length of arr2 - 1 to fix off-by-1
pos int; -- for current search postition in arr1
BEGIN
/* -- OPTIONAL, if invalid input possible
CASE array_length(arr1, 1) > len -- array_length(arr2, 1) - 1
WHEN TRUE THEN -- valid arrays
-- do nothing, proceed
WHEN FALSE THEN -- arr1 shorter than arr2
RETURN FALSE; -- or raise exception?
ELSE -- at least one array empty or NULL
RETURN NULL;
END CASE;
*/

pos := array_position(arr1, arr2[1]); -- pos of arr2's 1st elem in arr1

WHILE pos IS NOT NULL
LOOP
IF arr1[pos:pos+len] = arr2 THEN -- array slice matches arr2 *exactly*
RETURN TRUE; -- arr2 is part of arr1
END IF;

pos := pos + array_position(arr1[(pos+1):], arr2[1]);
END LOOP;

RETURN FALSE;
END
$func$ LANGUAGE plpgsql IMMUTABLE COST 1000;

Considerably faster than the above for longer arrays. All other considerations still apply.

Multiply elements in an array according to their position

WITH t AS (SELECT ARRAY[10,20,30,40,50]::INT[] AS arr)   -- variable for demo
SELECT ARRAY(
SELECT unnest(array_fill(arr[idx], ARRAY[idx])) AS mult
FROM (SELECT arr, generate_subscripts(arr, 1) AS idx FROM t) sub
);

I would wrap the logic into a simple IMMUTABLE SQL function:

CREATE OR REPLACE FUNCTION f_expand_arr(_arr anyarray)
RETURNS anyarray AS
$func$
SELECT ARRAY(
SELECT unnest(array_fill(_arr[idx], ARRAY[idx]))
FROM (SELECT generate_subscripts(_arr, 1) AS idx) sub
)
$func$ LANGUAGE sql IMMUTABLE;

Works for arrays of any base type due to the polymorphic parameter type anyarray:

How to write a function that returns text or integer values?

The manual on generate_subscripts() and array_fill().

Note: This works with the actual array indexes, which can differ from the ordinal array position in Postgres. You may be interested in @Daniel's method to "normalize" the array index:

Normalize array subscripts for 1-dimensional array so they start with 1

The upcoming Postgres 9.4 (currently beta) provides WITH ORDINALITY:

PostgreSQL unnest() with element number

Allowing for this even more elegant and reliable solution:

CREATE OR REPLACE FUNCTION f_expand_arr(_arr anyarray)
RETURNS anyarray AS
$func$
SELECT ARRAY(
SELECT unnest(array_fill(a, ARRAY[idx]))
FROM unnest(_arr) WITH ORDINALITY AS x (a, idx)
)
$func$ LANGUAGE sql IMMUTABLE;

One might still argue that proper order is not actually guaranteed. I claim it is ...

Parallel unnest() and sort order in PostgreSQL

Call:

SELECT f_expand_arr(ARRAY[10,20,30,40,10]::INT[]) AS a2;

Or for values from a table:

SELECT f_expand_arr(a) AS a2 FROM t;

SQL Fiddle.

Moving PostgreSQL bigint array unique value to another index

General assumptions:

  • Array elements are UNIQUE NOT NULL.
  • Arrays are 1-dimensional with standard subscripts (1..N). See:

    • Normalize array subscripts for 1-dimensional array so they start with 1

Simple solution

CREATE FUNCTION f_array_move_element_simple(_arr bigint[], _elem bigint, _pos int)
RETURNS bigint[] LANGUAGE sql IMMUTABLE AS
'SELECT a1[:_pos-1] || _elem || a1[_pos:] FROM array_remove(_arr, _elem) a1'

All fine & dandy, as long as:

  • The given element is actually contained in the array.
  • The given position is between 1 and the length of the array.

Proper solution

CREATE FUNCTION f_array_move_element(_arr ANYARRAY, _elem ANYELEMENT, _pos int)
RETURNS ANYARRAY AS
$func$
BEGIN
IF _pos IS NULL OR _pos < 1 THEN
RAISE EXCEPTION 'Target position % not allowed. Must be a positive integer.', _pos;
ELSIF _pos > array_length(_arr, 1) THEN
RAISE EXCEPTION 'Target position % not allowed. Cannot be greater than length of array.', _pos;
END IF;

CASE array_position(_arr, _elem) = _pos -- already in position, return org
WHEN true THEN
RETURN _arr;
WHEN false THEN -- remove element
_arr := array_remove(_arr, _elem);
ELSE -- element not found
RAISE EXCEPTION 'Element % not contained in array %!', _elem, _arr;
END CASE;

RETURN _arr[:_pos-1] || _elem || _arr[_pos:];
END
$func$ LANGUAGE plpgsql IMMUTABLE;

Exceptions are raised if any of the additional assumptions for the simple func are violated.

The "proper" function uses polymorphic types and works for any data type, not just bigint - as long as array and element type match.

db<>fiddle here

How to access array internal index with postgreSQL?

Postgres 9.4 or later

While operating with 1-dimensional arrays and standard index subscripts (like almost always), use the new WITH ORDINALITY instead:

SELECT t.*
FROM unnest(ARRAY[1,20,3,5]) WITH ORDINALITY t(val, idx);

See:

  • PostgreSQL unnest() with element number

Just make sure you don't trip over non-standard subscripts. See:

  • Normalize array subscripts so they start with 1

Postgres 9.3 or earlier

(Original answer.)

Postgres does provide dedicated functions to generate array subscripts:

WITH   x(a) AS (VALUES ('{1,20,3,5}'::int[]))
SELECT generate_subscripts(a, 1) AS idx
, unnest(a) AS val
FROM x;

Effectively it does almost the same as @Frank's query, just without subquery.

Plus, it also works for subscripts that do not start with 1.

Either solution works for 1-dimensional arrays only! (Can easily be expanded to multiple dimensions.)

Function:

CREATE OR REPLACE FUNCTION unnest_with_idx(anyarray) 
RETURNS TABLE(idx integer, val anyelement)
LANGUAGE sql IMMUTABLE AS
$func$
SELECT generate_subscripts($1, 1), unnest($1);
$func$;

Call:

SELECT * FROM unnest_with_idx('{1,20,3,5}'::int[]);

Also consider:

SELECT * FROM unnest_with_idx('[4:7]={1,20,3,5}'::int[]);

About custom array subscripts:

  • Normalize array subscripts so they start with 1

To get normalized subscripts starting with 1 for a 1-dimensional array:

SELECT generate_series(1, array_length($1,1)) ...

That's almost the query you had already, just with array_length() instead of array_upper() - which would fail with non-standard subscripts.

Performance

I ran a quick test on an array of 1000 int with all queries presented here so far. They all perform about the same (~ 3,5 ms) - except for row_number() on a subquery (~ 7,5 ms) - as expected, because of the subquery.

Postgres array_position(array, element) sometimes 0-indexed?

Array subscripts

You stated:

Postgres method array_position(array, element), like other things in SQL, is 1-based.

But that's subtly incorrect. Postgres arrays are 1-based by default. But Postgres allows any range of integers as index. And the function array_position() isn't anything-based. It just returns the index as found.

SELECT array_position('[7:9]={4,5,6}'::int[], 5);  -- returns 8!

See:

  • Normalize array subscripts so they start with 1
  • Why does PostgreSQL allow querying for array[0] even though it uses 1-based arrays?

And pg_index.indkey is not an array to begin with. It's type int2vector, which is an internal type, not available for general use, and 0-based! It allows subscripts (similar to an array). A cast to int2[] preserves 0-based array subscripts (indices).

Proper query

Either way, your query doesn't seem right.

The INNER JOIN on pg_tablespace eliminates indexes stored in the default tablespace. The manual on pg_class.reltablespace:

If zero, the database's default tablespace is implied.

But there is no entry in pg_tablespace with oid = 0, so make that a LEFT JOIN.

There are many more caveats if you try to extract parts of the index definition by hand. What you have for ASC / DESC doesn't quite cut it. See:

  • How to get the Index column order(ASC, DESC, NULLS FIRST....) from Postgresql?

And you didn't even consider NULLS FIRST | LAST. Or a possible WHERE condition for partial indices, ...

I strongly suggest this simple, fast and reliable alternative using the built-in System Catalog Information Function pg_get_indexdef():

SELECT ci.relname AS index_name
, ix.indrelid::regclass::text AS table_name
, pg_get_indexdef (ix.indexrelid) AS idx_def
FROM pg_catalog.pg_index ix
JOIN pg_catalog.pg_class ci ON ci.oid = ix.indexrelid
JOIN pg_catalog.pg_namespace ns ON ns.oid = ci.relnamespace
WHERE ix.indisunique = false
AND ns.nspname = 'my_schema'
ORDER BY 2, 1;

The manual:

Reconstructs the creating command for an index. (This is a decompiled reconstruction, not the original text of the command.)

This gets all aspects right and keeps working across Postgres versions.

Your query

If you insist on decomposing the index definition, this query should basically work (as of Postgres 14):

SELECT ci.relname AS index_name
, ct.relname AS table_name
, pg_get_indexdef (ix.indexrelid, pos::int, false) AS idx_expression
, CASE WHEN ia.indopt & 1 = 1 THEN 'DESC' ELSE 'ASC' END AS direction
, CASE WHEN ia.indopt & 2 = 2 THEN 'NULLS FIRST' ELSE 'NULLS LAST' END AS direction_nulls
, pg_get_expr(ix.indpred, ix.indrelid) AS where_clause
, ia.pos AS column_position
, ix.indkey
, ix.indoption
FROM pg_catalog.pg_index ix
JOIN pg_catalog.pg_class ct ON ct.oid = ix.indrelid
JOIN pg_catalog.pg_class ci ON ci.oid = ix.indexrelid
JOIN pg_catalog.pg_namespace ns ON ns.oid = ci.relnamespace
LEFT JOIN pg_catalog.pg_tablespace ts ON ts.oid = ci.reltablespace
CROSS JOIN LATERAL unnest(ix.indkey, ix.indoption) WITH ORDINALITY AS ia(attnum, indopt, pos)
WHERE ix.indisunique = false
AND ns.nspname = 'my_schema'
ORDER BY ct.relname, ci.relname, ia.pos;

But the "proper query" is far more stable and reliable.

In particular I use unnest() with multiple arguments to unnest indkey and indoption in lockstep and with ordinal (1-based) position. See:

  • Unnest multiple arrays in parallel

About WITH ORDINALITY:

  • PostgreSQL unnest() with element number

I use pg_get_indexdef() to reconstruct each index field. This also covers expressions, not just plain table columns.

I added direction_nulls indicating NULLS FIRST | LAST, see:

  • Sort by column ASC, but NULL values first?

And where_clause with a decompiled WHERE clause for partial indices (using pg_get_expr()).

Select every first element of array of integer arrays to array

Since PostgreSQL will allow asking for a slice outside of the array size, and assuming there will never be more than 999 subarrays, we can use this monstrosity

WITH data AS (
SELECT array[array[1,2,3], array[2,15,32], array[5,16,14]] as arr)
SELECT array_agg(arr)
FROM (SELECT unnest(arr[1:999][1]) as arr from data) data2;

You can of course make the constant 999 larger if needed, it is just a random large number I threw in there.

The reason why this is so complicated is that if you would use just arr[1:999][1] you would still get a two-dimensional array, but with only the first elements. In this case {{1}, {2}, {5}}. If we use unnest() we can make it into a set, which can then be fed into array_agg() via subselect.

It would be nice to use array_agg(unnest(arr[1:999][1])) but the aggregation function doesn't like sets and I don't know if there is a way to convert it on the fly.

You can also use the actual array length, but it might cause unnecessary computation

SELECT unnest(arr[1:array_length(arr, 1)][1]) as arr from data

Note

If the arrays could be unnested by one level, you could just index the arrays and then use array_agg() to convert it back into an array with a lot simpler syntax

WITH data AS
(SELECT array[1,2,3] as arr
UNION ALL SELECT array[2,15,32] as arr
UNION ALL SELECT array[5,16,14] as arr)
SELECT array_agg(arr[1]) from data;

The CTE is there just for input data, the actual meat is the array_agg(arr[1]). This will of course work for any number of input arrays.



Related Topics



Leave a reply



Submit