Is there something like a zip() function in PostgreSQL that combines two arrays?
Postgres 9.5 or later
has array_agg(array expression)
:
array_agg
(anyarray
) →anyarray
Concatenates all the input arrays into an array of one higher
dimension. (The inputs must all have the same dimensionality, and
cannot be empty or null.)
This is a drop-in replacement for my custom aggregate function array_agg_mult()
demonstrated below. It's implemented in C and considerably faster. Use it.
Postgres 9.4
Use the ROWS FROM
construct or the updated unnest()
which takes multiple arrays to unnest in parallel. Each can have a different length. You get (per documentation):
[...] the number of result rows in this case is that of the largest function
result, with smaller results padded with null values to match.
Use this cleaner and simpler variant:
SELECT ARRAY[a,b] AS ab
FROM unnest('{a,b,c}'::text[]
, '{d,e,f}'::text[]) x(a,b);
Postgres 9.3 or older
Simple zip()
Consider the following demo for Postgres 9.3 or earlier:
SELECT ARRAY[a,b] AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
, unnest('{d,e,f}'::text[]) AS b
) x;
Result:
ab
-------
{a,d}
{b,e}
{c,f}
Note that both arrays must have the same number of elements to unnest in parallel, or you get a cross join instead.
You can wrap this into a function, if you want to:
CREATE OR REPLACE FUNCTION zip(anyarray, anyarray)
RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT ARRAY[a,b] FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;
Call:
SELECT zip('{a,b,c}'::text[],'{d,e,f}'::text[]);
Same result.
zip() to multi-dimensional array:
Now, if you want to aggregate that new set of arrays into one 2-dimenstional array, it gets more complicated.
SELECT ARRAY (SELECT ...)
or:
SELECT array_agg(ARRAY[a,b]) AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
,unnest('{d,e,f}'::text[]) AS b
) x
or:
SELECT array_agg(ARRAY[ARRAY[a,b]]) AS ab
FROM ...
will all result in the same error message (tested with pg 9.1.5):
ERROR: could not find array type for data type text[]
But there is a way around this, as we worked out under this closely related question.
Create a custom aggregate function:
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
, STYPE = anyarray
, INITCOND = '{}'
);
And use it like this:
SELECT array_agg_mult(ARRAY[ARRAY[a,b]]) AS ab
FROM (
SELECT unnest('{a,b,c}'::text[]) AS a
, unnest('{d,e,f}'::text[]) AS b
) x
Result:
{{a,d},{b,e},{c,f}}
Note the additional ARRAY[]
layer! Without it and just:
SELECT array_agg_mult(ARRAY[a,b]) AS ab
FROM ...
You get:
{a,d,b,e,c,f}
Which may be useful for other purposes.
Roll another function:
CREATE OR REPLACE FUNCTION zip2(anyarray, anyarray)
RETURNS SETOF anyarray LANGUAGE SQL AS
$func$
SELECT array_agg_mult(ARRAY[ARRAY[a,b]])
FROM (SELECT unnest($1) AS a, unnest($2) AS b) x;
$func$;
Call:
SELECT zip2('{a,b,c}'::text[],'{d,e,f}'::text[]); -- or any other array type
Result:
{{a,d},{b,e},{c,f}}
Unnest multiple arrays in parallel
You will love this new feature of Postgres 9.4:
unnest(anyarray, anyarray [, ...])
unnest()
with the much anticipated (at least by me) capability to unnest multiple arrays in parallel cleanly. The manual:
expand multiple arrays (possibly of different types) to a set of rows. This is only allowed in the FROM clause;
It's a special implementation of the new ROWS FROM
feature.
Your function can now just be:
CREATE OR REPLACE FUNCTION multi_unnest(_some_id int
, _amounts numeric[]
, _invoices text[])
RETURNS TABLE (some_id int, amount numeric, invoice text) AS
$func$
SELECT _some_id, u.* FROM unnest(_amounts, _invoices) u;
$func$ LANGUAGE sql;
Call:
SELECT * FROM multi_unnest(123, '{100, 40.5, 76}'::numeric[]
, '{01-2222-05,01-3333-04,01-4444-08}'::text[]);
Of course, the simple form can be replaced with plain SQL (no additional function):
SELECT 123 AS some_id, *
FROM unnest('{100, 40.5, 76}'::numeric[]
, '{01-2222-05,01-3333-04,01-4444-08}'::text[]) AS u(amount, invoice);
In earlier versions (Postgres 9.3-), you can use the less elegant and less safe form:
SELECT 123 AS some_id
, unnest('{100, 40.5, 76}'::numeric[]) AS amount
, unnest('{01-2222-05,01-3333-04,01-4444-08}'::text[]) AS invoice;
Caveats of the old shorthand form: besides being non-standard to have set-returning function in the SELECT
list, the number of rows returned would be the lowest common multiple of each arrays number of elements (with surprising results for unequal numbers). Details in these related answers:
- Parallel unnest() and sort order in PostgreSQL
- Is there something like a zip() function in PostgreSQL that combines two arrays?
This behavior has finally been sanitized with Postgres 10. Multiple set-returning functions in the SELECT
list produce rows in "lock-step" now. See:
- What is the expected behaviour for multiple set-returning functions in SELECT clause?
Joining arrays within group by clause
UNION ALL
You could "unpivot" with UNION ALL
first:
SELECT name, array_agg(c) AS c_arr
FROM (
SELECT name, id, 1 AS rnk, col1 AS c FROM tbl
UNION ALL
SELECT name, id, 2, col2 FROM tbl
ORDER BY name, id, rnk
) sub
GROUP BY 1;
Adapted to produce the order of values you later requested. The manual:
The aggregate functions
array_agg
,json_agg
,string_agg
, andxmlagg
,
as well as similar user-defined aggregate functions, produce
meaningfully different result values depending on the order of the
input values. This ordering is unspecified by default, but can be
controlled by writing anORDER BY
clause within the aggregate call, as
shown in Section 4.2.7. Alternatively, supplying the input values from
a sorted subquery will usually work.
Bold emphasis mine.
LATERAL
subquery with VALUES
expression
LATERAL
requires Postgres 9.3 or later.
SELECT t.name, array_agg(c) AS c_arr
FROM (SELECT * FROM tbl ORDER BY name, id) t
CROSS JOIN LATERAL (VALUES (t.col1), (t.col2)) v(c)
GROUP BY 1;
Same result. Only needs a single pass over the table.
Custom aggregate function
Or you could create a custom aggregate function like discussed in these related answers:
- Selecting data into a Postgres array
- Is there something like a zip() function in PostgreSQL that combines two arrays?
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
, STYPE = anyarray
, INITCOND = '{}'
);
Then you can:
SELECT name, array_agg_mult(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM tbl
GROUP BY 1
ORDER BY 1;
Or, typically faster, while not standard SQL:
SELECT name, array_agg_mult(ARRAY[col1, col2]) AS c_arr
FROM (SELECT * FROM tbl ORDER BY name, id) t
GROUP BY 1;
The added ORDER BY id
(which can be appended to such aggregate functions) guarantees your desired result:
a | {1,2,3,4}
b | {5,6,7,8}
Or you might be interested in this alternative:
SELECT name, array_agg_mult(ARRAY[ARRAY[col1, col2]] ORDER BY id) AS c_arr
FROM tbl
GROUP BY 1
ORDER BY 1;
Which produces 2-dimensional arrays:
a | {{1,2},{3,4}}
b | {{5,6},{7,8}}
The last one can be replaced (and should be, as it's faster!) with the built-in array_agg()
in Postgres 9.5 or later - with its added capability of aggregating arrays:
SELECT name, array_agg(ARRAY[col1, col2] ORDER BY id) AS c_arr
FROM tbl
GROUP BY 1
ORDER BY 1;
Same result. The manual:
input arrays concatenated into array of one higher dimension (inputs
must all have same dimensionality, and cannot be empty or null)
So not exactly the same as our custom aggregate function array_agg_mult()
;
Writing a PostgreSQL aggregate function where the input values are arrays
You can write a pretty simple CREATE AGGREGATE
statement for this; see this similar prior post.
It's not very efficient, so PL/R will be a much better option. Or writing an aggregate in C if you're brave - the PostgreSQL array API in C is pretty ugly.
Aggregating all values not in the same group
In Postgres 11 or later, use a window function with a custom frame and a frame_exclusion
:
SELECT *, array_combine(values) OVER (ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
EXCLUDE CURRENT ROW) AS agg_values
FROM tbl;
If name
is not UNIQUE
, and since you asked:
all values where name not = John Smith
SELECT *, array_combine(values) OVER (ORDER BY name
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING
EXCLUDE GROUP) AS agg_values
FROM tbl;
db<>fiddle here
The first one (also) works with arbitrary order of rows, only excluding the current one. The second requires ORDER BY
to establish which rows are in the same group.
The manual:
The
frame_exclusion
option allows rows around the current row to be
excluded from the frame, even if they would be included according to
the frame start and frame end options.EXCLUDE CURRENT ROW
excludes the current row from the frame.EXCLUDE GROUP
excludes the
current row and its ordering peers from the frame.EXCLUDE TIES
excludes any peers of the current row from the frame, but not the
current row itself. [...]
Bold emphasis mine.
This uses the custom aggregate function array_combine(anyarray)
provided by a_horse.
Or here:
- Selecting data into a Postgres array
- Is there something like a zip() function in PostgreSQL that combines two arrays?
Pairwise array sum aggregate function?
General solutions for any number of arrays with any number of elements. Individual elements or the the whole array can be NULL, too:
Simpler in 9.4+ using WITH ORDINALITY
SELECT ARRAY (
SELECT sum(elem)
FROM tbl t
, unnest(t.arr) WITH ORDINALITY x(elem, rn)
GROUP BY rn
ORDER BY rn
);
See:
- PostgreSQL unnest() with element number
Postgres 9.3+
This makes use of an implicit LATERAL JOIN
SELECT ARRAY (
SELECT sum(arr[rn])
FROM tbl t
, generate_subscripts(t.arr, 1) AS rn
GROUP BY rn
ORDER BY rn
);
See:
- What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Postgres 9.1
SELECT ARRAY (
SELECT sum(arr[rn])
FROM (
SELECT arr, generate_subscripts(arr, 1) AS rn
FROM tbl t
) sub
GROUP BY rn
ORDER BY rn
);
The same works in later versions, but set-returning functions in the SELECT
list are not standard SQL and were frowned upon by some. Should be OK since Postgres 10, though. See:
- What is the expected behaviour for multiple set-returning functions in SELECT clause?
db<>fiddle here
Old sqlfiddle
Related:
- Is there something like a zip() function in PostgreSQL that combines two arrays?
SQL multiple UNNEST in single select list
This isn't about unnest as such, but about PostgreSQL's very weird handling of multiple set-returning functions in the SELECT
list. Set-returning functions in SELECT
aren't part of the ANSI SQL standard.
You will find behaviour much saner with LATERAL
queries, which should be preferred over using a a set-returning function in FROM
as much as possible:
select a, b FROM unnest(array[1,2]) a, LATERAL unnest(array[1,2,3]) b;
e.g.
regress=> select a, b FROM unnest(array[1,2]) a, LATERAL unnest(array[1,2,3]) b;
a | b
---+---
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
2 | 3
(6 rows)
The only time I still use multiple set-returning functions in SELECT
is when I want to pair up values from functions that both return the same number of rows. The need for that will go away in 9.4, with multi-argument unnest
and with support for WITH ORDINALITY
.
Aggregate single array of distinct elements from array column, excluding NULL
Plain array_agg()
does this with arrays:
Concatenates all the input arrays into an array of one higher
dimension. (The inputs must all have the same dimensionality, and
cannot be empty or null.)
Not what you need. See:
- Is there something like a zip() function in PostgreSQL that combines two arrays?
You need something like this: unnest()
, process and sort elements an feed the resulting set to an ARRAY constructor:
SELECT ARRAY(
SELECT DISTINCT elem::date
FROM (SELECT unnest(date_array) FROM example) AS e(elem)
WHERE elem IS NOT NULL
ORDER BY elem DESC
);
db<>fiddle here
To be clear: we could use array_agg()
(taking non-array input, different from your incorrect use) instead of the final ARRAY constructor. But the latter is faster (and simpler, too, IMO).
They happen to be timestamps of just dates (eg without time or timezone)
So cast to date
and trim the noise.
Should be the fastest way:
- A correlated subquery is a bit faster than a
LATERAL
one (and does the simple job). - An ARRAY constructor is a bit faster than the aggregate function
array_agg()
(and does the simple job). - Most importantly, sorting and applying
DISTINCT
in a subquery is typically faster than inlineORDER BY
andDISTINCT
in an aggregate function (and does the simple job).
See:
Unnest arrays of different dimensions
How to select 1d array from 2d array?
Why is array_agg() slower than the non-aggregate ARRAY() constructor?
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Performance comparison:
db<>fiddle here
Related Topics
How to Restore a Dump File from MySQLdump
Getting List of Tables, and Fields in Each, in a Database
How to Select an Entire Row Which Has the Largest Id in the Table
How to Make a Recursive SQL Query
SQL Query: Delete All Records from the Table Except Latest N
How to Run a SQL Query on an Excel Table
Select Info from Table Where Row Has Max Date
Fastest Way to Remove Non-Numeric Characters from a Varchar in SQL Server
SQL Server - How to Lock a Table Until a Stored Procedure Finishes
How to Update Identity Column in SQL Server
Convert Timestamp to Date in MySQL Query