Postgres - Function to Return the Intersection of 2 Arrays

Intersection of multiple arrays in PostgreSQL

The closest thing to an array intersection that I can think of is this:

select array_agg(e)
from (
select unnest(a1)
intersect
select unnest(a2)
) as dt(e)

This assumes that a1 and a2 are single dimension arrays with the same type of elements. You could wrap that up in a function something like this:

create function array_intersect(a1 int[], a2 int[]) returns int[] as $$
declare
ret int[];
begin
-- The reason for the kludgy NULL handling comes later.
if a1 is null then
return a2;
elseif a2 is null then
return a1;
end if;
select array_agg(e) into ret
from (
select unnest(a1)
intersect
select unnest(a2)
) as dt(e);
return ret;
end;
$$ language plpgsql;

Then you could do things like this:

=> select array_intersect(ARRAY[2,4,6,8,10], ARRAY[1,2,3,4,5,6,7,8,9,10]);
array_intersect
-----------------
{6,2,4,10,8}
(1 row)

Note that this doesn't guarantee any particular order in the returned array but you can fix that if you care about it. Then you could create your own aggregate function:

-- Pre-9.1
create aggregate array_intersect_agg(
sfunc = array_intersect,
basetype = int[],
stype = int[],
initcond = NULL
);

-- 9.1+ (AFAIK, I don't have 9.1 handy at the moment
-- see the comments below.
create aggregate array_intersect_agg(int[]) (
sfunc = array_intersect,
stype = int[]
);

And now we see why array_intersect does funny and somewhat kludgey things with NULLs. We need an initial value for the aggregation that behaves like the universal set and we can use NULL for that (yes, this smells a bit off but I can't think of anything better off the top of my head).

Once all this is in place, you can do things like this:

> select * from stuff;
a
---------
{1,2,3}
{1,2,3}
{3,4,5}
(3 rows)

> select array_intersect_agg(a) from stuff;
array_intersect_agg
---------------------
{3}
(1 row)

Not exactly simple or efficient but maybe a reasonable starting point and better than nothing at all.

Useful references:

  • array_agg
  • create aggregate
  • create function
  • PL/pgSQL
  • unnest

Postgres WHERE two arrays have a non-empty intersection

You can use &&, the array overlap operator:

select *
from foo
where tags && ARRAY['apples', 'bananas', 'cherries']

From the documentation:

&&: overlap (have elements in common)

Finding the intersection between two integer arrays in postgres

The documentation you send is for an extension. To use it you have to run CREATE EXTENSION intarray on your database so that those commands work. This will load that extension to the database and from then on you will be able to use it on all queries on that database.

You can read more about extensions here and how to load them here

PostgreSQL / TypeORM: search array in array column - return only the highest arrays' intersection

Your approach #3 should be fine, especially if you have an index on CARDINALITY(edge_stats.tags). However,

DataTeam populated EdgeStats table with all permutations of tags observed per given edge

If you're using a pre-aggregation approach instead of running your queries on the raw data, I would recommend to also record the "tags observed per given edge", in the Edges table.

That way, you can

SELECT s.edge_id, s.tags, s.metric_amount
FROM "EdgeStats" s
JOIN "Edges" e ON s.edge_id = e.id
WHERE s.tags = array_intersect(e.observed_tags, $1)

using the array_intersect function from here.

Intersect on two array_agg columns in the same row

Postgresql has LATERAL.

Which can be used to do something with the content of fields on record level.

create table mytable (day varchar(30), person varchar(1));
INSERT INTO mytable (day, person)
values
('Monday', 'A'),
('Monday', 'B'),
('Tuesday', 'A'),
('Thursday', 'B');
SELECT *
FROM (
select day as d1,
array_agg(distinct person) as agg1
from mytable
group by day) AS AA
cross join
(select day as d2,
array_agg(distinct person) as agg2
from mytable
group by day
) AS BB
CROSS JOIN LATERAL
(
SELECT COUNT(*) AS MatchingPersons
FROM
(
SELECT unnest(agg1) person
INTERSECT
SELECT unnest(agg2)
) q
) lat

d1 | agg1 | d2 | agg2 | matchingpersons
:------- | :---- | :------- | :---- | --------------:
Monday | {A,B} | Monday | {A,B} | 2
Thursday | {B} | Monday | {A,B} | 1
Tuesday | {A} | Monday | {A,B} | 1
Monday | {A,B} | Thursday | {B} | 1
Thursday | {B} | Thursday | {B} | 1
Tuesday | {A} | Thursday | {B} | 0
Monday | {A,B} | Tuesday | {A} | 1
Thursday | {B} | Tuesday | {A} | 0
Tuesday | {A} | Tuesday | {A} | 1

db<>fiddle here

Postgres array intersection queries performance

There is no simple an efficient way to do this. You'll have to unnest both arrays and intersect the result.

My recommendation is to avoid data models where you store references to other data in arrays. Instead, use a “junction table” to model m:n relationships. That is the natural way to do it, and your queries will become simple and probably more efficient. Also, you can define foreign key constraints that way, so that you cannot end up with inconsistent data.

Union of two arrays in PostgreSQL without unnesting

If your problem is to unnest twice this will unnest only once

select array_agg(a order by a)
from (
select distinct unnest(array[1,2,3] || array[2,3,4,5]) as a
) s;


Related Topics



Leave a reply



Submit