How to implement a many-to-many relationship in PostgreSQL?
The SQL DDL (data definition language) statements could look like this:
CREATE TABLE product (
product_id serial PRIMARY KEY -- implicit primary key constraint
, product text NOT NULL
, price numeric NOT NULL DEFAULT 0
);
CREATE TABLE bill (
bill_id serial PRIMARY KEY
, bill text NOT NULL
, billdate date NOT NULL DEFAULT CURRENT_DATE
);
CREATE TABLE bill_product (
bill_id int REFERENCES bill (bill_id) ON UPDATE CASCADE ON DELETE CASCADE
, product_id int REFERENCES product (product_id) ON UPDATE CASCADE
, amount numeric NOT NULL DEFAULT 1
, CONSTRAINT bill_product_pkey PRIMARY KEY (bill_id, product_id) -- explicit pk
);
I made a few adjustments:
The n:m relationship is normally implemented by a separate table -
bill_product
in this case.I added
serial
columns as surrogate primary keys. In Postgres 10 or later consider anIDENTITY
column instead. See:- Safely rename tables using serial primary key columns
- Auto increment table column
- https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/
I highly recommend that, because the name of a product is hardly unique (not a good "natural key"). Also, enforcing uniqueness and referencing the column in foreign keys is typically cheaper with a 4-byte
integer
(or even an 8-bytebigint
) than with a string stored astext
orvarchar
.Don't use names of basic data types like
date
as identifiers. While this is possible, it is bad style and leads to confusing errors and error messages. Use legal, lower case, unquoted identifiers. Never use reserved words and avoid double-quoted mixed case identifiers if you can."name" is not a good name. I renamed the column of the table
product
to beproduct
(orproduct_name
or similar). That is a better naming convention. Otherwise, when you join a couple of tables in a query - which you do a lot in a relational database - you end up with multiple columns named "name" and have to use column aliases to sort out the mess. That's not helpful. Another widespread anti-pattern would be just "id" as column name.
I am not sure what the name of abill
would be.bill_id
will probably suffice in this case.price
is of data typenumeric
to store fractional numbers precisely as entered (arbitrary precision type instead of floating point type). If you deal with whole numbers exclusively, make thatinteger
. For example, you could save prices as Cents.The
amount
("Products"
in your question) goes into the linking tablebill_product
and is of typenumeric
as well. Again,integer
if you deal with whole numbers exclusively.You see the foreign keys in
bill_product
? I created both to cascade changes:ON UPDATE CASCADE
. If aproduct_id
orbill_id
should change, the change is cascaded to all depending entries inbill_product
and nothing breaks. Those are just references without significance of their own.
I also usedON DELETE CASCADE
forbill_id
: If a bill gets deleted, its details die with it.
Not so for products: You don't want to delete a product that's used in a bill. Postgres will throw an error if you attempt this. You would add another column toproduct
to mark obsolete rows ("soft-delete") instead.All columns in this basic example end up to be
NOT NULL
, soNULL
values are not allowed. (Yes, all columns - primary key columns are definedUNIQUE NOT NULL
automatically.) That's becauseNULL
values wouldn't make sense in any of the columns. It makes a beginner's life easier. But you won't get away so easily, you need to understandNULL
handling anyway. Additional columns might allowNULL
values, functions and joins can introduceNULL
values in queries etc.Read the chapter on
CREATE TABLE
in the manual.Primary keys are implemented with a unique index on the key columns, that makes queries with conditions on the PK column(s) fast. However, the sequence of key columns is relevant in multicolumn keys. Since the PK on
bill_product
is on(bill_id, product_id)
in my example, you may want to add another index on justproduct_id
or(product_id, bill_id)
if you have queries looking for a givenproduct_id
and nobill_id
. See:- PostgreSQL composite primary key
- Is a composite index also good for queries on the first field?
- Working of indexes in PostgreSQL
Read the chapter on indexes in the manual.
What is the best way to implement a many-to-many relationship in PostgreSQL?
You can get what you want just inserting the Product Name
and the Order name
letting Postgres handle duplication. You elevate the indexes on them to unique constraints
(no hashing needed) then let the ON CONFLICT clause handle duplication. That is what select, if not found insert
logic is attempting anyway. The only issue becomes Postgres not returning the ID on duplicates. To overcome that the insert for Product_Orders retrieves the ids from the appropriate names. This can be done in a single statement with a couple CTEs but then the returning clause is necessary and whatever is returned is null where the name already exists. So it just gets ignored anyway. However a SQL function/procedure can have multiple statements, so (imho) 3 statements are clearer. (see example here)
create or replace
procedure gen_product_order(
myproduct varchar
, myorder varchar
)
language sql
as $$
insert into products(product_name)
values (myproduct)
on conflict (product_name) do nothing;
insert into orders (order_name)
values (myorder)
on conflict (order_name) do nothing;
insert into product_orders (product_id, order_id) -- insert both fk ids into the product_order table
select product_id, order_id
from (select product_id
from products
where product_name = myproduct
) prd
, (select order_id
from orders
where order_name = myorder
) ord
on conflict (product_id, order_id) do nothing;
$$;
Note: I've used a procedure rather that a function returning void but if you Postgres version does not support a function would work.
What the best way to model a many to many relationship
A proper model that allows everything you need while enforcing referential integrity could look like this:
CREATE TABLE contact (
contact_id serial PRIMARY KEY
, name text
, phone text
, ...
);
CREATE TABLE product (
product_id serial PRIMARY KEY
, ...
);
CREATE TABLE product_role (
role_id int PRIMARY KEY
, role text UNIQUE
);
CREATE TABLE product_contact (
product_id int REFERENCES product
, contact_id int REFERENCES contact
, role_id int REFERENCES product_role
, PRIMARY KEY (product_id, contact_id, role_id)
);
If the same contact can never act in more than one role for the same product, don't include the role in the PK:
, PRIMARY KEY (product_id, contact_id)
This allows to simply add a row to product_role
to allow and additional type of contact.
If there is only a hand full of distinct roles, the data type "char"
might be convenient for role_id
.
- Any downsides of using data type "text" for storing strings?
Basics:
- How to implement a many-to-many relationship in PostgreSQL?
Modeling Many-to-Many Relationships in Postgresql
Since you're using Postgres, you want to normalize your tables. What normalization means, in a sentence, is that your columns are represented by the key, the whole key, and nothing but the key. Normalization can be a three-step process, but I'm just going to show the final result.
Generally, in a database, table names are singular. User, Collection, Item. Also, it's less confusing to name your ID fields with the table name. It makes join SQL easier to read.
So let's start with the User table.
User
----
User ID
User Name
User Email
...
So far, so good. All of the columns have to do with a user.
Next, let's look at Collection
Collection
----------
Collection ID
Collection Name
Owner ID
So far, so good. The Owner ID is the User ID of the owner of the collection.
Next, let's look at Item.
Item
----
Item ID
Item Name
Item Description
...
So far, so good. All of the columns have to do with an item.
Now, let's look at the relationship between Collection and Item you described in problem 1. A Collection can have one or more items. An Item can be in one or more collections.
When you have a many-to-many relationship, you create a junction table. So let's create a CollectionItem junction table.
CollectionItem
--------------
CollectionItem ID
Collection ID
Item ID
CollectionItem timestamp
CollectionItem contributor ID
Where CollectionItem ID is the primary clustering key. You also have a unique index on (Collection ID, Item ID), so you can pull the collection together. You can also have a unique index on (Item ID, Collection ID), so you can see which items are in multiple collections.
You also have a unique index on (Collection ID, CollectionItem contributor ID). This allows you to see which contributor (user) contributed the item to the collection. This user ID could be the owner ID from the collections row or a different contributor.
Looking at problem 2, we need a Vote table. The Vote table is another junction table connecting a collection, item, and voter (user).
Vote
----
Vote ID
Collection ID
Item ID
Voter ID
Vote timestamp
Where Vote ID is the primary clustering key, and you have a unique index on (Collection ID, Item ID, Voter ID). You may also have two other unique indexes, depending on whether or not you want to group the votes by voter or item.
Edited to add:
You also need some kind of Permission table.
Permission
----------
Permission ID
Owner ID
Contributor ID
Where Permission ID is the primary clustering key and you have a unique index on (Owner ID, Contributor ID). You can also have a Permission Type, which you haven't defined, so I can't add it to the Permission table. The Permission Type would also be a part of the unique index.
I hope this explanation has been helpful.
PostgreSQL many-to-many relationship with some particular constrains
Create a unique constraint:
Alter table UserReaction
add constraint one_reaction_per_customer_per_post
unique ( customer_id, post_id);
Postgresql: One-To-Many Relationship With Multiple Conditions On The Many Side
Does this solve your problem? I count all occurrences for your expected courses and the sum must be 2
.
demo:db<>fiddle
SELECT
s.id
FROM students s
JOIN courses c
ON c.student_id = s.id
GROUP BY s.id
HAVING SUM(
((c.course_name = 'math' AND c.credits = 5)
OR
(c.course_name = 'history' AND c.credits = 3))::int
) = 2
Alternative to the SUM(condition::int)
you could use the COUNT()
with a FILTER
clause:
demo:db<>fiddle
HAVING COUNT(*) FILTER (WHERE
(c.course_name = 'math' AND c.credits = 5)
OR
(c.course_name = 'history' AND c.credits = 3)
) = 2
Related Topics
"193: %1 Is Not a Valid Win32 Application" Bug With a New Rails Application
How to Convert Datetime to Varchar
Set Versus Select When Assigning Variables
How Important Is the Order of Columns in Indexes
What's Faster, Select Distinct or Group by in MySQL
MySQL - Conditional Foreign Key Constraints
What's the Difference Between Inner Join, Left Join, Right Join and Full Join
Get Top N Records For Each Group of Grouped Results
How to Concatenate Multiple MySQL Rows into One Field
Group by Column and Multiple Rows into One Row Multiple Columns
Storing Json in Database Vs. Having a New Column For Each Key