Partitioning Postgres table
For a many-to-many relationship you will need a mapping table, partitioning or not.
I wouldn't use an artificial primary key for the mapping table, but the combination of id_doctor
and id_patient
(they are artificial anyway). The same holds for the appointment
table.
Since id_doctor
is not part of the patient table (and shouldn't be), you cannot partition the patient
table per doctor. Why would you want to do that? Partitioning is mostly useful for mass deletions (and to some extent for speeding up sequential scans) — is that your objective?
There is a wide-spread assumption that bigger tables should be partitioned just because they are big, but that is not the case. Index access to a partitioned table is — if anything — slightly slower than index access to a non-partitioned table. Do you have billions of patients?
postgres bad estimates from partitioned table
The key problem is the same in both queries: the huge underestimate of the join result. In the partitioned case, PostgreSQL materializes intermediate results, which are much bigger than expected and cause temporary files to be written. So increasing work_mem
sould speed up that case.
Join row count estimates are hard to estimate correctly, so it will be difficult to cure the problem at the root. You can fight the symptoms though:
create indexes on
"Z_PROD_STD"
,"Z_ASTD"
and"CT"
that include additional columns, so that you get a much faster index only scanFor example, to speed up the index scan on
"CT"
that is repeated 2 million times, you could create an index likeCREATE INDEX ON "CT" ("DATE_SK", "YID");
and then
VACUUM "CT";
.alternatively, set
enable_nestloop
tooff
for the duration of the query to reduce the impact of the bad estimate
How to implement a many-to-many relationship in PostgreSQL?
The SQL DDL (data definition language) statements could look like this:
CREATE TABLE product (
product_id serial PRIMARY KEY -- implicit primary key constraint
, product text NOT NULL
, price numeric NOT NULL DEFAULT 0
);
CREATE TABLE bill (
bill_id serial PRIMARY KEY
, bill text NOT NULL
, billdate date NOT NULL DEFAULT CURRENT_DATE
);
CREATE TABLE bill_product (
bill_id int REFERENCES bill (bill_id) ON UPDATE CASCADE ON DELETE CASCADE
, product_id int REFERENCES product (product_id) ON UPDATE CASCADE
, amount numeric NOT NULL DEFAULT 1
, CONSTRAINT bill_product_pkey PRIMARY KEY (bill_id, product_id) -- explicit pk
);
I made a few adjustments:
The n:m relationship is normally implemented by a separate table -
bill_product
in this case.I added
serial
columns as surrogate primary keys. In Postgres 10 or later consider anIDENTITY
column instead. See:- Safely rename tables using serial primary key columns
- Auto increment table column
- https://www.2ndquadrant.com/en/blog/postgresql-10-identity-columns/
I highly recommend that, because the name of a product is hardly unique (not a good "natural key"). Also, enforcing uniqueness and referencing the column in foreign keys is typically cheaper with a 4-byte
integer
(or even an 8-bytebigint
) than with a string stored astext
orvarchar
.Don't use names of basic data types like
date
as identifiers. While this is possible, it is bad style and leads to confusing errors and error messages. Use legal, lower case, unquoted identifiers. Never use reserved words and avoid double-quoted mixed case identifiers if you can."name" is not a good name. I renamed the column of the table
product
to beproduct
(orproduct_name
or similar). That is a better naming convention. Otherwise, when you join a couple of tables in a query - which you do a lot in a relational database - you end up with multiple columns named "name" and have to use column aliases to sort out the mess. That's not helpful. Another widespread anti-pattern would be just "id" as column name.
I am not sure what the name of abill
would be.bill_id
will probably suffice in this case.price
is of data typenumeric
to store fractional numbers precisely as entered (arbitrary precision type instead of floating point type). If you deal with whole numbers exclusively, make thatinteger
. For example, you could save prices as Cents.The
amount
("Products"
in your question) goes into the linking tablebill_product
and is of typenumeric
as well. Again,integer
if you deal with whole numbers exclusively.You see the foreign keys in
bill_product
? I created both to cascade changes:ON UPDATE CASCADE
. If aproduct_id
orbill_id
should change, the change is cascaded to all depending entries inbill_product
and nothing breaks. Those are just references without significance of their own.
I also usedON DELETE CASCADE
forbill_id
: If a bill gets deleted, its details die with it.
Not so for products: You don't want to delete a product that's used in a bill. Postgres will throw an error if you attempt this. You would add another column toproduct
to mark obsolete rows ("soft-delete") instead.All columns in this basic example end up to be
NOT NULL
, soNULL
values are not allowed. (Yes, all columns - primary key columns are definedUNIQUE NOT NULL
automatically.) That's becauseNULL
values wouldn't make sense in any of the columns. It makes a beginner's life easier. But you won't get away so easily, you need to understandNULL
handling anyway. Additional columns might allowNULL
values, functions and joins can introduceNULL
values in queries etc.Read the chapter on
CREATE TABLE
in the manual.Primary keys are implemented with a unique index on the key columns, that makes queries with conditions on the PK column(s) fast. However, the sequence of key columns is relevant in multicolumn keys. Since the PK on
bill_product
is on(bill_id, product_id)
in my example, you may want to add another index on justproduct_id
or(product_id, bill_id)
if you have queries looking for a givenproduct_id
and nobill_id
. See:- PostgreSQL composite primary key
- Is a composite index also good for queries on the first field?
- Working of indexes in PostgreSQL
Read the chapter on indexes in the manual.
Partitioned table query still scanning all partitions
For non-trivial expressions you have to repeat the more or less verbatim condition in queries to make the Postgres query planner understand it can rely on the CHECK
constraint. Even if it seems redundant!
Per documentation:
With constraint exclusion enabled, the planner will examine the
constraints of each partition and try to prove that the partition need
not be scanned because it could not contain any rows meeting the
query'sWHERE
clause. When the planner can prove this, it excludes
the partition from the query plan.
Bold emphasis mine. The planner does not understand complex expressions.
Of course, this has to be met, too:
Ensure that the constraint_exclusion configuration parameter is not
disabled inpostgresql.conf
. If it is, queries will not be optimized as desired.
Instead of
SELECT * FROM foo WHERE (id = 2);
Try:
SELECT * FROM foo WHERE id % 30 = 2 AND id = 2;
And:
The default (and recommended) setting of constraint_exclusion is
actually neitheron
noroff
, but an intermediate setting called
partition
, which causes the technique to be applied only to queries
that are likely to be working on partitioned tables. The on setting
causes the planner to examineCHECK
constraints in all queries, even
simple ones that are unlikely to benefit.
You can experiment with the constraint_exclusion = on
to see if the planner catches on without redundant verbatim condition. But you have to weigh cost and benefit of this setting.
The alternative would be simpler conditions for your partitions as already outlined by @harmic.
An no, increasing the number for STATISTICS
will not help in this case. Only the CHECK
constraints and your WHERE
conditions in the query matter.
PostgreSQL - Backup and Restore Database Tables with Partitions
If you did used zip
to compress the output, then you should use unzip
do uncompress it, not gunzip
, they use different formats/algorithms.
I'd suggest you to use gzip
and gunzip
only. For instance, if you generated a backup named mybackup.sql
, you can gzip it with:
gzip mybackup.sql
It will generate a file named mybackup.sql.gz
. Then, to restore, you can use:
gunzip -c mybackup.sql.gz | psql -U postgres
Also, I'd suggest you to avoid using pgAdmin to do the dump. Not that it can't do, it is just that you can't automatize it, you can easily use pg_dumpall
the same way:
pg_dumpall -U postgres -f mybackup.sql
You can either dump and compress without intermediate files using pipe:
pg_dumpall -U postgres | gzip -c > mybackup.sql.gz
BTW, I'd really suggest you avoiding pg_dumpall
and use pg_dump
with custom format for each database, as with that you already get the result compressed and easier to use latter. But pg_dumpall
is ok for small databases.
Select first row in each GROUP BY group?
On databases that support CTE and windowing functions:
WITH summary AS (
SELECT p.id,
p.customer,
p.total,
ROW_NUMBER() OVER(PARTITION BY p.customer
ORDER BY p.total DESC) AS rank
FROM PURCHASES p)
SELECT *
FROM summary
WHERE rank = 1
Supported by any database:
But you need to add logic to break ties:
SELECT MIN(x.id), -- change to MAX if you want the highest
x.customer,
x.total
FROM PURCHASES x
JOIN (SELECT p.customer,
MAX(total) AS max_total
FROM PURCHASES p
GROUP BY p.customer) y ON y.customer = x.customer
AND y.max_total = x.total
GROUP BY x.customer, x.total
Related Topics
Amazon Redshift - Lateral Column Alias Reference
Calculate the Sum of the Column Which Has Time Datatype:
Cs50 Pset 7 13.Sql, I Can't Solve It, Nested SQLite3 Database
Setting Up a Development Environment to Learn Pl/Sql
Getting the Floor Value of a Number in SQLite
"An Item with the Same Key Has Already Been Added" Error on Ssrs When Trying to Set Dataset
Select Top N Records Ordered by X, But Have Results in Reverse Order
How to Turn Off Implicit Type Conversion in SQL Server
Inserting a Coalesce(Null,Default)
Operand Data Type Time Is Invalid for Avg Operator...
SQL Difference Between in and or in Where
Are Databases and Functional Programming at Odds
Problem with MySQL Insert Max()+1
Porting from MySQL to T-Sql. Any Inet_Aton() Equivalent
What's the Most Efficient Way to Normalize Text from Column into a Table
Parse a Date from Unformatted Text in SQL
How Does SQL Server Wildcard Character Range, Eg [A-D], Work with Case-Sensitive Collation