Why is SQLITE taking forever on a simple count query?
I finally figured out the problem. Running a VACUUM command on the database solved the issue. I had run .dbinfo to confirm that the page_size multiplied by the number of pages added up to approximately the file size. That, combined with the fact that I hadn't deleted anything from the database (only inserted), led me to assume that I didn't need to vacuum (or de-fragment).
But it looks like the re-organization that vacuum does also makes a dramatic difference in the speed of the count queries (which now finish in milliseconds as I've seen reported elsewhere).
Count on join of big tables with conditions is slow
Your query, rewritten and 100 % equivalent:
SELECT count(*)
FROM product_categories pc
JOIN customers c USING (organization_id)
JOIN total_sales ts ON ts.customer_id = c.id
JOIN performance_analyses pa ON pa.total_sales_id = ts.id
WHERE pc.organization_id = 3
AND c.active -- boolean can be used directly
AND c.visible
AND ts.product_category_id = pc.id
AND ts.period_id = 193
AND pa.size > 0;
Another answer advises to move all conditions into join clauses and order tables in the FROM
list. This may apply for a certain other RDBMS with a comparatively primitive query planner. But while it doesn't hurt for Postgres either, it also has no effect on performance for your query - assuming default server configuration. The manual:
Explicit inner join syntax (
INNER JOIN
,CROSS JOIN
, or unadornedJOIN
)
is semantically the same as listing the input relations inFROM
, so it
does not constrain the join order.
Bold emphasis mine. There is more, read the manual.
The key setting is join_collapse_limit
(with default 8). The Postgres query planner will rearrange your 4 tables any way it expects it to be fastest, no matter how you arranged your tables and whether you write conditions as WHERE
or JOIN
clauses. No difference whatsoever. (The same is not true for some other types of joins that cannot be rearranged freely.)
The important point is that these different join possibilities give
semantically equivalent results but might have hugely different
execution costs. Therefore, the planner will explore all of them to
try to find the most efficient query plan.
Related:
- Sample Query to show Cardinality estimation error in PostgreSQL
- A: Slow fulltext search due to wildly inaccurate row estimates
Finally, WHERE id IN (<subquery>)
is not generally equivalent to a join. It does not multiply rows on the left side for duplicate matching values on the right side. And columns of the subquery are not visible for the rest of the query. A join can multiply rows with duplicate values and columns are visible.
Your simple subqueries dig up a single unique column in both cases, so there is no effective difference in this case - except that IN (<subquery>)
is generally (at least a bit) slower and more verbose. Use joins.
Your query
Indexes
product_categories
has 34 rows. Unless you plan on adding many more, indexes do no help performance for this table. A sequential scan will always be faster. Drop .index_product_categories_on_organization_id
customers
has 6,970 rows. Indexes start to make sense. But your query uses 4,988 of them according to the EXPLAIN
output. Only an index-only scan on an index much less wide than the table could help a bit. Assuming WHERE active AND visible
are constant predicates, I suggest a partial multicolumn index:
CREATE INDEX index_customers_on_organization_id ON customers (organization_id, id)
WHERE active AND visible;
I appended id
to allow index-only scans. The column is otherwise useless in the index for this query.
total_sales
has 7,104,441 rows. Indexes are very important. I suggest:
CREATE INDEX index_total_sales_on_product_category_customer_id
ON total_sales (period_id, product_category_id, customer_id, id)
Again, aiming for an index-only scan. This is the most important one.
You can delete the completely redundant index .index_total_sales_on_product_category_id
performance_analyses
has 1,012,346 rows. Indexes are very important.
I would suggest another partial index with the condition size > 0
:
CREATE INDEX index_performance_analyses_on_status_id
ON performance_analyses (total_sales_id)
WHERE pa.size > 0;
However:
Rows Removed by Filter: 0"
Seems like this conditions serves no purpose? Are there any rows with size > 0
is not true?
After creating these indexes you need to ANALYZE
the tables.
Tables statistics
Generally, I see many bad estimates. Postgres underestimates the number of rows returned at almost every step. The nested loops we see would work much better for fewer rows. Unless this is an unlikely coincidence, your table statistics are badly outdated. You need to visit your settings for autovacuum and probably also per-table settings for your two big tables performance_analyses
and total_sales
.
You already did run VACUUM
and ANALYZE
, which made the query slower, according to your comment. That doesn't make a lot of sense. I would run VACUUM FULL
on these two tables once (if you can afford an exclusive lock). Else try pg_repack
.
With all the fishy statistics and bad plans I would consider running a complete vacuumdb -fz yourdb
on your DB. That rewrites all tables and indexes in pristine conditions, but it's no good to use on a regular basis. It's also expensive and will lock your DBs for an extended period of time!
While being at it, have a look at the cost settings of your DB as well.
Related:
- Keep PostgreSQL from sometimes choosing a bad query plan
- Postgres Slow Queries - Autovacuum frequency
Very slow queries in a large database even with indexes
SQLite might be choosing the wrong index. I suspect that level
has only a small number of values. That is usually not a good idea for an index. Try making that a composite index:
CREATE INDEX users_level_index_login on users(level, login);
Is count(*) constant time in SQLite, and if not what are alternatives?
SQLite has a special optimization for COUNT(*)
without a WHERE
clause, where it goes through the table's B-tree pages and counts entries without actually loading the records.
However, this still requires that all the table's data (except overflow pages for large records) is visited, so the runtime is still O(n).
SQLite does not store a separate record count in the database because that would make all changes slower.
Slow Sqlite Select behaviour when trying to select several rows
I think that I've found a solution! I've created an additional index:
create index if not exists ab_b_idx on t1(a, b);
After that selecting both a and b columns runs as fast as selecting only a column.
Explanation of this behaviour I've found in documentation
Covering indices:
If, however, all columns that were to be fetched from the table are already available in the index itself, SQLite will use the values contained in the index and will never look up the original table row. This saves one binary search for each row and can make many queries run twice as fast.
Update: Search on 100 000 000 rows in table t1 took 550ms.
Related Topics
How to Upsert Multiple Rows with Individual Values in One Statement
Save Output from SQL Function to CSV File (Copy) with Dynamic Filename
How to Write This SQL Query in Mongodb Syntax
Prompt for Parameters with SQL Management Studio
Concatenate a Selected Column in a Single Query
Efficient Implementation of Faceted Search in Relational Databases
Identify a 3-Column Pk Duplicate in Vba Access
How to Remove Repeated Column Values from Report
Oracle Insert via Select from Multiple Tables Where One Table May Not Have a Row
Pivot a Table on a Value But Group the Data on One Line by Another
Finding All Children in a Hierarchy SQL
How to Select All Values and Hide Null Values in SQL
How to Query on Table Returned by Stored Procedure Within a Procedure
What Is the Easiest Way to Update an Image Field with the Content of a File
Inserting Multiple Rows into Oracle
Retrieving I18N Data with Fallback Language
SQL Queries on String Columns - Sorting According to Language