Is there any performance gain in indexing a boolean field?
Not really. You should think about it like a book. If there were only 3 kinds of words in a book and you index all of them, you would have the same number of index pages as normal pages.
There would be a performance gain if there are relatively few records of one value. For example, if you have 1000 records and 10 of them are TRUE, then it would be useful if you searching with isok = 1
As Michael Durrant mentioned, it also makes writes slower.
EDIT: Possible duplication: Indexing boolean fields
Here it explains that even if you have an index, if you have too many records it doesn't use the index anyways.
MySQL not using index when checking = 1 , but using it with = 0
Indexing boolean fields
No.
You index fields that are searched upon and have high selectivity/cardinality. A boolean field's cardinality is obliterated in nearly any table. If anything it will make your writes slower (by an oh so tiny amount).
Maybe you would make it the first field in the clustered index if every query took into account soft deletes?
How necessary is it to INDEX boolean fields
There's not enough information in the question to give a definitive answer.
The most general answer: there's no benefit to be gained by adding an index on just that one boolean column.
That's because with only two values, if those values are distributed evenly, the index isn't going to improve performance. The index would only be useful if you are always looking for rows that have the value that occurs most infrequently, like less than 5% or 10% of the rows. In that special case, it might improve performance.
I strongly suspect that's not the only predicate in your query. Likely, there are other predicates, on other columns. And including that boolean column as an extra column in another index that will be used by your query, that might actually be of some benefit.
There's some overhead to creating and maintaining an index (it isn't free), so that has to be weighed against the benefit of having an index. Adding an extra column to an existing index would be lot less expensive.
The most appropriate indexes are going to depend on the tables you are querying, and the actual SQL statements you are executing.
Is it safe to index boolean fields in Google Datastore/Firestore?
There is nothing specified in the documentation regarding the use of boolean values. The only problem that can occur is when you are using those monotonically increasing fields. Why? Because that approach doesn't scale and might create hotspots at some point in time. So regarding boolean values, yes, it's safe to create indexes. There is nothing you should worry about.
Is there a need for a boolean field index in elasticsearch
If you want to search against a field you have to index it. By default a boolean
field is indexed, and will take a small amount of space to do so. There will be a list of docs where "myfield": true
and "myfield": false
.
If you didn't want to maintain this index, then when you wanted to find docs where "myfield": true
you would have to through every doc to check the field.
If you don't want to search/filter with that field, by all means set "index": "no"
, just be warned you will need to re-index everything if you change your mind about this field in the future!
Have a look at the elasticsearch docs on mappings; the core types section, scroll down to the boolean type.
Adding an index on a boolean field
No, you can index a boolean field if you'll be filtering by it. That's a perfectly reasonable thing to do, although as with all indexes, PostgreSQL may choose to ignore it if it won't exclude enough of the table -- an index scan plus a ton of row fetches may be more expensive than a sequential scan -- which may or may not affect you depending on the values in that column.
You should also be aware that PostgreSQL lets you put conditions on indexes, which I often find useful with boolean fields. (See Partial Indexes for details.) If you'll be commonly filtering or sorting within that scope, you might be best served by something like CREATE INDEX ... ON table (some_field) WHERE boolean_field
.
Performance of query on indexed Boolean column vs Datetime column
Here is a MariaDB (10.0.19) benchmark with 10M rows (using the sequence plugin):
drop table if exists test;
CREATE TABLE `test` (
`id` MEDIUMINT UNSIGNED NOT NULL,
`is_active` TINYINT UNSIGNED NOT NULL,
`deleted_at` TIMESTAMP NULL,
PRIMARY KEY (`id`),
INDEX `is_active` (`is_active`),
INDEX `deleted_at` (`deleted_at`)
) ENGINE=InnoDB
select seq id
, rand(1)<0.5 as is_active
, case when rand(1)<0.5
then null
else '2017-03-18' - interval floor(rand(2)*1000000) second
end as deleted_at
from seq_1_to_10000000;
To measure the time I use set profiling=1
and run show profile
after executing a query. From the profiling result I take the value of Sending data
since everything else is altogether less than one msec.
TINYINT index:
SELECT COUNT(*) FROM test WHERE is_active = 1;
Runtime: ~ 738 msec
TIMESTAMP index:
SELECT COUNT(*) FROM test WHERE deleted_at is null;
Runtime: ~ 748 msec
Index size:
select database_name, table_name, index_name, stat_value*@@innodb_page_size
from mysql.innodb_index_stats
where database_name = 'tmp'
and table_name = 'test'
and stat_name = 'size'
Result:
database_name | table_name | index_name | stat_value*@@innodb_page_size
-----------------------------------------------------------------------
tmp | test | PRIMARY | 275513344
tmp | test | deleted_at | 170639360
tmp | test | is_active | 97107968
Note that while TIMESTAMP (4 Bytes) is 4 times as long as TYNYINT (1 Byte), the index size is not even twice as large. But the index size can be significant if it doesn't fit into memory. So when i change innodb_buffer_pool_size
from 1G
to 50M
i get the following numbers:
- TINYINT: ~ 960 msec
- TIMESTAMP: ~ 1500 msec
Update
To address the question more directly I did some changes to the data:
- Instead of TIMESTAMP I use DATETIME
- Since entries are usually rarely deleted I use
rand(1)<0.99
(1% deleted) instead ofrand(1)<0.5
(50% deleted) - Table size changed from 10M to 1M rows.
SELECT COUNT(*)
changed toSELECT *
Index size:
index_name | stat_value*@@innodb_page_size
------------------------------------------
PRIMARY | 25739264
deleted_at | 12075008
is_active | 11026432
Since 99% of deleted_at
values are NULL there is no significant difference in index size, though a non empty DATETIME requires 8 Bytes (MariaDB).
SELECT * FROM test WHERE is_active = 1; -- 782 msec
SELECT * FROM test WHERE deleted_at is null; -- 829 msec
Dropping both indexes both queries execute in about 350 msec. And dropping the is_active
column the deleted_at is null
query executes in 280 msec.
Note that this is still not a realistic scenario. You will unlikely want to select 990K rows out of 1M and deliver it to the user. You will probably also have more columns (maybe including text) in the table. But it shows, that you probably don't need the is_active
column (if it doesn't add additional information), and that any index is in best case useless for selecting non deleted entries.
However an index can be usefull to select deleted rows:
SELECT * FROM test WHERE is_active = 0;
Executes in 10 msec with index and in 170 msec without index.
SELECT * FROM test WHERE deleted_at is not null;
Executes in 11 msec with index and in 167 msec without index.
Dropping the is_active
column it executes in 4 msec with index and in 150 msec without index.
So if this scenario somehow fits your data the conclusion would be: Drop the is_active
column and don't create an index on deleted_at
column if you are rarely selecting deleted entries. Or adjust the benchmark to your needs and make your own conclusion.
Azure Cognitive Search with boolean field
Boolean fields are not full-text searchable. You can use filters to match on them:
GET /indexes('myindex')/docs?$filter=MyBooleanField eq true&api-version=2020-06-30
Using the .NET SDK, you can set the Filter property on SearchOptions
to an OData filter expression, in this case "MyBooleanField eq true"
.
Related Topics
Oracle - Why Does the Leading Zero of a Number Disappear When Converting It To_Char
How to Compare Data Between Two Table in Different Databases Using SQL Server 2008
Are There Any Free Tools to Generate 'Insert Into' Scripts in Ms SQL Server
How to Pull a List of Id's from a SQL Table as a Comma-Separated Values String
How to Store a String Var Greater Than Varchar(Max)
Pl/SQL Performance Tuning for Like '%...%' Wildcard Queries
Difference Between N'String' VS U'String' Literals in Oracle
Hibernate SQL Transformation Fails for Enum Field Type
Gather_Plan_Statistics Does Does Not Generate Basic Plan Statistics
Sane/Fast Method to Pass Variable Parameter Lists to SQLserver2008 Stored Procedure
Finding Simultaneous Events in a Database Between Times
Find Which Rows Have Different Values for a Given Column in Teradata SQL
How to Schedule a Stored Procedure
How to Write Union in Doctrine 2.0