Indexing Boolean Fields

Is there any performance gain in indexing a boolean field?

Not really. You should think about it like a book. If there were only 3 kinds of words in a book and you index all of them, you would have the same number of index pages as normal pages.

There would be a performance gain if there are relatively few records of one value. For example, if you have 1000 records and 10 of them are TRUE, then it would be useful if you searching with isok = 1

As Michael Durrant mentioned, it also makes writes slower.

EDIT: Possible duplication: Indexing boolean fields

Here it explains that even if you have an index, if you have too many records it doesn't use the index anyways.
MySQL not using index when checking = 1 , but using it with = 0

Indexing boolean fields

No.

You index fields that are searched upon and have high selectivity/cardinality. A boolean field's cardinality is obliterated in nearly any table. If anything it will make your writes slower (by an oh so tiny amount).

Maybe you would make it the first field in the clustered index if every query took into account soft deletes?

How necessary is it to INDEX boolean fields

There's not enough information in the question to give a definitive answer.

The most general answer: there's no benefit to be gained by adding an index on just that one boolean column.

That's because with only two values, if those values are distributed evenly, the index isn't going to improve performance. The index would only be useful if you are always looking for rows that have the value that occurs most infrequently, like less than 5% or 10% of the rows. In that special case, it might improve performance.

I strongly suspect that's not the only predicate in your query. Likely, there are other predicates, on other columns. And including that boolean column as an extra column in another index that will be used by your query, that might actually be of some benefit.

There's some overhead to creating and maintaining an index (it isn't free), so that has to be weighed against the benefit of having an index. Adding an extra column to an existing index would be lot less expensive.


The most appropriate indexes are going to depend on the tables you are querying, and the actual SQL statements you are executing.

Is it safe to index boolean fields in Google Datastore/Firestore?

There is nothing specified in the documentation regarding the use of boolean values. The only problem that can occur is when you are using those monotonically increasing fields. Why? Because that approach doesn't scale and might create hotspots at some point in time. So regarding boolean values, yes, it's safe to create indexes. There is nothing you should worry about.

Is there a need for a boolean field index in elasticsearch

If you want to search against a field you have to index it. By default a boolean field is indexed, and will take a small amount of space to do so. There will be a list of docs where "myfield": true and "myfield": false.

If you didn't want to maintain this index, then when you wanted to find docs where "myfield": true you would have to through every doc to check the field.

If you don't want to search/filter with that field, by all means set "index": "no", just be warned you will need to re-index everything if you change your mind about this field in the future!

Have a look at the elasticsearch docs on mappings; the core types section, scroll down to the boolean type.

Adding an index on a boolean field

No, you can index a boolean field if you'll be filtering by it. That's a perfectly reasonable thing to do, although as with all indexes, PostgreSQL may choose to ignore it if it won't exclude enough of the table -- an index scan plus a ton of row fetches may be more expensive than a sequential scan -- which may or may not affect you depending on the values in that column.

You should also be aware that PostgreSQL lets you put conditions on indexes, which I often find useful with boolean fields. (See Partial Indexes for details.) If you'll be commonly filtering or sorting within that scope, you might be best served by something like CREATE INDEX ... ON table (some_field) WHERE boolean_field.

Performance of query on indexed Boolean column vs Datetime column

Here is a MariaDB (10.0.19) benchmark with 10M rows (using the sequence plugin):

drop table if exists test;
CREATE TABLE `test` (
`id` MEDIUMINT UNSIGNED NOT NULL,
`is_active` TINYINT UNSIGNED NOT NULL,
`deleted_at` TIMESTAMP NULL,
PRIMARY KEY (`id`),
INDEX `is_active` (`is_active`),
INDEX `deleted_at` (`deleted_at`)
) ENGINE=InnoDB
select seq id
, rand(1)<0.5 as is_active
, case when rand(1)<0.5
then null
else '2017-03-18' - interval floor(rand(2)*1000000) second
end as deleted_at
from seq_1_to_10000000;

To measure the time I use set profiling=1 and run show profile after executing a query. From the profiling result I take the value of Sending data since everything else is altogether less than one msec.

TINYINT index:

SELECT COUNT(*) FROM test WHERE is_active = 1;

Runtime: ~ 738 msec

TIMESTAMP index:

SELECT COUNT(*) FROM test WHERE  deleted_at is null;

Runtime: ~ 748 msec

Index size:

select database_name, table_name, index_name, stat_value*@@innodb_page_size
from mysql.innodb_index_stats
where database_name = 'tmp'
and table_name = 'test'
and stat_name = 'size'

Result:

database_name | table_name | index_name | stat_value*@@innodb_page_size
-----------------------------------------------------------------------
tmp | test | PRIMARY | 275513344
tmp | test | deleted_at | 170639360
tmp | test | is_active | 97107968

Note that while TIMESTAMP (4 Bytes) is 4 times as long as TYNYINT (1 Byte), the index size is not even twice as large. But the index size can be significant if it doesn't fit into memory. So when i change innodb_buffer_pool_size from 1G to 50M i get the following numbers:

  • TINYINT: ~ 960 msec
  • TIMESTAMP: ~ 1500 msec

Update

To address the question more directly I did some changes to the data:

  • Instead of TIMESTAMP I use DATETIME
  • Since entries are usually rarely deleted I use rand(1)<0.99 (1% deleted) instead of rand(1)<0.5 (50% deleted)
  • Table size changed from 10M to 1M rows.
  • SELECT COUNT(*) changed to SELECT *

Index size:

index_name | stat_value*@@innodb_page_size
------------------------------------------
PRIMARY | 25739264
deleted_at | 12075008
is_active | 11026432

Since 99% of deleted_at values are NULL there is no significant difference in index size, though a non empty DATETIME requires 8 Bytes (MariaDB).

SELECT * FROM test WHERE is_active = 1;      -- 782 msec
SELECT * FROM test WHERE deleted_at is null; -- 829 msec

Dropping both indexes both queries execute in about 350 msec. And dropping the is_active column the deleted_at is null query executes in 280 msec.

Note that this is still not a realistic scenario. You will unlikely want to select 990K rows out of 1M and deliver it to the user. You will probably also have more columns (maybe including text) in the table. But it shows, that you probably don't need the is_active column (if it doesn't add additional information), and that any index is in best case useless for selecting non deleted entries.

However an index can be usefull to select deleted rows:

SELECT * FROM test WHERE is_active = 0;

Executes in 10 msec with index and in 170 msec without index.

SELECT * FROM test WHERE deleted_at is not null;

Executes in 11 msec with index and in 167 msec without index.

Dropping the is_active column it executes in 4 msec with index and in 150 msec without index.

So if this scenario somehow fits your data the conclusion would be: Drop the is_active column and don't create an index on deleted_at column if you are rarely selecting deleted entries. Or adjust the benchmark to your needs and make your own conclusion.

Azure Cognitive Search with boolean field

Boolean fields are not full-text searchable. You can use filters to match on them:

GET /indexes('myindex')/docs?$filter=MyBooleanField eq true&api-version=2020-06-30

Using the .NET SDK, you can set the Filter property on SearchOptions to an OData filter expression, in this case "MyBooleanField eq true".



Related Topics



Leave a reply



Submit