Optimize Between Date Statement

Optimize BETWEEN date statement

The query executes in less than one second. The other 6+ seconds are spent on traffic between server and client.

Optimize Oracle Between Date Statement

Results

If your index is cached, a function-based index performs best. If your index is not cached, a compressed function-based index performs best.

Below are the relative times generated by my test code. Lower is better. You cannot compare the numbers between cached and non-cached, they are totally different tests.

                 In cache      Not in cache
Regular          120           139
FBI              100           138
Compressed FBI   126           100

I'm not sure why the FBI performs better than the regular index. (Although it's probably related to what you said about equality predicates versus range. You can see that the regular index has an extra "FILTER" step in its explain plan.) The compressed FBI has some additional overhead to uncompress the blocks. This small amount of extra CPU time is relevant when everything is already in memory, and CPU waits are most important. But when nothing is cached, and IO is more important, the reduced space of the compressed FBI helps a lot.

Assumptions

There seems to be a lot of confusion about this question. The way I read it, you only care about this one specific query, and you want to know whether a function-based index or a regular index will be faster.

I assume you do not care about other queries that may benefit from this index, additional time spent to maintain the index, if the developers remember to use it, or whether or not the optimizer chooses the index. (If the optimizer doesn't choose the index, which I think is unlikely, you can add a hint.) Let me know if any of these assumptions are wrong.

Code

--Create tables. 1 = regular, 2 = FBI, 3 = Compressed FBI
create table my_table1(evt_end date, type number) nologging;
create table my_table2(evt_end date, type number) nologging;
create table my_table3(evt_end date, type number) nologging;

--Create 1K days, each with 100K values
begin
    for i in 1 .. 1000 loop
        insert /*+ append */ into my_table1
        select sysdate + i - 500 + (level * interval '1' second), 1
        from dual connect by level <= 100000;

        commit;
    end loop;
end;
/
insert /*+ append */ into my_table2 select * from my_table1;
insert /*+ append */ into my_table3 select * from my_table1;

--Create indexes
create index my_table1_idx on my_table1(evt_end);
create index my_table2_idx on my_table2(trunc(evt_end));
create index my_table3_idx on my_table3(trunc(evt_end)) compress;

--Gather statistics
begin
    dbms_stats.gather_table_stats(user, 'MY_TABLE1');
    dbms_stats.gather_table_stats(user, 'MY_TABLE2');
    dbms_stats.gather_table_stats(user, 'MY_TABLE3');
end;
/

--Get the segment size.
--This shows the main advantage of a compressed FBI, the lower space.
select segment_name, bytes/1024/1024/1024 GB
from dba_segments
where segment_name like 'MY_TABLE__IDX'
order by segment_name;

SEGMENT_NAME     GB
MY_TABLE1_IDX    2.0595703125
MY_TABLE2_IDX    2.0478515625
MY_TABLE3_IDX    1.1923828125

--Test block.
--Uncomment different lines to generate 6 different test cases.
--Regular, Function-based, and Function-based compressed.  Both cached and not-cached.
declare
    v_count number;
    v_start_time number;
    v_total_time number := 0;
begin
    --Uncomment two lines to test the server when it's "cold", and nothing is cached.
    for i in 1 .. 10 loop
        execute immediate 'alter system flush buffer_cache';
    --Uncomment one line to test the server when it's "hot", and everything is cached.
    --for i in 1 .. 1000 loop

        v_start_time := dbms_utility.get_time;

        SELECT COUNT(*)
        INTO   V_COUNT
        --#1: Regular
        FROM   MY_TABLE1 T 
        WHERE  T.EVT_END BETWEEN TRUNC(SYSDATE) AND TRUNC(SYSDATE) + 86399/86400;
        --#2: Function-based
        --FROM   MY_TABLE2 T 
        --WHERE  TRUNC(T.EVT_END) = TRUNC(SYSDATE);
        --#3: Compressed function-based
        --FROM   MY_TABLE3 T 
        --WHERE  TRUNC(T.EVT_END) = TRUNC(SYSDATE);

        v_total_time := v_total_time + (dbms_utility.get_time - v_start_time);
    end loop;

    dbms_output.put_line('Seconds: '||v_total_time/100);
end;
/

Test Methodology

I ran each block at least 5 times, alternated between run types (in case something was running on my machine only part of the time), threw out the high and the low run times, and averaged them. The code above does not include all that logic, since it would take up 90% of this answer.

Other Things to Consider

There are still many other things to consider. My code assumes the data is inserted in a very index-friendly order. Things will be totally different if this is not true, as compression may not help at all.

Probably the best solution to this problem is to avoid it completely with partitioning. For reading the same amount of data, a full table scan is much faster than an index read because it uses multi-block IO. But there are some downsides to partitioning, like the large amount of money
required to buy the option, and extra maintenance tasks. For example, creating partitions ahead of time, or using interval partitioning (which has some other weird issues), gathering stats, deferred segment creation, etc.

Ultimately, you will need to test this yourself. But remember that testing even such a simple choice is difficult. You need realistic data, realistic tests, and a realistic environment. Realistic data is much harder than it sounds. With indexes, you cannot simply copy the data and build the indexes at once. create table my_table1 as select * from and create index ... will create a different index than if you create the table and perform a bunch of inserts and deletes in a specific order.

Optimize the query of weekday statistics between two dates

First of all, optimizing your function:
1.adding pragma udf (for faster execution in sql
2. Adding deterministic clause(for caching)
3. Replacing count(0) to count(*) (to allow cbo optimize count)
4. Replacing return number to int

create or replace function get_workday_count (start_date in date, end_date in date)
return int deterministic is
    pragma udf;
   day_count int;
begin
    select count(*) into day_count from WORKDAYS w
    where w.ID >= TRUNC(start_date)
    and w.ID <= TRUNC(end_date)
    and status in (1, 3, 5);
    return day_count;
end;

Then you don't need to call your function in case of (end_date - start_date) < required number of days. Moreover, ideally it would be to use scalar subquery instead of function:

select count(*) 
from ERP_SJ 
where 
case 
   when trunc(end_date) - trunc(start_date) > 5 
      then GET_WORKDAY_COUNT(trunc(start_date) , trunc(end_date)) 
   else 0
 end > 5

Or using subquery:

select count(*) 
from ERP_SJ e
where 
case 
   when trunc(end_date) - trunc(start_date) > 5 
      then (select count(*) from WORKDAYS w
    where w.ID >= TRUNC(e.start_date)
    and w.ID <= TRUNC(e.end_date)
    and w.status in (1, 3, 5)) 
   else 0
 end > 5

optimize query with date type field in mysql

I would suggest writing the query as:

select sum(i.amount) as total
from incomes i
where i.date >= '2019-07-01' and
      i.date < '2019-08-01' and
      i.deleted_at is null;

This query can take advantage of an index on incomes(deleted_at, date, amount):

create index idx_incomes_deleted_at_date_amount on incomes(deleted_at, date, amount)

How to Performance tune a query that has Between statement for range of dates

First, I'd start by looking at why the DISTINCT is there. In my experience many developers tack on the DISTINCT because they know that they need unique results, but don't actually understand why they aren't already getting them.

Second, a clustered index on the column would be ideal for this specific query because it puts all of the rows right next to each other on disk and the server can just grab them all at once. The problem is, that might not be possible because you already have a clustered index that's good for other uses. In that case, try a non-clustered index on the date column and see what that does.

Keep in mind that indexing has wide-ranging effects, so using a single query to determine indexing isn't a good idea.

How to optimize this JOIN query that relies on date value?

Create two indexes:

CREATE INDEX dim_date_value ON dim_date( date_value );
CREATE INDEX tran_user_id ON transactions( `user_id` );

The latter is a must, the former is not as important as the latter, but also can help.

Optimize Between Date Statement