Design Option for 'Recurring Tasks'

Design option for 'recurring tasks'

I would create a task table to insert my tasks into.

taskTable
|taskID  |Freq   |runAt       |
-------------------------------
|1       |daily  |1           |
|2       |daily  |0           |
|3       |weekly |5           |
|4       |weekly |1           |
|5       |monthly|15          |
|6       |monthly|1           |
|7       |once   |2013-7-4    |

runAt for dailies is not ever considered so it doesn't matter what value is entered.

runAt for weekly items is the day of the week that the task is to run.

runAt for mothly is the day of the month that the task is to run (month end tasks I usually run on the first since is saves the hassle of dealing with which day the month ends on although you could use this to figure that out

lastDayOfMonth = datePart(d,dateadd(s,-1,dateadd(mm, datediff(m,0,getdate())+1,0)))

runAt for once is the actual day the task is to run.

Then I'd create a task to run daily to see what needed to be run.

select  taskID
from    taskTable
where   (freq = 'once' and runAt = convert(varchar(10),getDate(),21))
  or    freq = 'daily'
  or    (freq = 'weekly' and runAt = datePart(dw,getDate()))
  or    (freq = 'monthly' and runAt = datePart(d,getDate())

This query gives me all the taskID for any tasks that I need to run.

Not sure if this is what you were looking for but hopefully you'll find something useful in it.

Best way to develop/manage/design recurring tasks/calendar

Store it all in the database.

You want to have a "Task Template" table and a "Task" table where there is a one->many relationship.

When the user indicates they want a task to reoccur, create a "Task Template" record and then create as many "Tasks" as the user has indicated (don't allow a user to create tasks too far into the future). Each Task is linked to the Task Template via a Foreign Key. The idea is that SQL is going to be more efficient at managing these records than trying to do this all in code based on one template. This way, you will have more option when your sorting and filtering your data. After all, writing a SQL query is easier than writing, testing, and maintaining a PHP function that manipulates the data.

Some other tips I would give you is:

Try to get a lot of information in your "Task Template" record. Keep the number of tasks the Template covers, the date the last task ends, the time elapsed between the first task and the last, etc.. This "Meta Data" can help save you query time when you're looking to sort and filter tasks.
Put an index on the Date and FK field, this will help query time as well.
I just built two calendar apps at work that were pretty well received by the bosses. I used the "FullCalendar" JQuery plugin (http://arshaw.com/fullcalendar/). I used JQuery AJAX to handle most of my events, and it had built in support for Month, Day, and Week view.

Database design for recurring events with exceptions

Use iCalendar RRules and ExDates

If it's a recurring event, just store the start/end datetimes and RRules and ExDates for the event.

Use a Materialized View to pre-calculate upcoming actual events, say for the next 30 days or 365 days.

As you are using Postgres, you can use existing python, perl, or javascript RRule libraries (such as dateutil) inside pg function for calculating future events based on the rrules and exdates

UPDATE: check out pg_rrule extension: https://github.com/petropavel13/pg_rrule

Calendar Recurring/Repeating Events - Best Storage Method

Storing "Simple" Repeating Patterns

For my PHP/MySQL based calendar, I wanted to store repeating/recurring event information as efficiently as possibly. I didn't want to have a large number of rows, and I wanted to easily lookup all events that would take place on a specific date.

The method below is great at storing repeating information that occurs at regular intervals, such as every day, every n days, every week, every month every year, etc etc. This includes every Tuesday and Thursday type patterns as well, because they are stored separately as every week starting on a Tuesday and every week starting on a Thursday.

Assuming I have two tables, one called events like this:

ID    NAME
1     Sample Event
2     Another Event

And a table called events_meta like this:

ID    event_id      meta_key           meta_value
1     1             repeat_start       1299132000
2     1             repeat_interval_1  432000

With repeat_start being a date with no time as a unix timestamp, and repeat_interval an amount in seconds between intervals (432000 is 5 days).

repeat_interval_1 goes with repeat_start of the ID 1. So if I have an event that repeats every Tuesday and every Thursday, the repeat_interval would be 604800 (7 days), and there would be 2 repeat_starts and 2 repeat_intervals. The table would look like this:

ID    event_id      meta_key           meta_value
1     1             repeat_start       1298959200 -- This is for the Tuesday repeat
2     1             repeat_interval_1  604800
3     1             repeat_start       1299132000 -- This is for the Thursday repeat
4     1             repeat_interval_3  604800
5     2             repeat_start       1299132000
6     2             repeat_interval_5  1          -- Using 1 as a value gives us an event that only happens once

Then, if you have a calendar that loops through every day, grabbing the events for the day it's at, the query would look like this:

SELECT EV.*
FROM `events` EV
RIGHT JOIN `events_meta` EM1 ON EM1.`event_id` = EV.`id`
RIGHT JOIN `events_meta` EM2 ON EM2.`meta_key` = CONCAT( 'repeat_interval_', EM1.`id` )
WHERE EM1.meta_key = 'repeat_start'
    AND (
        ( CASE ( 1299132000 - EM1.`meta_value` )
            WHEN 0
              THEN 1
            ELSE ( 1299132000 - EM1.`meta_value` )
          END
        ) / EM2.`meta_value`
    ) = 1
LIMIT 0 , 30

Replacing {current_timestamp} with the unix timestamp for the current date (Minus the time, so the hour, minute and second values would be set to 0).

Hopefully this will help somebody else too!

Storing "Complex" Repeating Patterns

This method is better suited for storing complex patterns such as

Event A repeats every month on the 3rd of the month starting on March 3, 2011

Event A repeats Friday of the 2nd week of the month starting on March 11, 2011

I'd recommend combining this with the above system for the most flexibility. The tables for this should like like:

ID    NAME
1     Sample Event
2     Another Event

And a table called events_meta like this:

ID    event_id      meta_key           meta_value
1     1             repeat_start       1299132000 -- March 3rd, 2011
2     1             repeat_year_1      *
3     1             repeat_month_1     *
4     1             repeat_week_im_1   2
5     1             repeat_weekday_1   6

repeat_week_im represents the week of the current month, which could be between 1 and 5 potentially. repeat_weekday in the day of the week, 1-7.

Now assuming you are looping through the days/weeks to create a month view in your calendar, you could compose a query like this:

SELECT EV . *
FROM `events` AS EV
JOIN `events_meta` EM1 ON EM1.event_id = EV.id
AND EM1.meta_key = 'repeat_start'
LEFT JOIN `events_meta` EM2 ON EM2.meta_key = CONCAT( 'repeat_year_', EM1.id )
LEFT JOIN `events_meta` EM3 ON EM3.meta_key = CONCAT( 'repeat_month_', EM1.id )
LEFT JOIN `events_meta` EM4 ON EM4.meta_key = CONCAT( 'repeat_week_im_', EM1.id )
LEFT JOIN `events_meta` EM5 ON EM5.meta_key = CONCAT( 'repeat_weekday_', EM1.id )
WHERE (
  EM2.meta_value =2011
  OR EM2.meta_value = '*'
)
AND (
  EM3.meta_value =4
  OR EM3.meta_value = '*'
)
AND (
  EM4.meta_value =2
  OR EM4.meta_value = '*'
)
AND (
  EM5.meta_value =6
  OR EM5.meta_value = '*'
)
AND EM1.meta_value >= {current_timestamp}
LIMIT 0 , 30

This combined with the above method could be combined to cover most repeating/recurring event patterns. If I've missed anything please leave a comment.

cron-like recurring task scheduler design

There's 2 designs, basically.

One runs regularly and compares the current time to the scheduling spec (i.e. "Does this run now?"), and executes those that qualify.

The other technique takes the current scheduling spec and finds the NEXT time that the item should fire. Then, it compares the current time to all of those items who's "next time" is less than "current time", and fires those. Then, when an item is complete, it is rescheduled for the new "next time".

The first technique can not handle "missed" items, the second technique can only handle those items that were previously scheduled.

Specifically consider you you have a schedule that runs once every hour, at the top of the hour.

So, say, 1pm, 2pm, 3pm, 4pm.

At 1:30pm, the run task is down and not executing any processes. It does not start again until 3:20pm.

Using the first technique, the scheduler will have fired the 1pm task, but not fired the 2pm, and 3pm tasks, as it was not running when those times passed. The next job to run will be the 4pm job, at, well, 4pm.

Using the second technique, the scheduler will have fired the 1pm task, and scheduled the next task at 2pm. Since the system was down, the 2pm task did not run, nor did the 3pm task. But when the system restarted at 3:20, it saw that it "missed" the 2pm task, and fired it off at 3:20, and then scheduled it again for 4pm.

Each technique has it's ups and downs. With the first technique, you miss jobs. With the second technique you can still miss jobs, but it can "catch up" (to a point), but it may also run a job "at the wrong time" (maybe it's supposed to run at the top of the hour for a reason).

A benefit of the second technique is that if you reschedule at the END of the executing job, you don't have to worry about a cascading job problem.

Consider that you have a job that runs every minute. With the first technique, the job gets fired each minute. However, typically, if the job is not FINISHED within it's minute, then you can potentially have 2 jobs running (one late in the process, the other starting up). This can be a problem if the job is not designed to run more than once simultaneously. And it can exacerbate (if there's a real problem, after 10 minutes you have 10 jobs all fighting each other).

With the second technique, if you schedule at the end of the job, then if a job happens to run just over a minute, then you'll "skip" a minute" and start up the following minute rather than run on top of itself. So, you can have a job scheduled for every minute actually run at 1:01pm, 1:03pm, 1:05pm, etc.

Depending on your job design, either of these can be "good" or "bad". There's no right answer here.

Finally, implementing the first technique is really, quite trivial compared to implementing the second. The code to determine if a cron string (say) matches a given time is simple compared to deriving what time a cron string will be valid NEXT. I know, and I have a couple hundred lines of code to prove it. It's not pretty.

Recurring tasks using SSIS

To confirm what you already have: You have a TASK table with tasks in it and some sort of date flag for when it needs to run. Then your job runs every x number of minutes and checks if there are any tasks to be picked up/run. If so it runs them. Then i am guessing you have a flag in the table you set saying it is completed after you run it in your loop? If you have multiple tasks to run at a time it just runs them all one at a time?

Then what you want to do is add this to it: something I did something very similar. I created a lookup table for each task (in my process each task had a name and I used that name to reference the lookup table, I called it schedule lookup).

In that table I put when and how often the report would need to run (so 2 columns) time of day and frequency. One example is 7:00 and weekdays. So for this report it runs every weekday only (m-F) and at 7pm.

Then when my job runs it would run the task like you have above, but then have another step, that would flag that task as complete, but then insert a new task in the task table (the task details would be the same) but I would look at my schedule lookup table described above to figure out the next date/time when the job should run again and use that as my next run date/time in my task table.

Below is the SP I use in my process to get the next schedule day/time and to update the existing one to completed, and then to create the new one.

NOTE: My scheduling has some advanced options you may not need, I have LOTS of comments that should explain what/why I am doing everything. I am calling a few functions I created in places but I dont think you would need those and can figure out what to do instead of my functions, but if you have questions let me know:

This is what I used so it is using my table structures/etc, but you could convert it to yours easily enough.

--Purpose
----------------------------------------------------------------------------------
-- calculates the next time to run/schedule the job
----------------------------------------------------------------------------------
    -- NOTE:  TO USE you have to insert the first value manually in queue table

    ----- possibile scenerios
    -- if we want to schedule every x hours, or x days
    -- run every month only
    -- run weekdays only
    -- run on certain days of month only

    -- TO ADD MORE COMPLEX or different types of schedules:
    --  special - different times for different days of week
    --          ex - so have dayofweek:2:00,dayofweek:3:00  (and we parse out the day of week and number splitting out the strings)

    --  hourly - to do more then once a day??
    --          WHEN @ScheduleLookupType = 'hourly' THEN DATEADD(DAY, 1, @CurrentScheduleDate) -- FIX FIX FIX

--  EXEC dbo.JobsDynamicRescheduleFindNextTimeToScheduleJob @ReportName = 'TestReport1'

----------------------------------------------------------------------------------

ALTER PROCEDURE [dbo].[JobsDynamicRescheduleFindNextTimeToScheduleJob]
    @ReportName VARCHAR(50)
AS

SET NOCOUNT ON

BEGIN TRY
            -- left here for testing outside of SP
            -- this will be passed from SP
            --DECLARE @ReportName AS VARCHAR(50)    
            --SET @ReportName = 'TESTREport'

            -- this sets the first day of the week to Monday (we need it set to a value to do calcluations for weekdays, I set it to 1 for monday and 7 for sunday)
            --  this is due to server settings could have somethign else so forcing it here
            SET DATEFIRST 1

            DECLARE @CurrentScheduleDate AS DATE                    -- find the current date for the job that just ran
            DECLARE @CurrentScheduleDayNumberOfWeek AS SMALLINT     -- this pulls the number of the day of week 1=monday, 2=tuesdday
            DECLARE @ScheduleLookupType AS VARCHAR(20)              -- this is the type of schedule to do calculations on
            DECLARE @TimeOfDayToScheduleJob AS VARCHAR(20)          -- look this up, its the time to schedule the job
            DECLARE @SpecialScheduleValue AS VARCHAR(8000)          -- this is special value to lookup (only needed if non standard one)
            DECLARE @NewScheduleDateONLY AS DATETIME                -- to hold just the date of the schedule before combinng with time
            DECLARE @NewScheduleDateTime AS DATETIME                -- to hold the new schedule date and time, actual value to insert into queue

            -- pull the current schedule date/time from the queue table
            SELECT @CurrentScheduleDate = NextRunDateTime
            FROM dbo.GenericReportingQueue (NOLOCK)
            WHERE IsGenerated IS NULL
            AND ReportName = @ReportName

            -- to override for testing
            --SET @CurrentScheduleDate = '5/20/2016'

            SET @CurrentScheduleDayNumberOfWeek = DATEPART(WEEKDAY, @CurrentScheduleDate)

            -- pull these values from lookup table
            SELECT @ScheduleLookupType = dbo.fn_GetValueLookupTableValue(@ReportName, 'RescheduleJobDynamic_ScheduleLookupType'), 
            @TimeOfDayToScheduleJob = dbo.fn_GetValueLookupTableValue(@ReportName, 'RescheduleJobDynamic_TimeOfDayToScheduleJob'),
            @SpecialScheduleValue = dbo.fn_GetValueLookupTableValue(@ReportName, 'RescheduleJobDynamic_SpecialScheduleValue')

            /*
            -- reset for testing
            SET @ScheduleLookupType = 'specialdays' -- weekly, weekdays, monthly, specialdays
            SET @TimeOfDayToScheduleJob = '8:00'
            SET @SpecialScheduleValue = '5,6'
            */

            -- calculations to get the date to schedule the job next time based off logic
            SELECT @NewScheduleDateONLY = CASE  
                WHEN @ScheduleLookupType = 'daily' THEN DATEADD(DAY, 1, @CurrentScheduleDate)
                WHEN @ScheduleLookupType = 'weekly' THEN DATEADD(DAY, 7, @CurrentScheduleDate)
                WHEN @ScheduleLookupType = 'monthly' THEN DATEADD(MONTH, 1, @CurrentScheduleDate)
                WHEN @ScheduleLookupType = 'yearly' THEN DATEADD(YEAR, 1, @CurrentScheduleDate)

                -- only run on weekdays and skip weekends
                WHEN @ScheduleLookupType = 'weekdays' THEN 
                    CASE 
                        WHEN @CurrentScheduleDayNumberOfWeek IN (1, 2, 3, 4) THEN DATEADD(DAY, 1, @CurrentScheduleDate)
                        WHEN @CurrentScheduleDayNumberOfWeek = 5 THEN DATEADD(DAY, 3, @CurrentScheduleDate)
                    END -- end case for day of week
                -- only run on weekends and skip weekdays
                WHEN @ScheduleLookupType = 'weekends' THEN 
                    CASE
                        WHEN @CurrentScheduleDayNumberOfWeek = 6 THEN DATEADD(DAY, 1, @CurrentScheduleDate)
                        WHEN @CurrentScheduleDayNumberOfWeek = 7 THEN DATEADD(DAY, 6, @CurrentScheduleDate)
                    END -- end case for weekends only

                WHEN @ScheduleLookupType = 'specialdays' THEN
                    -- for this we need to determine the current day, and the next day we want to run on, then add that many days
                    --      if next day is not till the following week we just find the first day in the list
                    --      Take taht number and do dateadd to it
                    DATEADD(DAY, 
                            -- this does the select to determine what number to add based off current day and next day list
                            (SELECT ISNULL(
                                -- if this one I want to take today value and subtract from next value found
                                --      then add that number to todays date to give me the next schedule date
                                (SELECT TOP 1 StringValue - @CurrentScheduleDayNumberOfWeek
                                FROM dbo.fn_ParseText2Table(@SpecialScheduleValue, ',')
                                WHERE StringValue > @CurrentScheduleDayNumberOfWeek
                                ORDER BY StringValue)
                                ,
                                -- if none found above I need to go to the next weeks first value
                                --      I need to take 7 - todays number (to get the rest of the week) then add the next number for the next week to it
                                (SELECT TOP 1 (7 - @CurrentScheduleDayNumberOfWeek) + StringValue 
                                FROM dbo.fn_ParseText2Table(@SpecialScheduleValue, ',')                 
                                ORDER BY StringValue)
                                )-- end is null
                            ) -- end select 
                        , @CurrentScheduleDate) -- end dateadd for speical days             
            END -- outer case

            SET @NewScheduleDateTime = @NewScheduleDateONLY + ' ' + @TimeOfDayToScheduleJob

            -- for testing
            --SELECT @ScheduleLookupType AS ReportLookupType, @TimeOfDayToScheduleJob AS TimeOfDayToSchedule, @SpecialScheduleValue AS SpecialValuesForCalc, @NewScheduleDateTime AS NewDateTimeToRun, 
            --@CurrentScheduleDate AS CurrentDateSchedule, @CurrentScheduleDayNumberOfWeek AS CurrentNumberDayOfWeek, @NewScheduleDateONLY AS NewScheduleDateOnly

            -- &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
            --      now update and insert the new schedule date/time into the table

            -- update existing record
            UPDATE dbo.GenericReportingQueue
            SET IsGenerated = 1,
            DateReportRun = GETDATE(),
            LastUpdateDate = GETDATE()
            WHERE ISGenerated IS NULL
            AND ReportName = @ReportName

            -- insert new record with new date
            INSERT INTO dbo.GenericReportingQueue (
                ReportName, NextRunDateTime, CreatorID, CreateDate
            )
            SELECT @ReportName, @NewScheduleDateTime, 1, GETDATE()

END TRY

BEGIN CATCH

   RETURN

END CATCH

Database table design for scheduling tasks

This is the table structure i came up with;

Schedule
 - ScheduleName
 - ScheduleTypeId (Daily, Weekly, Monthly, Yearly, Specific)
 - StartDate
 - IntervalInDays
 - Frequency
 - FrequencyCounter

ScheduleDaily
 - ScheduleDailyId 
 - ScheduleId
 - TimeOfDay
 - StartDate
 - EndDate

ScheduleMonthly
 - ScheduleMonthlyId
 - ScheduleId
 - DayOfMonth
 - StartDate
 - EndDate

ScheduleSpecific
 - ScheduleSpecificId
 - ScheduleId
 - SpecificDate
 - StartDate

...

ScheduleJob
 - ScheduleJobId
 - ScheduleId
 - ScheduleTypeId
 - RunDate
 - ScheduleStatusId

Modelling indefinitely-recurring tasks in a schedule (calendar-like rails app)

I know this is an old question but I'm just starting to look into this for my own application and I found this paper by Martin Fowler illuminating: Recurring Events for Calendars

The main takeaway for me was using what he calls "temporal expressions" to figure out if a booking falls on a certain date range instead of trying to insert an infinite number of events (or in your case tasks) into the database.

Practically, for your use case, this might mean that you store the Task with a "temporal expression" property called schedule. The ice_cube recurrence gem has the ability to serialize itself into an active record property like so:

class Task < ActiveRecord::Base
  include IceCube
  serialize :schedule, Hash

  def schedule=(new_schedule)
    write_attribute(:schedule, new_schedule.to_hash)
  end

  def schedule
    Schedule.from_hash(read_attribute(:schedule))
  end
end

Ice cube seems really flexible and even allows you to specify exceptions to the recurrence rules. (Say you want to delete just one occurrence of the task, but not all of them.)

The problem is that you can't really query the database for a task that falls in a specific range of dates, because you've only stored the rule for making tasks, not the tasks themselves. For my case, I'm thinking about adding a property like "next_recurrence_date" which will be used to do some basic sorting/filtering. You could even use that to throw a task on a queue to have something done on the next recurring date. (Like check if that date has passed and then regenerate it. You could even store an "archived" version of the task once its next recurring date passes.)

This fixes your issue with "what if the task is updated" since tasks aren't ever persisted until they're in the past.

Anyway, I hope that is helpful to someone trying to think this through for their own app.

Design Option for 'Recurring Tasks'