To run a workflow periodically, add a
schedule: option to top-level workflow definitions.
timezone: UTC schedule: daily>: 07:00:00 +step1: sh>: tasks/shell_sample.sh
schedule: directive, you can choose one of following options:
||Run this job every day at HH:MM:SS||daily>: 07:00:00|
||Run this job every hour at MM:SS||hourly>: 30:00|
||Run this job every week on DDD at HH:MM:SS||weekly>: Sun,09:00:00|
||Run this job every month on D at HH:MM:SS||monthly>: 1,09:00:00|
||Run this job every this number of minutes||minutes_interval>: 30|
||Use cron format for complex scheduling||cron>: 42 4 1 * *|
digdag check command shows when the first schedule will start:
$ ./digdag check ... Schedules (1 entries): daily_job: daily>: "07:00:00" first session time: 2016-02-10 16:00:00 -0800 first runs at: 2016-02-10 23:00:00 -0800 (11h 16m 32s later)
When a field is starting with
* , enclosing in quotes is neccessary by a limitasion to be a vaild YAML.
digdag scheduler command runs the schedules:
$ ./digdag scheduler
When you change workflow definition, the scheduler reloads
digdag.dig file automatically so that you don’t have to restart it.
You can use Client-mode commands to manage the schedules.
The scheduler command listens on
http://127.0.0.1:65432 by default. It accepts connection only from 127.0.0.1 (localhost). This is for a security reason so that it doesn’t open the port to the public network. To change the listen address, please use
--bind ADDRESS option.
timezone: UTC schedule: daily>: 07:00:00 sla: # triggers this task at 02:00 time: 02:00 +notice: sh>: notice.sh +long_running_job: sh>: long_running_job.sh
Sometimes you have frequently running workflows (e.g. sessions every 30 or 60 minutes) that take longer than the duration between sessions. This variability in the duration of a workflow can occur for a number reasons. For example, you may be seeing an increase in the amount of data you are normally processing.
For example, let’s say we have a workflow that is running hourly, and it normally takes only 30 minutes. But it’s the holiday season and now there has been a huge increase in usage of your site – so much data is now being process the workflow is taking 1 hour and 30 minutes. During this time period, a 2nd workflow has started running for the following hour, which causes further strain on your available resources because both are running at the same time.
It’s this case it’s best to skip the next hour’s workflow session, and instead utilize the subsequent session to process 2 hours of data. To do this, we’ve added the following:
- Added a skip_on_overtime: true | false schedule option that can be used to control whether scheduled session execution should be skipped if another session is already running.
- Scheduled workflow sessions now have a last_executed_session_time variable which contains the previously executed session time. It is usually same with last_session_time but has different value when skip_on_overtime: true is set or the session is the first execution.