Most of the time, we may need to periodically run crawling tasks for a spider. Now you need a schedule.
The concept schedule in Crawlab is similar to crontab (opens new window) in Linux. It is a long-existing job that runs spider tasks in a periodical way.
If you would like to configure a web crawler that automatically runs crawling tasks every day/week/month, you should probably set up a schedule. Schedule is the right way to automate things, especially for spiders that crawl incremental content.
# Create Schedule
- Navigate to
New Schedulebutton on the top left.
- Enter basic info including
Name, Cron Expression (opens new window) and
The created schedule is enabled by default. Once you created a schedule which is already enabled, it should trigger a task on time according to its cron expression you have set.
You can debug whether the schedule module works in Crawlab by creating a new schedule with
* * * * *, which means "every minute", so that you can check if a task will be triggered when the next minute
# Enable/Disable Schedule
You can enable or disable schedules by toggling the switch button of
Enabled attribute in
Schedules page and
schedule detail page.
# Cron Expression
Cron Expression is a simple and standard format to describe the periodicity of tasks. It is the same as the format in
* * * * * Command_to_execute | | | | | | | | | Day of the Week ( 0 - 6 ) ( Sunday = 0 ) | | | | | | | Month ( 1 - 12 ) | | | | | Day of Month ( 1 - 31 ) | | | Hour ( 0 - 23 ) | Min ( 0 - 59 )
- The asterisk (*) operator specifies all possible values for a field. e.g. every hour or every day.
- The comma (,) operator specifies a list of values, for example: "1,3,4,7,8".
- The dash (-) operator specifies a range of values, for example: "1-6", which is equivalent to "1,2,3,4,5,6".
- The slash (/) operator, can be used to skip a given number of values. For example, "*/3" in the hour time field is equivalent to "0,3,6,9,12,15,18,21"; "*" specifies 'every hour' but the "/3" means that only the first, fourth, seventh...and such values given by "*" are used.
Cron Expression in Crawlab uses the same format as the one in Linux
crontab. That is to say, the smallest unit
minute. It is different from some crontab-style schedule frameworks whose smallest unit is second.
If you are not sure about your cron expression, you can go to https://crontab.guru (opens new window) to validate the correctness.