2024-01-08 10:31:36 +01:00
---
2024-01-16 16:58:48 +01:00
sort: 3
2024-01-08 10:31:36 +01:00
title: Scheduler
weight: 3
menu:
docs:
parent: "vmanomaly-components"
weight: 3
aliases:
- /anomaly-detection/components/scheduler.html
---
Scheduler defines how often to run and make inferences, as well as what timerange to use to train the model.
Is specified in `scheduler` section of a config for VictoriaMetrics Anomaly Detection.
2024-07-24 10:00:31 +02:00
> **Note: Starting from [v1.11.0](../CHANGELOG.md#v1110) scheduler section in config supports multiple schedulers via aliasing. <br>Also, `vmanomaly` expects scheduler section to be named `schedulers`. Using old (flat) format with `scheduler` key is deprecated and will be removed in future versions.**
2024-02-22 18:48:54 +01:00
```yaml
schedulers:
scheduler_periodic_1m:
2024-06-11 12:15:05 +02:00
# class: "periodic" # or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
2024-02-22 18:48:54 +01:00
infer_every: "1m"
fit_every: "2m"
fit_window: "3h"
scheduler_periodic_5m:
2024-06-11 12:15:05 +02:00
# class: "periodic" # or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
2024-02-22 18:48:54 +01:00
infer_every: "5m"
fit_every: "10m"
fit_window: "3h"
...
```
2024-07-24 10:00:31 +02:00
Old-style configs (< [1.11.0 ](../CHANGELOG.md#v1110 ))
2024-02-22 18:48:54 +01:00
```yaml
scheduler:
2024-06-11 12:15:05 +02:00
# class: "periodic" # or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
2024-02-22 18:48:54 +01:00
infer_every: "1m"
fit_every: "2m"
fit_window: "3h"
...
```
will be **implicitly** converted to
```yaml
schedulers:
2024-06-11 12:15:05 +02:00
default_scheduler: # default scheduler alias added, for backward compatibility
class: "scheduler.periodic.PeriodicScheduler"
2024-02-22 18:48:54 +01:00
infer_every: "1m"
fit_every: "2m"
fit_window: "3h"
...
```
2024-01-08 10:31:36 +01:00
## Parameters
`class` : str, default=`"scheduler.periodic.PeriodicScheduler"`,
options={`"scheduler.periodic.PeriodicScheduler"`, `"scheduler.oneoff.OneoffScheduler"` , `"scheduler.backtesting.BacktestingScheduler"` }
- `"scheduler.periodic.PeriodicScheduler"` : periodically runs the models on new data. Useful for consecutive re-trainings to counter [data drift ](https://www.datacamp.com/tutorial/understanding-data-drift-model-drift ) and model degradation over time.
- `"scheduler.oneoff.OneoffScheduler"` : runs the process once and exits. Useful for testing.
- `"scheduler.backtesting.BacktestingScheduler"` : imitates consecutive backtesting runs of OneoffScheduler. Runs the process once and exits. Use to get more granular control over testing on historical data.
2024-07-24 10:00:31 +02:00
> **Note**: starting from [v1.13.0](../CHANGELOG.md#v1130), class aliases are supported, so `"scheduler.periodic.PeriodicScheduler"` can be substituted to `"periodic"`, `"scheduler.oneoff.OneoffScheduler"` - to `"oneoff"`, `"scheduler.backtesting.BacktestingScheduler"` - to `"backtesting"`
2024-06-11 12:15:05 +02:00
2024-01-08 10:31:36 +01:00
**Depending on selected class, different parameters should be used**
## Periodic scheduler
### Parameters
For periodic scheduler parameters are defined as differences in times, expressed in difference units, e.g. days, hours, minutes, seconds.
Examples: `"50s"` , `"4m"` , `"3h"` , `"2d"` , `"1w"` .
2024-07-24 10:00:31 +02:00
< table class = "params" >
2024-01-08 10:31:36 +01:00
< thead >
< tr >
< th > < / th >
< th > Time granularity< / th >
< / tr >
< / thead >
< tbody >
< tr >
< td > s< / td >
< td > seconds< / td >
< / tr >
< tr >
< td > m< / td >
< td > minutes< / td >
< / tr >
< tr >
< td > h< / td >
< td > hours< / td >
< / tr >
< tr >
< td > d< / td >
< td > days< / td >
< / tr >
< tr >
< td > w< / td >
< td > weeks< / td >
< / tr >
< / tbody >
< / table >
2024-07-24 10:00:31 +02:00
< table class = "params" >
2024-01-08 10:31:36 +01:00
< thead >
< tr >
< th > Parameter< / th >
< th > Type< / th >
< th > Example< / th >
< th > Description< / th >
< / tr >
< / thead >
< tbody >
< tr >
2024-07-24 10:00:31 +02:00
< td >
`fit_window`
< / td >
2024-01-08 10:31:36 +01:00
< td > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"14d"`
< / td >
2024-01-08 10:31:36 +01:00
< td > What time range to use for training the models. Must be at least 1 second.< / td >
< / tr >
< tr >
2024-07-24 10:00:31 +02:00
< td >
`infer_every`
< / td >
2024-01-08 10:31:36 +01:00
< td > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"1m"`
< / td >
2024-01-08 10:31:36 +01:00
< td > How often a model will write its conclusions on newly added data. Must be at least 1 second.< / td >
< / tr >
< tr >
2024-07-24 10:00:31 +02:00
< td >
`fit_every`
< / td >
2024-01-08 10:31:36 +01:00
< td > str, Optional< / td >
2024-07-24 10:00:31 +02:00
< td >
`"1h"`
< / td >
< td >
How often to completely retrain the models. If missing value of `infer_every` is used and retrain on every inference run.
< / td >
2024-01-08 10:31:36 +01:00
< / tr >
< / tbody >
< / table >
### Periodic scheduler config example
```yaml
2024-06-11 12:15:05 +02:00
schedulers:
periodic_scheduler_alias:
class: "periodic"
# (or class: "scheduler.periodic.PeriodicScheduler" until v1.13.0 with class alias support)
fit_window: "14d"
infer_every: "1m"
fit_every: "1h"
2024-01-08 10:31:36 +01:00
```
This part of the config means that `vmanomaly` will calculate the time window of the previous 14 days and use it to train a model. Every hour model will be retrained again on 14 days’ data, which will include + 1 hour of new data. The time window is strictly the same 14 days and doesn't extend for the next retrains. Every minute `vmanomaly` will produce model inferences for newly added data points by using the model that is kept in memory at that time.
## Oneoff scheduler
### Parameters
For Oneoff scheduler timeframes can be defined in Unix time in seconds or ISO 8601 string format.
ISO format supported time zone offset formats are:
* Z (UTC)
* ±HH:MM
* ±HHMM
* ±HH
If a time zone is omitted, a timezone-naive datetime is used.
### Defining fitting timeframe
2024-07-24 10:00:31 +02:00
< table class = "params" >
2024-01-08 10:31:36 +01:00
< thead >
< tr >
< th > Format< / th >
< th > Parameter< / th >
< th > Type< / th >
< th > Example< / th >
< th > Description< / th >
< / tr >
< / thead >
< tbody >
< tr >
< td > ISO 8601< / td >
2024-07-24 10:00:31 +02:00
< td >
`fit_start_iso`
< / td >
2024-01-08 10:31:36 +01:00
< td > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"2022-04-01T00:00:00Z", "2022-04-01T00:00:00+01:00", "2022-04-01T00:00:00+0100", "2022-04-01T00:00:00+01"`
< / td >
2024-01-08 10:31:36 +01:00
< td rowspan = 2 > Start datetime to use for training a model. ISO string or UNIX time in seconds.< / td >
< / tr >
< tr >
< td > UNIX time< / td >
2024-07-24 10:00:31 +02:00
< td >
`fit_start_s`
< / td >
2024-01-08 10:31:36 +01:00
< td > float< / td >
< td > 1648771200< / td >
< / tr >
< tr >
< td > ISO 8601< / td >
2024-07-24 10:00:31 +02:00
< td >
`fit_end_iso`
< / td >
2024-01-08 10:31:36 +01:00
< td > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"2022-04-10T00:00:00Z", "2022-04-10T00:00:00+01:00", "2022-04-10T00:00:00+0100", "2022-04-10T00:00:00+01"`
< / td >
< td rowspan = 2 > End datetime to use for training a model. Must be greater than
`fit_start_*`
. ISO string or UNIX time in seconds.< / td >
2024-01-08 10:31:36 +01:00
< / tr >
< tr >
< td > UNIX time< / td >
2024-07-24 10:00:31 +02:00
< td >
`fit_end_s`
< / td >
2024-01-08 10:31:36 +01:00
< td > float< / td >
< td > 1649548800< / td >
< / tr >
< / tbody >
< / table >
### Defining inference timeframe
2024-07-24 10:00:31 +02:00
< table class = "params" >
2024-01-08 10:31:36 +01:00
< thead >
< tr >
< th > Format< / th >
< th > Parameter< / th >
< th > Type< / th >
< th > Example< / th >
< th > Description< / th >
< / tr >
< / thead >
< tbody >
< tr >
< td > ISO 8601< / td >
2024-07-24 10:00:31 +02:00
< td >
`infer_start_iso`
< / td >
2024-01-08 10:31:36 +01:00
< td > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"2022-04-11T00:00:00Z", "2022-04-11T00:00:00+01:00", "2022-04-11T00:00:00+0100", "2022-04-11T00:00:00+01"`
< / td >
2024-01-08 10:31:36 +01:00
< td rowspan = 2 > Start datetime to use for a model inference. ISO string or UNIX time in seconds.< / td >
< / tr >
< tr >
< td > UNIX time< / td >
2024-07-24 10:00:31 +02:00
< td >
`infer_start_s`
< / td >
2024-01-08 10:31:36 +01:00
< td > float< / td >
< td > 1649635200< / td >
< / tr >
< tr >
< td > ISO 8601< / td >
2024-07-24 10:00:31 +02:00
< td >
`infer_end_iso`
< / td >
2024-01-08 10:31:36 +01:00
< td > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"2022-04-14T00:00:00Z", "2022-04-14T00:00:00+01:00", "2022-04-14T00:00:00+0100", "2022-04-14T00:00:00+01"`
< / td >
< td rowspan = 2 > End datetime to use for a model inference. Must be greater than
`infer_start_*`
. ISO string or UNIX time in seconds.< / td >
2024-01-08 10:31:36 +01:00
< / tr >
< tr >
< td > UNIX time< / td >
2024-07-24 10:00:31 +02:00
< td >
`infer_end_s`
< / td >
2024-01-08 10:31:36 +01:00
< td > float< / td >
< td > 1649894400< / td >
< / tr >
< / tbody >
< / table >
### ISO format scheduler config example
```yaml
2024-06-11 12:15:05 +02:00
schedulers:
oneoff_scheduler_alias:
class: "oneoff"
# (or class: "scheduler.oneoff.OneoffScheduler" until v1.13.0 with class alias support)
fit_start_iso: "2022-04-01T00:00:00Z"
fit_end_iso: "2022-04-10T00:00:00Z"
infer_start_iso: "2022-04-11T00:00:00Z"
infer_end_iso: "2022-04-14T00:00:00Z"
2024-01-08 10:31:36 +01:00
```
### UNIX time format scheduler config example
```yaml
2024-06-11 12:15:05 +02:00
schedulers:
oneoff_scheduler_alias:
class: "oneoff"
# (or class: "scheduler.oneoff.OneoffScheduler" until v1.13.0 with class alias support)
fit_start_s: 1648771200
fit_end_s: 1649548800
infer_start_s: 1649635200
infer_end_s: 1649894400
2024-01-08 10:31:36 +01:00
```
## Backtesting scheduler
### Parameters
As for [Oneoff scheduler ](#oneoff-scheduler ), timeframes can be defined in Unix time in seconds or ISO 8601 string format.
ISO format supported time zone offset formats are:
* Z (UTC)
* ±HH:MM
* ±HHMM
* ±HH
If a time zone is omitted, a timezone-naive datetime is used.
2024-06-11 12:15:05 +02:00
### Parallelization
2024-07-24 10:00:31 +02:00
< table class = "params" >
2024-06-11 12:15:05 +02:00
< thead >
< tr >
< th > Parameter< / th >
< th > Type< / th >
< th > Example< / th >
< th > Description< / th >
< / tr >
< / thead >
< tbody >
< tr >
2024-07-24 10:00:31 +02:00
< td >
`n_jobs`
< / td >
2024-06-11 12:15:05 +02:00
< td > int< / td >
2024-07-24 10:00:31 +02:00
< td >
`1`
< / td >
< td >
Allows *proportionally faster (yet more resource-intensive)* evaluations of a config on historical data. Default value is 1, that implies *sequential* execution. Introduced in [v1.13.0 ](../CHANGELOG.md#v1130 )
< / td >
2024-06-11 12:15:05 +02:00
< / tr >
< / tbody >
< / table >
2024-01-08 10:31:36 +01:00
### Defining overall timeframe
This timeframe will be used for slicing on intervals `(fit_window, infer_window == fit_every)` , starting from the *latest available* time point, which is `to_*` and going back, until no full `fit_window + infer_window` interval exists within the provided timeframe.
2024-07-24 10:00:31 +02:00
< table class = "params" >
2024-01-08 10:31:36 +01:00
< thead >
< tr >
< th > Format< / th >
< th > Parameter< / th >
< th > Type< / th >
< th > Example< / th >
< th > Description< / th >
< / tr >
< / thead >
< tbody >
< tr >
< td > ISO 8601< / td >
2024-07-24 10:00:31 +02:00
< td >
`from_iso`
< / td >
2024-01-08 10:31:36 +01:00
< td > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"2022-04-01T00:00:00Z", "2022-04-01T00:00:00+01:00", "2022-04-01T00:00:00+0100", "2022-04-01T00:00:00+01"`
< / td >
2024-01-08 10:31:36 +01:00
< td rowspan = 2 > Start datetime to use for backtesting.< / td >
< / tr >
< tr >
< td > UNIX time< / td >
2024-07-24 10:00:31 +02:00
< td >
`from_s`
< / td >
2024-01-08 10:31:36 +01:00
< td > float< / td >
< td > 1648771200< / td >
< / tr >
< tr >
< td > ISO 8601< / td >
2024-07-24 10:00:31 +02:00
< td >
`to_iso`
< / td >
2024-01-08 10:31:36 +01:00
< td > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"2022-04-10T00:00:00Z", "2022-04-10T00:00:00+01:00", "2022-04-10T00:00:00+0100", "2022-04-10T00:00:00+01"`
< / td >
< td rowspan = 2 > End datetime to use for backtesting. Must be greater than
`from_start_*`
< / td >
2024-01-08 10:31:36 +01:00
< / tr >
< tr >
< td > UNIX time< / td >
2024-07-24 10:00:31 +02:00
< td >
`to_s`
< / td >
2024-01-08 10:31:36 +01:00
< td > float< / td >
< td > 1649548800< / td >
< / tr >
< / tbody >
< / table >
### Defining training timeframe
The same *explicit* logic as in [Periodic scheduler ](#periodic-scheduler )
2024-07-24 10:00:31 +02:00
< table class = "params" >
2024-01-08 10:31:36 +01:00
< thead >
< tr >
< th > Format< / th >
< th > Parameter< / th >
< th > Type< / th >
< th > Example< / th >
< th > Description< / th >
< / tr >
< / thead >
< tbody >
< tr >
< td > ISO 8601< / td >
2024-07-24 10:00:31 +02:00
< td rowspan = 2 >
`fit_window`
< / td >
2024-01-08 10:31:36 +01:00
< td rowspan = 2 > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"PT1M", "P1H"`
< / td >
2024-01-08 10:31:36 +01:00
< td rowspan = 2 > What time range to use for training the models. Must be at least 1 second.< / td >
< / tr >
< tr >
< td > Prometheus-compatible< / td >
2024-07-24 10:00:31 +02:00
< td >
`"1m", "1h"`
< / td >
2024-01-08 10:31:36 +01:00
< / tr >
< / tbody >
< / table >
### Defining inference timeframe
In `BacktestingScheduler` , the inference window is *implicitly* defined as a period between 2 consecutive model `fit_every` runs. The *latest* inference window starts from `to_s` - `fit_every` and ends on the *latest available* time point, which is `to_s` . The previous periods for fit/infer are defined the same way, by shifting `fit_every` seconds backwards until we get the last full fit period of `fit_window` size, which start is >= `from_s` .
2024-07-24 10:00:31 +02:00
< table class = "params" >
2024-01-08 10:31:36 +01:00
< thead >
< tr >
< th > Format< / th >
< th > Parameter< / th >
< th > Type< / th >
< th > Example< / th >
< th > Description< / th >
< / tr >
< / thead >
< tbody >
< tr >
< td > ISO 8601< / td >
2024-07-24 10:00:31 +02:00
< td rowspan = 2 >
`fit_every`
< / td >
2024-01-08 10:31:36 +01:00
< td rowspan = 2 > str< / td >
2024-07-24 10:00:31 +02:00
< td >
`"PT1M", "P1H"`
< / td >
2024-01-08 10:31:36 +01:00
< td rowspan = 2 > What time range to use previously trained model to infer on new data until next retrain happens.< / td >
< / tr >
< tr >
< td > Prometheus-compatible< / td >
2024-07-24 10:00:31 +02:00
< td >
`"1m", "1h"`
< / td >
2024-01-08 10:31:36 +01:00
< / tr >
< / tbody >
< / table >
### ISO format scheduler config example
```yaml
2024-06-11 12:15:05 +02:00
schedulers:
backtesting_scheduler_alias:
class: "backtesting"
# (or class: "scheduler.backtesting.BacktestingScheduler" until v1.13.0 with class alias support)
from_iso: '2021-01-01T00:00:00Z'
to_iso: '2021-01-14T00:00:00Z'
fit_window: 'P14D'
fit_every: 'PT1H'
n_jobs: 1 # default = 1 (sequential), set it up to # of CPUs for parallel execution
2024-01-08 10:31:36 +01:00
```
### UNIX time format scheduler config example
```yaml
2024-06-11 12:15:05 +02:00
schedulers:
backtesting_scheduler_alias:
class: "backtesting"
# (or class: "scheduler.backtesting.BacktestingScheduler" until v1.13.0 with class alias support)
from_s: 167253120
to_s: 167443200
fit_window: '14d'
fit_every: '1h'
n_jobs: 1 # default = 1 (sequential), set it up to # of CPUs for parallel execution
2024-07-24 10:00:31 +02:00
```