VictoriaMetrics

mirror of https://github.com/VictoriaMetrics/VictoriaMetrics.git synced 2024-12-05 01:01:09 +01:00

Author	SHA1	Message	Date
Zakhar Bessarab	6711eec109	docker-compose: move `TooManyLogs` into `vm-health` alerts set (#3199 )	2022-10-05 19:23:36 +02:00
Roman Khavronenko	5714a68ac6	deployment/docker: move cluster compose env to master branch (#3130 ) * deployment/docker: move cluster compose env to master branch The change supposed to simplify the process of maintaining for single/cluster docker-compose envs, alerts, dashboards. It also supposes to reduce confusion for users when looking for cluster related alerts/configs. Signed-off-by: hagen1778 <roman@victoriametrics.com> * deployment/docker: move cluster compose env to master branch Review updates. Signed-off-by: hagen1778 <roman@victoriametrics.com> Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-09-21 11:48:38 +03:00
Roman Khavronenko	27f1c65074	vmagent: expose metric `vmagent_remotewrite_queues` (#2871 ) The new metric `vmagent_remotewrite_queues` exports a static value of number of configured remote write queus. This metric is useful to calculate total saturation per each configured URL with given number of queues. See corresponding changes to vmagent alerts and dashboard. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-07-18 14:31:35 +03:00
Aliaksandr Valialkin	8a6fb5ef2b	deployment/docker/alerts.yml: backport `a42063909f`	2022-07-12 19:53:06 +03:00
Yurii Kravets	aeeaf877ac	Changed the level type in alerts.yml for TooManyLogs alert (#2760 ) alerts: filter out non error log messages for `TooManyLogs` Info and Warn error levels aren't always a result of malfunctioning or faulty state. So we filter them out.	2022-06-20 16:44:47 +02:00
Roman Khavronenko	7cd371f08f	alerts: lower the threshold for TooHighSlowInsertsRate (#2210 ) Lowering threshold from 50% to 5% will be more sufficient for discovering un-healthy system state. It also goes in sync with alert definition in cluster branch. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-02-18 13:42:24 +02:00
Roman Khavronenko	e29b2b8444	Monitoring single (#2190 ) * dashboards: plot cpu limits for vmagent, vmalert and vm-single dashboards Signed-off-by: hagen1778 <roman@victoriametrics.com> * alerts: add `TooHighCPUUsage` alert for all VM components Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards: bump components version requirements Signed-off-by: hagen1778 <roman@victoriametrics.com>	2022-02-15 11:54:28 +02:00
Roman Khavronenko	bc79bdf68a	Dashboards vmagent updates (#1973 ) * dashboards/vmagent: shuffle panels for better visibility More important error/dropped panels were moved higher on the main row. Network usage panel moved to Resource usage row. Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/vmagent: add Troubleshooting row to show top 5 instances/jobs by churn rate New panels are supposed to show top 5 jobs or targets which generate the most of the churn rate. They were placed into a new row "Troubleshooting". Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/vmagent: add panels for showing persistent queue saturation New panels were added to Torubleshooting row to show the persistent queue saturation. The corresponding alerts were added and linked to these panels as well. Signed-off-by: hagen1778 <roman@victoriametrics.com> * dashboards/vmagent: add alert "RejectedRemoteWriteDataBlocksAreDropped" New alert suppose to send a notification when vmagent starts to drop data blocks rejected by configured remote write destiantion. Signed-off-by: hagen1778 <roman@victoriametrics.com>	2021-12-20 12:16:53 +02:00
Thomas Danielsson	77e19b3f87	Fix vmsingle dashboard link (#1894 )	2021-12-02 14:43:30 +02:00
Aliaksandr Valialkin	ec40affb59	deployment/docker/alerts.yml: formatting fixes after `865a60f13e`	2021-10-19 08:53:03 +03:00
Yurii Kravets	865a60f13e	Update alerts.yml Added Series Limit day\hour alerts	2021-10-18 18:14:49 +03:00
Roman Khavronenko	0f4bcc00b2	Single dashboards upd (#1593 ) * dasbhoard: replace `null` datasources null datasource value may confuse Grafana and make it drop panel query in some versions. * docker: bump grafana image version * dashboards: add URL variable selector to vmagent dashboard * dashboards: add new panel `Remote write connection saturation` to vmagent dashboard * alerts: add new alert for `Remote write connection saturation` panel of vmagent dashboard * dashboards: add "Logging rate" panel to vmagent dashboard	2021-09-01 11:46:22 +03:00
Roman Khavronenko	408ba43092	Alerts single update (#1510 ) * alerts: move `ProcessNearFDLimits` to `vm-health` group since it is relevant for all services * alerts: add new `TooHighMemoryUsage` alerting rule	2021-08-02 15:51:24 +03:00
Roman Khavronenko	2f54559c89	alerts: sync alert expression for `DiskRunsOutOfSpaceIn3Days` with dashboard (#1436 )	2021-07-07 10:31:09 +03:00
Roman Khavronenko	5e9f3777bf	alerts: add new alert `LabelsLimitExceededOnIngestion` (#1359 )	2021-06-09 12:15:36 +03:00
k1rk	668165f53d	rename serviceHealth group name to vm-health (#1360 ) this causes conflicts in `victoria-metrics-k8s-stack` chart =)	2021-06-08 23:34:38 +03:00
Roman Khavronenko	162681e60d	add new alerts (#1195 ) * alerts: backport `DiskRunsOutOfSpace` alert and some other tweaks from cluster branch * alerts: add `ServiceDown` alert to detect "dead" services	2021-04-08 18:24:25 +03:00
Roman Khavronenko	cfdb6762e6	deployment: add new alert `TooHighChurnRate24h` (#1154 ) Alert `TooHighChurnRate24h` suppose to cover cases when churn rate is low but results in multiple times higher number than total number of active series.	2021-03-29 12:38:03 +03:00
Roman Khavronenko	b457739f87	Single dashboard (#1126 ) * dashboard: update single node dashboard * add panel `Open FDs` for file descriptors metrics; * add panel `Disk writes/reads` to show the real read/write load on storage layer; * add `process_resident_memory_bytes` metric to memory usage panel; * add stats panel to show available CPUs, memory and disk space; * rm flags panel since it didn't prove its usefulness. * alerts: add alert for reaching FDs limit	2021-03-15 12:04:24 +02:00
Roman Khavronenko	14f0f90507	docker-compose: provide the example list of alerting rules for vm components (#1005 ) List contains examples for the alerting rules which might be executed via `vmalert` to track the health state of VM components. It is assumed that list will be revised and calibrated for each system individually.	2021-01-11 13:03:15 +02:00
Artem Navoiev	4e391a5e39	[deployment] add vmalert + alertmanager to docker compose (#885 )	2020-11-07 17:00:23 +02:00

21 Commits