mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2024-11-23 12:31:07 +01:00
docs: fix typos and format in case study (#7374)
### Describe Your Changes - made small typo fix in case studies ### Checklist The following checks are **mandatory**: - [x] My change adheres [VictoriaMetrics contributing guidelines](https://docs.victoriametrics.com/contributing/).
This commit is contained in:
parent
45896fb477
commit
8dc4e2b5a5
@ -3,11 +3,12 @@ weight: 21
|
||||
title: Case studies and talks
|
||||
menu:
|
||||
docs:
|
||||
parent: 'victoriametrics'
|
||||
parent: "victoriametrics"
|
||||
weight: 21
|
||||
aliases:
|
||||
- /CaseStudies.html
|
||||
- /CaseStudies.html
|
||||
---
|
||||
|
||||
Below please find public case studies and talks from VictoriaMetrics users. You can also join our [community Slack channel](https://slack.victoriametrics.com/)
|
||||
where you can chat with VictoriaMetrics users to get additional references, reviews and case studies.
|
||||
|
||||
@ -87,12 +88,12 @@ We ended up with the following configuration:
|
||||
We learned that remote write protocol generated too much traffic and connections so after 8 months we started looking for alternatives.
|
||||
|
||||
Around the same time, VictoriaMetrics released [vmagent](https://docs.victoriametrics.com/vmagent/).
|
||||
We tried to scrape all the metrics via a single instance of vmagent but it that didn't work because vmagent wasn't able to catch up with writes
|
||||
We tried to scrape all the metrics via a single instance of vmagent but that didn't work because vmagent wasn't able to catch up with writes
|
||||
into VictoriaMetrics. We tested different options and end up with the following scheme:
|
||||
|
||||
- We removed Prometheus from our setup.
|
||||
- VictoriaMetrics [can scrape targets](https://docs.victoriametrics.com/single-server-victoriametrics/#how-to-scrape-prometheus-exporters-such-as-node-exporter) as well
|
||||
so we removed vmagent. Now, VictoriaMetrics scrapes all the metrics from 110 jobs and 5531 targets.
|
||||
so we removed vmagent. Now, VictoriaMetrics scrapes all the metrics from 110 jobs and 5531 targets.
|
||||
- We use [Promxy](https://github.com/jacksontj/promxy) for alerting.
|
||||
|
||||
Such a scheme has generated the following benefits compared with Prometheus:
|
||||
@ -195,8 +196,8 @@ Once we found VictoriaMetrics it solved the following problems:
|
||||
- very short startup time and any possible gaps in data can easily be filled in using Promxy
|
||||
- we could continue using Telegraf as our metrics agent and ship identical metrics to both InfluxDB and VictoriaMetrics during the migration period (migration just about to start)
|
||||
- compression im VM is really good. We can store more metrics and we can easily spin up new VictoriaMetrics instances
|
||||
for new data and keep read-only nodes with older data if we need to extend our retention period further
|
||||
than single virtual machine disks allow and we can aggregate all the data from VictoriaMetrics with Promxy
|
||||
for new data and keep read-only nodes with older data if we need to extend our retention period further
|
||||
than single virtual machine disks allow and we can aggregate all the data from VictoriaMetrics with Promxy
|
||||
|
||||
High availability is done the same way we did with InfluxDB by running parallel single nodes of VictoriaMetrics.
|
||||
|
||||
@ -260,38 +261,38 @@ We started with a Prometheus server on EKS. That worked until it didn't. We then
|
||||
|
||||
### What VictoriaMetrics means for us
|
||||
|
||||
* Easy to use and maintain
|
||||
* Cost effective
|
||||
* The ability to handle billions of time series events at any point of time
|
||||
* Multiple K8s clusters to monitor
|
||||
* Consistent monitoring infra for each cluster across multiple Regions and clouds
|
||||
* Secure communication and data storage
|
||||
* Easy Retention
|
||||
- Easy to use and maintain
|
||||
- Cost effective
|
||||
- The ability to handle billions of time series events at any point of time
|
||||
- Multiple K8s clusters to monitor
|
||||
- Consistent monitoring infra for each cluster across multiple Regions and clouds
|
||||
- Secure communication and data storage
|
||||
- Easy Retention
|
||||
|
||||
### Some of our initial challenges prior to moving to VictoriaMetrics
|
||||
|
||||
* Reducing cost by not using a managed solution of one of the clouds
|
||||
* Support HA and recover fast
|
||||
* No downtimes
|
||||
* Having our main prometheus using too much Ram and restarts.
|
||||
- Reducing cost by not using a managed solution of one of the clouds
|
||||
- Support HA and recover fast
|
||||
- No downtimes
|
||||
- Having our main prometheus using too much Ram and restarts.
|
||||
|
||||
### Some of the reasons we chose VictoriaMetrics
|
||||
|
||||
* The API is compatible with Prometheus and all standard PromQL queries work well out of the box
|
||||
* Handles storage well
|
||||
* Available to use in Grafana easily
|
||||
* Single and small executable
|
||||
* Easy and fast backups
|
||||
* Better benchmarks than all the competitors
|
||||
* Open Source and maintained with good community
|
||||
- The API is compatible with Prometheus and all standard PromQL queries work well out of the box
|
||||
- Handles storage well
|
||||
- Available to use in Grafana easily
|
||||
- Single and small executable
|
||||
- Easy and fast backups
|
||||
- Better benchmarks than all the competitors
|
||||
- Open Source and maintained with good community
|
||||
|
||||
### Some of the benefits we experienced since working with VictoriaMetrics
|
||||
|
||||
* We saved around $5K USD per month
|
||||
* It’s seamless and doesn’t cause any override complications on the Infrastructure team
|
||||
* It doesn’t use lots of storage
|
||||
* It can serve us in the future in even bigger scales
|
||||
* It has support with a great community.
|
||||
- We saved around $5K USD per month
|
||||
- It’s seamless and doesn’t cause any override complications on the Infrastructure team
|
||||
- It doesn’t use lots of storage
|
||||
- It can serve us in the future in even bigger scales
|
||||
- It has support with a great community.
|
||||
|
||||
## Fly.io
|
||||
|
||||
@ -412,12 +413,13 @@ See [this video](https://www.youtube.com/watch?v=OUyXPgVcdw4) and [these slides]
|
||||
[NetEase Cloud Music](https://music.163.com/) is a Chinese freemium music streaming service developed and owned by [NetEase, Inc](https://en.wikipedia.org/wiki/NetEase). It is one of the biggest competitors in the Chinese music streaming business, primarily competing with [Tencent](https://en.wikipedia.org/wiki/Tencent)'s QQ Music.
|
||||
|
||||
The huge scale of services and the diverse monitoring requirements bring great challenges to timeseries database’s reliability, availability, and performance. With year’s evolution, we finally build a metrics system around VictoriaMetrics, aiming to solve following problems:
|
||||
* Weak observability on application layer: in the past, internal monitoring of the product mainly focused on machine level. Although it also provided monitoring plugins for common frameworks, there was still room for improvement in both performance and visualization effects.
|
||||
* Linking metrics to trace: metrics are the most intuitive way to discover problems, such as "getting 10 failed http requests in the past 30s", but sometimes traces are also needed to locate the root cause of the errors.
|
||||
* Performance and cost: storage cost of the old metric system is relatively high, since prometheus as a standalone application cannot support large scale of data.
|
||||
* aggregate queries: aggregate queries are often needed and could take several seconds or even tens of seconds, slowing down troubleshooting process seriously.
|
||||
* Weak visualization capabilities: monitoring data are often used in YoY comparison and multi-instance comparison to help locate problems. Neither Prometheus UI nor Grafana supports this feature.
|
||||
|
||||
|
||||
- Weak observability on application layer: in the past, internal monitoring of the product mainly focused on machine level. Although it also provided monitoring plugins for common frameworks, there was still room for improvement in both performance and visualization effects.
|
||||
- Linking metrics to trace: metrics are the most intuitive way to discover problems, such as "getting 10 failed http requests in the past 30s", but sometimes traces are also needed to locate the root cause of the errors.
|
||||
- Performance and cost: storage cost of the old metric system is relatively high, since prometheus as a standalone application cannot support large scale of data.
|
||||
- aggregate queries: aggregate queries are often needed and could take several seconds or even tens of seconds, slowing down troubleshooting process seriously.
|
||||
- Weak visualization capabilities: monitoring data are often used in YoY comparison and multi-instance comparison to help locate problems. Neither Prometheus UI nor Grafana supports this feature.
|
||||
|
||||
See [this article](https://juejin.cn/post/7322268449409744931) for details on how NetEase Cloud Music build a metrics system base on VictoriaMetrics and give solutions to above problems.
|
||||
|
||||
## Percona
|
||||
@ -451,7 +453,7 @@ See [the full article](https://engineering.razorpay.com/scaling-to-trillions-of-
|
||||
|
||||
[RELEX Solutions](https://www.relexsolutions.com/), a global software company from Finland, is the market-leading supply chain and retail planning platform.
|
||||
|
||||
VictoriaMetrics is used as the central metrics storage for timeseries about applications and machines hosted both in the public cloud and in the private cloud. Metrics are remote-written by Prometheus, the OpenTelemetry collector and sometimes directly by custom Prometheus exporters.
|
||||
VictoriaMetrics is used as the central metrics storage for timeseries about applications and machines hosted both in the public cloud and in the private cloud. Metrics are remote-written by Prometheus, the OpenTelemetry collector and sometimes directly by custom Prometheus exporters.
|
||||
|
||||
Alerts are evaluated on vmalert when necessary, either because metrics from multiple sources are needed or because the source is Prometheus in agent mode (mostly for kubernetes clusters). Prometheus Alertmanager and Grafana+Promxy combine all sources together so that the end users of dashboards (and ideally the recipients of alert notifications) don't have to worry where to look for some information.
|
||||
|
||||
@ -554,7 +556,7 @@ See [the full article](https://smarketshq.com/monitoring-kubernetes-clusters-41a
|
||||
|
||||
[Synthesio](https://www.synthesio.com/) is the leading social intelligence tool for social media monitoring and analytics.
|
||||
|
||||
> We fully migrated from [Metrictank](https://github.com/grafana/metrictank) to VictoriaMetrics
|
||||
> We fully migrated from [Metrictank](https://github.com/grafana/metrictank) to VictoriaMetrics
|
||||
|
||||
Numbers:
|
||||
|
||||
@ -611,15 +613,15 @@ Numbers:
|
||||
|
||||
Alex Ulstein, Head of Monitoring, Wix.com
|
||||
|
||||
|
||||
## xiaohongshu
|
||||
|
||||
With a mission to “inspire lives”, [Xiaohongshu](https://www.xiaohongshu.com) is a lifestyle platform that inspires people to discover and connect with a range of diverse lifestyles from China.
|
||||
In the past year, we used VictoriaMetrics to replace Prometheus. After migrating to VictoriaMetrics, we had saved more than ten thousand cpu cores, and our metrics system is more stable.
|
||||
In the past year, we used VictoriaMetrics to replace Prometheus. After migrating to VictoriaMetrics, we had saved more than ten thousand cpu cores, and our metrics system is more stable.
|
||||
Now more than thirty VictoriaMetrics storage clusters are running online, including all of our key business areas, such as recommendations, search, community, advertising, infrastructure, etc.
|
||||
See [this article](https://mp.weixin.qq.com/s/uJ1t0B8WBBryzvbLWDfl5A) on how Xiaohongshu build metrics system base on VictoriaMetrics and the competing solutions.
|
||||
|
||||
Across our production VictoriaMetrics clusters, numbers as below:
|
||||
|
||||
- Cpu cores in all VictoriaMetrics clusters: almost 50000
|
||||
- Data size on disk: 2400 TB
|
||||
- Retention period: 1 month
|
||||
@ -629,7 +631,6 @@ Across our production VictoriaMetrics clusters, numbers as below:
|
||||
- /api/v1/query_range: 2300 queries per second
|
||||
- /api/v1/query: 260 queries per second
|
||||
|
||||
|
||||
## Zerodha
|
||||
|
||||
[Zerodha](https://zerodha.com/) is India's largest stock broker. The monitoring team at Zerodha had the following requirements:
|
||||
@ -665,7 +666,6 @@ Numbers:
|
||||
- The average query rate is ~3k per second (mostly alert queries).
|
||||
- Query duration: median is ~40ms, 99th percentile is ~100ms.
|
||||
|
||||
|
||||
## Zomato
|
||||
|
||||
### Who We Are
|
||||
@ -679,13 +679,13 @@ As we scaled, our existing observability stack (Prometheus and Thanos) began to
|
||||
### Our Solution
|
||||
|
||||
To address these challenges, we decided to migrate to VictoriaMetrics. We were drawn to its reputation for high performance, low resource usage, and scalability. The migration process was carefully planned to ensure a smooth transition with minimal disruption. We focused on:
|
||||
- **Data Optimization**: We reduced unnecessary metrics to minimize data ingestion and storage needs.
|
||||
- **Performance Enhancements**: VictoriaMetrics’ efficient query processing allowed us to achieve significantly faster query response times.
|
||||
- **Cost Efficiency**: The optimized storage format in VictoriaMetrics led to a noticeable reduction in our storage and operational costs.
|
||||
|
||||
- **Data Optimization**: We reduced unnecessary metrics to minimize data ingestion and storage needs.
|
||||
- **Performance Enhancements**: VictoriaMetrics’ efficient query processing allowed us to achieve significantly faster query response times.
|
||||
- **Cost Efficiency**: The optimized storage format in VictoriaMetrics led to a noticeable reduction in our storage and operational costs.
|
||||
|
||||
### The Results
|
||||
|
||||
Post-migration, we successfully scaled our monitoring infrastructure to handle billions of data points daily, all while experiencing faster query performance and 60% reduction in yearly infra cost. The improved observability has enhanced our ability to deliver reliable service, allowing us to troubleshoot issues more quickly and effectively.
|
||||
|
||||
|
||||
Read more about the migration journey in our blog - https://blog.zomato.com/migrating-to-victoriametrics-a-complete-overhaul-for-enhanced-observability
|
||||
Read more about the migration journey in our blog - https://blog.zomato.com/migrating-to-victoriametrics-a-complete-overhaul-for-enhanced-observability
|
||||
|
Loading…
Reference in New Issue
Block a user