diff --git a/VictoriaLogs/FAQ.md b/VictoriaLogs/FAQ.md new file mode 100644 index 0000000..2ef7fdc --- /dev/null +++ b/VictoriaLogs/FAQ.md @@ -0,0 +1,113 @@ +# VictoriaLogs FAQ + +## What is the difference between VictoriaLogs and Elasticsearch (OpenSearch)? + +Both Elasticsearch and VictoriaLogs allow ingesting structured and unstructured logs +and performing fast full-text search over the ingested logs. + +Elasticsearch and OpenSearch are designed as general-purpose databases for fast full-text search over large set of documents. +They aren't optimized specifically for logs. This results in the following issues, which are resolved by VictoriaLogs: + +- High RAM usage +- High disk space usage +- Non-trivial index setup +- Inability to select more than 10K matching log lines in a single query + +VictoriaLogs is optimized specifically for logs. So it provides the following features useful for logs: + +- Easy to setup and operate. There is no need in tuning configuration for optimal performance or in creating any indexes for various log types. + Just run VictoriaLogs on the most suitable hardware - and it automatically provides the best performance. +- Up to 30x less RAM usage than ElasticSearch for the same workload. +- Up to 15x less disk space usage than ElasticSearch for the same amounts of stored logs. +- Ability to work with hundreds of terabytes of logs on a single node. +- Very easy to use query language optimized for typical log analysis tasks - [LogsQL](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html). +- Fast full-text search over all the [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). +- Good integration with traditional command-line tools for log analysis. See [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#command-line). + + +## What is the difference between VictoriaLogs and Grafana Loki? + +Both Grafana Loki and VictoriaLogs are designed for log management and processing. +Both systems support [log stream](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) concept. + +VictoriaLogs and Grafana Loki have the following differences: + +- Grafana Loki doesn't support high-cardinality log fields (aka labels) such as `user_id`, `trace_id` or `ip`. + It starts consuming huge amounts of RAM and working very slowly when logs with high-cardinality fields are ingested into it. + See [these docs](https://grafana.com/docs/loki/latest/best-practices/) for details. + + VictoriaMetrics supports high-cardinality [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). + It automatically indexes all the ingested log fields and allows performing fast full-text search over any fields. + +- Grafana Loki provides very inconvenient query language - [LogQL](https://grafana.com/docs/loki/latest/logql/). + This query language is hard to use for typical log analysis tasks. + + VictoriaMetrics provides easy to use query language for typical log analysis tasks - [LogsQL](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html). + +- VictoriaLogs performs typical full-text queries up to 1000x faster than Grafana Loki. + +- VictoriaLogs needs less storage space than Grafana Loki for the same amounts of logs. + +- VictoriaLogs is much easier to setup and operate than Grafana Loki. + + +## What is the difference between VictoriaLogs and ClickHouse? + +ClickHouse is an extremely fast and efficient analytical database. It can be used for logs storage, analysis and processing. +VictoriaLogs is designed solely for logs. VictoriaLogs uses [similar design ideas as ClickHouse](#how-does-victorialogs-work) for achieving high performance. + +- ClickHouse is good for logs if you know the set of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) beforehand. + Then you can create a table with a column per each log field and achieve the maximum possible query performance in ClickHouse. + + If the set of log fields isn't known beforehand, or if it can change at any time, then ClickHouse can still be used, + but its' efficiency may suffer significantly, depending on how you design the database schema for log storage. + + ClickHouse efficiency highly depends on the used database schema. It must be optimized for the particular workload + for achieving high efficiency and query performance. + + VictoriaLogs works optimally with any log types out of the box - structured, unstructured and mixed. + It works optimally with any sets of [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model), + which can change in any way across different log sources. + +- ClickHouse provides SQL dialect with additional analytical functionality. It allows performing arbitrary complex analytical queries + over the stored logs. + + VictoriaLogs provides easy to use query language with full-text search support specifically optimized + log analysis - [LogsQL](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html). + LogsQL is usually much easier to use than SQL for typical log analysis tasks. + +- VictoriaLogs accepts logs from popular log shippers - see [these docs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/). + + ClickHouse needs an intermediate applications for converting the ingested logs into `INSERT` SQL statements for the particular database schema. + This may increase the complexity of the system and, subsequently, increase its' maintenance costs. + + +## How does VictoriaLogs work? + +VictoriaLogs accepts logs as [JSON entries](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model). +It then stores every field value into a distinct data block. E.g. values for the same field across multiple log entries +are stored in a single data block. This allow reading data blocks only for the needed fields during querying. + +Data blocks are compressed before being stored on disk. This allows saving disk space and improving query performance +when it is limited by disk read IO bandwidth. + +Smaller data blocks are merged into bigger blocks in background. Data blocks are limited in size. If the size of data block exceeds the limit, +then it is split into multiple blocks of smaller sizes. + +Every data block is processed in an atomic manner during querying. For example, if the data block contains at least a single value, +which needs to be processed, then the whole data block is unpacked and read at once. Data blocks are processed in parallel +on all the available CPU cores during querying. This allows scaling query performance with the number of available CPU cores. + +This architecture is inspired by [ClickHouse architecture](https://clickhouse.com/docs/en/development/architecture). + +On top of this, VictoriaLogs employs additional optimizations for achieving high query performance: + +- It uses [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) for skipping blocks without the given + [word](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#word-filter) or [phrase](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#phrase-filter). +- It uses custom encoding and compression for fields with different data types. + For example, it encodes IP addresses as 4-byte tuples. Custom fields' encoding reduces data size on disk and improves query performance. +- It physically groups logs for the same [log stream](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) + close to each other. This improves compression ratio, which helps reducing disk space usage. This also improves query performance + by skipping blocks for unneeded streams when [stream filter](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#stream-filter) is used. +- It maintains sparse index for [log timestamps](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field), + which allow improving query performance when [time filter](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html#time-filter) is used. diff --git a/VictoriaLogs/QuickStart.md b/VictoriaLogs/QuickStart.md index 29f45dc..ca73593 100644 --- a/VictoriaLogs/QuickStart.md +++ b/VictoriaLogs/QuickStart.md @@ -134,5 +134,3 @@ Here are a Docker-compose demos, which start VictoriaLogs and push logs to it vi You can use [this Helm chart](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-logs-single/README.md) as a demo for running Fluentbit in Kubernetes with VictoriaLogs. - - diff --git a/VictoriaLogs/README.md b/VictoriaLogs/README.md index a740518..506861b 100644 --- a/VictoriaLogs/README.md +++ b/VictoriaLogs/README.md @@ -1,8 +1,9 @@ # VictoriaLogs -VictoriaLogs is log management and log analytics system from [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/). +VictoriaLogs is [open source](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/app/victoria-logs) user-friendly database for logs +from [VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/). -It provides the following key features: +VictoriaLogs provides the following key features: - VictoriaLogs can accept logs from popular log collectors. See [these docs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/). - VictoriaLogs is much easier to setup and operate comparing to ElasticSearch and Grafana Loki. @@ -16,6 +17,8 @@ It provides the following key features: It runs smoothly on both Raspberry PI and a server with hundreds of CPU cores and terabytes of RAM. - VictoriaLogs can handle much bigger data volumes than ElasticSearch and Grafana Loki when running on comparable hardware. See [these docs](#benchmarks). +- VictoriaLogs supports fast full-text search over high-cardinality [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) + such as `trace_id`, `user_id` and `ip`. - VictoriaLogs supports multitenancy - see [these docs](#multitenancy). - VictoriaLogs supports out of order logs' ingestion aka backfilling. - VictoriaLogs provides simple web UI for querying logs - see [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/#web-ui). @@ -24,7 +27,8 @@ VictoriaLogs is at Preview stage now. It is ready for evaluation in production a It isn't recommended migrating from existing logging solutions to VictoriaLogs Preview in general case yet. See the [Roadmap](https://docs.victoriametrics.com/VictoriaLogs/Roadmap.html) for details. -If you have questions about VictoriaLogs, then feel free asking them at [VictoriaMetrics community Slack chat](https://slack.victoriametrics.com/). +If you have questions about VictoriaLogs, then read [this FAQ](https://docs.victoriametrics.com/VictoriaLogs/FAQ.html). +Also feel free asking any questions at [VictoriaMetrics community Slack chat](https://slack.victoriametrics.com/). See [Quick start docs](https://docs.victoriametrics.com/VictoriaLogs/QuickStart.html) for start working with VictoriaLogs. diff --git a/keyConcepts.md b/keyConcepts.md index a3670d7..549d7d2 100644 --- a/keyConcepts.md +++ b/keyConcepts.md @@ -744,21 +744,22 @@ If you need to export raw samples from VictoriaMetrics, then take a look at [exp ### Query latency -By default, Victoria Metrics does not immediately return the recently written samples. Instead, it retrieves the last results written prior to the time specified by the `search.latencyOffset` flag, which has a default offset of 30 seconds. +By default, Victoria Metrics does not immediately return the recently written samples. Instead, it retrieves the last results +written prior to the time specified by the `-search.latencyOffset` command-line flag, which has a default offset of 30 seconds. This is true for both `query` and `query_range` and may give the impression that data is written to the VM with a 30-second delay. -But this flag avoids non-consistent results due to the fact that only part of the values are scraped in the last scrape interval. +This flag prevents from non-consistent results due to the fact that only part of the values are scraped in the last scrape interval. -Here is an illustration of a potential problem when `search.latencyOffset` is set to zero: +Here is an illustration of a potential problem when `-search.latencyOffset` is set to zero: -When this flag is set, the VM will return the last metric value collected before the `search.latencyOffset` -duration throughout the `search.latencyOffset` duration: +When this flag is set, the VM will return the last metric value collected before the `-search.latencyOffset` +duration throughout the `-search.latencyOffset` duration: -It can be overridden on per-query basis via `latency_offset` arg. +It can be overridden on per-query basis via `latency_offset` query arg. ### MetricsQL