added more info and examples about data ingestion and collectors to VictoriaLogs docs (#4490)

2024-11-23 20:37:12 +01:00 · 2023-06-21 16:58:43 +02:00 · 2023-06-21 16:58:43 +02:00 · 892fa32743
commit 892fa32743
parent 4a7b17ed76
7 changed files with 273 additions and 5 deletions
--- a/docs/VictoriaLogs/QuickStart.md
+++ b/docs/VictoriaLogs/QuickStart.md
@ -9,6 +9,7 @@ before you start working with VictoriaLogs.
 There are the following options exist:
 - [To run Docker image](#docker-image)
 - [To run in Kubernetes with helm-charts](#helm-charts)
 - [To build VictoriaLogs from source code](#building-from-source-code)
 ### Docker image
@ -21,6 +22,11 @@ docker run --rm -it -p 9428:9428 -v ./victoria-logs-data:/victoria-logs-data \
  docker.io/victoriametrics/victoria-logs:heads-public-single-node-0-ga638f5e2b
 ```
 ### Helm charts
 You can run VictoriaLogs in Kubernetes environment
 with [helm-charts](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-logs-single/README.md).
 ### Building from source code
 Follow the following steps in order to build VictoriaLogs from source code:
@ -50,6 +56,8 @@ It has no any external dependencies, so it may run in various environments witho
 VictoriaLogs automatically adapts to the available CPU and RAM resources. It also automatically setups and creates
 the needed indexes during [data ingestion](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/).
 ## How to configure VictoriaLogs
 It is possible to change the TCP port via `-httpListenAddr` command-line flag. For example, the following command
 starts VictoriaLogs, which accepts incoming requests at port `9200` (aka ElasticSearch HTTP API port):
@ -66,3 +74,26 @@ E.g. it uses the retention of 7 days. Read [these docs](https://docs.victoriamet
 for the [ingested](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/) logs.
 It is recommended setting up monitoring of VictoriaLogs according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/#monitoring).
 ## How to send logs to VictoriaLogs
 You can setup data ingestion for VictoriaLogs via the following ways:
 - Configure one of the [supported log collectors](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#http-apis) to send logs to VictoriaLogs.
 - Configure your own log collector to send logs to VictoriaLogs via [supported log ingestion protocols](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#log-collectors-and-data-ingestion-formats).
 Here are a demos for running popular supported log collectors in docker with VictoriaLogs:
 - [**Filebeat (docker)**](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/filebeat-docker)
 - [**Fluentbit (docker)**](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/fluentbit-docker)
 - [**Logstash (docker)**](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/logstash)
 - [**Vector (docker)**](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/vector-docker)
 And you can use [helm chart](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-logs-single/README.md)
 as demo for running fluentbit in kubernetes with VictoriaLogs:
 - [Fluentbit (k8s)](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-logs-single/values.yaml)
 ## How to query logs in VictoriaLogs
 See details in [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying).
--- a/docs/VictoriaLogs/Roadmap.md
+++ b/docs/VictoriaLogs/Roadmap.md
@ -17,8 +17,6 @@ The following functionality is planned in the future versions of VictoriaLogs:
 - Support for [data ingestion](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/) from popular log collectors and formats:
  - Promtail (aka Grafana Loki)
  - Vector.dev
  - Fluentbit
  - Fluentd
  - Syslog
 - Add missing functionality to [LogsQL](https://docs.victoriametrics.com/VictoriaLogs/LogsQL.html):
--- a/docs/VictoriaLogs/data-ingestion/Filebeat.md
+++ b/docs/VictoriaLogs/data-ingestion/Filebeat.md
@ -1,5 +1,9 @@
 # Filebeat setup
 [Filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html) log collector supports
 [Elasticsearch output](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html) compatible with
 VictoriaMetrics [ingestion format](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#elasticsearch-bulk-api).
 Specify [`output.elasicsearch`](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html) section in the `filebeat.yml`
 for sending the collected logs to [VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/):
@ -72,7 +76,7 @@ output.elasticsearch:
  compression_level: 1
 ```
-By default the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy).
+By default, the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy).
 If you need storing logs in other tenant, then specify the needed tenant via `headers` at `output.elasticsearch` section.
 For example, the following `filebeat.yml` config instructs Filebeat to store the data to `(AccountID=12, ProjectID=34)` tenant:
@ -88,6 +92,12 @@ output.elasticsearch:
    _stream_fields: "host.name,log.file.path"
 ```
 More info about output parameters you can find in [these docs](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html).
 [Here is a demo](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/filebeat-docker) for
 running Filebeat with VictoriaLogs with docker-compose and collecting logs to VictoriaLogs.
 The ingested log entries can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
 See also [data ingestion troubleshooting](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#troubleshooting) docs.
--- a/docs/VictoriaLogs/data-ingestion/Fluentbit.md
+++ b/docs/VictoriaLogs/data-ingestion/Fluentbit.md
@ -0,0 +1,70 @@
 ## Fluentbit setup
 [Fluentbit](https://docs.fluentbit.io/manual) log collector supports [HTTP output](https://docs.fluentbit.io/manual/pipeline/outputs/http) compatible with
 VictoriaMetrics [JSON stream API](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#json-stream-api).
 Specify [`output`](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html) section with `Name http` in the `fluentbit.conf`
 for sending the collected logs to VictoriaLogs:
 ```conf
 [Output]
     Name http
     Match *
     host localhost
     port 9428
     uri /insert/jsonline/?_stream_fields=stream&_msg_field=log&_time_field=date
     format json_lines
     json_date_format iso8601
 ```
 Substitute the address (`localhost`) and port (`9428`) inside `Output` section with the real TCP address of VictoriaLogs.
 The `_msg_field` parameter must contain the field name with the log message generated by Fluentbit. This is usually `message` field.
 See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#message-field) for details.
 The `_time_field` parameter must contain the field name with the log timestamp generated by Fluentbit. This is usually `@timestamp` field.
 See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field) for details.
 It is recommended specifying comma-separated list of field names, which uniquely identify every log stream collected by Fluentbit, in the `_stream_fields` parameter.
 See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) for details.
 If the Fluentbit sends logs to VictoriaLogs in another datacenter, then it may be useful enabling data compression via `compress` option.
 This usually allows saving network bandwidth and costs by up to 5 times:
 ```conf
 [Output]
     Name http
     Match *
     host localhost
     port 9428
     uri /insert/jsonline/?_stream_fields=stream&_msg_field=log&_time_field=date
     format json_lines
     json_date_format iso8601
     compress gzip
 ```
 By default, the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#multitenancy).
 If you need storing logs in other tenant, then specify the needed tenant via `headers` at `output.elasticsearch` section.
 For example, the following `fluentbit.conf` config instructs Filebeat to store the data to `(AccountID=12, ProjectID=34)` tenant:
 ```conf
 [Output]
     Name http
     Match *
     host localhost
     port 9428
     uri /insert/jsonline/?_stream_fields=stream&_msg_field=log&_time_field=date
     format json_lines
     json_date_format iso8601
     header AccountID 12
     header ProjectID 23
 ```
 More info about output tuning you can find in [these docs](https://docs.fluentbit.io/manual/pipeline/outputs/http).
 [Here is a demo](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/fluentbit-docker)
 for running Fluentbit with VictoriaLogs with docker-compose and collecting logs from docker-containers to VictoriaLogs.
 The ingested log entries can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
 See also [data ingestion troubleshooting](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#troubleshooting) docs.
--- a/docs/VictoriaLogs/data-ingestion/Logstash.md
+++ b/docs/VictoriaLogs/data-ingestion/Logstash.md
@ -1,5 +1,10 @@
 # Logstash setup
 [Logstash](https://www.elastic.co/guide/en/logstash/8.8/introduction.html) log collector supports
 [Opensearch output plugin](https://github.com/opensearch-project/logstash-output-opensearch) compatible with
 [Elasticsearch bulk API](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#elasticsearch-bulk-api)
 in VictoriaMetrics.
 Specify [`output.elasticsearch`](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html) section in the `logstash.conf` file
 for sending the collected logs to [VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/):
@ -74,7 +79,7 @@ output {
 }
 ```
-By default the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy).
+By default, the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy).
 If you need storing logs in other tenant, then specify the needed tenant via `custom_headers` at `output.elasticsearch` section.
 For example, the following `logstash.conf` config instructs Logstash to store the data to `(AccountID=12, ProjectID=34)` tenant:
@ -95,6 +100,12 @@ output {
 }
 ```
 More info about output tuning you can find in [these docs](https://github.com/opensearch-project/logstash-output-opensearch/blob/main/README.md).
 [Here is a demo](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/logstash)
 for running Logstash with VictoriaLogs with docker-compose and collecting logs to VictoriaLogs
 (via [Elasticsearch bulk API](https://docs.victoriametrics.com/VictoriaLogs/daat-ingestion/#elasticsearch-bulk-api)).
 The ingested log entries can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
 See also [data ingestion troubleshooting](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#troubleshooting) docs.
--- a/docs/VictoriaLogs/data-ingestion/README.md
+++ b/docs/VictoriaLogs/data-ingestion/README.md
@ -3,7 +3,11 @@
 [VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/) can accept logs from the following log collectors:
 - Filebeat. See [how to setup Filebeat for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Filebeat.html).
 - Fluentbit. See [how to setup Fluentbit for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Fluentbit.html).
 - Logstash. See [how to setup Logstash for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Logstash.html).
 - Vector. See [how to setup Vector for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Vector.html).
 See also [Log collectors and data ingestion formats](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#log-collectors-and-data-ingestion-formats) in VictoriaMetrics.
 The ingested logs can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
@ -21,7 +25,8 @@ VictoriaLogs accepts optional [HTTP parameters](#http-parameters) at data ingest
 ### Elasticsearch bulk API
 VictoriaLogs accepts logs in [Elasticsearch bulk API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
-format at `http://localhost:9428/insert/elasticsearch/_bulk` endpoint.
+/ [OpenSearch Bulk API](http://opensearch.org/docs/1.2/opensearch/rest-api/document-apis/bulk/) format
 at `http://localhost:9428/insert/elasticsearch/_bulk` endpoint.
 The following command pushes a single log line to Elasticsearch bulk API at VictoriaLogs:
@ -114,3 +119,14 @@ VictoriaLogs exposes various [metrics](https://docs.victoriametrics.com/Victoria
  since the last VictoriaLogs restart. If this metric grows rapidly during extended periods of time, then this may lead
  to [high cardinality issues](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#high-cardinality).
  The newly created log streams can be inspected in logs by passing `-logNewStreams` command-line flag to VictoriaLogs.
 ## Log collectors and data ingestion formats
 Here is the list of supported collectors and their ingestion formats supported by VictoriaLogs:
 | Collector                                                                                | Elasticsearch                                                                              | JSON Stream                                                   |
 |------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|---------------------------------------------------------------|
 | [filebeat](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Filebeat.html)   | [Yes](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html)    | No                                                            |
 | [fluentbit](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Fluentbit.html) | No                                                                                         | [Yes](https://docs.fluentbit.io/manual/pipeline/outputs/http) |
 | [logstash](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Logstash.html)   | [Yes](https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html) | No                                                            |
 | [vector](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Vector.html)       | [Yes](https://vector.dev/docs/reference/configuration/sinks/elasticsearch/)                | No                                                            |
--- a/docs/VictoriaLogs/data-ingestion/Vector.md
+++ b/docs/VictoriaLogs/data-ingestion/Vector.md
@ -0,0 +1,132 @@
 # Vector setup
 [Vector](http://vector.dev) log collector supports
 [Elasticsearch sink](https://vector.dev/docs/reference/configuration/sinks/elasticsearch/) compatible with
 [VictoriaMetrics Elasticsearch bulk API](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#elasticsearch-bulk-api).
 Specify [`sinks.vlogs`](https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html)  with `type=elasticsearch` section in the `vector.toml`
 for sending the collected logs to VictoriaLogs:
 ```toml
 [sinks.vlogs]
  inputs = [ "your_input" ]
  type = "elasticsearch"
  endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
  mode = "bulk"
  api_version = "v8"
  healthcheck.enabled = false
  [sinks.vlogs.query]
    _msg_field = "message"
    _time_field = "timestamp"
    _stream_fields = "host,container_name"
 ```
 Substitute the `localhost:9428` address inside `endpoints` section with the real TCP address of VictoriaLogs.
 Replace `your_input` with the name of the `inputs` section, which collects logs. See [these docs](https://vector.dev/docs/reference/configuration/sources/) for details.
 The `_msg_field` parameter must contain the field name with the log message generated by Vector. This is usually `message` field.
 See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#message-field) for details.
 The `_time_field` parameter must contain the field name with the log timestamp generated by Vector. This is usually `@timestamp` field.
 See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field) for details.
 It is recommended specifying comma-separated list of field names, which uniquely identify every log stream collected by Vector, in the `_stream_fields` parameter.
 See [these docs](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) for details.
 If some [log fields](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) aren't needed,
 then VictoriaLogs can be instructed to ignore them during data ingestion - just pass `ignore_fields` parameter with comma-separated list of fields to ignore.
 For example, the following config instructs VictoriaLogs to ignore `log.offset` and `event.original` fields in the ingested logs:
 ```toml
 [sinks.vlogs]
  inputs = [ "your_input" ]
  type = "elasticsearch"
  endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
  mode = "bulk"
  api_version = "v8"
  healthcheck.enabled = false
  [sinks.vlogs.query]
    _msg_field = "message"
    _time_field = "timestamp"
    _stream_fields = "host,container_name"
    ignore_fields = "log.offset,event.original"
 ```
 More details about `_msg_field`, `_time_field`, `_stream_fields` and `ignore_fields` are
 available [here](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#http-parameters).
 When Vector ingests logs into VictoriaLogs at a high rate, then it may be needed to tune `batch.max_events` option.
 For example, the following config is optimized for higher than usual ingestion rate:
 ```toml
 [sinks.vlogs]
  inputs = [ "your_input" ]
  type = "elasticsearch"
  endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
  mode = "bulk"
  api_version = "v8"
  healthcheck.enabled = false
  [sinks.vlogs.query]
    _msg_field = "message"
    _time_field = "timestamp"
    _stream_fields = "host,container_name"
  [sinks.vlogs.batch]
    max_events = 1000
 ```
 If the Vector sends logs to VictoriaLogs in another datacenter, then it may be useful enabling data compression via `compression` option.
 This usually allows saving network bandwidth and costs by up to 5 times:
 ```toml
 [sinks.vlogs]
  inputs = [ "your_input" ]
  type = "elasticsearch"
  endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
  mode = "bulk"
  api_version = "v8"
  healthcheck.enabled = false
  compression = "gzip"
  [sinks.vlogs.query]
    _msg_field = "message"
    _time_field = "timestamp"
    _stream_fields = "host,container_name"
 ```
 By default, the ingested logs are stored in the `(AccountID=0, ProjectID=0)` [tenant](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#multitenancy).
 If you need storing logs in other tenant, then specify the needed tenant via `custom_headers` at `output.elasticsearch` section.
 For example, the following `vector.toml` config instructs Logstash to store the data to `(AccountID=12, ProjectID=34)` tenant:
 ```toml
 [sinks.vlogs]
  inputs = [ "your_input" ]
  type = "elasticsearch"
  endpoints = [ "http://localhost:9428/insert/elasticsearch/" ]
  mode = "bulk"
  api_version = "v8"
  healthcheck.enabled = false
  [sinks.vlogs.query]
    _msg_field = "message"
    _time_field = "timestamp"
    _stream_fields = "host,container_name"
  [sinks.vlogs.request.headers]
    AccountID = "12"
    ProjectID = "34"
 ```
 More info about output tuning you can find in [these docs](https://vector.dev/docs/reference/configuration/sinks/elasticsearch/).
 [Here is a demo](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker/victorialogs/vector-docker)
 for running Vector with VictoriaLogs with docker-compose and collecting logs from docker-containers
 to VictoriaLogs (via [Elasticsearch API](https://docs.victoriametrics.com/VictoriaLogs/ingestion/#elasticsearch-bulk-api)).
 The ingested log entries can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
 See also [data ingestion troubleshooting](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/#troubleshooting) docs.