VictoriaMetrics/docs/VictoriaLogs/data-ingestion/README.md
Alexander Marshalov bf081d157e
jsonline support for data ingestion in vlinsert (#4487)
added json lines / json stream format for ingestion to vlinsert
2023-06-21 15:31:28 +02:00

117 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Data ingestion
[VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/) can accept logs from the following log collectors:
- Filebeat. See [how to setup Filebeat for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Filebeat.html).
- Logstash. See [how to setup Logstash for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Logstash.html).
The ingested logs can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
See also [data ingestion troubleshooting](#troubleshooting) docs.
## HTTP APIs
VictoriaLogs supports the following data ingestion HTTP APIs:
- Elasticsearch bulk API. See [these docs](#elasticsearch-bulk-api).
- JSON stream API aka [ndjson](http://ndjson.org/). See [these docs](#json-stream-api).
VictoriaLogs accepts optional [HTTP parameters](#http-parameters) at data ingestion HTTP APIs.
### Elasticsearch bulk API
VictoriaLogs accepts logs in [Elasticsearch bulk API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
format at `http://localhost:9428/insert/elasticsearch/_bulk` endpoint.
The following command pushes a single log line to Elasticsearch bulk API at VictoriaLogs:
```bash
echo '{"create":{}}
{"_msg":"cannot open file","_time":"2023-06-21T04:24:24Z","host.name":"host123"}
' | curl -X POST -H 'Content-Type: application/json' --data-binary @- http://localhost:9428/insert/elasticsearch/_bulk
```
The following command verifies that the data has been successfully pushed to VictoriaLogs by [querying](https://docs.victoriametrics.com/VictoriaLogs/querying/) it:
```bash
curl http://localhost:9428/select/logsql/query -d 'query=host.name:host123'
```
The command should return the following response:
```bash
{"_msg":"cannot open file","_stream":"{}","_time":"2023-06-21T04:24:24Z","host.name":"host123"}
```
### JSON stream API
VictoriaLogs supports HTTP API on `/insert/jsonline` endpoint for data ingestion where
body contains a JSON object in each line (separated by `\n`).
Here is an example:
```http request
POST http://localhost:9428/insert/jsonline/?_stream_fields=stream&_msg_field=log&_time_field=date
Content-Type: application/jsonl
{ "log": { "level": "info", "message": "hello world" }, "date": "20230620T15:31:23Z", "stream": "stream1" }
{ "log": { "level": "error", "message": "oh no!" }, "date": "20230620T15:32:10Z", "stream": "stream1" }
{ "log": { "level": "info", "message": "hello world" }, "date": "20230620T15:35:11Z", "stream": "stream2" }
```
### HTTP parameters
VictoriaLogs accepts the following parameters at [data ingestion HTTP APIs](#http-apis):
- `_msg_field` - it must contain the name of the [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
with the [log message](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#message-field) generated by the log shipper.
This is usually the `message` field for Filebeat and Logstash.
If the `_msg_field` parameter isn't set, then VictoriaLogs reads the log message from the `_msg` field.
- `_time_field` - it must contain the name of the [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
with the [log timestamp](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field) generated by the log shipper.
This is usually the `@timestamp` field for Filebeat and Logstash.
If the `_time_field` parameter isn't set, then VictoriaLogs reads the timestamp from the `_time` field.
If this field doesn't exist, then the current timestamp is used.
- `_stream_fields` - it should contain comma-separated list of [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) names,
which uniquely identify every [log stream](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) collected the log shipper.
If the `_stream_fields` parameter isn't set, then all the ingested logs are written to default log stream - `{}`.
- `ignore_fields` - this parameter may contain the list of [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) names,
which must be ignored during data ingestion.
- `debug` - if this parameter is set to `1`, then the ingested logs aren't stored in VictoriaLogs. Instead,
the ingested data is logged by VictoriaLogs, so it can be investigated later.
See also [HTTP headers](#http-headers).
### HTTP headers
VictoriaLogs accepts optional `AccountID` and `ProjectID` headers at [data ingestion HTTP APIs](#http-apis).
These headers may contain the needed tenant to ingest data to. See [multitenancy docs](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy) for details.
## Troubleshooting
VictoriaLogs provides the following command-line flags, which can help debugging data ingestion issues:
- `-logNewStreams` - if this flag is passed to VictoriaLogs, then it logs all the newly
registered [log streams](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields).
This may help debugging [high cardinality issues](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#high-cardinality).
- `-logIngestedRows` - if this flag is passed to VictoriaLogs, then it logs all the ingested
[log entries](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
See also `debug` [parameter](#http-parameters).
VictoriaLogs exposes various [metrics](https://docs.victoriametrics.com/VictoriaLogs/#monitoring), which may help debugging data ingestion issues:
- `vl_rows_ingested_total` - the number of ingested [log entries](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
since the last VictoriaLogs restart. If this number icreases over time, then logs are successfully ingested into VictoriaLogs.
The ingested logs can be inspected in the following ways:
- By passing `debug=1` parameter to every request to [data ingestion APIs](#http-apis). The ingested rows aren't stored in VictoriaLogs
in this case. Instead, they are logged, so they can be investigated later.
The `vl_rows_dropped_total` [metric](https://docs.victoriametrics.com/VictoriaLogs/#monitoring) is incremented for each logged row.
- By passing `-logIngestedRows` command-line flag to VictoriaLogs. In this case it logs all the ingested data, so it can be investigated later.
- `vl_streams_created_total` - the number of created [log streams](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields)
since the last VictoriaLogs restart. If this metric grows rapidly during extended periods of time, then this may lead
to [high cardinality issues](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#high-cardinality).
The newly created log streams can be inspected in logs by passing `-logNewStreams` command-line flag to VictoriaLogs.