mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2024-12-22 00:16:23 +01:00
bf081d157e
added json lines / json stream format for ingestion to vlinsert
117 lines
6.8 KiB
Markdown
117 lines
6.8 KiB
Markdown
# Data ingestion
|
||
|
||
[VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/) can accept logs from the following log collectors:
|
||
|
||
- Filebeat. See [how to setup Filebeat for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Filebeat.html).
|
||
- Logstash. See [how to setup Logstash for sending logs to VictoriaLogs](https://docs.victoriametrics.com/VictoriaLogs/data-ingestion/Logstash.html).
|
||
|
||
The ingested logs can be queried according to [these docs](https://docs.victoriametrics.com/VictoriaLogs/querying/).
|
||
|
||
See also [data ingestion troubleshooting](#troubleshooting) docs.
|
||
|
||
## HTTP APIs
|
||
|
||
VictoriaLogs supports the following data ingestion HTTP APIs:
|
||
|
||
- Elasticsearch bulk API. See [these docs](#elasticsearch-bulk-api).
|
||
- JSON stream API aka [ndjson](http://ndjson.org/). See [these docs](#json-stream-api).
|
||
|
||
VictoriaLogs accepts optional [HTTP parameters](#http-parameters) at data ingestion HTTP APIs.
|
||
|
||
### Elasticsearch bulk API
|
||
|
||
VictoriaLogs accepts logs in [Elasticsearch bulk API](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html)
|
||
format at `http://localhost:9428/insert/elasticsearch/_bulk` endpoint.
|
||
|
||
The following command pushes a single log line to Elasticsearch bulk API at VictoriaLogs:
|
||
|
||
```bash
|
||
echo '{"create":{}}
|
||
{"_msg":"cannot open file","_time":"2023-06-21T04:24:24Z","host.name":"host123"}
|
||
' | curl -X POST -H 'Content-Type: application/json' --data-binary @- http://localhost:9428/insert/elasticsearch/_bulk
|
||
```
|
||
|
||
The following command verifies that the data has been successfully pushed to VictoriaLogs by [querying](https://docs.victoriametrics.com/VictoriaLogs/querying/) it:
|
||
|
||
```bash
|
||
curl http://localhost:9428/select/logsql/query -d 'query=host.name:host123'
|
||
```
|
||
|
||
The command should return the following response:
|
||
|
||
```bash
|
||
{"_msg":"cannot open file","_stream":"{}","_time":"2023-06-21T04:24:24Z","host.name":"host123"}
|
||
```
|
||
|
||
### JSON stream API
|
||
|
||
VictoriaLogs supports HTTP API on `/insert/jsonline` endpoint for data ingestion where
|
||
body contains a JSON object in each line (separated by `\n`).
|
||
|
||
Here is an example:
|
||
|
||
```http request
|
||
POST http://localhost:9428/insert/jsonline/?_stream_fields=stream&_msg_field=log&_time_field=date
|
||
Content-Type: application/jsonl
|
||
{ "log": { "level": "info", "message": "hello world" }, "date": "2023‐06‐20T15:31:23Z", "stream": "stream1" }
|
||
{ "log": { "level": "error", "message": "oh no!" }, "date": "2023‐06‐20T15:32:10Z", "stream": "stream1" }
|
||
{ "log": { "level": "info", "message": "hello world" }, "date": "2023‐06‐20T15:35:11Z", "stream": "stream2" }
|
||
```
|
||
|
||
### HTTP parameters
|
||
|
||
VictoriaLogs accepts the following parameters at [data ingestion HTTP APIs](#http-apis):
|
||
|
||
- `_msg_field` - it must contain the name of the [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
|
||
with the [log message](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#message-field) generated by the log shipper.
|
||
This is usually the `message` field for Filebeat and Logstash.
|
||
If the `_msg_field` parameter isn't set, then VictoriaLogs reads the log message from the `_msg` field.
|
||
|
||
- `_time_field` - it must contain the name of the [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
|
||
with the [log timestamp](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#time-field) generated by the log shipper.
|
||
This is usually the `@timestamp` field for Filebeat and Logstash.
|
||
If the `_time_field` parameter isn't set, then VictoriaLogs reads the timestamp from the `_time` field.
|
||
If this field doesn't exist, then the current timestamp is used.
|
||
|
||
- `_stream_fields` - it should contain comma-separated list of [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) names,
|
||
which uniquely identify every [log stream](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields) collected the log shipper.
|
||
If the `_stream_fields` parameter isn't set, then all the ingested logs are written to default log stream - `{}`.
|
||
|
||
- `ignore_fields` - this parameter may contain the list of [log field](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model) names,
|
||
which must be ignored during data ingestion.
|
||
|
||
- `debug` - if this parameter is set to `1`, then the ingested logs aren't stored in VictoriaLogs. Instead,
|
||
the ingested data is logged by VictoriaLogs, so it can be investigated later.
|
||
|
||
See also [HTTP headers](#http-headers).
|
||
|
||
### HTTP headers
|
||
|
||
VictoriaLogs accepts optional `AccountID` and `ProjectID` headers at [data ingestion HTTP APIs](#http-apis).
|
||
These headers may contain the needed tenant to ingest data to. See [multitenancy docs](https://docs.victoriametrics.com/VictoriaLogs/#multitenancy) for details.
|
||
|
||
## Troubleshooting
|
||
|
||
VictoriaLogs provides the following command-line flags, which can help debugging data ingestion issues:
|
||
|
||
- `-logNewStreams` - if this flag is passed to VictoriaLogs, then it logs all the newly
|
||
registered [log streams](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields).
|
||
This may help debugging [high cardinality issues](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#high-cardinality).
|
||
- `-logIngestedRows` - if this flag is passed to VictoriaLogs, then it logs all the ingested
|
||
[log entries](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model).
|
||
See also `debug` [parameter](#http-parameters).
|
||
|
||
VictoriaLogs exposes various [metrics](https://docs.victoriametrics.com/VictoriaLogs/#monitoring), which may help debugging data ingestion issues:
|
||
|
||
- `vl_rows_ingested_total` - the number of ingested [log entries](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#data-model)
|
||
since the last VictoriaLogs restart. If this number icreases over time, then logs are successfully ingested into VictoriaLogs.
|
||
The ingested logs can be inspected in the following ways:
|
||
- By passing `debug=1` parameter to every request to [data ingestion APIs](#http-apis). The ingested rows aren't stored in VictoriaLogs
|
||
in this case. Instead, they are logged, so they can be investigated later.
|
||
The `vl_rows_dropped_total` [metric](https://docs.victoriametrics.com/VictoriaLogs/#monitoring) is incremented for each logged row.
|
||
- By passing `-logIngestedRows` command-line flag to VictoriaLogs. In this case it logs all the ingested data, so it can be investigated later.
|
||
- `vl_streams_created_total` - the number of created [log streams](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#stream-fields)
|
||
since the last VictoriaLogs restart. If this metric grows rapidly during extended periods of time, then this may lead
|
||
to [high cardinality issues](https://docs.victoriametrics.com/VictoriaLogs/keyConcepts.html#high-cardinality).
|
||
The newly created log streams can be inspected in logs by passing `-logNewStreams` command-line flag to VictoriaLogs.
|