An introduction to the Elastic data stream naming scheme

With Elastic 7.9, the Elastic Agent and Fleet were released, along with a new way to structure indices and data streams in Elasticsearch for time series data. In this blog post, we'll give an overview of the Elastic data stream naming scheme and how it works. This is the first in a series of blog posts around the Elastic data stream naming scheme. Elastic data stream naming scheme The Elastic data stream naming scheme is made for time series data and consists of splitting datasets into different data streams using the following naming convention.

type: Generic type describing the data dataset: Describes the data ingested and its structure namespace: User-configurable arbitrary grouping

These three parts are combined by a “-” and result in data streams like logs-nginx.access-production. In all three parts, the “-” character is not allowed. This means all data streams are named in the following way: {type}-{dataset}-{namespace} For both dataset and namespace there is a default value, which is dataset=generic and namespace=default. In the case of Elastic Agent, if a user just starts to ingest a log file, the data ends up in logs-generic-default. To have all benefits of the Elastic data stream naming scheme, each document must contain the following three fields:

data_stream.type data_stream.dataset data_stream.namespace

More details about these fields can be found in the Elastic Common Schema (ECS). The above fields are mapped as constant keyword fields, which makes querying on them efficient by reducing the number of shards that have to be queried. Benefits of the Elastic data stream naming scheme The Elastic data stream naming scheme has a few benefits over previous indexing strategies used by Beats and Logstash. Instead of very few large indices, many smaller but denser data streams are used. A short summary of the benefits:

Reduced number of fields per index: As the data is split up per data set across multiple data streams, each data stream contains a minimal set of fields. This leads to better space efficiency and faster queries. More granular control of the data: Having the data split up by data set and namespace allows granular control over rollover, retention, and security permissions. Flexibility: Users can use the namespace to divide and organize data in any way they want. Better curated experiences: Due to the common structure of the Elastic data stream naming scheme, it is possible to build a better curated experience on top of the data streams. Fewer ingest permissions needed: Before, the setup of templates and ingest pipelines was performed by the Elastic Agent. As this now happens in a centralized way, the ingestion tool only needs permissions to append data.

Usage of the Elastic data stream naming scheme The Elastic data stream naming scheme is supported from Elastic Stack version 7.9 and newer, as it requires support for data streams, the new Elasticsearch component templates, and constant keywords. Index templates for logs-- and metrics-- ship with Elasticsearch >=7.9. All data shipped with the Elastic Agent uses the Elastic data stream naming scheme. To use it for any other data shipper, just follow the naming structure and add the data_stream fields to make it work. Summary This is a short summary of the Elastic data stream naming scheme. In follow-up blog posts, we'll dive into the technical details on how it works behind the scenes, how it is used by the Elastic Agent in detail, and how you can use it for your own benefit. For additional insight, watch the deep dive into the new Elastic indexing strategy on the Elastic Community YouTube channel. https://www.elastic.co/blog/an-introduction-to-the-elastic-data-stream-naming-scheme

Created 4y | Dec 23, 2020, 4:20:31 PM


Login to add comment