Runtime fields: Schema on read for Elastic

In 7.11, we’re excited to announce support for schema on read in the Elastic Stack. We now offer the best of both worlds on a single platform — the performance and scale of the existing schema on write mechanism that our users love and depend on, coupled with a new level of flexibility for defining and executing queries with schema on read. We call our implementation of schema on read runtime fields. Runtime fields enable you to create and query fields that are evaluated only at query time. Instead of indexing all the fields in your data as it’s ingested, you can pick and choose which fields are indexed and which ones are calculated only at runtime as you execute your queries. Runtime fields support new use cases and you won’t have to reindex any of your data. Adapt to changing log file formats, query new fields that were not indexed at ingest time, fix errors in index mappings, or iterate on the preparation and perfection of indices over time. Schema on write or schema on read — why not both?Schema on write remains the default way that Elasticsearch handles incoming data. All fields in a document are indexed as it’s ingested. This is what makes running searches in Elastic so fast, regardless of the volume of data returned or the number of queries executed. It’s also a big part of what our users love about Elastic. Schema on write works really well if you know your data before it is ingested. That way, the schema can be fully defined in the index mapping. It also requires sticking to that defined schema when queries are run against the index. However, in the real world data can often change. New data sources may arrive, for example. An added layer of flexibility to dynamically extract or query new fields after the data has been indexed adds tremendous value, even if it comes at a slight cost to performance.  That’s where schema on read comes in. Data can be ingested in raw form without any indexing, except for certain necessary fields such as timestamp or response code. Other fields can be created on the fly when queries are run against the data. You don’t need to have intimate knowledge of your data ahead of time, nor do you have to predict all the possible ways that the data may eventually be queried. You can change the data structure at any time, even after the documents have been indexed. Here’s what’s unique about how we’ve done it. We’ve built runtime fields on the same Elasticsearch platform — the same architecture, the same tools, and the same interfaces you are already using. There are no new datastores, languages, or components, and there’s no additional procedural overhead. Schema on read and schema on write work together and seamlessly complement each other, so that you can decide which fields to calculate when a query requires them and which fields to index when the document is ingested into Elasticsearch.  By offering you the best of both worlds on a single stack, we make it easy for you to decide which combination of schema on write and schema on read works best for your use cases.  More ways to get value out of your dataDefine fields at runtime and use them right away. Some of the most useful scenarios are outlined below. Add new fields to indexed documents The ability to add fields to documents is a common requirement for addressing a new need that did not exist when the documents were ingested and indexed. It could be the addition of an entirely new field, or just a field that needs to be calculated (or extracted and transformed) from other fields. For example, the date from the message field may need to be transformed to show the day of the week, or the IP address may need to be extracted from the client.address field so that it can be used in a map visualization. Either way, it’s significantly faster to just add a runtime field to address this than to update the index mapping and reindex the documents. Use ephemeral fieldsAnalysts can define ephemeral fields that only exist within the context of a query or visualization, allowing them to work in their own confines without affecting the document index and without overriding the work of others on the same data. Empowering users to autonomously adjust the data for their needs leads to lower costs and higher productivity, cutting out the need to request admin users to make changes to the index or wait for the requests to be approved and completed. Fix errors in the indexIndexed fields can be shadowed with runtime fields to immediately fix errors that were made when the documents were ingested. This capability makes those indexed documents more usable and eliminates the need for lengthy QA cycles, thus reducing costs and facilitating the rapid loading of documents. Iterate during data preparationAs new users set up their data sources and begin ingesting data, they may not have a complete picture of how to define or perfect their index mappings. Runtime fields give them the leverage to not have to index all fields ahead of time, and runtime fields can be added gradually based on need. Over time, users can then observe the sizes of their indices, as well as the frequency and performance of different kinds of queries. During peak hours, if the performance of the most frequently run queries is falling behind optimal levels, some of the runtime fields that appear in those queries can be converted to indexed fields. This workflow makes introducing new data sources much faster, reduces the size of indices, increases ingestion speed, and lowers costs. 7.11: Runtime fields in betaIn 7.11, runtime fields are available in beta. You can define a runtime field in an index mapping alongside the indexed fields. The runtime field definition requires nothing more than a datatype and a painless script that specifies how to evaluate the field at query time. Regex can be used in the painless script. You can also define a runtime field without a script, and the system will use the value of the _source field that has the same name as its value. You can then use the Elasticsearch API to query those runtime fields in the same way that you query the indexed fields via Query DSL. You can further define a runtime field on the fly in the query itself to operate within the confines of that query. In Kibana, in 7.11 runtime fields are exposed through the Kibana Query Language (KQL) bar. You can run queries or filters in KQL on runtime fields as you would on indexed fields. What’s next for runtime fieldsWhile 7.11 provides the fundamentals to get you going with runtime fields, we’re building a number of features to provide you with a comprehensive experience. Going forward, runtime fields will be fully exposed in Kibana for search and exploration in both Discover and Lens. Runtime fields defined in index mappings will be visible alongside the indexed fields to create index patterns, making it easy for you to build your visualizations from any combination of the two. Kibana will also expose a UI editor to define and test runtime fields directly in your query interface, and this editor will be available in the various Kibana apps. This will make it easy for analysts to create runtime fields on the fly by themselves, operating only within the context of the queries and visualizations they are running at any point in time.  Beyond that, we’ll add the ability to override indexed fields with runtime fields and convert runtime fields back to indexed fields. This will greatly help the use cases that focus on data preparation and perfecting indices over time. A Grok UI editor will be added, as will the ability to enrich your data at query time from an external index.  Balancing performance and flexibility, this simple yet powerful implementation of schema on read opens up new ways of indexing and searching data while complementing the existing schema on write mechanism that Elastic is known for.  Get started todayTo get started with runtime fields, spin up a cluster on the Elasticsearch Service or install the latest version of the Elastic Stack. Already have Elasticsearch running? Just upgrade your clusters to 7.11 and give it a try. If you’d like to know more about how it works, you can read the runtime fields detailed technical blog or the runtime fields documentation. 

https://www.elastic.co/blog/introducing-elasticsearch-runtime-fields

Created 4y | Feb 10, 2021, 7:20:34 PM


Login to add comment