Supporting Cross Node Interactive Queries in Kafka Streams

By Syed Wahaj

Introduction

Kafka Streams is a powerful stream processing library in Apache Kafka that allows you to build scalable and fault-tolerant stream processing applications. One of the key features of Kafka Streams is its support for interactive queries, which enables you to perform real-time lookups on the state maintained by your stream processing application. By default, interactive queries are limited to a single node, but you can extend this functionality to support cross-node interactive queries. In this article, we will explore how to achieve cross-node interactive queries in Kafka Streams.

Prerequisites

Before proceeding, ensure that you have the following:

Apache Kafka and Kafka Streams set up and running.
A basic understanding of Kafka Streams concepts, such as streams, state stores, and interactive queries.

Enabling Cross-Node Interactive Queries

By default, Kafka Streams enables local state stores on each processing node to support interactive queries. However, to achieve cross-node interactive queries, you need to configure your Kafka Streams application to use a shared, distributed state store that can be accessed by all processing nodes.

Here are the steps to enable cross-node interactive queries:

Step 1: Configure the State Store

In your Kafka Streams application, configure a state store with the appropriate characteristics to enable distribution across multiple nodes. You can use the Stores class to create the state store configuration.

StateStoreSupplier<KeyValueStore<String, String>> stateStoreSupplier =
    Stores.keyValueStoreBuilder(
        Stores.persistentKeyValueStore("myStateStore"),
        Serdes.String(),
        Serdes.String()
    );

streamsBuilder.addStateStore(stateStoreSupplier);

Step 2: Enable Queryable State

To make the state store queryable, use the Materialized class to define the queryable state store.

Materialized<String, String, KeyValueStore<Bytes, byte[]>> materialized =
    Materialized.<String, String>as(stateStoreSupplier)
        .withKeySerde(Serdes.String())
        .withValueSerde(Serdes.String());

KStream<String, String> processedStream = inputStream
    .filter(...)
    .mapValues(...)
    .groupByKey()
    .aggregate(..., materialized);

Step 3: Start the Queryable Store Server

To access the queryable state across multiple nodes, start a Queryable Store Server alongside your Kafka Streams application.

QueryableStoreProvider queryableStoreProvider =
    new KafkaStreamsQueryableStoreProvider(streams, queryableStoreName);

KafkaStreamsQueryServer queryServer =
    new KafkaStreamsQueryServer(queryableStoreProvider, queryPort);

queryServer.start();

Step 4: Query the State Store

With the Queryable Store Server running, you can query the state store from any Kafka Streams instance by connecting to the query server’s REST API.

RestTemplate restTemplate = new RestTemplate();
String queryUrl = "http://queryServerHost:queryServerPort/stores/myStateStore/key/myKey";

String result = restTemplate.getForObject(queryUrl, String.class);

Handling State Store Updates

When working with cross-node interactive queries in Kafka Streams, it’s important to consider how state store updates are handled. As multiple nodes may be involved, updates to the state store need to be propagated and synchronized across the cluster. Kafka Streams provides mechanisms to handle state store updates effectively.

State Restoration

To ensure consistency across nodes, Kafka Streams uses state restoration. When a Kafka Streams application starts, it restores the state store from the changelog topic, which contains all the updates made to the state store. By configuring the retention period and topic configuration appropriately, you can control how long state changes are retained and ensure that all nodes have access to the necessary data.

Rebalancing and State Redistribution

When scaling up or down, or when there is a failure or addition of Kafka Streams instances, Kafka Streams performs rebalancing. During rebalancing, partitions are redistributed among the available instances. Kafka Streams automatically handles state redistribution, ensuring that each instance has access to the required state partitions. This allows for seamless queryability across nodes, even as the cluster topology changes.

Handling Query Forwarding

When a query is made to a specific node, and that node does not host the queried state partition, Kafka Streams handles the forwarding of the query to the appropriate node. The forwarding process transparently redirects the query to the correct node, allowing the requester to receive the desired result without being aware of the underlying state distribution.

Conclusion

Supporting cross-node interactive queries in Kafka Streams opens up new possibilities for real-time data retrieval and lookup across distributed state stores. By configuring a shared, distributed state store, enabling queryable state, starting a Queryable Store Server, and handling state updates and query forwarding, you can achieve a robust and scalable system for cross-node interactive queries. Understanding the mechanisms and considerations involved in handling state updates and rebalancing ensures consistent and reliable queryability across multiple Kafka Streams instances. Utilize these techniques to enhance the functionality and flexibility of your stream processing applications.