Apache Hive is open source data warehouse software built on top of Hadoop. It gives you an SQL-like interface to a wide variety of databases, filesystems, and other systems.
One of the Hive storage handlers is a Kafka storage handler, which lets you create a Hive “external table” based on a Kafka topic.
And once you’ve created a Hive table based on a Kafka topic, you can run SQL queries based on attributes of the messages on that topic.
I was having a play with Hive this evening, as a way of running SQL queries against messages on my Kafka topics. In this post, I’ll share a few queries that I tried.