Posts Tagged ‘apachehive’

SQL queries on Kafka topics using Apache Hive

Tuesday, August 6th, 2019

Apache Hive is open source data warehouse software built on top of Hadoop. It gives you an SQL-like interface to a wide variety of databases, filesystems, and other systems.

One of the Hive storage handlers is a Kafka storage handler, which lets you create a Hive “external table” based on a Kafka topic.

And once you’ve created a Hive table based on a Kafka topic, you can run SQL queries based on attributes of the messages on that topic.

I was having a play with Hive this evening, as a way of running SQL queries against messages on my Kafka topics. In this post, I’ll share a few queries that I tried.


(more…)