Posts Tagged ‘apachekafka’

Using TensorFlow with IBM Event Streams
(Kafka + Machine Learning = Awesome)

Thursday, October 31st, 2019

In this post, I want to explain how to get started creating machine learning applications using the data you have on Kafka topics.

I’ve written a sample app, with examples of how you can use Kafka topics as:

  • a source of training data for creating machine learning models
  • a source of test data for evaluating machine learning models
  • an ongoing stream of events to make predictions about using machine learning models

I’ll use this post to explain how it works, and how you can use it as the basis of writing your first ML pipeline using the data on your own Kafka topics.

(more…)

Using Avro schemas from Python apps with IBM Event Streams

Thursday, October 17th, 2019

I’ve written before about how to write a schema for your developers using Kafka. The examples I used before were all in Java, but someone asked me yesterday if I could share some Python equivalents.

The principles are described in the Event Streams documentation, but in short, your Kafka producers use Apache Avro to serialize the message data that you send, and identify the schema that you’ve used in the Kafka message header. In your Kafka consumers, you look at the headers of the messages that you receive to know which schema to retrieve, and use that to deserialize message data.

(more…)

SQL queries on Kafka topics using Apache Hive

Tuesday, August 6th, 2019

Apache Hive is open source data warehouse software built on top of Hadoop. It gives you an SQL-like interface to a wide variety of databases, filesystems, and other systems.

One of the Hive storage handlers is a Kafka storage handler, which lets you create a Hive “external table” based on a Kafka topic.

And once you’ve created a Hive table based on a Kafka topic, you can run SQL queries based on attributes of the messages on that topic.

I was having a play with Hive this evening, as a way of running SQL queries against messages on my Kafka topics. In this post, I’ll share a few queries that I tried.


(more…)

How to write your first Avro schema

Saturday, July 20th, 2019

Any time there is more than one developer using a Kafka topic, they will need a way to agree on the shape of the data that will go into messages. The most common way to document the schema of messages in Kafka is to use the Apache Avro serialization system.

This post is a beginner’s guide to writing your first Avro schema, and a few tips for how to use it in your Kafka apps.

(more…)

An introduction to serverless and OpenWhisk for Kafka users

Saturday, July 13th, 2019

I gave a talk at Kafka Summit London this year about Apache OpenWhisk. It was aimed at Kafka users who want to know what the serverless hype is all about.

I covered:

  • a simple introduction of what serverless is for
  • an introduction to some of the serverless platforms available
  • a quick crash course in how to get started with Apache OpenWhisk

I also had a quick tangent looking into how Apache OpenWhisk itself uses Kafka internally, because I thought that was interesting!

My slides are on SlideShare if you’d like to see a higher-res version of any of them.

If this convinces you to give OpenWhisk a try, I have a post on how to get started with OpenWhisk that has all the commands you need to copy/paste to get yourself a working OpenWhisk environment connected to a Kafka source of events.

(more…)

Getting started with OpenWhisk and Kafka

Saturday, July 6th, 2019

Apache OpenWhisk (and serverless platforms in general) are a great way to host and manage code that you want to run in response to events.
Apache Kafka topics are a great source of events.

In this post, I’ll run through a super simple beginner’s guide to writing code for OpenWhisk that processes events on your Kafka topics.

(more…)

Using Node-RED with IBM Event Streams

Friday, June 28th, 2019


Click to enlarge

IBM Event Streams is the distributed streaming real-time data platform Apache Kafka, from IBM.

Node-RED is a visual flow-based development tool, with nodes that you drag and drop onto a canvas and wire together. It’s useful for loads of tasks, such as quick and flexible prototyping.

In this post, I’ll show how Event Streams and Node-RED work well together. You can use Node-RED to quickly and easily create flows that consume messages from Kafka topics, or that process events from different sources and produce the output to Kafka topics.

(more…)

Using kafkacat and kaf with IBM Event Streams

Sunday, June 9th, 2019

IBM Event Streams is IBM’s Kafka offering. Naturally it comes with it’s own UI and CLI tools, but one of the great things about Apache Kafka is that it’s not just a single thing from a single company – rather it is an active and diverse ecosystem, which means you’ve got a variety of tools to choose from.

I thought I’d try a couple of open source CLI tools, and share how to connect them and what they can do.

First up, kafkacat.

(more…)