Migrating your Apache Kafka cluster using MirrorMaker 2

You have a Kafka cluster that you have been using for a while. Your cluster has many topics, and the topics have many messages.

Now you’ve decided to move and start using a new, different Kafka cluster somewhere else.

How can you take your topics with you?

Huge thanks to Andrew Borley for co-writing this with me. Useful insights in here probably came from him, the mistakes from me.

Terminology

For the purposes of this post, we’ll refer to the two Kafka clusters as:

  • “origin” – your existing Kafka cluster that you are migrating from
  • “target” – your new Kafka cluster that you are migrating to

The instructions here use the Strimzi Operator as a convenient way to configure and run MirrorMaker 2, but neither the “origin” or “target” Kafka clusters need to be managed by the Strimzi Operator for this to work.

We’ll be running MirrorMaker 2 in a namespace called “migration”.

Overview

This post is a brief introduction to how you can set up and run MirrorMaker 2 for data migration using the Strimzi Operator.

MirrorMaker 2 can be run in a variety of different ways, for a variety of use cases – for example, it can be used for disaster-recovery use-cases where it is run as a continuous background mirroring process. However, for the purposes of this post, the requirement is a one-off data migration, so the description here will describe how to do that.

Prerequisites

Before you start, you need to obtain credentials and TLS certificates for both of your Kafka clusters:

Credentials from the “origin” cluster

If your “origin” cluster requires authentication, you will need to create credentials for MirrorMaker 2 to use.

If your “origin” cluster is managed by Strimzi, you could do this by creating a KafkaUser resource something like this (below).

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaUser
metadata:
  name: mm2-credentials
  labels:
    strimzi.io/cluster: origin
  namespace: origin-ns
spec:
  authentication:
    type: scram-sha-512
  authorization:
    acls:
      - host: '*'
        operation: Read
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: Describe
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: DescribeConfigs
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: Create
        resource:
          type: cluster
      - host: '*'
        operation: Read
        resource:
          type: cluster
      - host: '*'
        operation: Describe
        resource:
          type: cluster
      - host: '*'
        operation: Write
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: Describe
        resource:
          name: '*'
          patternType: literal
          type: group
      - host: '*'
        operation: Read
        resource:
          name: '*'
          patternType: literal
          type: group
    type: simple

This will create a new secret with your credentials in the namespace where your “origin” Kafka cluster is running. Copy these into a secret in a namespace on the Kubernetes cluster where you want to run MirrorMaker 2.

apiVersion: v1
kind: Secret
metadata:
  name: origin-cluster-credentials
  namespace: migration
data:
  password: YzZPSHZNUHhlWTZm

These credentials will allow MirrorMaker to find and consume from all of the topics on the “origin” cluster. You will need to do something similar using the type of Kafka cluster you are running. However you create your credentials, you should create a Secret similar to the one above.

TLS certificate from the “origin” cluster

If your “origin” cluster requires TLS, you may need to obtain a CA cert for MirrorMaker 2 to use.

If your “origin” cluster is managed by Strimzi, you will be able to find this in a Secret called something like origin-cluster-ca-cert in the namespace where your Kafka cluster is running.

Copy this into a secret in a namespace on the Kubernetes cluster where you want to run MirrorMaker 2.

apiVersion: v1
kind: Secret
metadata:
  name: origin-cluster-ca-cert
  namespace: migration
data:
  ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURPekNDQWlPZ0F3SUJBZ0lVYldZKy9TaFVYM1NvRFdwWHpQa3ZUK2F6dDlJd0RRWUpLb1pJaHZjTkFRRUwKQlFBd0xURVRNQkVHQTFVRUNnd0thVzh1YzNSeWFXMTZhVEVXTUJRR0ExVUVBd3dOWTJ4MWMzUmxjaTFqWVNCMgpNREFlRncweU1UQXpNRFF4TmpVMk16bGFGdzB5TWpBek1EUXhOalUyTXpsYU1DMHhFekFSQmdOVkJBb01DbWx2CkxuTjBjbWx0ZW1reEZqQVVCZ05WQkFNTURXTnNkWE4wWlhJdFkyRWdkakF3Z2dFaU1BMEdDU3FHU0liM0RRRUIKQVFVQUE0SUJEd0F3Z2dFS0FvSUJBUUNwd1VnVEdnTkZkTTN5MUsrWnhVMlV5R21FN1J4bW5YWDBYaEVrNFVzeApmZVZCc01aS0NSWXlHeXN2QXVSc3Z4UUdJRDhvcnB5WDVZLzRMQzlScTZqTHd1LzhUNHBJOFk0VUxWR3VqVHprCi9OK3RGd2tYTUhuWkpGVFExN0tRM2lTVEFqcjg3TFVQSjdpd0dveTJiSEtCeXg1eGVSYWdpdHVJZTBUWGlwOTgKRzNCRVpkelBHeFhjalk2bFdlM0h2eXNiUHlrWnl4c3djSXZha0N0MThpakNCNXpiUXFCTGV2ZlpIa3NlWTJNMApPcjBPa0FFaUFtVkVML3dMK2JyYVd6YWw4UEJPd0VENXVKZ1FnYmZqQk1lYzJXNi9CMllscnBJWG1kYTdMb0VDClVnZGZKcTBnMUVZaFEyNEJBVjlyODFocFdQaHA5L0hwZE5FMXlGdWplZnhsQWdNQkFBR2pVekJSTUIwR0ExVWQKRGdRV0JCUUNrQnVKRENabndCUmttdlRVdFJUVCtWYXAvakFmQmdOVkhTTUVHREFXZ0JRQ2tCdUpEQ1pud0JSawptdlRVdFJUVCtWYXAvakFQQmdOVkhSTUJBZjhFQlRBREFRSC9NQTBHQ1NxR1NJYjNEUUVCQ3dVQUE0SUJBUUFhClZtdlhZakM2aVNIQ2tldHB6dlFINzNBc1R4NUd5V2tmYmJjcHMvNWV1MWljUGhkWWl2TUttTjZMRUhzVjN0c1gKQXpEblRSOU9wOUxhSkRKZHhWYTVZcGx5V0o5bUtTcTRoaGdWZ0J6aW0yc3h6UW1DNFVNM3dUYUFJTXJJRU82cApUangyU0EydVF2Q3hWYnFVU2hLK3VkT3dxZG1wUUFMSDF5emtvTlNuQ0JzZ05jSE9WMzVTRHZBS2d0cVdnZjlYCjFrcWNacmRDMzhndkFKQXBhcmI5QW1sbit0a0dvNWZJV3FoN0dQb0J1TjJBWnpkUG5WbXE1RytqckxPdWlYUHYKSHdUUGdnTkx0aDdxTndtQkZpRHliSXlZMUQ1WHNLWjdLWWNSdi9uWlhyLy9NRlRWZVgxVmxoaGZnK3I2SGRXQQphVUVoTHBZNHJkUUxVQVJEdklIMwotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==

Credentials from the “target” cluster

If your “target” cluster requires authentication, you will need to create credentials for MirrorMaker 2 to use.

If your “target” cluster is managed by Strimzi, you could do this by creating a KafkaUser resource something like this (below).

Note: This is not the same as before, as the credentials for the “target” Kafka cluster need to allow creating topics.

apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaUser
metadata:
  name: mm2-credentials
  labels:
    strimzi.io/cluster: target
  namespace: target-ns
spec:
  authentication:
    type: scram-sha-512
  authorization:
    acls:
      - host: '*'
        operation: Read
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: Write
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: Describe
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: DescribeConfigs
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: AlterConfigs
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: Alter
        resource:
          name: '*'
          patternType: literal
          type: topic
      - host: '*'
        operation: Create
        resource:
          type: cluster
      - host: '*'
        operation: Alter
        resource:
          type: cluster
      - host: '*'
        operation: DescribeConfigs
        resource:
          type: cluster
      - host: '*'
        operation: Read
        resource:
          name: '*'
          patternType: literal
          type: group
      - host: '*'
        operation: Describe
        resource:
          name: '*'
          patternType: literal
          type: group
    type: simple

This will create a new secret with your credentials in the namespace where your “target” Kafka cluster is running. Copy these into a secret in a namespace on the Kubernetes cluster where you want to run MirrorMaker 2.

apiVersion: v1
kind: Secret
metadata:
  name: target-cluster-credentials
  namespace: migration
data:
  password: TDQzaEoxVVlSMUds

These credentials will allow MirrorMaker to create and produce to topics on the “target” cluster. You will need to do something similar using the type of Kafka cluster you are running. However you create your credentials, you should create a Secret similar to the one above.

TLS certificate from the “target” cluster

If your “target” cluster requires TLS, you may need to obtain a CA cert for MirrorMaker 2 to use.

If your “target” cluster is managed by Strimzi, you will be able to find this in a Secret called something like target-cluster-ca-cert in the namespace where your Kafka cluster is running.

Copy this into a secret in a namespace on the Kubernetes cluster where you want to run MirrorMaker 2.

apiVersion: v1
kind: Secret
metadata:
  name: target-cluster-ca-cert
  namespace: migration
data:
  ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURMVENDQWhXZ0F3SUJBZ0lKQVAzUVdiZGs4UGFPTUEwR0NTcUdTSWIzRFFFQkN3VUFNQzB4RXpBUkJnTlYKQkFvTUNtbHZMbk4wY21sdGVta3hGakFVQmdOVkJBTU1EV05zZFhOMFpYSXRZMkVnZGpBd0hoY05NakV3TXpJMApNVEV4TkRFMldoY05Nakl3TXpJME1URXhOREUyV2pBdE1STXdFUVlEVlFRS0RBcHBieTV6ZEhKcGJYcHBNUll3CkZBWURWUVFEREExamJIVnpkR1Z5TFdOaElIWXdNSUlCSWpBTkJna3Foa2lHOXcwQkFRRUZBQU9DQVE4QU1JSUIKQ2dLQ0FRRUF3T2NQbVl1d0dzbkcrRmFkR0syYmF4azJQVFFVL282SDU3Z0xobU90a2tLc1hISVVXVmxpMnFJeQpwTCtGZ3FrV215RG9Ia1JicmlHUzZObm9QMnN1RlEvMThNUnBkU0ltSHZvQ1E3Y1plcVpqSDdlaFBnWVA0YlQ5Cmp3WUtxRVU2NXFmTFc3eHZnanp3bWl3T0RmbGpKSDF6TTN4aCthRjFFbEpBWlBzQ2YzTVY4TkVOamw2ZGpHdEEKcWFmZkczYk5tZ1FzcmZBVGhWcU5ibDMrOFpSOUJUcThuazBDN2JCUEJhb3Ewa3BuTXNHdHdTcWhPaGVLcTdreApkTXhQNFV3N2wzY3ZsSHZKellEZXhSRk1JMUViZkZIQXJmcGxjcCt5ZjJRR3JRU1BpVzVUdHgxNXhLWTJYRlNmCnRNSWxjNVRUeEdZUndtaWtGNnU2cEtxN3RYTW9KUUlEQVFBQm8xQXdUakFkQmdOVkhRNEVGZ1FVYjZ0RUdVcnkKbkRuR2VZUXJEODl3WUR4Wjg0MHdId1lEVlIwakJCZ3dGb0FVYjZ0RUdVcnluRG5HZVlRckQ4OXdZRHhaODQwdwpEQVlEVlIwVEJBVXdBd0VCL3pBTkJna3Foa2lHOXcwQkFRc0ZBQU9DQVFFQU1FTVhwem9PUWlIdVc4Z3hJd3dXClY1b1Z3YlFOa3V6c1FYeVVRbmFQRGhhQ1pTRGo0TG1TRUdQa3NlRW5JeStoQXBPY1RCSGRNY203MTJIZUlNRmIKME44cGc3VW1XeHRPM0h3YXlnRzUwTEJCVHlDVkZOd2tMWDcrdU1zMFA5d3QzdTNSNzNGVVAyOUZRS0NNMExGeQpLcFJFek0zY004dGlSdnhRc2Q0QVdQQmZYZW1nQTZEcFM5Q2tmTzBrcU9vWWZmMWpqNUp5Z2JZM09BL3huNStXCll5RVZIYkcxaThMUFI0VkpKREVyMWh3MmNmQnN6d291elMrUUtzVWsxVzM3YkZhWjIxWHZiVzRmOVdKYUFzRUIKdEQ2Y3VVL3l3a3lRTTZWZHB4ZlFxSEdGQnFoUzh0VS9qTTVMTURLRXgvZ0xzWTQ1b1VlRHYvZkhyOHZ5T0loMQp5UT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K

Start MirrorMaker 2

Create the following KafkaMirrorMaker2 resource so that the Strimzi Operator can configure MirrorMaker 2 to start the data migration.

The configuration options below are commented in the definition to explain where you may want to customize the values for your own use.

apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaMirrorMaker2
metadata:
  name: data-migration
  namespace: migration
spec:
  # How many instances of MirrorMaker 2 do you want to run in parallel?
  #
  # If you have a large "origin" cluster with a lot of data to migrate
  #  then you can increase this value.
  replicas: 1


  clusters:

  # connection details for your "origin" cluster
  - alias: "origin-cluster"
    # change this to provide the bootstrap servers address for your "origin" cluster
    bootstrapServers: "origin-kafka-bootstrap-origin-ns.apps.my-origin-cluster-name.cp.fyre.ibm.com:443"
    # remove this section if your "origin" cluster does not require authentication
    authentication:
      # update this to match the authentication method used by your "origin" cluster
      type: scram-sha-512
      username: mm2-credentials
      passwordSecret:
        # name of the secret where you stored the
        #  credentials for your "origin" cluster
        secretName: origin-cluster-credentials
        # name of the key in the secret that contains
        #  the password
        password: password
    # remove this section if your "origin" cluster does not require a custom TLS CA certificate
    tls:
      trustedCertificates:
        # name of the secret where you stored the
        #  TLS certificate for your "origin" cluster
        - secretName: origin-cluster-ca-cert
          # name of the key in the secret that contains
          #  the certificate
          certificate: ca.crt

  # connection details for your "target" cluster
  - alias: "target-cluster"
    # change this to provide the bootstrap servers address for your "target" cluster
    bootstrapServers: "target-kafka-bootstrap-target-ns.apps.my-target-cluster-name.cp.fyre.ibm.com:443"
    # remove this section if your "target" cluster does not require authentication
    authentication:
      # update this to match the authentication method used by your "target" cluster
      type: scram-sha-512
      username: mm2-credentials
      passwordSecret:
        # name of the secret where you stored the
        #  credentials for your "target" cluster
        secretName: target-cluster-credentials
        # name of the key in the secret that contains
        #  the password
        password: password
    # remove this section if your "target" cluster does not require a custom TLS CA certificate
    tls:
      trustedCertificates:
        # name of the secret where you stored the
        #  TLS certificate for your "target" cluster
        - secretName: target-cluster-ca-cert
          # name of the key in the secret that contains
          #  the certificate
          certificate: ca.crt
    config:
      # These topics will be created on the "target" Kafka
      #  cluster for MirrorMaker 2 to store it's state.
      # Make sure that these names don't match the names of any
      #  of your existing topics.
      # We will delete these topics once MirrorMaker 2 has finished.
      offset.storage.topic: migration-connect-cluster-offsets
      config.storage.topic: migration-connect-cluster-configs
      status.storage.topic: migration-connect-cluster-status

  connectCluster: "target-cluster"

  mirrors:
    - sourceCluster: "origin-cluster"
      targetCluster: "target-cluster"
      sourceConnector:
        config:
          # the replication factor that will be used for
          #  all topics created on the "target" Kafka cluster
          replication.factor: 1

          # don't try to copy permissions across from the "origin"
          #  cluster to the "target" cluster
          sync.topic.acls.enabled: "false"
          # create topics on the "target" cluster with names that
          #  match the names of the topics on the "origin" cluster
          replication.policy.class: "io.strimzi.kafka.connect.mirror.IdentityReplicationPolicy"
          replication.policy.separator: ""
          # syncing offsets
          offset-syncs.topic.replication.factor: 1

      checkpointConnector:
        config:
          checkpoints.topic.replication.factor: 1
          refresh.groups.interval.seconds: 600
          # migrates the consumer group offsets
          emit.checkpoints.enabled: true
          sync.group.offsets.enabled: true
          sync.group.offsets.interval.seconds: 60
          emit.checkpoints.interval.seconds: 60
          # ensures that consumer group offsets on the "target" cluster
          #  are correctly mapped to consumer groups on the "origin" cluster
          replication.policy.class: "io.strimzi.kafka.connect.mirror.IdentityReplicationPolicy"
          replication.policy.separator: ""

      # Which topics should be migrated from the
      #  "origin" cluster to the "target" cluster ?
      # If you don't want to migrate all of your topics, modify this pattern to
      #  match only the topics you want. 
      topicsPattern: ".*"

      # Which consumer groups should be migrated from the
      #  "origin" cluster to the "target" cluster ?
      # If you don't want to migrate all of your groups, modify this pattern to
      #  match only the groups you want. 
      groupsPattern: ".*"

Stop MirrorMaker 2 and clean up

Once MirrorMaker 2 has finished migrating all of your topics, you can delete it and the resources that it created while running.

  • Delete the data-migration KafkaMirrorMaker2 resource
  • Delete the three topics that MirrorMaker 2 creates to store it’s state (in the config above, these start with the name migration-connect-cluster-)

Don’t delete the origin-cluster.checkpoints.internal topic yet, until you’ve verified your consumer group offsets in the final step.

Update your topic replication factors

MirrorMaker 2 does not migrate the replication factor for topics it creates on the “target” cluster to match the “origin” cluster. All topics that it creates on the “target” cluster will have the replication factor you specify in the MirrorMaker 2 config.

You should update the replication factor on the topics on your “target” cluster to match your requirements before continuing.

Resume your client applications

Kafka client applications that were using the “origin” Kafka cluster can now start using the “target” Kafka cluster.

MirrorMaker 2’s MirrorCheckpointConnector automatically store consumer group offset checkpoints for consumer groups on the “origin” Kafka cluster. Each checkpoint maps the last committed offset for each consumer group in the “origin” cluster to the equivalent offset in the “target” cluster.

These checkpoints are stored in the origin-cluster.checkpoints.internal topic on the “target” cluster.

Since Kafka 2.7.0, MirrorMaker 2 has been able to translate the consumer group offsets for the “target” cluster, so Kafka consumers that start consuming from the same topic on the “target” cluster will be able to resume receiving messages from the last offset they committed on the “origin” cluster.

Once you have confirmed that your Kafka consumers are able to resume from their previously stored offsets, and no longer need the offset checkpoints that MirrorMaker 2 has stored, you can delete the origin-cluster.checkpoints.internal topic.

Tags: , ,

Leave a Reply