Jakub’s Projects
PostgreSQL with Change Data Capture Support [Kafka + Debezium]

PostgreSQL with Change Data Capture Support [Kafka + Debezium]

This project sets up a lightweight environment for real-time Change Data Capture from PostgreSQL into Kafka. Using Docker Compose, it runs PostgreSQL with Debezium, Kafka, Zookeeper, and Kafka Connect, so any row-level change in the database is streamed as an event into Kafka topics. The stack is useful for testing data pipelines and stream processing tools like Spark: you can register Debezium connectors, watch topics update in real time, and experiment with CDC patterns locally without complex infrastructure.


Created: 2025-09-25 Updated: 2025-09-25

Installation & Usage

PostgreSQL-Kafka-CDC-Compose

Docker PostgreSQL Zookeeper Kafka Debezium Connect Schema Registry

General Information

Purpose of this stack is to create PostgreSQL database that will be able to stream changes in real time.

This can be utilised in spark, by defining readStream that will read changes that are happening in the PostgreSQL database

Technologies Used

1. PostgreSQL (Debezium-enabled)

  • Image: debezium/postgres:17
  • Purpose: Acts as the source database that user can utilise to do anything that standard PostgreSQL DB can handle
  • How it works:
    • Database uses logical replication and Write-Ahead-Log (WAL) to expose changes that happenned in the database
    • Debezium reads changes from the WAL and emits them as Kafka messages

2. Zookeper

  • Image: confluentinc/cp-zookeeper:5.5.3
  • Purpose: Provides coordination services for the Kafka cluster
  • How it works:
    • Keeps track of Kafka broker metadata, topic configurations, and distributed locks
    • Required for older Kafka versions for leader election and configuration management

3. Kafka (message broker)

  • Image: confluentinc/cp-enterprise-kafka:5.5.3
  • Purpose: Acts as the central event streaming platform
  • How it works:
    • Recieves CDC events from Debezium via Kafka Connect
    • Organizes data into topics (one topic per database table by default)

      KAFKA_ADVERTISED_LISTENERS must be updated to the current host ip to allow external clients to connect

4. Debezium (Kafka Connect)

  • Image: debezium/connect:1.4
  • Purpose: Runs Kafka Connect with Debezium's CDC connectors preinstalled
  • How it works:
    • Kafka Connect is a framework for moving data in and out of Kafka
    • Debezium's PostgreSQL connector reads WAL logs and produces events to Kafka topics

How to set up:

To initialise the stack go to the repository with .yaml file and use:

docker-compose up -d

In PostgreSQL's container bash you have to set REPLICA IDENTITY FULL on the table that you want to monitor:

ALTER TABLE <schema>.<table_name> REPLICA IDENTITY FULL

Register your conector in order to begin tracking changes in specific table Provided debezium.json can be used as template, and next sent to the debezium by curl:

curl -X POST http://<host_ip>:8083/connectors \
  -H "Content-Type: application/json" \
  --data @debezium.json

Use the command attached below to consume topic within terminal (to see in real time new records that appear in the queue). This command also can serve a purpose of creating topic in Kafka (It will create new topic if debezium did not produce any new rows yet and topic has not been initialised)

docker run --rm -it --network <docker_network_name> \
  confluentinc/cp-kafkacat kafkacat \
  -b kafka:9092 \
  -C \
  -t <postgres_container_name>.<schema>.<table_name> \
  -o beginning