Real-Time Data Pipelines with Apache Kafka

Traditionally, most data integrations are implemented with batch pipelines that periodically sync entire data sets between data systems, like databases, search indexes, or APIs. They put heavy load on all involved systems, which is why they are seldomly executed. As a consequence, they leave data systems out of sync most of the time.

This talk introduces real-time data pipelines. Real-time data pipelines can be used to stream data changes from data sources to data sinks in real-time and transform them on the way. Use cases range from keeping Apache Solr search indexes always up-to-date over streaming application data from operational database systems to dashboards to integrating multiple APIs in real-time.

Apache Kafka is the industry-leading, distributed event streaming platform. It provides different components and building blocks for implementing real-time data pipelines. After introducing the most important concepts behind Apache Kafka, we show how the data pipeline platform DataCater makes the power of Kafka accessible to developers, without having them to handle its complexity.

05.08.2022 13:30 - 14:15 Room "Sitegeist" (T20+T21)

Back to overview