Aller au contenu principal

Overview

What is Apache Kafka ?​

Apache Kafka is a distributed event streaming platform used for building real-time data pipelines and streaming applications.
It was originally developed at LinkedIn and later open-sourced, now maintained by the Apache Software Foundation.

Kafka is designed to handle high throughput, low latency, fault-tolerant, and scalable message streaming.


Key Concepts​

  • Producer → Sends data (messages/events) into Kafka.
  • Consumer → Reads data from Kafka.
  • Topic → A category or feed name to which records are published.
  • Partition → Each topic is split into partitions for parallelism and scalability.
  • Broker → A Kafka server that stores data and serves client requests.
  • Cluster → A group of Kafka brokers working together.

Core Features​

  • High throughput & scalability – Can handle millions of messages per second.
  • Durability – Messages are written to disk and replicated across brokers.
  • Fault tolerance – Survives server failures without data loss.
  • Integration – Works with many data sources (databases, microservices, cloud services).

Common Use Cases​

  1. Real-time analytics (e.g., monitoring financial transactions).
  2. Event-driven architectures (microservices communication).
  3. Log aggregation (centralized logs from many systems).
  4. Data pipelines (stream data from source → Kafka → data warehouse).
  5. IoT streaming (sensor data ingestion).