The Right Number of Partitions for a Kafka Topic

Track: Cloud Infrastructure

Abstract

Every technology has that key concept that people struggle to understand. With databases, is which join clause to use for fetching data from multiple tables. Containers are tricky when you have to pick a storage type given some persistence requirements. With Apache Kafka, the winner is how many partitions to set for a topic. Why this is important? You may ask. Well, sizing Kafka partitions wrongly affects many aspects of the system, such as storage, parallelism, and durability. Worse, it may also affect how much load Kafka can handle. Hence why often the decision about how many partitions to set for a topic is handled by Ops teams, as we see this to be only an infrastructure matter. In reality, this is an architectural design decision that affects even the amount of code you write. This session will peel off the concept of partitions and explain it from the perspective of the Kafka cluster and its clients. It will explain the formula people should use to decide how many partitions to set for a topic, and how to spot a poor decision when they see one.

Ricardo Ferreira

Ricardo is Senior Developer Advocate at AWS, working in the developer relations team for North America. With +20 years of experience, he may have learned a thing or two about distributed systems, fast data analytics, software architecture, databases, and observability. Before AWS, he worked for software vendors like Elastic, Confluent, and Oracle. Ricardo is well known for his remarkable ability to explain complex topics. He cunningly breaks them down into bite-sized pieces until anyone can understand. While not working, he loves barbecuing in his backyard with his family and friends, where he finally gets the chance to talk about anything unrelated to computers. He currently lives in North Carolina, USA, with his wife and son. Follow Ricardo on Twitter: @riferrei