Break me if you can: A practical guide to building fault-tolerant systems

Track: Architecture
Abstract

You built your system, you deployed it, you rolled it up in production, but it’s just the beginning. The life of your system just started. It will grow, evolve, and wake you up in the middle of the night. Usually, at this point you start thinking about fault tolerance and error handling.

In this talk you will learn practical recipes (code and design patterns) to build fault-tolerant, scalable systems using open source tools; and understand the role of product decisions in building fault-tolerant software and the importance of a proper communication culture.

Alex Borysov

Alex Borysov is a senior software engineer at Netflix. He is a clean coder and a test-driven developer with solid experience in building and running world-scale software systems. During his career Alex developed and ran machine learning infrastructure for payments fraud detection at Google, large-scale backends at Nest, microservice architecture for world-leading social casino games, and core infrastructure services for a unicorn startup in Silicon Valley with more than 300 million users.