Skip navigation

Netflix engineer gives talk about resilience in Netflix streaming service

Netflix engineer gives talk about resilience in Netflix streaming service

Event Date: November 11, 2016
Priority: No
School or Program: College of Engineering
College Calendar: Show
Dr. Lorin Hochstein of Netflix gave an industry seminar on how resilience is built into the Netflix streaming service and how testing is done to see that the guarantees are being met.

A Platform for Automating Chaos Experiments

Abstract

The Netflix video streaming system is composed of many interacting services which are frequently updated. Failures in individual services are not uncommon, and in some cases a misbehaving service that was not thought to be critical can result in a system-wide outage. The Chaos team at Netflix is responsible for identifying these kinds of system vulnerabilities through fault injection.

In this talk, we describe the Chaos Automation Platform, a system we recently developed for running failure injection experiments on the production system to verify that failures in non-critical services do not result in system outages.

Bio

Lorin Hochstein is a Sr. Software Engineer on the Chaos team at Netflix, where he works on ensuring that Netflix remains available. He was previously Sr. Software Engineer at SendGrid Labs, Lead Architect for Cloud Services at Nimbis Services, Computer Scientist at the University of Southern California's Information Sciences Institute, and Assistant Professor in the Department of Computer Science and Engineering at the University of Nebraska¡VLincoln.

Lorin has a B.Eng. in Computer Engineering from McGill University, an M.S. in Electrical Engineering from Boston University, and a PhD in Computer Science from the University of Maryland.