Netflix engineer gives talk about resilience in Netflix streaming service
Netflix engineer gives talk about resilience in Netflix streaming service
Event Date: | November 11, 2016 |
---|---|
Priority: | No |
School or Program: | College of Engineering |
College Calendar: | Show |
A Platform for Automating Chaos Experiments
Abstract
The Netflix video streaming system is composed of many interacting services which are frequently updated. Failures in individual services are not uncommon, and in some cases a misbehaving service that was not thought to be critical can result in a system-wide outage. The Chaos team at Netflix is responsible for identifying these kinds of system vulnerabilities through fault injection.
In this talk, we describe the Chaos Automation Platform, a system we recently developed for running failure injection experiments on the production system to verify that failures in non-critical services do not result in system outages.
Bio
Lorin Hochstein is a Sr. Software Engineer on the Chaos team at Netflix, where he works on ensuring that Netflix remains available. He was previously Sr. Software Engineer at SendGrid Labs, Lead Architect for Cloud Services at Nimbis Services, Computer Scientist at the University of Southern California's Information Sciences Institute, and Assistant Professor in the Department of Computer Science and Engineering at the University of Nebraska¡VLincoln.
Lorin has a B.Eng. in Computer Engineering from McGill University, an M.S. in Electrical Engineering from Boston University, and a PhD in Computer Science from the University of Maryland.