Netflix engineer to talk about "Chaos Engineering"
Netflix engineer to talk about "Chaos Engineering"
Event Date: | November 29, 2017 |
---|---|
Hosted By: | Purdue College of Engineering |
Time: | 10.30 am |
Location: | EE 226 |
Contact Name: | Saurabh Bagchi |
Open To: | All |
Priority: | No |
School or Program: | Electrical and Computer Engineering |
College Calendar: | Show |
Netflix engineer, Dr. Lorin Hochstein, will give a talk about reliability engineering in Netflix services. This will be given as part of the graduate class "Fault tolerant computer system design". Any member of the Purdue community is welcome to attend in person or over Webex.
Title: Antics, Drift & Chaos
Date and Time: EE 226, 10.30-11.20
Webex URL: https://purdue.webex.com/purdue/j.php?MTID=mb3e1eafb17adc4cf845d5ebc2c8f03c2
Meeting number (access code): 315 772 901
Abstract:
Despite what you might think, successful large-scale cloud systems are not designed by an architect. Instead, they grow organically, failing in complex ways. However, by applying Chaos Engineering, you can prevent failures by detecting weaknesses before they do real harm.
Large systems evolve from successful, smaller one, an observation predicted by the branch of study known as systems theory. Systems theory also predicts that our systems will inevitably behave, and fail, in unforeseen ways. This talk will draw from the ideas of two very different systems theorists to demonstrate that neither quality architecture nor thorough testing can prevent our software from eventually exhibiting pathological behavior. The first is the safety researcher Sidney Dekker, who proposed a theory of "drift into failure" that describes how seemingly reliable safety-critical systems can still lead to accidents. The second is the late pediatrician John Gall, who coined the "Generalized Uncertainty Principle" about how all types of complex systems behave unexpectedly.
Even though failure is inevitable, there is still hope. Chaos Engineering is an approach that can be used to identify system vulnerabilities before they lead to outages. This talk will cover how to design and run Chaos Engineering experiments, drawing examples from our experiences at Netflix.
Bio:
Lorin Hochstein is a Sr. Software Engineer in the Chaos Team at Netflix, where he works on ensuring that Netflix remains available. He was previously Sr. Software Engineer at SendGrid Labs, Lead Architect for Cloud Services at Nimbis Services, Computer Scientist at the University of Southern California's Information Sciences Institute, and Assistant Professor in the Department of Computer Science and Engineering at the University of Nebraska–Lincoln.
Lorin has a B.Eng. in Computer Engineering from McGill University, an M.S. in Electrical Engineering from Boston University, and a PhD in Computer Science from the University of Maryland.