Home Projects Publications Presentations People News Activities About DCSL Internal
 
<< All Projects Failure Detection and Prediction through Metrics
Summary
  1. Orion: A tool for problem localization based on correlation analysis of multiple application metrics. The user collects measurements of multiple metrics per code region of an application. Orion then presents: (1) the abnormal metrics, and (2) for a given abnormal metric, the abnormal code regions.

For the git repo of this codebase, please access:

https://github.com/ilagunap/Orion

  • ARIMA models: here is the code to train ARIMA(p,d,q) models. This is a Perl script that calls the R tool several times (using multiple values of p,d,q values), and selects the best model using AIC.  Please follow the README file for directions of how to run it.
  • Online detection code: here is the C++ code that implements Augury's online algorithms. It loads ARIMA models and a hyper-sphere from files (along with other files and environment variables) when it begins. Measurement vectors are passed as a separate Linux process via pipes. Follow the README file in /testing to compile and to run it.
  • Matlab code: before our code is implemented in C++, we prototype our algorithms in Matlab. Here is the script that we use to run Augury's algorithms. This script does not include ARIMA forecasting.
  1. Data Sets
  • ITaP's Stations-Stat data:
    • List of metrics that we analyze in Orion [pdf]
    • Data set for the ITAP failure data that we analyze in our SRDS '13 paper. [xlsx]
      Please look at the README sheet of the Excel file to see what each column means

 

Here we provide the data of the campus application (used to check the availability of workstations) which is used for evaluation in the paper. The tar.gz file contains the raw data file of the measurements observed during two months from Jun 7 – Aug 19 in 2010. We also provide the metric names and times for service restarts and the alerts received from Nagios (the monitoring system).

  • RUBiS fault-injection:

    • The data collected as a result of the faults injected in RUBiS are here. (to be uploaded soon)

  • Android OS bugs:

    • Failure data of the Android emulator bugs presented in the paper are here. (to be uploaded soon)

Contacts

For questions regarding how building and runing the code, and/or how to understand the data, please contact:

Ignacio Laguna <ilaguna@purdue.edu>

Nawanol Theera-Ampornpunt <ntheeraa@purdue.edu>

 

Achieved Technical
Goals
Publications
Future Work
Students
Code & Data
Funding Source
 
 
465 Northwestern Avenue, West Lafayette, IN 47907   |  dcsl@ecn.purdue.edu   |  +1 765 494 3510
Home |  Projects  |  Publications  |  Presentations  |  People
News  |  Activities |  About DCSL  |  Internal


Last Update: October 31, 2013 15:37 by GMHoward