To view publications by project, click the buttons down below:
2022
- NeurIPS
“Root Cause Analysis of Failures in Microservices through Causal Discovery,”
Azam Ikram; Sarthak Chakraborty, Subrata Mitra, Shiv Saini (Adobe Research); Saurabh Bagchi, and Murat Kocaoglu. At the 36th Conference on Neural Information Processing Systems (NeurIPS), pp. 31158-31170, November-December 2022. (Acceptance rate: 2,665/10,411 = 25.6%) - CIKM
“AutoForecast: Automatic Time-Series Forecasting Model Selection,”
Mustafa Abdallah (Purdue); Ryan Rossi, Kanak Mahadik, Sungchul Kim, Handong Zhao, Haoliang Wang (Adobe Research); Saurabh Bagchi (Purdue). At the 31st ACM International Conference on Information and Knowledge Management (CIKM), pp. 1-10, October 2022. (Acceptance rate: 274/1175 = 23.3%) [ Dataset ] - OSDI
- Sigmetrics
2021
- Usenix ATC
“SONIC: Application-aware Data Passing for Chained Serverless Applications,” Fault Tolerance for Distributed Applications
2020
- ISM
“Closing-the-Loop: A Data-Driven Framework for Effective Video Summarization,” Fault Tolerance for Distributed Applications
“OptimusCloud: Heterogeneous Configuration Optimization for Distributed Databases in the Cloud,” Fault Tolerance for Distributed Applications
- DSN
“The Mystery of the Failing Jobs: Insights from Operational Data from Two University-Wide Computing Systems,” Fault Tolerance for Distributed Applications
Rakesh Kumar, Saurabh Jha (University of Illinois at Urbana-Champaign), Ashraf Mahgoub, Rajesh Kalyanam, Stephen L Harrell, Xiaohui Carol Song, Zbigniew Kalbarczyk (University of Illinois at Urbana-Champaign), William T Kramer (University of Illinois at Urbana-Champaign), Ravishankar K. Iyer (University of Illinois at Urbana-Champaign), and Saurabh Bagchi. At the 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) , pp. 158–171, June-July 2020. (Acceptance rate: 48/291 = 16.5%) [ Presentation ] [ Video ] - OJCS
“Vision Paper: Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime Measures,” Fault Tolerance for Distributed Applications
Saurabh Bagchi, Vaneet Aggarwal, Somali Chaterji, Fred Douglis, Aly El Gamal, Jiawei Han, Brian J. Henz, Hank Hoffmann, Suman Jana, Milind Kulkarni, Felix Xiaozhu Lin, Karen Marais, Prateek Mittal, Shaoshuai Mou, Xiaokang Qiu, and Gesualdo Scutari. In IEEE Open Journal of the Computer Society (OJCS), pp. 1-15, 2020, doi: 10.1109/OJCS.2020.3006807.
2019
- CoNLL
“SIMVECS: Similarity-based Vectors for Utterance Representation in Conversational AI Systems,” Fault Tolerance for Distributed Applications
Ashraf Mahgoub, Youssef Shahin (Microsoft), Riham Mansour (Microsoft), and Saurabh Bagchi. At the SIGNLL Conference on Computational Natural Language Learning (CoNLL), pp. 1-10, Nov 3-4, 2019, Hong Kong. (Acceptance rate: 97/428 = 22.7%) - Usenix ATC
“SOPHIA: Online Reconfiguration of Clustered NoSQL Databases for Time-Varying Workloads,” Fault Tolerance for Distributed Applications
Ashraf Mahgoub, Paul Wood, Alexander Medoff, Subrata Mitra (Adobe Research), Folker Meyer (Argonne National Lab), Somali Chaterji, and Saurabh Bagchi. At the 2019 USENIX Annual Technical Conference (Usenix ATC), pp. 223-240, Jul 10-12, 2019, Renton, WA. (Acceptance rate: 71/356 = 19.9%) [ Presentation ] [ Lightning talk ] [ YouTube video ] - ICS
“AMPT-GA: Automatic Mixed Precision Floating Point Tuning for GPU Applications,” Fault Tolerance for Distributed Applications
Pradeep Kotipalli, Ranvijay Singh, Paul Wood, Ignacio Laguna (Lawrence Livermore National Lab), and Saurabh Bagchi. At the 33rd ACM International Conference on Supercomputing (ICS), pp. 160-170, Jun 26-28, 2019, Phoenix, AZ. (Acceptance rate: 45/193 = 23.3%) [ Presentation ] [ Slide show ] - ISC
“GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications,” Fault Tolerance for Distributed Applications
Ignacio Laguna, Paul C. Wood, Ranvijay Singh, and Saurabh Bagchi. Accepted to appear at the International Supercomputing Conference (ISC), pp. 227-246, Jun 17-19, Frankfurt, Germany. (Acceptance rate: 17/72 = 23.6%) [ Hans Meuer Award winner (best paper) ] [ Presentation ] - CACM
“Dependability in Edge Computing,” Fault Tolerance for Distributed Applications
Paul Wood, Heng Zhang, Muhammad-Bilal Siddiqui, Saurabh Bagchi. To appear in Communications of the ACM (CACM) as Contributed Article, pp. 1-16. - “Smoothing the path to computing: pondering uses for big data,” Fault Tolerance for Distributed Applications
M Hall, R Ladner, D Levitt, MAP Quiñones, S Bagchi. Communications of the ACM 62 (3), 8-9. - “FRESCO: Open Source Data Repository for Computational Usage and Failures,” Fault Tolerance for Distributed Applications
S Bagchi, R Kumar, R Kalyanam, S Harrell, CA Ellis, C Song. Repository documentation found here.
2018
- ICST
“XSTRESSOR: Automatic Generation of Large-Scale Test Inputs by Inferring Path Conditions,” Fault Tolerance for Distributed Applications
Charitha Saumya, Jinkyu Koo, Milind Kulkarni, and Saurabh Bagchi. Accepted to appear at the 12th IEEE International Conference on Software Testing, Verification, and Validation (ICST), pp. 1-11, Apr 22-27, 2019, Xi’an, China. (Acceptance rate: 31/110 = 28.2%) [ Distinguished Paper Award (one of 3) ] - ICST
“PySE: Automatic Worst-Case Test Generation by Reinforcement Learning,” Fault Tolerance for Distributed Applications
Jinkyu Koo, Charitha Saumya, Milind Kulkarni, and Saurabh Bagchi. Accepted to appear at the 12th IEEE International Conference on Software Testing, Verification, and Validation (ICST), pp. 1-11, Apr 22-27, 2019, Xi’an, China. (Acceptance rate: 31/110 = 28.2%) - Middleware
“Pythia: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads,” Fault Tolerance for Distributed Applications
Ran Xu (Purdue University); Subrata Mitra (Adobe Research); Jason Rahman (Facebook); Peter Bai (Purdue University); Bowen Zhou (LinkedIn); Greg Bronevetsky (Google); Saurabh Bagchi (Purdue University). At the 19th ACM/IFIP International Middleware Conference, pp. 146-160, December 10-14, 2018, Rennes, France. (Acceptance rate: 22/95 = 23.2%) [ Presentation ] - USENIX ATC
“VideoChef: Efficient Approximation for Streaming Video Processing Pipelines,” Fault Tolerance for Distributed Applications
Ran Xu, Jinkyu Koo, Rakesh Kumar, Peter Bai; Subrata Mitra (Adobe Research); Sasa Misailovic (University of Illinois Urbana-Champaign); Saurabh Bagchi. At the 2018 USENIX Annual Technical Conference (USENIX ATC), pp. 43-56, July 11-13, 2018, Boston, MA. (Acceptance rate: 76/378 = 20.1%) [ Presentation ] [ Audio ]
2017
- ScalA
“Snowpack: Efficient Parameter Choice for GPU Kernels via Static Analysis and Statistical Prediction“, Fault Tolerance for Distributed Applications
Ranvijay Singh, Paul Wood, Ravi Gupta (Intel), Saurabh Bagchi, Ignacio Laguna (LLNL), At the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), co-located with the IEEE/ACM Supercomputing conference, pp. 1-8, November 13, 2017, Denver, Colorado. [ Presentation ] - Middleware
“Rafiki: A Middleware for Parameter Tuning of NoSQL Datastores for Dynamic Metagenomics Workloads,” Fault Tolerance for Distributed Applications
Ashraf Mahgoub, Paul Wood, Sachandhan Ganesh, Subrata Mitra (Adobe Research), Wolfgang Gerlach (Argonne National Laboratory), Travis Harrison (Argonne National Laboratory), Folker Meyer (Argonne National Laboratory), Ananth Grama, Saurabh Bagchi, and Somali Chaterji. At the ACM/IFIP/USENIX Middleware Conference, pp. 28-40, Dec 11-15, 2017, Las Vegas, Nevada. (Acceptance rate: 20/85 = 23.5%) [ Presentation ] [ Poster ] - Briefings in Bioinformatics
“Federation in Genomics Pipelines: Techniques and Challenges,” Fault Tolerance for Distributed Applications
Somali Chaterji, Jinkyu Koo, Ninghui Li, Folker Meyer, Ananth Grama, and Saurabh Bagchi. In Oxford Briefings in Bioinformatics, pp. 1-11, Published: 29 August 2017. [ Abstract ] - Briefings in Bioinformatics
“MG-RAST Version 4—Lessons learned from a decade of low-budget ultra-high throughput metagenome analysis,” Fault Tolerance for Distributed Applications
Folker Meyer, Saurabh Bagchi, Somali Chaterji, Wolfgang Gerlach, Ananth Grama, Travis Harrison, Tobias Paczian, Will Trimble, Andreas Wilke. In Oxford Briefings in Bioinformatics, bbx105, pp. 1-12, September 2017. [ Abstract ] - ACM BCB
“Scalable Genomic Assembly through Parallel de Bruijn Graph Construction for Multiple K-mers,” Fault Tolerance for Distributed Applications
Kanak Mahadik, Christopher Wright, Milind Kulkarni, Saurabh Bagchi, Somali Chaterji. In Proceedings of the 8th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), pp. 425-431, Aug 20-23, 2017, Boston, MA. [ Presentation ] - FTXS
“Understanding the Spatial Characteristics of DRAM Errors in HPC Clusters,” Fault Tolerance for Distributed Applications
Ayush Patwari, Ignacio Laguna, Martin Schulz, and Saurabh Bagchi. At the 7th Fault Tolerance for HPC at eXtreme Scales (FTXS) Workshop (co-located with HPDC), pp. 1-6, Jun 26, 2017, Washington DC. [ Presentation ]
2016
- CGO
“Phase-Aware Optimization in Approximate Computing,” Fault Tolerance for Distributed Applications
Subrata Mitra, Manish Gupta, Sasa Misailovic (U of Illinois at Urbana-Champaign), Saurabh Bagchi. At the 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 1-12, Feb 4-8, 2017, Austin, TX. (Acceptance rate: 26/114 = 22.8%) [ Presentation ] - “A Study of Failures in Community Clusters: The Case of Conte,” Fault Tolerance for Distributed Applications
Subrata Mitra, Suhas Raveesh Javagal, Amiya K. Maji (ITaP), Todd Gamblin (LLNL), Adam Moody (LLNL), Stephen Harrell (ITaP), and Saurabh Bagchi. At the 7th IEEE International Workshop on Program Debugging, co-located with ISSRE, pp. 1-8, Oct 23-27, 2016, Ottawa, Canada.[ Presentation ] - SRDS
“Sirius: Probabilistic data assertions for detecting silent data corruptions in parallel programs“, Fault Tolerance for Distributed Applications
Tara Thomas, Anmol Bhattad, Subrata Mitra, and Saurabh Bagchi. At the IEEE 35th Symposium on Reliable Distributed Systems (SRDS), pp. 1-10, September 26-29, 2016, Budapest, Hungary. (Acceptance rate: 27/83 = 32.5%)[ Presentation ] - ICS
“SARVAVID: A Domain Specific Language for Developing Scalable Computational Genomics Applications“, Fault Tolerance for Distributed Applications
Kanak Mahadik, Christopher Wright, Jinyi Zhang, Milind Kulkarni, Saurabh Bagchi, and Somali Chaterji. At the International Conference on Supercomputing (ICS), pp. 1-13, June 1-3, 2016, Istanbul, Turkey (Acceptance rate: 43/178 = 24.2%). - EuroSys
“Partial-Parallel-Repair (PPR): A Distributed Technique for Repairing Erasure Coded Storage“, Fault Tolerance for Distributed Applications
Subrata Mitra, Rajesh Krishna Panta (AT&T Labs), Moo-Ryong Ra (AT&T Labs), Saurabh Bagchi. At the European Conference on Computer Systems (EuroSys), pp. 1-14, April 18-21, 2016, London, UK (Acceptance rate: 38/180 = 21.1%). [ Presentation ]
2015
- PACT
“Dealing with the Unknown: Resilience to Prediction Errors“, Fault Tolerance for Distributed Applications
Subrata Mitra, Greg Bronevetsky, Suhas Javagal and Saurabh Bagchi. At the 24th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 1-10, October 18-21, 2015, San Francisco, CA. (Acceptance rate: 38/179 = 21.2%) [ Presentation ] - BCB
“An Ensemble SVM Model for the Accurate Prediction of Non-Canonical MicroRNA Targets“, Fault Tolerance for Distributed Applications
Asish Ghoshal, Ananth Grama, Saurabh Bagchi and Somali Chaterji. At the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics (BCB), pp. 403-412, September 9-12, 2015, Atlanta, GA. (Acceptance rate: 48/141 = 34%) (Winner of the best paper award)
2014
-
Middleware
“Mitigating Interference in Cloud Services by Middleware Reconfiguration,” Fault Tolerance for Distributed Applications
Amiya Maji, Subrata Mitra, Bowen Zhou, Saurabh Bagchi and Akshat Verma (IBM Research). At the 15th ACM/IFIP/USENIX Middleware conference, pp. 1-12, Nov 16-21, 2014. (Acceptance rate: 27/144 = 18.8%) [ Presentation ] -
Supercomputing
“Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization,” Fault Tolerance for Distributed Applications
Kanak Mahadik, Somali Chaterji, Bowen Zhou, Milind Kulkarni, and Saurabh Bagchi. At the International Conference for High Performance Computing, Networking, Storage, and (Supercomputing), pp. 1-11, Nov 16-21, 2014. (Acceptance rate: 82/394 = 20.8%) [ Presentation ] [ Abstract ] -
ICAC
“Is Your Web Server Suffering from Undue Stress due to Duplicate Requests?,” Fault Tolerance for Distributed Applications
Fahad A. Arshad, Amiya K. Maji, Sidharth Mudgal, and Saurabh Bagchi. As a Short Paper, At the 11th International Conference on Autonomic Computing (ICAC), pp. 105-111, June 18-20, 2014, Philadelphia, PA. (Acceptance rate: 12 (full papers) + 10 (short papers)/53 = 41.5%) [Presentation ] [ Abstract ] -
TPDS
“Diagnosis of Performance Faults in Large Scale MPI Applications via Probabilistic Progress-Dependence Inference,” Fault Tolerance for Distributed Applications
Ignacio Laguna (LLNL), Dong Ahn (LLNL), Bronis de Supinski (LLNL), Saurabh Bagchi, and Todd Gamblin (LLNL), Accepted to appear in IEEE Transactions on Parallel and Distributed Systems (TPDS), pp. 1-15, notification of acceptance: March 2014. [ Presentation ] [ Abstract ] -
PLDI
“Accurate Application Progress Analysis for Large-Scale Parallel Debugging,” Fault Tolerance for Distributed Applications
Subrata Mitra, Ignacio Laguna, Dong H. Ahn, Saurabh Bagchi, Martin Schulz, and Todd Gamblin. At the ACM International Symposium on Programming Language Design and Implementation (PLDI), pp. 193-203, Edinburgh, UK, June 9-11, 2014. (Acceptance rate: 52/287 = 18.1%) [ Abstract ] [Presentation ]
2013
-
ISSRE
“Characterizing Configuration Problems in Java EE Application Servers: An Empirical Study with GlassFish and JBoss,” Fault Tolerance for Distributed Applications
Fahad A. Arshad, Rebecca J. Krause, and Saurabh Bagchi, At the 24th IEEE International Symposium on Software Reliability Engineering (ISSRE), pp. 1-10, Pasadena, CA, November 4-7, 2013. (Acceptance rate: 46/131 = 35.1%)[ Abstract ] [Presentation ] -
SRDS
“Automatic Problem Localization in Distributed Applications via Multi-dimensional Metric Profiling,” Fault Tolerance for Distributed Applications
Ignacio Laguna, Subrata Mitra, Fahad A. Arshad, Nawanol Theera-Ampornpunt, Zongyang Zhu, Saurabh Bagchi, Samuel P. Midkiff, Mike Kistler (IBM Research), and Ahmed Gheith (IBM Research), At the 32nd International Symposium on Reliable Distributed Systems (SRDS), pp. 121-132, Braga, Portugal, September 30-October 3, 2013. (Acceptance rate: 22/67 = 32.8%) [ Presentation ] [ Abstract ] -
HPDC
“WuKong: Automatically Detecting and Localizing Bugs that Manifest at Large System Scales,” Fault Tolerance for Distributed Applications
Bowen Zhou, Jonathan Too, Milind Kulkarni, and Saurabh Bagchi. At the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), pp. 131-142, New York City, NY, June 17-21, 2013. (Acceptance rate: 20/131 = 15.3%) [ Presentation ] [Abstract ]
2012
-
HotDep
“ABHRANTA: Locating Bugs that Manifest at Large System Scales,” Fault Tolerance for Distributed Applications
Bowen Zhou, Milind Kukarni, and Saurabh Bagchi. At the 8th Workshop on Hot Topics in System Dependability (HotDep) (co-located with OSDI ’12), pp. 1-6, Hollywood, CA, October 7, 2012. (Acceptance rate: 10/24 = 41.7%) [ Presentation ] [ Abstract ] -
Supercomputing
“mcrEngine: A Scalable Checkpointing System using Data-Aware Aggregation and Compression,” Fault Tolerance for Distributed Applications
Tanzima Zerin Islam, Kathryn Mohror, Saurabh Bagchi, Adam Moody, Bronis R. de Supinski, and Rudolf Eigenmann. At the IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (Supercomputing), pp. 1-10, Salt Lake City, Utah, November 10-16, 2012. (Acceptance rate: 100/472 = 21.2%) (One of 8 papers that is a finalist for the best student paper) [ Presentation ][ Abstract ] -
PACT
“Probabilistic Diagnosis of Performance Faults in Large Scale Parallel Applications,” Fault Tolerance for Distributed Applications
Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, and Todd Gamblin. At the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 1-10, September 19-23, 2012, Minneapolis, MN. (Acceptance rate: 39/207 = 18.8%) [Presentation ] [ Abstract ] - DSN
“Automatic Fault Characterization via Abnormality-Enhanced Classification,” Fault Tolerance for Distributed Applications
Greg Bronevetsky (LLNL), Ignacio Laguna, Saurabh Bagchi and Bronis R. de Supinski (LLNL). In the 42th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 1-12, Boston, MA, June 25-28, 2012 (Acceptance rate: 51/236 = 21.6%) [ Presentation ] [Abstract ] -
DSN
“A Study of Soft Error Consequences in Hard Disk Drives,” Fault Tolerance for Distributed Applications
Timothy Tsai (Hitachi GST), Nawanol Theera-Ampornpunt and Saurabh Bagchi. In the 42th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (Practical Experience Report), pp. 1-8, Boston, MA, June 25-28, 2012 (Acceptance rate: 51/236 = 21.6%) [ Presentation ] [Abstract ]
2011
- “The NEEShub Cyberinfrastructure for Earthquake Engineering“, Fault Tolerance for Distributed Applications
Thomas J. Hacker, Rudi Eigenmann, Saurabh Bagchi, Ayhan Irfanoglu, Santiago Pujol, Ann Catlin, Ellen Rathje IEEE Computing in Science and Engineering, vol. 13, issue 4, pp. 67-78, July-August 2011 - Supercomputing
“Large Scale Debugging of Parallel Tasks with AutomaDeD,” Fault Tolerance for Distributed Applications
Ignacio Laguna, Todd Gamblin, Bronis R. de Supinski, Saurabh Bagchi, Greg Bronevetsky, Dong H. Ahn, Martin Schulz, and Barry Rountree, At the Supercomputing Conference, 12 pages, Seattle, WA, Nov 12-18, 2011. (Acceptance rate: 74/352 = 21.0%) [ Presentation ] [ Abstract ] - HPDC
“Vrisha: Using Scaling Properties of Parallel Programs for Bug Detection and Localization,” Fault Tolerance for Distributed Applications
Bowen Zhou, Milind Kulkarni, and Saurabh Bagchi, At the 20th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 12 pages, San Jose, California, June 8-11, 2011. (Acceptance rate: 22/170 = 12.9%) [ Presentation ] [ Abstract ]
2010
- ISSRE
“Characterizing Failures in Mobile OSes: A Case Study with Android and Symbian“: Fault Tolerance for Distributed Applications
Amiya Kumar Maji, Kangli Hao, Salmin Sultana, and Saurabh Bagchi. At the 21st annual International Symposium on Software Reliability Engineering (ISSRE 2010), 10 pages, Nov 1-4, 2010, San Jose, California. (Acceptance rate: 40/130 = 30.8%) [ Abstract ] - DSN
“AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks“: Fault Tolerance for Distributed Applications
Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, and Martin Schulz. In the 40th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 10 pages, June 28-July 1, 2010, Chicago, IL. (Acceptance rate (DCCS track): 40/174 = 23%) [Presentation ] [ Abstract ]
2009
-
Middleware
“How To Keep Your Head Above Water While Detecting Errors“: Fault Tolerance for Distributed Applications
Ignacio Laguna, Fahad A. Arshad, David M. Grothe, and Saurabh Bagchi. In: ACM/IFIP/USENIX 10th International Middleware Conference, November 30-December 4, 2009, Urbana-Champaign, Illinois. (Acceptance rate: 21/110 = 19.1%) [ Presentation ] [ abstract ] -
Supercomputing
“FALCON: A System for Reliable Checkpoint Recovery in Shared Grid Environments“: Fault Tolerance for Distributed Applications
Tanzima Zerin, Saurabh Bagchi, and Rudolf Eigenmann. In: the ACM/IEEE Supercomputing Conference, November 14-20, 2009, Portland, Oregon. (Acceptance rate: 59/261 = 22.6%) (Nominated as one of 4 best student papers) [ Presentation ] [ abstract ]
2008
2007
- SRDS
“Stateful Detection in High Throughput Distributed Systems“: Fault Tolerance for Distributed Applications
Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad, and Saurabh Bagchi. In: 26th IEEE International Symposium on Reliable Distributed Systems (SRDS-2007), pp. 275-287, Beijing, CHINA, October 10-12, 2007. (Acceptance rate: 29/185 ~ 15.7%) [ Presentation ] [ abstract ] - SRDS
“Distributed Diagnosis of Failures in a Three Tier E-Commerce System“: Fault Tolerance for Distributed Applications
Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad, and Saurabh Bagchi. In: 26th IEEE International Symposium on Reliable Distributed Systems (SRDS-2007), pp. 185-198, Beijing, CHINA, October 10-12, 2007. (Acceptance rate: 29/185 ~ 15.7%) [ Presentation ] [ abstract ] - HPDC
“Failure-Aware Checkpointing in Fine-Grained Cycle Sharing Systems“: Fault Tolerance for Distributed Applications
Xiaojuan Ren, Rudolf Eigenmann, and Saurabh Bagchi. In: 16th IEEE International Symposium on High Performance Distributed Computing (HPDC-16), Monterey Bay, California, June 27-29, 2007. (Acceptance rate: 20%). [ Presentation ] [ abstract ] - TDSC
“Automated Rule-Based Diagnosis through a Distributed Monitor System“: Fault Tolerance for Distributed Applications
Gunjan Khanna, Mike Yu Cheng, Padma Varadharajan, Saurabh Bagchi, Miguel P. Correia, and Paulo J. Verissimo. In: IEEE Transactions on Dependable and Secure Computing (TDSC), notificacion of acceptance: May 2007. [ abstract ] - JOGC
“Prediction of Resource Availability in Fine-Grained Cycle Sharing Systems and Empirical Evaluation“, Fault Tolerance for Distributed Applications
Xiaojuan Ren, Seyong Lee, Rudolf Eigenmann, and Saurabh Bagchi. In Springer’s Journal of Grid Computing (JOGC), vol. 5, no. 2, pp. 173-195, 2007. [ abstract ]
2006
-
ICCD
“Pesticide: Using SMT Processors to Improve Performance of Pointer Bug Detection,” Fault Tolerance for Distributed Applications
Jin-Yi Wang, Yen-Shiang Shue, T N Vijaykumar, and Saurabh Bagchi. 24th International Conference of Computer Design (ICCD), Oct 1-4, 2006, San Jose, California, USA. -
DSN
“Providing Automated Detection of Problems in Virtualized Servers using Monitor framework,” Fault Tolerance for Distributed Applications
Gunjan Khanna, Saurabh Bagchi, Kirk Beaty, Andrzej Kochut, and Gautam Kar. Workshop on Applied Software Reliability (WASR) at the International Conference on Dependable Systems and Networks (DSN), June 25-28, 2006, Philadelphia, Pennsylvania, USA. [ Presentation] -
HPDC
“Resource Failure Prediction in Fine-Grained Cycle Sharing Systems,” Fault Tolerance for Distributed Applications
Xiaojuan Ren, Seyong Lee, Rudolf Eigenmann, and Saurabh Bagchi. 15th IEEE International Symposium on High Performance Distributed Computing (HPDC-15), 19-23 June 2006, Paris, France. (Acceptance rate: 24/157 ~ 15%). [ Presentation ] - TDSC
“Automated Online Monitoring of Distributed Applications through External Monitors,” Fault Tolerance for Distributed Applications
Gunjan Khanna, Padma Varadharajan, and Saurabh Bagchi. IEEE Transactions on Dependable and Secure Computing (TDSC), vol. 3, no. 2, pp. 115-129, Apr-Jun, 2006.
2005
-
“Probabilistic Diagnosis through Non-Intrusive Monitoring in Distributed Applications,” Fault Tolerance for Distributed Applications
Gunjan Khanna, Yu Cheng, Saurabh Bagchi, Miguel Correia, and Paolo Verissimo. Purdue ECE Technical Report 05-19, December 2005. - SRDS
“LRRM: A Randomized Reliable Multicast Protocol for Optimizing Recovery Latency and Buffer Utilization,” Fault Tolerance for Distributed Applications
Nipoon Malhotra, Shrish Ranjan, and Saurabh Bagchi. 24th IEEE Symposium on Reliable Distributed Systems (SRDS 2005), October 26-28, 2005, Orlando, Florida, USA.(Acceptance rate: 20/67 ~ 29.9%) [ Camera ready ]. - “Automated Monitor Based Diagnosis in Distributed Systems,” Fault Tolerance for Distributed Applications
Gunjan Khanna, Padma Varadharajan, Mike Cheng, and Saurabh Bagchi, Purdue ECE Technical Report 05-13, August 2005.
2004
- SRDS
“Self Checking Network Protocols: A Monitor Based Approach,” Fault Tolerance for Distributed Applications
Gunjan Khanna, Padma Varadharajan, and Saurabh Bagchi. 23rd International Symposium on Reliable Distributed Systems (SRDS 2004), October 2004. (Acceptance rate:27/117 ~ 23.1%)
[ Camera Ready ] [ Presentation ] - PRDC
“Failure Handling in a Reliable Multicast Protocol for Improving Buffer Utilization and Accommodating Heterogeneous Receivers,” Fault Tolerance for Distributed Applications
Gunjan Khanna, John Rogers, and Saurabh Bagchi. In Proceedings of the 10th IEEE Pacific Rim Dependable Computing Conference (PRDC’ 04), March 2004. (Acceptance rate: 34/102 ~ 33.3%) [ Camera ready ]
2003
-
“Self-Checking Network Protocols: A Monitor Based Approach,” Fault Tolerance for Distributed Applications
Gunjan Khanna, MS Thesis. December 2003. - “Light-Weight Randomized Reliable Multicasting Protocol,” Fault Tolerance for Distributed Applications
Nipoon Malhotra, Shrish Ranjan, and Saurabh Bagchi. Appeared in Fast Abstracts, DSN2003.
Copyright notice: Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional
purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work
in other works must be obtained from the appropriate publisher (IEEE, ACM, Elsevier, etc.)
purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work
in other works must be obtained from the appropriate publisher (IEEE, ACM, Elsevier, etc.)