Software & Datasets

Here is a listing of our open source software and open datasets. We believe strongly in the importance of open sharing of software and datasets. In particular, we believe in the freedom arising from copyleft licenses. All the material here is licensed under the GNU General Public License (GPL), unless otherwise specified. We request that where you use the dataset or software, please cite the paper mentioned underneath.

  1. ACES: Our Usenix Security 2018 paper on security in bare-metal embedded systems. It creates compartments out of off-the-shelf embedded software. The distribution targets ARM Cortex-M4 class of devices.

    ACES: Automatic Compartments for Embedded Systems,” Abraham A. Clements, Naif Saleh Almakhdhub, Saurabh Bagchi, and Mathias Payer. At the 27th USENIX Security Symposium (USENIX Sec), pp. 65-82, August 15-17, 2018, Baltimore, MD. (Acceptance rate: 100/524 = 19.1%)

  2. TATHYA: This is the dataset that accompanies our CIKM 2017 paper on automated fact checking. This dataset contains manually annotated statements from the US Presidential debates for the 2016 election. [ README ] [ Full thesis ]

    A Multi-Classifier System for Detecting Check-Worthy Statements in Political Debates,” Ayush Patwari, Dan Goldwasser, and Saurabh Bagchi. At the 26th  ACM International Conference on Information and Knowledge Management (CIKM) (Short paper), pp. 2259-2262, Nov 6-10, 2017, Singapore. (Acceptance rate: 119/398 = 29.9% (short papers))

  3. Rafiki: Our Middleware 2017 paper showing how one can find the optimized parameter settings for a NoSQL database (Cassandra in our case) when the workload characteristics change.

    Rafiki: A Middleware for Parameter Tuning of NoSQL Datastores for Dynamic Metagenomics Workloads,” Ashraf Mahgoub, Paul Wood, Sachandhan Ganesh, Subrata Mitra (Adobe Research), Wolfgang Gerlach (Argonne National Laboratory), Travis Harrison (Argonne National Laboratory), Folker Meyer (Argonne National Laboratory), Ananth Grama, Saurabh Bagchi, and Somali Chaterji. At the ACM/IFIP/USENIX Middleware Conference, pp. 28-40, Dec 11-15, 2017, Las Vegas, Nevada. (Acceptance rate: 20/85 = 23.5%)

  4. FRESCO: This is the open source data repository of system usage and failure information for Purdue’s centralized computing clusters. It contains anonymized data from 3+ Million jobs in 2015-2017.

    A Study of Failures in Community Clusters: The Case of Conte,” Subrata Mitra, Suhas Raveesh Javagal, Amiya K. Maji (ITaP), Todd Gamblin (LLNL), Adam Moody (LLNL), Stephen Harrell (ITaP), and Saurabh Bagchi. At the 7th IEEE International Workshop on Program Debugging, co-located with ISSRE, pp. 1-8, Oct 23-27, 2016, Ottawa, Canada.

  5. EPOXY: Our Security and Privacy 2017 paper on security in bare-metal embedded systems. It executes code at the unprivileged level for the most part, except for the small amounts of critical regions. The distribution is for ARM Cortex-M4 class of devices.

    Protecting Bare-metal Embedded Systems with Privilege Overlays,” Abraham A Clements, Naif Saleh Almakhdhub, Khaled Saab (Georgia Tech), Prashast Srivastava, Jinkyu Koo, Saurabh Bagchi, and Mathias Payer. In Proceedings of the IEEE International Symposium on Security and Privacy (Oakland/S&P), pp. 289-303, May 22-24, 2017, San Jose, California. (Acceptance rate: 60/450 = 13.3%)

  6. ScalaDBG: Our BCB 2017 paper on how to do genomic assembly in a distributed manner. It designs for building De Bruijn graphs with different k values in parallel and then merging them in. The distribution is built on top of the IDBA assembly algorithm.

    Scalable Genomic Assembly through Parallel de Bruijn Graph Construction for Multiple K-mers,” Kanak Mahadik, Christopher Wright, Milind Kulkarni, Saurabh Bagchi, Somali Chaterji. In Proceedings of the 8th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), pp. 425-431, Aug 20-23, 2017, Boston, MA.

Last modified: December 26, 2018

Download Software