Software & Datasets

Here is a listing of our open source software and open datasets. We believe strongly in the importance of open sharing of software and datasets. In particular, we believe in the freedom arising from copyleft licenses. All the material here is licensed under the GNU General Public License (GPL), unless otherwise specified. We request that where you use the dataset or software, please cite the paper mentioned underneath.

  1. Bluetooth proximity tracing: Our CPSIoTSec 2020 paper on Bluetooth proximity data. We conducted a measurement study and collected traces of Bluetooth advertisements from 49 students on Purdue campus over a period of two weeks in Feb-Mar, 2019. The participating users were asked to install an Android app written by us, which periodically collected and uploaded traces of Bluetooth advertisements and device locations.

    Privacy in the Mobile World: An Analysis of Bluetooth Scan Traces,” Heng Zhang, Amiya K. Maji, and Saurabh Bagchi. At the 2020 Joint Workshop on CPS&IoT Security and Privacy (CPSIoTSec), co-located with ACM Conference on Computer and Communications Security (CCS), pp. 1-5, November 9, 2020.

  2. Qui-Gon Jinn: Our DSN 2018 paper on reliability of Wear OS. This software allows for fuzzing of the apps, leading to crashes, hangs (of the apps) and even reboot of the smartwatch.

    How Reliable is my Wearable: A Fuzz Testing-based Study,” Edgardo Barsallo Yi, Amiya K. Maji, Saurabh Bagchi. At the 48th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 410-417, June 25-28, 2018, Luxembourg City, Luxembourg. (Acceptance rate: 62/221 = 28.1%)

  3. ACES: Our Usenix Security 2018 paper on security in bare-metal embedded systems. It creates compartments out of off-the-shelf embedded software. The distribution targets ARM Cortex-M4 class of devices.

    ACES: Automatic Compartments for Embedded Systems,” Abraham A. Clements, Naif Saleh Almakhdhub, Saurabh Bagchi, and Mathias Payer. At the 27th USENIX Security Symposium (USENIX Sec), pp. 65-82, August 15-17, 2018, Baltimore, MD. (Acceptance rate: 100/524 = 19.1%)

  4. TATHYA: This is the dataset that accompanies our CIKM 2017 paper on automated fact checking. This dataset contains manually annotated statements from the US Presidential debates for the 2016 election. [ README ] [ Full thesis ]

    A Multi-Classifier System for Detecting Check-Worthy Statements in Political Debates,” Ayush Patwari, Dan Goldwasser, and Saurabh Bagchi. At the 26th  ACM International Conference on Information and Knowledge Management (CIKM) (Short paper), pp. 2259-2262, Nov 6-10, 2017, Singapore. (Acceptance rate: 119/398 = 29.9% (short papers))

  5. Rafiki: Our Middleware 2017 paper showing how one can find the optimized parameter settings for a NoSQL database (Cassandra in our case) when the workload characteristics change.

    Rafiki: A Middleware for Parameter Tuning of NoSQL Datastores for Dynamic Metagenomics Workloads,” Ashraf Mahgoub, Paul Wood, Sachandhan Ganesh, Subrata Mitra (Adobe Research), Wolfgang Gerlach (Argonne National Laboratory), Travis Harrison (Argonne National Laboratory), Folker Meyer (Argonne National Laboratory), Ananth Grama, Saurabh Bagchi, and Somali Chaterji. At the ACM/IFIP/USENIX Middleware Conference, pp. 28-40, Dec 11-15, 2017, Las Vegas, Nevada. (Acceptance rate: 20/85 = 23.5%)

  6. FRESCO: This is the open source data repository of system usage and failure information for Purdue’s centralized computing clusters. It contains anonymized data from 3+ Million jobs in 2015-2017.

    A Study of Failures in Community Clusters: The Case of Conte,” Subrata Mitra, Suhas Raveesh Javagal, Amiya K. Maji (ITaP), Todd Gamblin (LLNL), Adam Moody (LLNL), Stephen Harrell (ITaP), and Saurabh Bagchi. At the 7th IEEE International Workshop on Program Debugging, co-located with ISSRE, pp. 1-8, Oct 23-27, 2016, Ottawa, Canada.

  7. EPOXY: Our Security and Privacy 2017 paper on security in bare-metal embedded systems. It executes code at the unprivileged level for the most part, except for the small amounts of critical regions. The distribution is for ARM Cortex-M4 class of devices.

    Protecting Bare-metal Embedded Systems with Privilege Overlays,” Abraham A Clements, Naif Saleh Almakhdhub, Khaled Saab (Georgia Tech), Prashast Srivastava, Jinkyu Koo, Saurabh Bagchi, and Mathias Payer. In Proceedings of the IEEE International Symposium on Security and Privacy (Oakland/S&P), pp. 289-303, May 22-24, 2017, San Jose, California. (Acceptance rate: 60/450 = 13.3%)

  8. ScalaDBG: Our BCB 2017 paper on how to do genomic assembly in a distributed manner. It designs for building De Bruijn graphs with different k values in parallel and then merging them in. The distribution is built on top of the IDBA assembly algorithm.

    Scalable Genomic Assembly through Parallel de Bruijn Graph Construction for Multiple K-mers,” Kanak Mahadik, Christopher Wright, Milind Kulkarni, Saurabh Bagchi, Somali Chaterji. In Proceedings of the 8th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB), pp. 425-431, Aug 20-23, 2017, Boston, MA.

Last modified: October 12, 2020

Download Software