Publications

(2024). ThreadFuser: A SIMT Analysis Framework for MIMD Programs. In MICRO 2024.

PDF

(2024). Extending GPU Ray-Tracing Units for Hierarchical Search Acceleration. In MICRO 2024.

PDF

(2024). Concurrency-Aware Register Stacks for Efficient GPU Function Calls. In MICRO 2024.

PDF

(2024). CRISP: Concurrent Rendering and Compute Simulation Platform for GPUs. In IISWC 2024..

PDF

(2023). RETROSPECTIVE: Accel-sim: An Extensible Simulation Framework for Validated GPU Modeling. In ISCA@50 25-year Retrospective 1996-2020.

PDF Cite Code Project Press

(2023). Mitigating GPU Core Partitioning Performance Effects. In HPCA 2023.

PDF DOI

(2022). SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices. In MICRO 2022.

PDF DOI

(2022). A SIMT Analyzer for Multi-Threaded CPU Applications. In ISPASS 2022.

PDF DOI

(2021). Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads. In MICRO 2021.

PDF DOI

(2021). AccelWattch: A Power Modeling Framework for Modern GPUs. In MICRO 2021.

PDF DOI

(2021). Judging a type by its pointer: optimizing GPU virtual functions. In ASPLOS 2021.

PDF DOI

(2021). Characterizing Massively Parallel Polymorphism. In ISPASS 2021. Best Paper Nominee.

PDF DOI

(2021). Deadline-Aware Offloading for High-Throughput Accelerators . In HPCA 2021.

PDF DOI

(2020). Locality-Centric Data and Threadblock Management for Massive GPUs. In MICRO 2020.

PDF DOI

(2020). Deterministic Atomic Buffering. In MICRO 2020.

PDF DOI

(2020). Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling. In ISCA 2020.

PDF DOI

(2020). Dimensionality-Aware Redundant SIMT Instruction Elimination. In ASPLOS 2020.

PDF DOI

(2019). Pagoda: A GPURuntime System for Narrow Tasks. In TOPC 2021. Invited Paper.

PDF DOI

(2019). Analyzing Machine Learning Workloads Using a Detailed GPU Simulator. In ISPASS 2019.

PDF DOI

(2019). A Detailed Model for Contemporary GPU Memory Systems. In ISPASS 2019.

PDF DOI

(2018). General-Purpose Graphics Processor Architectures. In * Synthesis Lectures on Computer Architecture*.

DOI

(2018). A Quantitative Evaluation of Contemporary GPU Simulation Methodology. In SIGMETRICS 2018..

PDF DOI

(2018). Characterizing the Runtime Effects of Object-Oriented Workloads on GPUs. In ISPASS 2018.

PDF DOI

(2018). Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level. In HPCA 2018.

PDF DOI

(2017). Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks. In PPoPP 2017. Best Paper Nominee.

PDF DOI

(2016). POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism. In PACT 2016.

PDF DOI

(2015). A Variable Warp Size Architecture. In ISCA 2015.

PDF DOI

(2014). Learning Your Limit: Managing Massively Multithreaded Caches Through Scheduling. In CACM Research Highlight.

PDF DOI

(2013). Divergence-Aware Warp Scheduling. In MICRO 2013.

PDF DOI

(2013). Cache-Conscious Thread Scheduling for Massively Multithreaded Processors. In TOP-PICKS 2013.

PDF DOI

(2012). Cache-Conscious Wavefront Scheduling. In MICRO 2012. Best Paper Nominee. Top Picks. CACM Research Highlight..

PDF DOI

(2012). Characterizing and Evaluating a Key-value Store Application on Heterogeneous CPU-GPU Systems. In ISPASS 2012.

PDF DOI