Offloading computations to multiple GPUs is not an easy task. It re- quires decomposing data, distributing computations and handling communication manually. GPU libraries have made it easy to offload computations to multiple GPUs by hiding this complexity in- side library calls. Such encapsulation prevents the reuse of the data between successive kernel invocations resulting in redundant communication.

In this work, we introduce SemCache++, a semantics-aware GPU cache that automatically manages communication between the CPU and multiple GPUs in addition to optimizing communication by eliminating redundant transfers using caching. SemCache++ is used to build the first multi-GPU drop-in replacement library that (a) uses the virtual memory to automatically manage and optimize multi-GPU communication and (b) requires no pro- gram rewriting or annotations. Our caching technique is efficient; it uses a two level caching directory to track matrices and sub- matrices. Experimental results show that our system can eliminate redundant communication and deliver significant performance improvements over multi-GPU libraries like CUBLASXT.