/home/tgrogers/github-purdue/aalp/gpgpu-sim_simulations/util/correlation/../../benchmarks/bin/9.1/release/matrixMul
	Starting (CUDA and CUBLAS tests)...


Using Matrix Sizes: A(640 x 960), B(640 x 640), C(640 x 960)

Runing Kernels...

> CUBLAS         5889.4058 GFlop/s, Time = 0.00013 s, Size = 786432000 Ops

> CUDA matrixMul 1367.3116 GFlop/s, Time = 0.00058 s, Size = 786432000 Ops, NumDevsUsed = 1, Workgroup = 1024

Comparing GPU results with Host computation...

CUBLAS compares OK

CUDA matrixMul compares OK

