Using Device 0: Tesla P100-SXM2-16GB

Reducing array of type int

65536 elements
256 threads (max)
64 blocks

Reduction, Throughput = 0.0001 GB/s, Time = 2.57959 s, Size = 65536 Elements, NumDevsUsed = 1, Workgroup = 256

GPU result = 8374433
CPU result = 8374433

