Check out the Issue Explorer
Looking to fund some work? You can submit a new Funded Issue here.
In order to get accurate benchmarks for different hardware and to support the hypothesis that BrowerWalk is no more efficient on GPUs than CPUs, a CUDA implementation must be created. To facilitate an implementation I have created a C++ implementation with notes about which device memory should be allocated on and where code should run ( CPU vs GPU ). See https://github.com/browep/brower-walk/tree/master/parallel for the necessary source files and https://github.com/browep/brower-walk/blob/master/parallel/brower_walk.cpp#L82 specifically for those notes.
Where to get started:
This function `walk_wrapper` has certain parts that need to run on the GPU. Specifically, allocating the large scratchpad at https://github.com/browep/brower-walk/blob/master/parallel/brower_walk.cpp#L101 and operating on that scratchpad ( filling it with pseudo-random data and accessing random indexes ) on lines 104-129. This code will need to run on the GPU. There is some SHA work before ( hashing the header to get the pseudo-random seed ) and after ( hashing the picked values ) that can be done on the CPU to simplify the implementation since it would require an opencl implementation of SHA256. The included source file comments show where the GPU work starts and stops.
To make sure the code produces the same output with the provided input as the C++ implementation you can check the resulting SHA256 with the hardcoded input. With https://github.com/browep/brower-walk/blob/master/parallel/data.h as input, the reference implementation returns:
final hash: 874f62a25be15cd6017785a8c51cf7bfbf90a207db8db9d0fe20fc7d4ab4262c
path creation time: 0.602000
walk time: 0.203000
total time: 0.805000
* source files for the CUDA implementation
* program requirements ( compiler version, CUDA version, etc ) in a README
* instructions on how to compile and run the program in a README