cuda 同步与计时

同步block

_syncthreads()

同步kernel

cudaDeviceSynchronize()
waits until all preceding commands in all streams of all host threads have completed.

同步stream

cudaStreamSynchronize()
takes a stream as a parameter and waits until all preceding commands in the given stream have completed. It can be used to synchronize the host with a specific stream, allowing other streams to continue executing on the device.

Although CUDA kernel launches are asynchronous, all GPU-related tasks placed in one stream (which is default behaviour) are executed sequentially.
如果在kernel中使用printf,因为kernel调用是异步的,所以要使用DeviceSynchronize()进行同步,否则没有输出。
CUDA提供了两种对kernel进行同步的方式:

  • 使用cudaThreadSynchronize()进行显示同步,使主机进入阻塞状态,停止运行并等待所有已经提交的kernel执行完毕。
  • 利用cudaMemcpy()实现阻塞式数据传输,实际上内部调用了cudaThreadSynchronize()。

 
 

Leave a Reply

Your email address will not be published. Required fields are marked *