align your accesses on the GPU Global memory

You should simply align your accesses on the GPU Global memory. The aligned address is a multiple of the size of the object
your are reading or writing, e.g. if you want to read or write an integer, the address should be a multiple of 4. And, reading
or writing a char is always aligned.
Suppose you have a big space allocated with cudaMalloc called dummySpace, this pseudo kernel code would probably results in
CUDA_EXCEPTION_6 Warp Misaligned Address:

__global__ void func (char* stringInput, int stringSize, int* integerInput, char* dummySpace) //input: a string, an integer, output: a big space with that string and integer in it
{
//dummySpace is created by cudaMalloc, so it is aligned to at least 256 bytes
   int counter = 0;
   for(int i=0;i<stringSize;i++)
   dummySpace[counter++] = stringInput[i]; //==>this is copying several chars, sizeof(char) is one, so they are always aligned
   for(int i=0;i<sizeof(int);i++)
   dummySpace[counter++] = ((char*)integerInput)[i];   //==> this is going to be a problem because the first for has advanced the counter by stringSize which is unknown an can make the address unaligned
}

The fixed one:

__global__ void func (char* stringInput, int stringSize, int* integerInput, char* dummySpace)
{
    int counter = 0;
    for(int i=0;i<stringSize;i++)
    dummySpace[counter++] = stringInput[i];
    int sub = counter % 4; //or 8 or 16..
    counter += (4-sub);
    for(int i=0;i<sizeof(int);i++)
   dummySpace[counter++] = ((char*)integerInput)[i];   //==> everything is ok as you are saving an integer in an aligned address
}

 

Leave a Reply

Your email address will not be published. Required fields are marked *