2

in this code im generating 1D array of floats on a gpu using CUDA. The numbers are between 0 and 1. For my purpose i need them to be between -1 and 1 so i have made simple kernel to multiply each element by 2 and then substract 1 from it. However something is going wrong here. When i print my original array into .bmp i get this https://i.sstatic.net/tpdhn.png (typical noise pattern). But when i try to modify that array with my kernel i get blank black picture https://i.sstatic.net/tisIp.jpg . The program is executable but in the debug i get this:

First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe: Microsoft C++ exception: cudaError_enum at memory location 0x003cfacc..

First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe: Microsoft C++ exception: cudaError_enum at memory location 0x003cfb08..

First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe: Microsoft C++ exception: [rethrow] at memory location 0x00000000..

i would be thankfull for any help or even little hint in this matter. Thanks ! (edited)

#include <device_functions.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include "stdafx.h"
#include "EasyBMP.h"
#include <curand.h> //curand.lib must be added in project propetties > linker > input
#include "device_launch_parameters.h"

float *heightMap_cpu;
float *randomArray_gpu;
int randCount = 0;
int rozmer = 513;

void createRandoms(int size){
    curandGenerator_t generator;
    cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));
    curandCreateGenerator(&generator,CURAND_RNG_PSEUDO_XORWOW);
    curandSetPseudoRandomGeneratorSeed(generator,(int)time(NULL));
    curandGenerateUniform(generator,randomArray_gpu,size*size);
}

__global__ void polarizeRandoms(int size, float *randomArray_gpu){
    int index = threadIdx.x + blockDim.x * blockIdx.x;
    if(index<size*size){
        randomArray_gpu[index] = randomArray_gpu[index]*2.0f - 1.0f;
    }
}

//helper fucnction for getting address in 1D using 2D coords
int ad(int x,int y){
    return x*rozmer+y;
}

void printBmp(){
    BMP AnImage;
    AnImage.SetSize(rozmer,rozmer);
    AnImage.SetBitDepth(24);
    int i,j;
    for(i=0;i<=rozmer-1;i++){
        for(j=0;j<=rozmer-1;j++){
            AnImage(i,j)->Red = (int)((heightMap_cpu[ad(i,j)]*127)+128);
            AnImage(i,j)->Green = (int)((heightMap_cpu[ad(i,j)]*127)+128);
            AnImage(i,j)->Blue = (int)((heightMap_cpu[ad(i,j)]*127)+128);
            AnImage(i,j)->Alpha = 0;
        }
    }
    AnImage.WriteToFile("HeightMap.bmp");
}

int main(){
    createRandoms(rozmer);
    polarizeRandoms<<<((rozmer*rozmer)/1024)+1,1024>>>(rozmer,randomArray_gpu);
    heightMap_cpu = (float*)malloc((rozmer*rozmer)*sizeof(float));
    cudaMemcpy(heightMap_cpu,randomArray_gpu,rozmer*rozmer*sizeof(float),cudaMemcpyDeviceToHost);
    printBmp();

    //cleanup
    cudaFree(randomArray_gpu);
    free(heightMap_cpu);
    return 0;
}
8
  • You might have some issues with float/integer conversion, try replacing the computation in your kernel with: randomArray_gpu[index] = randomArray_gpu[index]*2.0f - 1.0f; Commented Sep 10, 2013 at 20:28
  • thx, but that didnt solve it :/ Commented Sep 10, 2013 at 20:38
  • Because int is promoted to float before * and +, see Implicit type conversion rules in C++ operators. Commented Sep 10, 2013 at 20:38
  • thx Jack, but it is not it. Even when I leave the line written like this: randomArray_gpu[index] = randomArray_gpu[index]; which should just leave the array as it was, the result is the same blank black (all zeroes). Commented Sep 10, 2013 at 21:26
  • 1
    @TomášTomusJavorský: Please do not "fix" code in your questions. The whole point of Stack Overflow is to leave a question and its answer for the next person who comes along. By "fixing" code, you are effectively destroying the question and making the answers you have received invalid. I have rolled back your edits and corrected the title as you wanted. Please don't edit the code again. Commented Sep 11, 2013 at 11:43

1 Answer 1

3

This is wrong:

cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));

We don't use cudaMalloc with __device__ variables. If you do proper cuda error checking I'm pretty sure that line will throw an error.

If you really want to use a __device__ pointer this way, you need to create a separate normal pointer, cudaMalloc that, then copy the pointer value to the device pointer using cudaMemcpyToSymbol:

float *my_dev_pointer;
cudaMalloc((void**)&my_dev_pointer, size*size*sizeof(float));
cudaMemcpyToSymbol(randomArray_gpu, &my_dev_pointer, sizeof(float *));

Whenever you are having trouble with your CUDA programs, you should do proper cuda error checking. It will likely focus your attention on what is wrong.

And, yes, kernels can access __device__ variables without the variable being passed explicitly as a parameter to the kernel.

The programming guide covers the proper usage of __device__ variables and the api functions that should be used to access them from the host.

Sign up to request clarification or add additional context in comments.

2 Comments

thx, again, fixed now. And yes this was allso in the debug output.
Definitely right. It was too late yesterday to have enough neurons working to realize that there was a __device__ keywork in front of the pointer declaration :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.