cuda __launch_bounds__() intellisense error - visual-studio-2012

I'm using VS2012, CUDA 5.5 SDK
once I add __launch_bounds__() to a kernel, VS intellisense go mad with lots of errors incomplete type is not allowed
I searched for the definition in the headers, found in "host_defines.h",
#define __launch_bounds__(...) \
__annotate__(launch_bounds(__VA_ARGS__))
the compile for the project is working fine, just intellisense is wrong
edit:
example of cuda kernel
__global__ void kernel(int* result, int* input){} //fine
__global__ void __launch_bounds__(256, 8) kernel(int* result, int* input){} //intellisense error

I've found a solution
#ifdef __CUDACC__
#define L(x,y) __launch_bounds__(x,y)
#else
#define L(x,y)
#endif
__global__ void L(256, 8) kernel(int* result, int* input){}
this compiles fine without intellisense problems

Related

how to prevent some values from being optimized out in linux kernel debugging? [duplicate]

This question already has answers here:
Is there a way to tell GCC not to optimise a particular piece of code?
(3 answers)
Closed last year.
This is a code in linux (5.4.21)
When I use a virtual machine and connect gdb to the linux process, I can use break points and follow code. For example, I set breakpoint on a function arm_smmu_device_probe. When I follow with 'next' command, I see some values, for example, 'smmu' or 'dev' below are shown to have been optimized out. How can I make them not optimized out so that I can see them in gdb?
static int arm_smmu_device_probe(struct platform_device *pdev)
{
int irq, ret;
struct resource *res;
resource_size_t ioaddr;
struct arm_smmu_device *smmu;
struct device *dev = &pdev->dev;
bool bypass;
smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
if (!smmu) {
dev_err(dev, "failed to allocate arm_smmu_device\n");
return -ENOMEM;
}
smmu->dev = dev;
if (dev->of_node) {
ret = arm_smmu_device_dt_probe(pdev, smmu);
} else {
ret = arm_smmu_device_acpi_probe(pdev, smmu);
if (ret == -ENODEV)
return ret;
}
I tried chaning -O2 to -Og in the top Makefile but the kernel build fails then.
Recently I found how to do this. (from Is there a way to tell GCC not to optimise a particular piece of code?, flolo's answer)
If you want a function aaa(...) not to be optimzed, you can do it like this.
#pragma GCC push_options
#pragma GCC optimize ("O0")
aaa ( ... )
{
function body
}
#prgma GCC pop_options
In some cases, this putting #pragma causes some discrepancy between the #include header file and the function source. So in this case (not often) you need to add this #praga around the corresponding #include statement. If linux/bbb.h causes this kind of problem, do this.
#pragma GCC push_options
#pragma GCC optimize ("O0")
#include <linux/bbb.h>
#pragma GCC pop_options
This works sure and I'm enjoying(?) debug/analysis this way.

CUDA - Array Generating random array on gpu and its modification using kernel

in this code im generating 1D array of floats on a gpu using CUDA. The numbers are between 0 and 1. For my purpose i need them to be between -1 and 1 so i have made simple kernel to multiply each element by 2 and then substract 1 from it. However something is going wrong here. When i print my original array into .bmp i get this http://i.imgur.com/IS5dvSq.png (typical noise pattern). But when i try to modify that array with my kernel i get blank black picture http://imgur.com/cwTVPTG . The program is executable but in the debug i get this:
First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe:
Microsoft C++ exception: cudaError_enum at memory location
0x003cfacc..
First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe:
Microsoft C++ exception: cudaError_enum at memory location
0x003cfb08..
First-chance exception at 0x75f0c41f in Midpoint_CUDA_Alpha.exe:
Microsoft C++ exception: [rethrow] at memory location 0x00000000..
i would be thankfull for any help or even little hint in this matter. Thanks !
(edited)
#include <device_functions.h>
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include "stdafx.h"
#include "EasyBMP.h"
#include <curand.h> //curand.lib must be added in project propetties > linker > input
#include "device_launch_parameters.h"
float *heightMap_cpu;
float *randomArray_gpu;
int randCount = 0;
int rozmer = 513;
void createRandoms(int size){
curandGenerator_t generator;
cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));
curandCreateGenerator(&generator,CURAND_RNG_PSEUDO_XORWOW);
curandSetPseudoRandomGeneratorSeed(generator,(int)time(NULL));
curandGenerateUniform(generator,randomArray_gpu,size*size);
}
__global__ void polarizeRandoms(int size, float *randomArray_gpu){
int index = threadIdx.x + blockDim.x * blockIdx.x;
if(index<size*size){
randomArray_gpu[index] = randomArray_gpu[index]*2.0f - 1.0f;
}
}
//helper fucnction for getting address in 1D using 2D coords
int ad(int x,int y){
return x*rozmer+y;
}
void printBmp(){
BMP AnImage;
AnImage.SetSize(rozmer,rozmer);
AnImage.SetBitDepth(24);
int i,j;
for(i=0;i<=rozmer-1;i++){
for(j=0;j<=rozmer-1;j++){
AnImage(i,j)->Red = (int)((heightMap_cpu[ad(i,j)]*127)+128);
AnImage(i,j)->Green = (int)((heightMap_cpu[ad(i,j)]*127)+128);
AnImage(i,j)->Blue = (int)((heightMap_cpu[ad(i,j)]*127)+128);
AnImage(i,j)->Alpha = 0;
}
}
AnImage.WriteToFile("HeightMap.bmp");
}
int main(){
createRandoms(rozmer);
polarizeRandoms<<<((rozmer*rozmer)/1024)+1,1024>>>(rozmer,randomArray_gpu);
heightMap_cpu = (float*)malloc((rozmer*rozmer)*sizeof(float));
cudaMemcpy(heightMap_cpu,randomArray_gpu,rozmer*rozmer*sizeof(float),cudaMemcpyDeviceToHost);
printBmp();
//cleanup
cudaFree(randomArray_gpu);
free(heightMap_cpu);
return 0;
}
This is wrong:
cudaMalloc((void**)&randomArray_gpu, size*size*sizeof(float));
We don't use cudaMalloc with __device__ variables. If you do proper cuda error checking I'm pretty sure that line will throw an error.
If you really want to use a __device__ pointer this way, you need to create a separate normal pointer, cudaMalloc that, then copy the pointer value to the device pointer using cudaMemcpyToSymbol:
float *my_dev_pointer;
cudaMalloc((void**)&my_dev_pointer, size*size*sizeof(float));
cudaMemcpyToSymbol(randomArray_gpu, &my_dev_pointer, sizeof(float *));
Whenever you are having trouble with your CUDA programs, you should do proper cuda error checking. It will likely focus your attention on what is wrong.
And, yes, kernels can access __device__ variables without the variable being passed explicitly as a parameter to the kernel.
The programming guide covers the proper usage of __device__ variables and the api functions that should be used to access them from the host.

Compiling/Linking CUDA and CPP Source Files

I am working through a sample program that uses both C++ source code as well as CUDA. This is the essential content from my four source files.
matrixmul.cu (main CUDA source code):
#include <stdlib.h>
#include <cutil.h>
#include "assist.h"
#include "matrixmul.h"
int main (int argc, char ** argv)
{
...
computeGold(reference, hostM, hostN, Mh, Mw, Nw); //reference to .cpp file
...
}
matrixmul_gold.cpp (C++ source code, single function, no main method):
void computeGold(float * P, const float * M, const float * N, int Mh, int Mw, int Nw)
{
...
}
matrixmul.h (header for matrixmul_gold.cpp file)
#ifndef matrixmul_h
#define matrixmul_h
extern "C"
void computeGold(float * P, const float * M, const float * N, int Mh, int Mw, int Nw);
#endif
assist.h (helper functions)
I am trying to compile and link these files so that they, well, work. So far I can get matrixmul_gold.cpp compiled using:
g++ -c matrixmul_gold.cpp
And I can compile the CUDA source code with out errors using:
nvcc -I/home/sbu/NVIDIA_GPU_Computing_SDK/C/common/inc -L/home/sbu/NVIDIA_GPU_Computing_SDK/C/lib matrixmul.cu -c -lcutil_x86_64
But I just end up with two .O files. I've tried a lot of different ways to link the two .O files but so far it's a no-go. What's the proper approach?
UPDATE: As requested, here is the output of:
nm matrixmul_gold.o matrixmul.o | grep computeGold
nm: 'matrixmul.o': No such file
0000000000000000 T _Z11computeGoldPfPKfS1_iii
I think the 'matrixmul.o' missing error is because I am not actually getting a successful compile when running the suggested compile command:
nvcc -I/home/sbu/NVIDIA_GPU_Computing_SDK/C/common/inc -L/home/sbu/NVIDIA_GPU_Computing_SDK/C/lib -o matrixmul matrixmul.cu matrixmul_gold.o -lcutil_x86_64
UPDATE 2: I was missing an extern "C" from the beginning of matrixmul_gold.cpp. I added that and the suggested compilation command works great. Thank you!
Conventionally you would use whichever compiler you are using to compile the code containing the main subroutine to link the application. In this case you have the main in the .cu, so use nvcc to do the linking. Something like this:
$ g++ -c matrixmul_gold.cpp
$ nvcc -I/home/sbu/NVIDIA_GPU_Computing_SDK/C/common/inc \
-L/home/sbu/NVIDIA_GPU_Computing_SDK/C/lib \
-o matrixmul matrixmul.cu matrixmul_gold.o -lcutil_x86_64
This will link an executable binary called matrimul from matrixmul.cu, matrixmul_gold.o and the cutil library (implicitly nvcc will link the CUDA runtime library and CUDA driver library as well).

Cannot find lib file when I've specifed the dll

I'm new to visual c++ and rusty with c++.
I created a dll project following the visual C++ directions. Now I want to test my dll to make sure it's working. I created an empty project and put in tester.cpp. I added the dll to the project references and to the path. Then I try to run it.
Before I included stuff from my dll ("Hello world!") it worked. Now that I've put in my stuff to reference the dll, it fails. The message is:
1>LINK : fatal error LNK1104: cannot open file 'C:\Users\thom\Documents\cworkspace\barnaby\Debug\barnaby.lib'
The thing I don't understand is the reference links to the dll which exists at the path above. Here's my code:
#include <iostream>
#include <string>
#include <vector>
#include "barnaby.h"
int main(int argc, char *argv[]){
std::vector<std::string> tzNames = Barnaby::TimeZoneFunctions::getTimezoneList();
for(std::vector<std::string>::iterator iter = tzNames.begin(); iter != tzNames.end(); iter++){
std::cout << *iter << std::endl;
}
}
ideas?
OK, so I found the answer from http://binglongx.wordpress.com/2009/01/26/visual-c-does-not-generate-lib-file-for-a-dll-project/ which told me to include the following code in my header for the DLL:
#ifdef BARNABY_EXPORTS
#define BARNABY_API __declspec(dllexport)
#else
#define BARNABY_API __declspec(dllimport)
#endif
Then, each function I export you simply precede by:
BARNABY_API int add(){
}
All of this would have been prevented either by click the Export Symbols box on the new project DLL Wizard or by voting yes for lobotomies for application programmers.
Thanks for the help.

ICU library in Android NDK

I am trying to create a JNI wrapper for a C library that depends on the ICU libraries (libicuuc.so and libicui18n.so).
I tried building ICU4C in my NDK (both standard and CrystaX versions, on a Mac OS X machine) and kept running into linking issues like this:
/Users/kyip/KyVmShared/KyAndroid/myproject/obj/local/armeabi/objs/icuuc/udata.o: In function `openCommonData':
/Users/kyip/KyVmShared/KyAndroid/myproject/jni/icu4c/common/udata.c:836: undefined reference to `icudt42_dat'
/Users/kyip/KyVmShared/KyAndroid/myproject/obj/local/armeabi/objs/icuuc/ustr_wcs.o: In function `_strFromWCS':
/Users/kyip/KyVmShared/KyAndroid/myproject/jni/icu4c/common/ustr_wcs.c:365: undefined reference to `wcstombs'
/Users/kyip/KyVmShared/KyAndroid/myproject/jni/icu4c/common/ustr_wcs.c:415: undefined reference to `wcstombs'
/Users/kyip/KyVmShared/KyAndroid/myproject/jni/icu4c/common/ustr_wcs.c:314: undefined reference to `wcstombs'
/Users/kyip/KyVmShared/KyAndroid/myproject/obj/local/armeabi/objs/icuuc/ustr_wcs.o: In function `_strToWCS':
/Users/kyip/KyVmShared/KyAndroid/myproject/jni/icu4c/common/ustr_wcs.c:164: undefined reference to `mbstowcs'
collect2: ld returned 1 exit status
I also tried the suggestion given at unicode support in android ndk but no luck. I got stuck at:
arm-eabi-g++ -I/ky/crystax/android-ndk-r4-crystax/build/platforms/android-8/arch-arm/usr/include/ -O3 -fno-short-wchar -DU_USING_ICU_NAMESPACE=0 -DU_GNUC_UTF16_STRING=0 -fno-short-enums -nostdlib -fPIC -DU_COMMON_IMPLEMENTATION -D_REENTRANT -I../common -I../../icu/source/common -I../../icu/source/i18n "-DDEFAULT_ICU_PLUGINS=\"/usr/local/lib/icu\" " -DU_COMMON_IMPLEMENTATION -DHAVE_CONFIG_H -I/ky/crystax/android-ndk-r4-crystax/build/platforms/android-8/arch-arm/usr/include/ -O3 -fno-short-wchar -DU_USING_ICU_NAMESPACE=0 -DU_GNUC_UTF16_STRING=0 -fno-short-enums -nostdlib -fPIC -DU_COMMON_IMPLEMENTATION -std=c++0x -fvisibility=hidden -c -o errorcode.ao ../../icu/source/common/errorcode.cpp
In file included from ../../icu/source/common/unicode/ptypes.h:23,
from ../../icu/source/common/unicode/umachine.h:52,
from ../../icu/source/common/unicode/utypes.h:36,
from ../../icu/source/common/errorcode.cpp:17:
/ky/crystax/android-ndk-r4-crystax/build/platforms/android-8/arch-arm/usr/include/sys/types.h:122: error: 'uint64_t' does not name a type
make[1]: *** [errorcode.ao] Error 1
make: *** [all-recursive] Error 2
Any help would be appreciated.
It seems that two files are involved in this issue. icu/source/common/unicode/ptypes.h which calls sys/types.h includes
#if ! U_HAVE_UINT64_T
typedef unsigned long long uint64_t;
/* else we may not have a 64-bit type */
#endif
By including sys/types.h from Android, we involve (near line 122/124)
#ifdef __BSD_VISIBLE
typedef unsigned char u_char;
typedef unsigned short u_short;
typedef unsigned int u_int;
typedef unsigned long u_long;
typedef uint32_t u_int32_t;
typedef uint16_t u_int16_t;
typedef uint8_t u_int8_t;
typedef uint64_t u_int64_t;
#endif
It seems that uint64_t has not been declared when it is assigned to u_int64_t. Indeed, sys/types.h includes stdint.h which has the following:
#if !defined __STRICT_ANSI__ || __STDC_VERSION__ >= 199901L
# define __STDC_INT64__
#endif
typedef __int8_t int8_t;
typedef __uint8_t uint8_t;
typedef __int16_t int16_t;
typedef __uint16_t uint16_t;
typedef __int32_t int32_t;
typedef __uint32_t uint32_t;
#if defined(__STDC_INT64__)
typedef __int64_t int64_t;
typedef __uint64_t uint64_t;
#endif
Likely STRICT_ANSI is not defined. Seems like this is a bug in the Android code in sys/types.h. If STDC_INT64 is not defined, it will not define uint64_t so it can't define u_int64_t. Perhaps the real solution is to have sys/types.h modified so that it has
#ifdef __BSD_VISIBLE
typedef unsigned char u_char;
typedef unsigned short u_short;
typedef unsigned int u_int;
typedef unsigned long u_long;
typedef uint32_t u_int32_t;
typedef uint16_t u_int16_t;
typedef uint8_t u_int8_t;
$if defined(__STDC_INT64__)
typedef uint64_t u_int64_t;
#endif
#endif
If you fix this, the next error will be in cstring.h:109
icu/source/common/cstring.h:109: error: 'int64_t' has not been declared
If you instead #define STDC_INT64 in common/unicode/ptypes.h it will go substantially farther, but will end at
icu/source/common/ustrenum.cpp:118: error: must #include <typeinfo> before using typeid
with more info here: http://groups.google.com/group/android-ndk/browse_thread/thread/2ec9dc289d815ba3?pli=1 but no real solutions
I also had this issue:
undefined reference to `mbstowcs'
You should build and link with higher version of android api.
Note:
I tried to link it with libraries from android-ndk/platforms/android-4... I had thought that 4 is version of Android, but 4 is version of Android API. And Android API 4 corresponds to Android 1.6 witch is very very old there is really no mbstowcs function in libc
Here is how i solved the problem. It is dirty but it works. The lib got compiled:
1. file: /icu4c/common/cwchar.h
comment out the #if U_HAVE_WCHAR_H and the respective #endif so the <wchar.h> is always included.
replace the previous uprv_wcstombs definition with:
#define uprv_wcstombs(mbstr, wcstr, count) U_STANDARD_CPP_NAMESPACE wcs2mbs(mbstr, wcstr, count)
replace the previous uprv_mbstowcs definition with:
#define uprv_mbstowcs(wcstr, mbstr, count) U_STANDARD_CPP_NAMESPACE mbs2wcs(wcstr, mbstr, count)
2. file: /icu4c/common/ustr_wcs.cpp
somewhere at the top, under the already existing includes add the line:
#include "../wcsmbs.h"
3. create new file "icu4c/wcsmbs.h"
size_t mbs2wcs(wchar_t * __ restrict pwcs, const char * __ restrict s, size_t n)
{
mbstate_t mbs;
const char *sp;
memset(&mbs, 0, sizeof(mbs));
sp=s;
return (mbsrtowcs(pwcs,&sp,n,&mbs));
}
size_t wcs2mbs(char * __restrict s, const wchar_t * __restrict pwcs, size_t n)
{
mbstate_t mbs;
const wchar_t *pwcsp;
memset(&mbs,0,sizeof(mbs));
pwcsp = pwcs;
return (wcsrtombs(s,&pwcsp,n,&mbs));
}
Hope it helps.
There has been an effort to provide NDK wrappers for the ICU libraries that are part of the system: https://android-review.googlesource.com/c/153001/.

Resources