How to include several source files in Nsight Eclipse? - linux

I have a project with many source files (examples: main.cu, a.cu, b.cu, c.cu, d.cu). Each with functions and kernel calls (global and device).
In a header (cpu.h) all the structures and definitions to be used in the host side.
Another header (gpu.h) all the structures and definitions to be used in the device side.
If I call kernel functions from main.cu, declared in a.cu. How do I #include those kernel functions declared in a.cu to use in main.cu, without doing the not recommended #include "a.cu"?
Do I create a header a.h with forward declaration of the kernel function in a.cu? Example:
extern void functionA(type);
What about the CUDA kernel functions? Should I create a header file for each source file that is used from another source file?
Where can I find some simple CUDA examples with multiple source files?
I mention Nsight Eclipse because I am having lots of trouble with it and multiple sources.
I am using CUDA 5.5 Toolkits in a Ubuntu Linux and Mac OS environments.
My main development environment is with 4 Tesla C1060 cards in the Ubuntu Linux machine.

Separating kernels. In a project, create two files (I refactored the default Runtime Project template and created device.cu and host.cu)
device.cu:
__device__ unsigned int bitreverse(unsigned int number) {
number = ((0xf0f0f0f0 & number) >> 4) | ((0x0f0f0f0f & number) << 4);
number = ((0xcccccccc & number) >> 2) | ((0x33333333 & number) << 2);
number = ((0xaaaaaaaa & number) >> 1) | ((0x55555555 & number) << 1);
return number;
}
__global__ void bitreverse(void *data) {
unsigned int *idata = (unsigned int*) data;
idata[threadIdx.x] = bitreverse(idata[threadIdx.x]);
}
host.cu:
extern __global__ void bitreverse(void *data);
...
bitreverse<<<1, WORK_SIZE, WORK_SIZE * sizeof(int)>>>(d);
Separate Compilation
Right-click project, go to properties.
Build/Settings.
Setup build for SM 2.0 or newer.
Select "Separate compilation" radio.
device.cu:
__device__ unsigned int bitreverse(unsigned int number) {
number = ((0xf0f0f0f0 & number) >> 4) | ((0x0f0f0f0f & number) << 4);
number = ((0xcccccccc & number) >> 2) | ((0x33333333 & number) << 2);
number = ((0xaaaaaaaa & number) >> 1) | ((0x55555555 & number) << 1);
return number;
}
host.cu:
extern __device__ unsigned int bitreverse(unsigned int number);
__global__ void bitreverse(void *data) {
unsigned int *idata = (unsigned int*) data;
idata[threadIdx.x] = bitreverse(idata[threadIdx.x]);
}
...
bitreverse<<<1, WORK_SIZE, WORK_SIZE * sizeof(int)>>>(d);
Isolate CUDA code One common pattern is to have CUDA code isolated in .cu files that have a host function wrapping kernel invocation. This way you can link object file produced from such .cu file to host code written in .cpp or .c files. Keep in mind that exported host code function should be qualified with extern "C" to be usable from .c files.
extern declarations can be put in .h file. Note that .h file with CUDA C syntax (__global__ is CUDA C-specific) cannot be included in .cpp or .c.
Adding files to projects
Usually I just copy files to project folder, right-click the project and do "Refresh". Nsight will index them and include in the build.
Excluding files from the build
If you absolutely need to, you can copy device code to headers and include the headers (convention is to have .cuh extension for such header files, though .h works just the same). You can include .cu - the problem is that Nsight considers such files a source files and tries to compile them. You may exclude .cu file from the build by checking "Exclude resource from the build" checkbox in the top of any property page in the Build subtree in build properties.
CUDA multi-file samples
Pretty much any non-trivial sample is broken up into multiple files. Just create an Nsight project from, e.g., "Particles" sample.

Related

Is there a way to make this crossplatform?

Im trying to get rid of a platform dependent code.
This code gets a string formated by a mask ("%Y-%m-%d %H:%M:%S") and convert it to std::tm struct and a std::time seconds.
I want to get rid of the need of preprocessor dependant blocks, as (imho) there must be a standard way of doing it in both platforms in the same way.
strptime does not exists in windows, std::get_time does not exists(?) on linux (g++ c++11).
Is there any way to make this code crossplatform (windows/linux) without using a factory pattern or preprocessor dependant blocks?
KDATETIME& KDATETIME::operator= (const char* k)
{
std::string smask;
smask = (std::string)this->mask;
std::string sk;
sk=k;
tm.tm_isdst=-1;
#ifdef WINDOWS <<---- THIS BLOCK
std::istringstream ss(sk);
ss >> std::get_time(&tm, smask.c_str());
#else
strptime(sk.c_str(), smask.c_str(), &tm);
#endif <<------
this->time = mktime(&tm); // t is now your desired time_t
[...]
}

c++ ~ shared object -> get host application offsets

Im writing a shared library for a FreeBSD application.
This library gets loaded by LD_PRELOAD.
This application has multiple compile-versions, so some function offsets might change and my library wont work there.
Now i want to read the offsets at loading the library.
The offsets are changing, so i think my only way is to read the offsets of specific function names.
The offsets are simply the offsets of functions or labels.
Now the problem - how to do it?
Example
In the first version, i call the main version like that:
int(*main)(int argc, char *argv[])=(int(*)(int,char*[]))0x081F3XXX;
but in the second, the offset has changed:
int(*main)(int argc, char *argv[])=(int(*)(int,char*[]))0x08233XXX;
Programmers (me) are lazy and don't want to compile their libs for every version.. I want to create a lib, that is for every version!
I simply need the offsets of the functions via function name, the rest is no problem..
Thats how i call the library:
LD_PRELOAD="/path/to/library.so" ./executable
or
env LD_PRELOAD="/path/to/library.so" ./executable
Edit with test code
Here my testcode regarding to the comments:
Main.cpp:
#include <stdio.h>
void test() {
printf("Test done.\n");
}
int main(int argc, char * argv[]) {
printf("Program started\n");
test();
}
lib.cpp
#include <stdio.h>
#include <dlfcn.h>
void __attribute__ ((constructor)) my_load(void);
void my_load(void) {
printf("Library loaded\n");
printf("test - offset: 0x%x\n",dlsym(NULL,"test"));
}
test.sh
g++ main.cpp -o program
g++ -shared lib.cpp -o lib.so
env LD_PRELOAD="lib.so" ./program
-> Result:
Library loaded
test - offset: 0x0
Program started
Test done.
Does not seem as would it work :s
Edit 15:45
printf("test - offset: 0x%x\n",dlsym(dlopen("/home/test/test_proc/program",RTLD_GLOBAL),"test"));
This also does not work.. Maybe dlsym is the wrong way?
I reproduced your program on Mac OS X using Clang, and found a solution. First, the boring parts:
To make it compile cleanly I had to change your %x format specifier to %p for the pointer.
Then, on Mac OS X I had to pass RTLD_MAIN_ONLY as the first argument to dlsym(). I guess this is platform-dependent; on Linux it does seem to be NULL as you have.
Now, the meat of the fix!
You're searching with dlsym() for a symbol called test. But there is no such symbol in your application. Why? Because you're using C++, and C++ does "name mangling." You could use any number of tools to figure out the mangled name and try to load that with dlsym(), but it could change with different compilers. So instead, just inhibit name mangling by enclosing your test() function in extern "C":
extern "C" {
void test() {
printf("Test done.\n");
}
}
This fixed it for me:
$ DYLD_INSERT_LIBRARIES=lib.so ./program
Library loaded
test - offset: 0x1027d1eb0
Program started
Test done.

Arduino sketch compiled from Windows or Linux performs differently

I have a very strange problem with a sketch which performs differently if compiled and uploaded to Arduino from Windows XP Home sp3 or Elementary OS Luna (a distro of Ubuntu Linux).
This sketch, between other things, reads a byte from a serial connection (bluetooth) and write it back to serial monitor.
This is what I get if I compile the sketch from WinXP: I sent over BT connection strings from "1" to "7" one time each. The ASCII code of these strings are reduced of 48 to transform string in byte. The result is correct, also functions in pointer array are correctly called.
and here is what I get from Linux. I sent 4 times each string from "1" to "7" to see that result has nothing to do with what I need to get and also is not consistent with the same input data: for example when I send string "2" I get 104 106 106 104..... and same byte 106 is written with different Strings coming from BT.
Also the functions are not called so it means that is not a Serial.print issue.
I'm sure it is a compiling issue because once the sketch is uploaded in Arduino it performs in the same way (correct or not) if I use serial monitor in WinXP or Linux.
Here's the sketch
#include "Arduino.h"
#include <SoftwareSerial.h>
#include <Streaming.h>
#define nrOfCommands 10
typedef void (* CmdFuncPtr) (); // this is a typedef to command functions
//the following declares an arry of 10 function pointers of type DigitFuncPtr
CmdFuncPtr setOfCmds[nrOfCommands] = {
noOp,
leftWindowDown,
leftWindowUp,
bootOpen,
cabinLightOn,
cabinLightOff,
lockOn,
lockOff,
canStart,
canStop
};
#define cmdLeftWindowDown 1
#define cmdLeftWindowUp 2
#define cmdBootOpen 3
#define cmdCabinLightOn 4
#define cmdCabinLightOff 5
#define cmdLockOn 6
#define cmdLockOff 7
#define cmdCanStart 8
#define cmdCanStop 9
#define buttonPin 4 // the number of the pushbutton pin
#define bluetoothTx 2
#define bluetoothRx 3
int buttonState = 0; // variable for reading the pushbutton status
int androidSwitch=0;
byte incomingByte; // incoming data
byte msg[12];
byte msgLen=0;
byte msgIdMsb=0;
byte msgIdLsb=0;
//const byte cmdLeftWindowDown;
SoftwareSerial bluetooth(bluetoothTx,bluetoothRx);
void setup()
{
//Setup usb serial connection to computer
Serial.begin(115200);
//Setup Bluetooth serial connection to android
bluetooth.begin(115200);
//bluetooth.print("$$$");
randomSeed(analogRead(10));
delay(100);
//bluetooth.println("U,9600,E");
//bluetooth.begin(9600);
//time=0;
}
void loop() {
msgIdLsb=random(1,255);
msgIdMsb=random(0,5);
msg[0]=msgIdMsb;
msg[1]=msgIdLsb;
msgLen=random(9);
msg[2]=msgLen;
for (int x=3;x<msgLen+3;x++) {
msg[x]=random(255);
}
for (int x=3+msgLen;x<11;x++) {
msg[x]=0;
}
msg[11]='\n';
// read the state of the pushbutton value:
buttonState = digitalRead(buttonPin);
if ((buttonState == HIGH)||(androidSwitch==HIGH)) {
for (int x=0;x<12;x++) {
Serial<<msg[x]<<" ";
bluetooth.write(uint8_t(msg[x]));
}
Serial<<endl;
}
//Read from bluetooth and write to usb serial
if(bluetooth.available())
{
incomingByte = bluetooth.read()-48;
Serial<<incomingByte<<endl;
if (incomingByte<nrOfCommands)
setOfCmds[incomingByte]();
}
delay(10);
}
void noOp(void)
{
Serial<<"noOp"<<endl;
};
void leftWindowDown(void)
{
Serial<<"leftWindowDown"<<endl;
};
void leftWindowUp(void)
{
Serial<<"leftWindowUp"<<endl;
};
void bootOpen(void)
{
Serial<<"bootOpen"<<endl;
};
void cabinLightOn(void)
{
Serial<<"cabinLightOn"<<endl;
};
void cabinLightOff(void)
{
Serial<<"cabinLightOff"<<endl;
};
void lockOn(void)
{
Serial<<"lockOn"<<endl;
};
void lockOff(void)
{
Serial<<"lockOff"<<endl;
};
void canStart(void)
{
androidSwitch=HIGH;
};
void canStop(void)
{
androidSwitch=LOW;
};
Any help would be very helpful.
Thanks in advance.
I suppose you are using the arduino ide; if not, some of the following might not apply.
First, find out the location of the build directory the ide is using when it compiles and links the code. [One way to find out is to temporarily turn on Verbose output during compilation. (Click File, Preferences, "Show verbose output during compilation".) Click the Verify button to compile the code, and look at the path following the -o option in the first line of output.] For example, on a Linux system the build directory path might be something like /tmp/build3877126492387157498.tmp. In that directory, look for the .cpp file created during compilation.
After you find the .cpp files for your sketch on both systems, copy them onto one system so you can compare them and check for differences. If they are different, one or the other ide might be corrupt or an incorrect include might be occurring.
If the .cpp files differ, compare the compile flags, the header files, etc. I think the flags and AVR header files should be the same on both systems, with the possible exception that MSW files might have carriage return characters after the newline characters. Also check the gcc versions. [I don't have an MSW system to try, but I'm supposing that gcc is used on both systems for AVR cross-compiling. Please correct me if I'm wrong.]
If the .cpp files match, then test the generated binary files to find out where they differ. (For example, if the sketch file is Blink21x.ino, binary files might be Blink21x.cpp.elf or Blink21x.cpp.hex.) If you have a .elf file on both systems [I don't know if the MSW system will generate .elf] use avr-objdump on the Linux system to produce a disassembled version of code:
avr-objdump -d Blink21x.cpp.elf > Blink21x.cpp.lst
Then use diff to locate differences between the two disassembly files. Enough information is available in the .lst file to identify your source line if the difference is due to how your source was compiled, as opposed to a difference in libraries. (In the latter case, enough information is given in the .lst file to identify which library routines differ.)
If you don't have an .elf file on the MSW system, you might try comparing the .hex files. From the location of the difference you can find the relevant line in the Linux-system .elf-disassembly file, and from that can identify a line of your code or a library routine.

Compiling/Linking CUDA and CPP Source Files

I am working through a sample program that uses both C++ source code as well as CUDA. This is the essential content from my four source files.
matrixmul.cu (main CUDA source code):
#include <stdlib.h>
#include <cutil.h>
#include "assist.h"
#include "matrixmul.h"
int main (int argc, char ** argv)
{
...
computeGold(reference, hostM, hostN, Mh, Mw, Nw); //reference to .cpp file
...
}
matrixmul_gold.cpp (C++ source code, single function, no main method):
void computeGold(float * P, const float * M, const float * N, int Mh, int Mw, int Nw)
{
...
}
matrixmul.h (header for matrixmul_gold.cpp file)
#ifndef matrixmul_h
#define matrixmul_h
extern "C"
void computeGold(float * P, const float * M, const float * N, int Mh, int Mw, int Nw);
#endif
assist.h (helper functions)
I am trying to compile and link these files so that they, well, work. So far I can get matrixmul_gold.cpp compiled using:
g++ -c matrixmul_gold.cpp
And I can compile the CUDA source code with out errors using:
nvcc -I/home/sbu/NVIDIA_GPU_Computing_SDK/C/common/inc -L/home/sbu/NVIDIA_GPU_Computing_SDK/C/lib matrixmul.cu -c -lcutil_x86_64
But I just end up with two .O files. I've tried a lot of different ways to link the two .O files but so far it's a no-go. What's the proper approach?
UPDATE: As requested, here is the output of:
nm matrixmul_gold.o matrixmul.o | grep computeGold
nm: 'matrixmul.o': No such file
0000000000000000 T _Z11computeGoldPfPKfS1_iii
I think the 'matrixmul.o' missing error is because I am not actually getting a successful compile when running the suggested compile command:
nvcc -I/home/sbu/NVIDIA_GPU_Computing_SDK/C/common/inc -L/home/sbu/NVIDIA_GPU_Computing_SDK/C/lib -o matrixmul matrixmul.cu matrixmul_gold.o -lcutil_x86_64
UPDATE 2: I was missing an extern "C" from the beginning of matrixmul_gold.cpp. I added that and the suggested compilation command works great. Thank you!
Conventionally you would use whichever compiler you are using to compile the code containing the main subroutine to link the application. In this case you have the main in the .cu, so use nvcc to do the linking. Something like this:
$ g++ -c matrixmul_gold.cpp
$ nvcc -I/home/sbu/NVIDIA_GPU_Computing_SDK/C/common/inc \
-L/home/sbu/NVIDIA_GPU_Computing_SDK/C/lib \
-o matrixmul matrixmul.cu matrixmul_gold.o -lcutil_x86_64
This will link an executable binary called matrimul from matrixmul.cu, matrixmul_gold.o and the cutil library (implicitly nvcc will link the CUDA runtime library and CUDA driver library as well).

SFML tutorial 1: thread problems

Hi so i am using msVS++2010 and have been attempting to set up SFML all day....
I downloaded 1.6 from the site, then rebuilt it in VS2010, but sad to find that this did not result in a sfml-system-d.lib file, which is what i am used to using, and only produced new system-s and system-s-d libs.
I then closely watched this Video to find that he ran his test code by adding the external lib of sfml-system-s-d and so i added the sfml-system-d.dll next the .exe and got the following exact same code the video showed to work:
#include <iostream>
#include <SFML/System.hpp>
int main(int argc, char **argv)
{
sf::Clock clock;
sf::Sleep(0.1f);
while(clock.GetElapsedTime() < 5.0f)
{
std::cout << clock.GetElapsedTime() << std::endl;
sf::Sleep(0.5f);
}
}
obviously clock and sleep are working, but when i add the simple line of code 'sf::Thread thread();' an error box pops up saying "unable to start program," "configuration is incorrect," "Review the manifest file for possible errors," "renstalling my fix it."
Also: when trying to run the first program of the tutorials regarding threads:
#include <SFML/System.hpp>
#include <iostream>
void ThreadFunction(void* UserData)
{
// Print something...
for (int i = 0; i < 10; ++i)
std::cout << "I'm the thread number 1" << std::endl;
}
int main()
{
// Create a thread with our function
sf::Thread Thread(&ThreadFunction);
// Start it !
Thread.Launch();
// Print something...
for (int i = 0; i < 10; ++i)
std::cout << "I'm the main thread" << std::endl;
return EXIT_SUCCESS;
}
I get 8 unresovled external symbols like this one:
1>sfml-system-s-d.lib(Thread.obj) : error LNK2001: unresolved external symbol "__declspec(dllimport) public: int __thiscall std::ios_base::width(int)" (__imp_?width#ios_base#std##QAEHH#Z)
fatal error LNK1120: 8 unresolved externals
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
Lastly this is how my project is set up:
include directory to out of the box, freshly downloaded SFML 1.6/include
lib directory to the VS2010 rebuilt SFML (debug/release DLL setting, and static).
extra dependency on sfml-system-s-d.lib file.
out of frusteration i placed every dll file next to the .exe
It sounds like you might not be linking to the CRT when building SFML. (ios_width is iostream, which requires the CRT library.)
You need to rebuild SFML, except this time do the following:
0. copy this list of libs
kernel32.lib
user32.lib
gdi32.lib
winspool.lib
comdlg32.lib
advapi32.lib
shell32.lib
ole32.lib
oleaut32.lib
uuid.lib
odbc32.lib
odbccp32.lib
go into each
individual Project's Properties ->
Configuration -> Linker -> Input.
or if it doesn't have 'Linker' go
into Properties -> Configuration ->
Librarian.
Set "Ignore Default Libraries" to
"no" and it will probably work
If you wanna be 100% safe, click on additional dependencies, expand it, and click "edit." now just paste in the libs above
If your in the 'librarian' tab, set
Link Library Dependencies to YES
repeat steps 1-4 each time you
change the build setting of Debug
DLL, Debug static, etc.
When I recompiled SFML (granted, I have a static compile because 1.6 is the last of the 1.x line, and 2.0 isn't compatible ;)) I had to add those references. It will ignore (and 'warn' about ignoring) anything it doesn't need, but they are the defaults ;)
Unfortunately you'll need to update everything in the SFML solution, as, if I recall correctly, they are all missing the default libraries.

Resources