What's the significance of (void) (&_max1 == &_max2); in the following definition of max found in Linux/tools/lib/lockdep/uinclude/linux/kernel.h?
#define max(x, y) ({ \
typeof(x) _max1 = (x); \
typeof(y) _max2 = (y); \
(void) (&_max1 == &_max2); \
_max1 > _max2 ? _max1 : _max2; })
It helps the compiler detect invalid uses of max(), i.e. with non-comparable x and y. As Sukminder points out, the == check is only used at compile time, it doesn't end up in the resulting binary.
Related
As title, I was calculating the exponential of an array of complex numbers in the RawKernel provided by cupy. But I don't know how to include or invoke the function "cexpf" or "cexp" correctly. The error message always shows me that "cexpf" is undefined. Does anybody know how to invoke the function in the correct way? Thank you a lot for the answer.
import cupy as cp
import time
add_kernel = cp.RawKernel(r'''
#include <cupy/complex.cuh>
#include <cupy/complex/cexpf.h>
extern "C" __global__
void test(double* x, double* y, complex<float>* z){
int tId_x = blockDim.x*blockIdx.x + threadIdx.x;
int tId_y = blockDim.y*blockIdx.y + threadIdx.y;
complex<float> value = complex<float>(x[tId_x],y[tId_y]);
z[tId_x*blockDim.y*gridDim.y+tId_y] = cexpf(value);
}''',"test")
x = cp.random.rand(1,8,4096,dtype = cp.float32)
#x = cp.arange(0,4096,dtype = cp.uint32)
y = cp.random.rand(1,8,4096,dtype = cp.float32)
#y = cp.arange(4096,8192,dtype = cp.uint32)
z = cp.zeros((4096,4096), dtype = cp.complex64)
t1 = time.time()
add_kernel((128,128),(32,32),(x,y,z))
print(time.time()-t1)
print(z)
Looking at the headers it seems like you are supposed to just call exp and you don't need to include cupy/complex/cexpf.h yourself, as it is already included implicitly via cupy/complex.cuh.
add_kernel = cp.RawKernel(r'''
#include <cupy/complex.cuh>
extern "C" __global__
void test(double* x, double* y, complex<float>* z){
int tId_x = blockDim.x*blockIdx.x + threadIdx.x;
int tId_y = blockDim.y*blockIdx.y + threadIdx.y;
complex<float> value = complex<float>(x[tId_x],y[tId_y]);
z[tId_x*blockDim.y*gridDim.y+tId_y] = exp(value);
}''',"test")
Generally Cupy's custome kernel C++ complex number API is taken from Thrust, so you can consult the Thrust documentation. Just skip using the thrust:: namespace.
Thrusts API in turn tries to implement the C++ std::complex API for the most part, so looking at the C++ standard library documentation might also be helpful when the Thrust documentation does not go deep enough. Just be careful because Thrust might no give all the same guarantees to avoid performance problems on the GPU.
In C++, if using vector types from the STL, some compilers such as GCC have compilation options like _GLIBCXX_ASSERTIONS which will force it to add bound checks for vectors in debug builds, so that if I try to access an element at a position longer than the vector, it will result in an error.
RcppArmadillo has a similar debug macro that can be turned on for bounds checking.
Does Rcpp have something similar for classes like Rcpp::NumericVector or Rcpp::IntegerVector?
Yes it does:
> Rcpp::cppFunction("IntegerVector foo(IntegerVector v) { v.at(11) = 42L; \
return v; }")
> foo( 1:10 )
Error in foo(1:10) : Index out of bounds: [index=11; extent=10].
>
You can also consider RcppArmadillo where Armadillo has the checks on by default with ability to disable:
> Rcpp::cppFunction("arma::vec foo(arma::vec v) { v(11) = 42L; \
return v; }", depends="RcppArmadillo")
> foo( 1:10 )
Error in foo(1:10) : Mat::operator(): index out of bounds
>
Is init_task thread is per cpu? or just one ?
If Im iterating from Init_task ill get to all of the threads in all of cpus ?
for example , using the following Macro which define at sched.h :
#define do_each_thread(g, t) \
for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
will iterate over all threads ?
thanks !
I wrote a code meant to make some very fast calculations. I assume I know a certain number (because I can safely assume it is <15 and >2. Not a pretty way to implement something, but it allows loop unrolling and makes the code much faster)
(The code was written this way for practical considerations. I know it's not a good way to write a code, but in this case, that's the way it has to be)
Problem is I need to change the number in #define and compile again and again for every value
(Using Visual C++ 2010)
I figured macros might be the way to go, but I couldn't find out how to do such a thing. my lame attempt filed:
#define myCustomFunc(number) void myF_number() \
{ printf("%d",number); \
}
my goal is that something like this:
create_myfunc(2);
create_myfunc(3);
create_myfunc(4);
will expand to:
void myFunc_2(...)
{ ... #pragma unroll
for (int i<0; i<2;i++)
...
}
void myFunc_3(...)
{ ... #pragma unroll
for (int i<0; i<3;i++)
...
}
etc.
and to be able to call these functions from some function by their name, including the constant in it
if (x==2)
myFunc_2();
But from what I understand, one of the problems with doing such a thing is that such code doesn't expand if it's not inside a function, just gives an error.
You can use a macro parameter in the function name.
#define create_myfunc(number) \
void myFunc_##number(...) \
{ \
#pragma unroll \
for (int i = 0; i < number; i++) \
}
Note that I changed i < 0 to i = 0. Was this intended?
You could store all your function pointers in an array and then call appropriate function with array's index.
Also, if all optimized functions allow such hack you could write your own function in assembler that covers all your cases.
I am trying to generate a comprehensive callgraph (complete with low level calls to Linux, runtime, the lot).
I have statically compiled my source files with "-fdump-rtl-expand" and created RTL files, which I passed to a PERL script called Egypt (which I believe is Graphviz/Dot) and generated a PDF file of the callgraph. This works perfectly, no problems at all.
Except, there are calls being made into some libraries that are getting shown as built-in. I was looking to see if there is a way for the callgraph not to be printed as and instead the real calls made into the libraries ?
Please let me know if the question is unclear.
http://i.imgur.com/sp58v.jpg
Basically, I am trying to avoid the callgraph from generating < built-in >
Is there a way to do that ?
-------- CODE ---------
#include <cilk/cilk.h>
#include <stdio.h>
#include <stdlib.h>
unsigned long int t0, t5;
unsigned int NOSPAWN_THRESHOLD = 32;
int fib_nospawn(int n)
{
if (n < 2)
return n;
else
{
int x = fib_nospawn(n-1);
int y = fib_nospawn(n-2);
return x + y;
}
}
// spawning fibonacci function
int fib(long int n)
{
long int x, y;
if (n < 2)
return n;
else if (n <= NOSPAWN_THRESHOLD)
{
x = fib_nospawn(n-1);
y = fib_nospawn(n-2);
return x + y;
}
else
{
x = cilk_spawn fib(n-1);
y = cilk_spawn fib(n-2);
cilk_sync;
return x + y;
}
}
int main(int argc, char *argv[])
{
int n;
long int result;
long int exec_time;
n = atoi(argv[1]);
NOSPAWN_THRESHOLD = atoi(argv[2]);
result = fib(n);
printf("%ld\n", result);
return 0;
}
I compiled the Cilk Library from source.
I might have found the partial solution to the problem:
You need to pass the following option to egypt
--include-external
This produced a slightly more comprehensive callgraph, although there still is the " visible
http://i.imgur.com/GWPJO.jpg?1
Can anyone suggest if I get more depth in the callgraph ?
You can use the GCC VCG Plugin: A gcc plugin, which can be loaded when debugging gcc, to show internal structures graphically.
gcc -fplugin=/path/to/vcg_plugin.so -fplugin-arg-vcg_plugin-cgraph foo.c
Call-graph is place to store data needed
for inter-procedural optimization. All datastructures
are divided into three components:
local_info that is produced while analyzing
the function, global_info that is result
of global walking of the call-graph on the end
of compilation and rtl_info used by RTL
back-end to propagate data from already compiled
functions to their callers.