Multiple copies of same function in C - visual-c++

I wrote a code meant to make some very fast calculations. I assume I know a certain number (because I can safely assume it is <15 and >2. Not a pretty way to implement something, but it allows loop unrolling and makes the code much faster)
(The code was written this way for practical considerations. I know it's not a good way to write a code, but in this case, that's the way it has to be)
Problem is I need to change the number in #define and compile again and again for every value
(Using Visual C++ 2010)
I figured macros might be the way to go, but I couldn't find out how to do such a thing. my lame attempt filed:
#define myCustomFunc(number) void myF_number() \
{ printf("%d",number); \
}
my goal is that something like this:
create_myfunc(2);
create_myfunc(3);
create_myfunc(4);
will expand to:
void myFunc_2(...)
{ ... #pragma unroll
for (int i<0; i<2;i++)
...
}
void myFunc_3(...)
{ ... #pragma unroll
for (int i<0; i<3;i++)
...
}
etc.
and to be able to call these functions from some function by their name, including the constant in it
if (x==2)
myFunc_2();
But from what I understand, one of the problems with doing such a thing is that such code doesn't expand if it's not inside a function, just gives an error.

You can use a macro parameter in the function name.
#define create_myfunc(number) \
void myFunc_##number(...) \
{ \
#pragma unroll \
for (int i = 0; i < number; i++) \
}
Note that I changed i < 0 to i = 0. Was this intended?

You could store all your function pointers in an array and then call appropriate function with array's index.
Also, if all optimized functions allow such hack you could write your own function in assembler that covers all your cases.

Related

Code analysis C26408 — Replacing the m_pszHelpFilePath variable in InitInstance

In my application's InitInstance function, I have the following code to rewrite the location of the CHM Help Documentation:
CString strHelp = GetProgramPath();
strHelp += _T("MeetSchedAssist.CHM");
free((void*)m_pszHelpFilePath);
m_pszHelpFilePath = _tcsdup(strHelp);
It is all functional but it gives me a code analysis warning:
C26408 Avoid malloc() and free(), prefer the nothrow version of new with delete (r.10).
When you look at the official documentation for m_pszHelpFilePath it does state:
If you assign a value to m_pszHelpFilePath, it must be dynamically allocated on the heap. The CWinApp destructor calls free( ) with this pointer. You many want to use the _tcsdup( ) run-time library function to do the allocating. Also, free the memory associated with the current pointer before assigning a new value.
Is it possible to rewrite this code to avoid the code analysis warning, or must I add a __pragma?
You could (should?) use a smart pointer to wrap your reallocated m_pszHelpFilePath buffer. However, although this is not trivial, it can be accomplished without too much trouble.
First, declare an appropriate std::unique_ptr member in your derived application class:
class MyApp : public CWinApp // Presumably
{
// Add this member...
public:
std::unique_ptr<TCHAR[]> spHelpPath;
// ...
};
Then, you will need to modify the code that constructs and assigns the help path as follows (I've changed your C-style cast to an arguably better C++ cast):
// First three (almost) lines as before ...
CString strHelp = GetProgramPath();
strHelp += _T("MeetSchedAssist.CHM");
free(const_cast<TCHAR *>(m_pszHelpFilePath));
// Next, allocate the shared pointer data and copy the string...
size_t strSize = static_cast<size_t>(strHelp.GetLength() + 1);
spHelpPath std::make_unique<TCHAR[]>(strSize);
_tcscpy_s(spHelpPath.get(), strHelp.GetString()); // Use the "_s" 'safe' version!
// Now, we can use the embedded raw pointer for m_pszHelpFilePath ...
m_pszHelpFilePath = spHelpPath.get();
So far, so good. The data allocated in the smart pointer will be automatically freed when your application object is destroyed, and the code analysis warnings should disappear. However, there is one last modification we need to make, to prevent the MFC framework from attempting to free our assigned m_pszHelpFilePath pointer. This can be done by setting that to nullptr in the MyApp class override of ExitInstance:
int MyApp::ExitInstance()
{
// <your other exit-time code>
m_pszHelpFilePath = nullptr;
return CWinApp::ExitInstance(); // Call base class
}
However, this may seem like much ado about nothing and, as others have said, you may be justified in simply supressing the warning.
Technically, you can take advantage of the fact that new / delete map to usual malloc/free by default in Visual C++, and just go ahead and replace. The portability won't suffer much as MFC is not portable anyway. Sure you can use unique_ptr<TCHAR[]> instead of direct new / delete, like this:
CString strHelp = GetProgramPath();
strHelp += _T("MeetSchedAssist.CHM");
std::unique_ptr<TCHAR[]> str_old(m_pszHelpFilePath);
auto str_new = std::make_unique<TCHAR[]>(strHelp.GetLength() + 1);
_tcscpy_s(str_new.get(), strHelp.GetLength() + 1, strHelp.GetString());
m_pszHelpFilePath = str_new.release();
str_old.reset();
For robustness for replaced new operator, and for least surprise principle, you should keep free / strdup.
If you replace multiple of those CWinApp strings, suggest writing a function for them, so that there's a single place with free / strdup with suppressed warnings.

finer-grained control than with LD_PRELOAD?

I have a dynamically linked ELF executable on Linux, and I want to swap a function in a library it is linked against. With LD_PRELOAD I can, of course, supply a small library with a replacement for the function that I compile myself. However, what if in the replacement I want to call the original library function? For example, the function may be srand(), and I want to hijack it with my own seed choice but otherwise let srand() do whatever it normally does.
If I were linking to make said executable, I would use the wrap option of the linker but here I only have the compiled binary.
One trivial solution I see is to cut and paste the source code for the original library function into the replacement - but I want to handle the more general case when the source is unavailable. Or, I could hex edit the needed extra code into the binary but that is specific to the binary and also time consuming. Is something more elegant possible than either of these? Such as some magic with the loader?
(Apologies if I were not using the terminology precisely...)
Here's an example of wrapping malloc:
// LD_PRELOAD will cause the process to call this instead of malloc(3)
// report malloc(size) calls
void *malloc(size_t size)
{
// on first call, get a function pointer for malloc(3)
static void *(*real_malloc)(size_t) = NULL;
static int malloc_signal = 0;
if(!real_malloc)
{
// real_malloc = (void *(*)(size_t))dlsym(RTLD_NEXT, "malloc");
*(void **) (&real_malloc) = dlsym(RTLD_NEXT, "malloc");
}
assert(real_malloc);
if (malloc_signal == 0)
{
char *string = getenv("MW_MALLOC_SIGNAL");
if (string != NULL)
{
malloc_signal = 1;
}
}
// call malloc(3)
void *retval = real_malloc(size);
fprintf(stderr, "MW! %f malloc size %zu, address %p\n", get_seconds(), size, retval);
if (malloc_signal == 1)
{
send_signal(SIGUSR1);
}
return retval;
}
The canonical answer is to use dlsym(RTLD_NEXT, ...).
From the man page:
RTLD_NEXT
Find the next occurrence of the desired symbol in the search
order after the current object. This allows one to provide a
wrapper around a function in another shared object, so that,
for example, the definition of a function in a preloaded
shared object (see LD_PRELOAD in ld.so(8)) can find and invoke
the "real" function provided in another shared object (or for
that matter, the "next" definition of the function in cases
where there are multiple layers of preloading).
See also this article.
Just for completeness, regarding editing the function name in the binary - I checked and it works but not without potential hiccups. E.g., in the example I mentioned, one can find the offset of "srand" (e.g., via strings -t x exefile | grep srand) and hex edit the string to "sran0". But names of symbols may be overlapping (to save space), so if the code also calls rand(), then there is only one "srand" string in the binary for both. After the change the unresolved references will then be to sran0 and ran0. Not a showstopper, of course, but something to keep in mind. The dlsym() solution is certainly more flexible.

Parallelism; recursive or iterative?

I've been asked a very interesting question about threads and how to implement them, specifically recursive or iterative. This is in the context of sorting algorithms like quicksort.
When you have an array of elements that need sorting, would you rather implement a tree structure of threads(so recursive) that keep spawning new threads until the sorting size threshold is reached, or would you rather divide the array from the very beginning in even chunks and spawn threads for them?
Example recursive psuedocode:
void sort(int array[], int start, int end){
if(array.size > THRESHOLD){
//partition logic with calls to sort()
}
//sorting logic
}
Example iterative psuedocode:
void sort(int array[], int start, int end){
if(array.size > THRESHOLD){
int numberOfChunks = array.size / 1000;
for(int i = 0; i < numberOfChunks; i++){
//spawn thread for every chunk with calls to sort(), technically also recursion but only once and can be rewritten easily
}
}
//sorting logic
}
Assume the calls to sort are separate threads. I didn't want to clutter the examples with boilerplate. Try and look past the multitude of errors.
Picture:
What I've been taught in college for quicksort is to use the recursive method. But(and this is my opinion and nothing else) I think recursion has a tendency to make code unreadable and complex. Sure it looks fancy and works well, but it's harder to read.
What is the recommended way of doing things here?

Debugging in threading building Blocks

I would like to program in threading building blocks with tasks. But how does one do the debugging in practice?
In general the print method is a solid technique for debugging programs.
In my experience with MPI parallelization, the right way to do logging is that each thread print its debugging information in its own file (say "debug_irank" with irank the rank in the MPI_COMM_WORLD) so that the logical errors can be found.
How can something similar be achieved with TBB? It is not clear how to access the thread number in the thread pool as this is obviously something internal to tbb.
Alternatively, one could add an additional index specifying the rank when a task is generated but this makes the code rather complicated since the whole program has to take care of that.
First, get the program working with 1 thread. To do this, construct a task_scheduler_init as the first thing in main, like this:
#include "tbb/tbb.h"
int main() {
tbb::task_scheduler_init init(1);
...
}
Be sure to compile with the macro TBB_USE_DEBUG set to 1 so that TBB's checking will be enabled.
If the single-threaded version works, but the multi-threaded version does not, consider using Intel Inspector to spot race conditions. Be sure to compile with TBB_USE_THREADING_TOOLS so that Inspector gets enough information.
Otherwise, I usually first start by adding assertions, because the machine can check assertions much faster than I can read log messages. If I am really puzzled about why an assertion is failing, I use printfs and task ids (not thread ids). Easiest way to create a task id is to allocate one by post-incrementing a tbb::atomic<size_t> and storing the result in the task.
If I'm having a really bad day and the printfs are changing program behavior so that the error does not show up, I use "delayed printfs". Stuff the printf arguments in a circular buffer, and run printf on the records later after the failure is detected. Typically for the buffer, I use an array of structs containing the format string and a few word-size values, and make the array size a power of two. Then an atomic increment and mask suffices to allocate slots. E.g., something like this:
const size_t bufSize = 1024;
struct record {
const char* format;
void *arg0, *arg1;
};
tbb::atomic<size_t> head;
record buf[bufSize];
void recf(const char* fmt, void* a, void* b) {
record* r = &buf[head++ & bufSize-1];
r->format = fmt;
r->arg0 = a;
r->arg1 = b;
}
void recf(const char* fmt, int a, int b) {
record* r = &buf[head++ & bufSize-1];
r->format = fmt;
r->arg0 = (void*)a;
r->arg1 = (void*)b;
}
The two recf routines record the format and the values. The casting is somewhat abusive, but on most architectures you can print the record correctly in practice with printf(r->format, r->arg0, r->arg1) even if the the 2nd overload of recf created the record.
~
~

Strange behavior of printk in linux kernel module

I am writing a code for linux kernel module and experiencing a strange behavior in it.
Here is my code:
int data = 0;
void threadfn1()
{
int j;
for( j = 0; j < 10; j++ )
printk(KERN_INFO "I AM THREAD 1 %d\n",j);
data++;
}
void threadfn2()
{
int j;
for( j = 0; j < 10; j++ )
printk(KERN_INFO "I AM THREAD 2 %d\n",j);
data++;
}
static int __init abc_init(void)
{
struct task_struct *t1 = kthread_run(threadfn1, NULL, "thread1");
struct task_struct *t2 = kthread_run(threadfn2, NULL, "thread2");
while( 1 )
{
printk("debug\n"); // runs ok
if( data >= 2 )
{
kthread_stop(t1);
kthread_stop(t2);
break;
}
}
printk(KERN_INFO "HELLO WORLD\n");
}
Basically I was trying to wait for threads to finish and then print something after that.
The above code does achieve that target but WITH "printk("debug\n");" not commented. As soon as I comment out printk("debug\n"); to run the code without debugging and load the module through insmod command, the module hangs on and it seems like it gets lost in recursion. I dont why printk effects my code in such a big way?
Any help would be appreciated.
regards.
You're not synchronizing the access to the data-variable. What happens is, that the compiler will generate a infinite loop. Here is why:
while( 1 )
{
if( data >= 2 )
{
kthread_stop(t1);
kthread_stop(t2);
break;
}
}
The compiler can detect that the value of data never changes within the while loop. Therefore it can completely move the check out of the loop and you'll end up with a simple
while (1) {}
If you insert printk the compiler has to assume that the global variable data may change (after all - the compiler has no idea what printk does in detail) therefore your code will start to work again (in a undefined behavior kind of way..)
How to fix this:
Use proper thread synchronization primitives. If you wrap the access to data into a code section protected by a mutex the code will work. You could also replace the variable data and use a counted semaphore instead.
Edit:
This link explains how locking in the linux-kernel works:
http://www.linuxgrill.com/anonymous/fire/netfilter/kernel-hacking-HOWTO-5.html
With the call to printk() removed the compiler is optimising the loop into while (1);. When you add the call to printk() the compiler is not sure that data isn't changed and so checks the value each time through the loop.
You can insert a barrier into the loop, which forces the compiler to reevaluate data on each iteration. eg:
while (1) {
if (data >= 2) {
kthread_stop(t1);
kthread_stop(t2);
break;
}
barrier();
}
Maybe data should be declared volatile? It could be that the compiler is not going to memory to get data in the loop.
Nils Pipenbrinck's answer is spot on. I'll just add some pointers.
Rusty's Unreliable Guide to Kernel Locking (every kernel hacker should read this one).
Goodbye semaphores?, The mutex API (lwn.net articles on the new mutex API introduced in early 2006, before that the Linux kernel used semaphores as mutexes).
Also, since your shared data is a simple counter, you can just use the atomic API (basically, declare your counter as atomic_t and access it using atomic_* functions).
Volatile might not always be "bad idea". One needs to separate out
the case of when volatile is needed and when mutual exclusion
mechanism is needed. It is non optimal when one uses or misuses
one mechanism for the other. In the above case. I would suggest
for optimal solution, that both mechanisms are needed: mutex to
provide mutual exclusion, volatile to indicate to compiler that
"info" must be read fresh from hardware. Otherwise, in some
situation (optimization -O2, -O3), compilers might inadvertently
leave out the needed codes.

Resources