GLIB: g_atomic_int_get becomes NO-OP? - linux

In a larger piece of code, I noticed that the g_atomic_* functions in glib were not doing what I expected, so I wrote this simple example:
#include <stdlib.h>
#include "glib.h"
#include "pthread.h"
#include "stdio.h"
void *set_foo(void *ptr) {
g_atomic_int_set(((int*)ptr), 42);
return NULL;
}
int main(void) {
int foo = 0;
pthread_t other;
if (pthread_create(&other, NULL, set_foo, &foo)== 0) {
pthread_join(other, NULL);
printf("Got %d\n", g_atomic_int_get(&foo));
} else {
printf("Thread did not run\n");
exit(1);
}
}
When I compile this with GCC's '-E' option (stop after pre-processing), I notice that the call to g_atomic_int_get(&foo) has become:
(*(&foo))
and g_atomic_int_set(((int*)ptr), 42) has become:
((void) (*(((int*)ptr)) = (42)))
Clearly I was expecting some atomic compare and swap operations, not just simple (thread-unsafe) assignments. What am I doing wrong?
For reference my compile command looks like this:
gcc -m64 -E -o foo.E `pkg-config --cflags glib-2.0` -O0 -g foo.c

The architecture you are on does not require a memory barrier for atomic integer set/get operations, so the transformation is valid.
Here's where it's defined: http://git.gnome.org/browse/glib/tree/glib/gatomic.h#n60
This is a good thing, because otherwise you'd need to lock a global mutex for every atomic operation.

Related

Self-modifying code does not work under advanced optimization

I tried to write a self-modifying code. (Refer to the link https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/) The self-modifying code works when there is no optimization (-o0)
gcc -O0 smc.c -o smc
Calling foo...
i: 1
Calling foo...
i: 42
while when the optimization level increases (-O1-O2-O3..) Self-modifying code no longer works.
gcc -O3 smc.c -o smc
Calling foo...
i: 1
Calling foo...
i: 1
Is it possible to make self-modifying code work with -O3 level optimization, and what should I do?
The program is as follows:
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>
void foo(void);
int change_page_permissions_of_address(void *addr);
int main(void) {
void *foo_addr = (void*)foo;
// Change the permissions of the page that contains foo() to read, write, and execute
// This assumes that foo() is fully contained by a single page
if(change_page_permissions_of_address(foo_addr) == -1) {
fprintf(stderr, "Error while changing page permissions of foo(): %s\n", strerror(errno));
return 1;
}
// Call the unmodified foo()
puts("Calling foo...");
foo();
// Change the immediate value in the addl instruction in foo() to 42
unsigned char *instruction = (unsigned char*)foo_addr + 22;
// Notice that 22 here is the offset that I compiled. Different compilations and machine offsets may vary.
*instruction = 0x2A;
// Call the modified foo()
puts("Calling foo...");
foo();
return 0;
}
void foo(void) {
int i=0;
i++;
printf("i: %d\n", i);
}
int change_page_permissions_of_address(void *addr) {
// Move the pointer to the page boundary
int page_size = getpagesize();
addr -= (unsigned long)addr % page_size;
if(mprotect(addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC) == -1)
{
return -1;
}
return 0;
}

which openmp schedule am I running?

How to check in runtime the openmp schedule?
I compile my code with a parallel loop and runtime scheduele
#pragma omp parallel for schedule(runtime) collapse(2)
for(j=1;j>-2;j-=2){
for(i=0;i<n;i++){
//nested loop code here
}
}
and I specify the environment variable OMP_SCHEDULE=dynamic,50.
How can I check in runtime that my program actually took the OMP_SCHEDULEvariable ?
I am using openmp 3.1 with gcc 4.7.3
I downloaded http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
Then went to the section "C/C++ Stub Routines" and found this
void omp_get_schedule(omp_sched_t *kind, int *chunk_size)
{
*kind = omp_sched_static;
*chunk_size = 0;
}
Then made this test
/*
typedef enum omp_sched_t {
omp_sched_static = 1,
omp_sched_dynamic = 2,
omp_sched_guided = 3,
omp_sched_auto = 4
} omp_sched_t;
*/
#include <omp.h>
#include <stdio.h>
int main(void) {
omp_sched_t kind;
int chunk;
omp_get_schedule(&kind, &chunk);
printf("%d %d\n", kind, chunk);
}
and compiled
gcc -fopenmp -O3 foo.c
and then
export OMP_SCHEDULE=static,50
./a.out
1 50
export OMP_SCHEDULE=dynamic,100
2 100
Note that omp_get_schedule only reports the runtime scheduling definition OMP_SCHEDULE. If you change the scheduling with e.g.
#pragma omp parallel for schedule(static,1)
and define OMP_SCHEDULE=dynamic,100 then omp_get_schedule still reports dynamic scheduling and chunk size 100.

Issue with dlopen and weak symbols

I have the following sequence
executable (main) ---- (dlopen)---> libapp.so ---(dynamically linked)--> libfoo.so
libfoo.so in turn dynamically links to libfoo_strong.so. libfoo.so invokes a function from
libfoo_strong.so, but also has a weak definition (within foo.c which is compiled into libfoo.so).
Now, libapp.so invokes a function from libfoo.so (say invoke_foo_func_ptr() and this function >invokes a function pointer which stores the symbol that is defined as weak. My expectation is that >invokes_foo_func_ptr invokes the strong symbol, but it always goes to the weak symbol. Pls see the >code below for details.
PS: Dont ask me to explain the reason particular sequence of execution, but I am open to >workarounds.
foo_strong.c --> gcc -g -fPIC -shared -rdynamic foo_strong.o -o libfoo_strong.so
foo.c: --> gcc -g -fPIC -shared -rdynamic -L/users/ardesiga/cprogs/ld_r foo.o -o libfoo.so
app.c: --> gcc -g -fPIC -shared -rdynamic -L/users/ardesiga/cprogs/ld_r -lfoo -lfoo_strong app.o -o > libapp.so
/* foo_strong.c */
int
foo_weak_func (char *msg)
{
printf("[%s:%s] Reached strong, with msg: %s\n", __FILE__, __FUNCTION__, msg);
}
/* foo.c */
#include <stdio.h>
#include <stdlib.h>
#include "foo_ext.h"
#include "foo_weak.h"
int __attribute__ ((weak)) foo_weak_func (char *msg)
{
printf("[%s:%s], Reached weak, with msg: %s\n", __FILE__, __FUNCTION__, msg);
}
typedef int (*func_ptr_t) (char *msg);
func_ptr_t foo_func_ptr = foo_weak_func;
void
invoke_foo_func_ptr (char *msg)
{
printf("Inside %s\n", __FUNCTION__);
if (foo_func_ptr) {
(*foo_func_ptr)(msg);
} else {
printf("foo_func_ptr is NULL\n");
}
}
/* app.c */
#include "foo.h"
int
app_init_func (char *msg)
{
printf("Inside %s:%s\n", __FILE__, __FUNCTION__);
invoke_foo_func_ptr(msg);
}
/* main.c */
int main (int argc, char *argv[])
{
void *dl_handle;
char *lib_name;
app_init_func_t app_init_func;
if (!(argc > 1)) {
printf("Library is not supplied, loading libapp.so\n");
lib_name = strdup("libapp.so");
} else {
lib_name = strdup(argv[2]);
}
printf("Loading library: %s\n", lib_name);
dl_handle = dlopen(lib_name, RTLD_LAZY);
if (!dl_handle) {
printf("Failed to dlopen on %s, error: %s\n", lib_name, dlerror());
exit(1);
}
app_init_func = dlsym(dl_handle, "app_init_func");
if (app_init_func) {
(*app_init_func)("Called via dlsym");
} else {
printf("dlsym did not file app_init_func");
}
return (0);
}
My expectation is that invokes_foo_func_ptr invokes the strong symbol, but it always goes to the weak symbol.
Your expectation is incorrect and everything is working as designed.
Weak symbols lose to strong symbols when you link a single ELF binary. If you were to link a normal (strong) function foo and a weak foo into libfoo.so, the the strong definition would have won.
When you have multiple ELF images, some with strong foo, and some with weak foo, the first ELF image to define foo (regardless of whether weak or strong) wins. The loader will simply not look for any additional ELF images in its search scope once it finds the first image that does provide a definition for foo.
Dont ask me to explain the reason particular sequence of execution
That is quite an obnoxious thing to say.
I have a guess as to what your reason may be, and a solution for it, but you'll have to provide your reason first.

C++ 11 threading error

I have a C++11 program that gives me this error:
terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted
Code:
const int popSize=100;
void initializePop(mt3dSet mt3dPop[], int index1, int index2, std::string ssmName, std::string s1, std::string s2, std::string s3, std::string s4, std::string mt3dnam, std::string obf1, int iNSP, int iNRM, int iNCM, int iNLY, int iOPT, int iNPS, int iNWL, int iNRO, int ssmPosition, int obsPosition ){
if((index1 >= index2)||index1<0||index2>popSize){
std::cout<<"\nInitializing population...\nIndex not valid..\nQuitting...\n";
exit(1);
}
for(int i=index1; i<index2; i++){
mt3dPop[i].setSSM(ssmName, iNSP, iNRM, iNCM, iNLY);
mt3dPop[i].setNam(toString(s1,s3,i));
mt3dPop[i].setObsName(toString(s1,s4,i));
mt3dPop[i].setSsmName(toString(s1,s2,i));
mt3dPop[i].getSSM().generateFl(toString(s1,s2,i),iOPT,iNPS);
mt3dPop[i].generateNam(mt3dnam, ssmPosition, obsPosition);
mt3dPop[i].setFitness(obf1, iNWL, iNRO);
}}
void runPackage(ifstream& inFile){
//all variables/function parameters for function call are read from inFile
unsigned int numThreads = std::thread::hardware_concurrency();// =4 in my computer
std::vector<std::thread> vt(numThreads-1);//three threads
for(int j=0; j<numThreads-1; j++){
vt[j]= std::thread(initializePop,mt3dPop,j*popSize/numThreads, (j+1)*popSize/numThreads, ssmName, s1,s2, s3, s4, mt3dnam,obf1,iNSP, iNRM, iNCM, iNLY, iOPT, iNPS, iNWL, iNRO, ssmPosition, obsPosition ); //0-24 in thread 1, 25-49 in thread 2, 50-74 in thread 3
}
//remaining 75 to 99 in main thread
initializePop(mt3dPop,(numThreads-1)*popSize/numThreads, popSize, ssmName, s1,s2, s3, s4, mt3dnam,obf1,iNSP, iNRM, iNCM, iNLY, iOPT, iNPS, iNWL, iNRO, ssmPosition, obsPosition);
for(int j=0; j<numThreads-1; j++){
vt[j].join();
}}
What does the error mean and how do I fix it?
You need to link correctly, and compile with -std=c++11 - see this example.
I'm guessing you had the same problem as me! (I compiled with -pthread and -std=c++11 rather than linking with those two. (But you will need to compile with std=c++11 as well as linking with it.))
Probably you want to do something like this:
g++ -c <input_files> -std=c++11
then
g++ -o a.out <input_files> -std=c++11 -pthread
... at least I think that's right. (Someone to confirm?)
How to reproduce these errors:
#include <iostream>
#include <stdlib.h>
#include <string>
using namespace std;
void task1(std::string msg){
cout << "task1 says: " << msg;
}
int main() {
std::thread t1(task1, "hello");
t1.join();
return 0;
}
Compile and run:
el#defiant ~/foo4/39_threading $ g++ -o s s.cpp
s.cpp: In function ‘int main()’:
s.cpp:9:3: error: ‘thread’ is not a member of ‘std’
s.cpp:9:15: error: expected ‘;’ before ‘t1’
You forgot to #include <thread>, include it and try again:
#include <iostream>
#include <stdlib.h>
#include <string>
#include <thread>
using namespace std;
void task1(std::string msg){
cout << "task1 says: " << msg;
}
int main() {
std::thread t1(task1, "hello");
t1.join();
return 0;
}
Compile and run:
el#defiant ~/foo4/39_threading $ g++ -o s s.cpp -std=c++11
el#defiant ~/foo4/39_threading $ ./s
terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted
Aborted (core dumped)
More errors, as you defined above, because you didn't specify -pthread in the compile:
el#defiant ~/foo4/39_threading $ g++ -o s s.cpp -pthread -std=c++11
el#defiant ~/foo4/39_threading $ ./s
task1 says: hello
Now it works.

Segmentation fault when calling backtrace() on Linux x86

I am attempting to do the following - write a wrapper for the pthreads library that will log some information whenever each of its APIs it called.
One piece of info I would like to record is the stack trace.
Below is the minimal snippet from the original code that can be compiled and run AS IS.
Initializations (file libmutex.c):
#include <execinfo.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <dlfcn.h>
static int (*real_mutex_lock)(pthread_mutex_t *) __attribute__((__may_alias__));
static void *pthread_libhandle;
#ifdef _BIT64
#define PTHREAD_PATH "/lib64/libpthread.so.0"
#else
#define PTHREAD_PATH "/lib/libpthread.so.0"
#endif
static inline void load_real_function(char* function_name, void** real_func) {
char* msg;
*(void**) (real_func) = dlsym(pthread_libhandle, function_name);
msg = dlerror();
if (msg != NULL)
printf("init: real_%s load error %s\n", function_name, msg);
}
void __attribute__((constructor)) my_init(void) {
printf("init: trying to dlopen '%s'\n", PTHREAD_PATH);
pthread_libhandle = dlopen(PTHREAD_PATH, RTLD_LAZY);
if (pthread_libhandle == NULL) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
load_real_function("pthread_mutex_lock", (void**) &real_mutex_lock);
}
The wrapper and the call to backtrace.
I have chopped as much as possible from the methods, so yes, I know that I never call the original pthread_mutex_lock for example.
void my_backtrace(void) {
#define SIZE 100
void *buffer[SIZE];
int nptrs;
nptrs = backtrace(buffer, SIZE);
printf("backtrace() returned %d addresses\n", nptrs);
}
int pthread_mutex_lock(pthread_mutex_t *mutex) {
printf("In pthread_mutex_lock\n"); fflush(stdout);
my_backtrace();
return 0;
}
To test this I use this binary (file tst_mutex.c):
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
int main (int argc, char *argv[]) {
pthread_mutex_t x;
printf("Before mutex\n"); fflush(stdout);
pthread_mutex_lock(&x);
printf("after mutex\n");fflush(stdout);
return 0;
}
Here is the way all this is compiled:
rm -f *.o *.so tst_mutex
cc -Wall -D_BIT64 -c -m64 -fPIC libmutex.c
cc -m64 -o libmutex.so -shared -fPIC -ldl -lpthread libmutex.o
cc -Wall -m64 tst_mutex.c -o tst_mutex
and run
LD_PRELOAD=$(pwd)/libmutex.so ./tst_mutex
This crashes with segmentation fault on Linux x86.
On Linux PPC everything works flawlessly.
I have tried a few versions of GCC compilers, GLIBC libraries and Linux distros - all fail.
The output is
init: trying to dlopen '/lib64/libpthread.so.0'
Before mutex
In pthread_mutex_lock
In pthread_mutex_lock
In pthread_mutex_lock
...
...
./run.sh: line 1: 25023 Segmentation fault LD_PRELOAD=$(pwd)/libmutex.so ./tst_mutex
suggesting that there is a recursion here.
I have looked at the source code for backtrace() - there is no call in it to locking mechanism. All it does is a simple walk over the stack frame linked list.
I have also, checked the library code with objdump, but that hasn't revealed anything out of the ordinary.
What is happening here?
Any solution/workaround?
Oh, and maybe the most important thing. This only happens with the pthread_mutex_lock function!!
Printing the stack from any other overridden pthread_* function works just fine ...
It is a stack overflow, caused by an endless recursion (as remarked by #Chris Dodd).
The backtrace() function runs different system calls being called from programs compiled with pthread library and without. Even if no pthread functions are called explicitly by the program.
Here is a simple program that uses the backtrace() function and does not use any pthread function.
#include <stdio.h>
#include <stdlib.h>
#include <execinfo.h>
int main(void)
{
void* buffer[100];
int num_ret_addr;
num_ret_addr=backtrace(buffer, 100);
printf("returned number of addr %d\n", num_ret_addr);
return 0;
}
Lets compile it without linking to the pthread and inspect the program system calls with the strace utility. No mutex related system call appears in the output.
$ gcc -o backtrace_no_thread backtrace.c
$ strace -o backtrace_no_thread.out backtrace_no_thread
No lets compile the same code linking it to the pthread library, run the strace and look at its output.
$ gcc -o backtrace_with_thread backtrace.c -lpthread
$ strace -o backtrace_with_thread.out backtrace_with_thread
This time the output contains mutex related system calls (their names may depend on the platform). Here is a fragment of the strace output file obtained on an X86 Linux machine.
futex(0x3240553f80, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x324480d350, FUTEX_WAKE_PRIVATE, 2147483647) = 0

Resources