Handling __sync_add_and_fetch not being defined - multithreading

In my open source software project, I call the gcc atomic builtins: __sync_add_and_fetch and __sync_sub_and_fetch to implement atomic increments and decrements on certain variables. I periodically get an email from someone trying to compile my code, but they get the following linker error:
refcountobject.cpp:(.text+0xb5): undefined reference to `__sync_sub_and_fetch_4'
refcountobject.cpp:(.text+0x115): undefined reference to `__sync_add_and_fetch_4'
After some digging, I narrowed down the root cause to the fact that their older version of gcc (4.1) defaults to a target architecture of i386. And evidently, gcc doesn't actually have an intrinsic for atomic addition on 80386, so it implicitly injects an undefined __sync_add_and_fetch_4 call in it place. A great description of how this works is here.
The easy workaround, as discussed here, is to tell them to modify the Makefile to append -march=pentium as one of the compiler flags. And all is good.
So what's the long term fix so users don't have to manually fix the Makefile?
I am considering a few ideas:
I don't want to hardcode -march=pentium as a compiler flag into the Makefile. I'm guessing that will break on anything that isn't Intel based. But I could certainly could add it if the Makefile had a rule to detect that the default target was i386. I'm thinking about having a rule in the Makefile that is a script that calls gcc -dumpmachine and parses out the first triplet. If the string is i386, it would add the compiler flag. I'm assuming no one will be actually be building for 80386 machines.
The other alternative is to actually supply an implementation for __sync_add_and_fetch_4 for the linker to fall back on. It could even be compiled conditionally based on the presence of GCC_HAVE_SYNC_COMPARE_AND_SWAP macros being defined. I prototyped an implementation with a global pthread_mutex. Likely not the best performance, but it works and resolves the issue nicely. A better idea might be to write the inline assembly myself to call "lock xadd" for the implementation if compiling for x86.

This is my other working solution. It might have it's place in certain situations, but I opted for the makefile+script solution above.
This solution is to provide local definitions for _sync_add_and_fetch_4, _sync_fetch_and_add_4, _sync_sub_and_fetch_4, and _sync_fetch_and_sub_4 in a separate source file. They get linked in only if the compiler couldn't natively generate them. Some assembly required, but Wikipedia of all places had a reasonable implementation that I could reference. (I also disassembled what the compiler normally generates to infer if everything else was correct).
#if defined(__i386) || defined(i386) || defined(__i386__)
extern "C" unsigned int xadd_4(volatile void* pVal, unsigned int inc)
{
unsigned int result;
unsigned int* pValInt = (unsigned int*)pVal;
asm volatile(
"lock; xaddl %%eax, %2;"
:"=a" (result)
: "a" (inc), "m" (*pValInt)
:"memory" );
return (result);
}
extern "C" unsigned int __sync_add_and_fetch_4(volatile void* pVal, unsigned int inc)
{
return (xadd_4(pVal, inc) + inc);
}
extern "C" unsigned int __sync_sub_and_fetch_4(volatile void* pVal, unsigned int inc)
{
return (xadd_4(pVal, -inc) - inc);
}
extern "C" unsigned int __sync_fetch_and_add_4(volatile void* pVal, unsigned int inc)
{
return xadd_4(pVal, inc);
}
extern "C" unsigned int __sync_fetch_and_sub_4(volatile void* pVal, unsigned int inc)
{
return xadd_4(pVal, -inc);
}
#endif

With no replies, I struck it out on my own to solve.
There are two possible solutions this is one of them.
First, add the following script, getfixupflags.sh, to the same directory as the Makefile. This script will detect if the compiler is likely targeting i386, and if so will echo out "-march=pentium" as output.
#!/bin/bash
_cxx=$1
_fixupflags=
_regex_i386='^i386'
if [[ ! -n $_cxx ]]; then echo "_cxx var is empty - exiting" >&2; exit; fi
_target=`$_cxx -dumpmachine`
if [[ $_target =~ $_regex_i386 ]]; then
_fixupflags="$_fixupflags -march=pentium"
fi
if [[ -n $_fixupflags ]]; then echo $_fixupflags; fi
Now fix the Makefile to use this script. Add the following line to the Makefile
FIXUP_FLAGS := $(shell getfixupflags.sh $(CXX))
Then modify the compiler directives in the Makefile to include the FIXUP_FLAGS when compiling code. For example:
%.o: %.cpp
$(COMPILE.cpp) $(FIXUP_FLAGS) $^

Related

On Linux, why does this library loaded with LD_PRELOAD catch only some openat() calls?

I am trying to intercept openat() calls with the following library comm.c. This is very standard minimal example, nothing special about it. I compile it with
>gcc -shared -Wall -fPIC -Wl,-init,init comm.c -o comm.so
I am pasting this standard minimal example to show that, I thought, I knew what I was doing.
#define _GNU_SOURCE
#include <dlfcn.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdarg.h>
#include <stdio.h>
typedef int (*openat_type)(int, const char *, int, ...);
static openat_type g_orig_openat;
void init() {
g_orig_openat = (openat_type)dlsym(RTLD_NEXT,"openat");
}
int openat(int dirfd, const char* pathname, int flags, ...) {
int fd;
va_list ap;
if (flags & (O_CREAT)) {
va_start(ap, flags);
fd = g_orig_openat(dirfd, pathname, flags, va_arg(ap, mode_t));
}
else
fd = g_orig_openat(dirfd, pathname, flags);
printf("openat dirfd %d pathname %s\n", dirfd, pathname);
return fd;
}
I am running a tar command, again a minimal example, untarring an archive containing a single file foobar, to a pre-existing subdirectory dir:
>strace -f tar xf foobar.tar -C dir 2>&1 | grep openat
openat(AT_FDCWD, "dir", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
openat(4, "foobar", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0600) = -1 EEXIST (File exists)
openat(4, "foobar", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_NONBLOCK|O_CLOEXEC, 0600) = 5
However,
>LD_PRELOAD=./comm.so tar xf foobar.tar -C dir
openat dirfd 4 pathname foobar
openat dirfd 4 pathname foobar
OK, I know how to handle this - I have done this before - the reason for this kind of discrepancy, is that the system call openat() that is shown by strace is not done by the same-named user function openat(). To find out what that other user function is, one gets the sources, rebuilds them, and finds out.
So, I got the sources for my tar:
>$(which tar) --version
tar (GNU tar) 1.26
I got the tar 1.26 sources and rebuilt them myself, and, lo and behold, if I use the binary tar that I built, rather than the above installed one, then comm.so does catch all 3 openat calls!
So that means there is no "other user function".
Please help, what is possibly going on here??
NO, the question is not answered by that previous question. That previous answer simply said, the library call may be differently named, than the underlying system call. Here, that is NOT the case because I recompiled the same code myself, and there are no other library calls in there.
According to the discussion mentioned, openat will probably be called by different symbol or function. The system call dumped by tool such as strace is raw system call. It might be wrapped by user function or glibc. If you want intercept it by LD_PRELOAD, you need to find out those wrapper instead of openat. To my experience, you can try intercept open64 or open, it can redirect to openat which you observe on strace.
The link is one example to wrap openat from open64.

ftrace: system crash when changing current_tracer from function_graph via echo

I have been playing with ftrace recently to monitor some behavior characteristics of my system. I've been handling switching the trace on/off via a small script. After running the script, my system would crash and reboot itself. Initially, I believed that there might be an error with the script itself, but I have since determined that the crash and reboot is a result of echoing some tracer to /sys/kernel/debug/tracing/current_tracer when current_tracer is set to function_graph.
That is, the following sequence of commands will produce the crash/reboot:
echo "function_graph" > /sys/kernel/debug/tracing/current_tracer
echo "function" > /sys/kernel/debug/tracing/current_tracer
Durning the reboot after the crash caused by the above echo statements, I see a lot of output that reads:
clearing orphaned inode <inode>
I tried to reproduce this problem by replacing the current_tracer value from function_graph to something else in a C program:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
int openCurrentTracer()
{
int fd = open("/sys/kernel/debug/tracing/current_tracer", O_WRONLY);
if(fd < 0)
exit(1);
return fd;
}
int writeTracer(int fd, char* tracer)
{
if(write(fd, tracer, strlen(tracer)) != strlen(tracer)) {
printf("Failure writing %s\n", tracer);
return 0;
}
return 1;
}
int main(int argc, char* argv[])
{
int fd = openCurrentTracer();
char* blockTracer = "blk";
if(!writeTracer(fd, blockTracer))
return 1;
close(fd);
fd = openCurrentTracer();
char* graphTracer = "function_graph";
if(!writeTracer(fd, graphTracer))
return 1;
close(fd);
printf("Preparing to fail!\n");
fd = openCurrentTracer();
if(!writeTracer(fd, blockTracer))
return 1;
close(fd);
return 0;
}
Oddly enough, the C program does not crash my system.
I originally encountered this problem while using Ubuntu (Unity environment) 16.04 LTS and confirmed it to be an issue on the 4.4.0 and 4.5.5 kernels. I have also tested this issue on a machine running Ubuntu (Mate environment) 15.10, on the 4.2.0 and 4.5.5 kernels, but was unable to reproduce the issue. This has only confused me further.
Can anyone give me insight on what is happening? Specifically, why would I be able to write() but not echo to /sys/kernel/debug/tracing/current_tracer?
Update
As vielmetti pointed out, others have had a similar issue (as seen here).
The ftrace_disable_ftrace_graph_caller() modifies jmp instruction at
ftrace_graph_call assuming it's a 5 bytes near jmp (e9 ).
However it's a short jmp consisting of 2 bytes only (eb ). And
ftrace_stub() is located just below the ftrace_graph_caller so
modification above breaks the instruction resulting in kernel oops on
the ftrace_stub() with the invalid opcode like below:
The patch (shown below) solved the echo issue, but I still do not understand why echo was breaking previously when write() was not.
diff --git a/arch/x86/kernel/mcount_64.S b/arch/x86/kernel/mcount_64.S
index ed48a9f465f8..e13a695c3084 100644
--- a/arch/x86/kernel/mcount_64.S
+++ b/arch/x86/kernel/mcount_64.S
## -182,7 +182,8 ## GLOBAL(ftrace_graph_call)
jmp ftrace_stub
#endif
-GLOBAL(ftrace_stub)
+/* This is weak to keep gas from relaxing the jumps */
+WEAK(ftrace_stub)
retq
END(ftrace_caller)
via https://lkml.org/lkml/2016/5/16/493
Looks like you are not the only person to notice this behavior. I see
https://lkml.org/lkml/2016/5/13/327
as a report of the problem, and
https://lkml.org/lkml/2016/5/16/493
as a patch to the kernel that addresses it. Reading through that whole thread it appears that the issue is some compiler optimizations.

ARM inline asm: exit system call with value read from memory

Problem
I want to execute the exit system call in ARM using inline assembly on a Linux Android device, and I want the exit value to be read from a location in memory.
Example
Without giving this extra argument, a macro for the call looks like:
#define ASM_EXIT() __asm__("mov %r0, #1\n\t" \
"mov %r7, #1\n\t" \
"swi #0")
This works well.
To accept an argument, I adjust it to:
#define ASM_EXIT(var) __asm__("mov %r0, %0\n\t" \
"mov %r7, #1\n\t" \
"swi #0" \
: \
: "r"(var))
and I call it using:
#define GET_STATUS() (*(int*)(some_address)) //gets an integer from an address
ASM_EXIT(GET_STATUS());
Error
invalid 'asm': operand number out of range
I can't explain why I get this error, as I use one input variable in the above snippet (%0/var). Also, I have tried with a regular variable, and still got the same error.
Extended-asm syntax requires writing %% to get a single % in the asm output. e.g. for x86:
asm("inc %eax") // bad: undeclared clobber
asm("inc %%eax" ::: "eax"); // safe but still useless :P
%r7 is treating r7 as an operand number. As commenters have pointed out, just omit the %s, because you don't need them for ARM, even with GNU as.
Unfortunately, there doesn't seem to be a way to request input operands in specific registers on ARM, the way you can for x86. (e.g. "a" constraint means eax specifically).
You can use register int var asm ("r7") to force a var to use a specific register, and then use an "r" constraint and assume it will be in that register. I'm not sure this is always safe, or a good idea, but it appears to work even after inlining. #Jeremy comments that this technique was recommended by the GCC team.
I did get some efficient code generated, which avoids wasting an instruction on a reg-reg move:
See it on the Godbolt Compiler Explorer:
__attribute__((noreturn)) static inline void ASM_EXIT(int status)
{
register int status_r0 asm ("r0") = status;
register int callno_r7 asm ("r7") = 1;
asm volatile("swi #0\n"
:
: "r" (status_r0), "r" (callno_r7)
: "memory" // any side-effects on shared memory need to be done before this, not delayed until after
);
// __builtin_unreachable(); // optionally let GCC know the inline asm doesn't "return"
}
#define GET_STATUS() (*(int*)(some_address)) //gets an integer from an address
void foo(void) { ASM_EXIT(12); }
push {r7} # # gcc is still saving r7 before use, even though it sees the "noreturn" and doesn't generate a return
movs r0, #12 # stat_r0,
movs r7, #1 # callno,
swi #0
# yes, it literally ends here, after the inlined noreturn
void bar(int status) { ASM_EXIT(status); }
push {r7} #
movs r7, #1 # callno,
swi #0 # doesn't touch r0: already there as bar()'s first arg.
Since you always want the value read from memory, you could use an "m" constraint and include a ldr in your inline asm. Then you wouldn't need the register int var asm("r0") trick to avoid a wasted mov for that operand.
The mov r7, #1 might not always be needed either, which is why I used the register asm() syntax for it, too. If gcc wants a 1 constant in a register somewhere else in a function, it can do it in r7 so it's already there for the ASM_EXIT.
Any time the first or last instructions of a GNU C inline asm statement are mov instructions, there's probably a way to remove them with better constraints.

gcc, static library, external assembly function becomes undefined symbol

I have a problem with g++ building an application which links to a static library, where the latter shall contain some global functions written in external asm-files, compiled with yasm. So in the library, I have
#ifdef __cplusplus
extern "C" {
#endif
extern void __attribute__((cdecl)) interp1( char *pSrc );
extern void __attribute__((cdecl)) interp2( char *pSrc );
#ifdef __cplusplus
}
#endif
which I reference elsewhere inside the library. Then, there is the implementation in an asm-file, like this:
section .data
; (some data)
section .text
; (some text)
global _interp1
_interp1:
; (code ...)
ret
global _interp2
_interp2:
; (code ...)
ret
Compiling and Linking work fine for the library, I do
yasm -f elf32 -O2 -o interp.o interp.asm
and then
ar -rc libInterp.a objs1.o [...] objsN.o interp.o
ranlib libInterp.a
Now finally, to link the library to the main application, I do
g++ -O4 -ffast-math -DNDEBUG -fomit-frame-pointer -DARCH_X86 -fPIC -o ../bin/interp this.o that.o -lboost_thread -lpthread ./libInterp.a
and I get the errors
undefined reference to `interp1'
undefined reference to `interp2'
What am I doing wrong here? any help is appreciated.
Depending on the target type, gcc will not prepend a leading underscore to external symbols. It appears that this is the case in your scenario.
The simple fix is probably to remove the underscores from the names in your assembly file.
A couple alternatives you might consder might be to use something like one of the following macros for your symbols in the assembly file:
from http://svn.xiph.org/trunk/oggdsf/src/lib/codecs/webm/libvpx/src/vpx_ports/x86_abi_support.asm
; sym()
; Return the proper symbol name for the target ABI.
;
; Certain ABIs, notably MS COFF and Darwin MACH-O, require that symbols
; with C linkage be prefixed with an underscore.
;
%ifidn __OUTPUT_FORMAT__,elf32
%define sym(x) x
%elifidn __OUTPUT_FORMAT__,elf64
%define sym(x) x
%elifidn __OUTPUT_FORMAT__,x64
%define sym(x) x
%else
%define sym(x) _ %+ x
%endif
from http://www.dcs.warwick.ac.uk/~peter/otherstuff.html
%macro public_c_symbol 1
GLOBAL %1,_%1
%1:
_%1:
%endmacro
public_c_symbol my_external_proc:
; ...
RET

Problems on injecting into printf using LD_PRELOAD method

I was hacking printf() of glibc in one of my project and encountered some problem. Could you please give some clues? And one of my concern is why the same solution for malloc/free works perfect!
As attached, “PrintfHank.c” contains my own solution of printf() which will be preloaded before standard library; and “main.c” just outputs a sentence using printf(). After editing two files, I issued following commands:
compile main.c
gcc –Wall –o main main.c
create my own library
gcc –Wall –fPIC –shared –o PrintfHank.so PrintfHank.c –ldl
test the new library
LD_PRELOAD=”$mypath/PrintfHank.so” $mypath/main
But I received “hello world” instead of “within my own printf” in the console. When hacking malloc/free functions, it’s okay.
I log in my system as “root” and am using 2.6.23.1-42.fc8-i686. Any comments will be highly appreciated!!
main.c
#include <stdio.h>
int main(void)
{
printf("hello world\n");
return 0;
}
PrintfHank.c
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <stdio.h>
#include <dlfcn.h>
static int (*orig_printf)(const char *format, ...) = NULL;
int printf(const char *format, ...)
{
if (orig_printf == NULL)
{
orig_printf = (int (*)(const char *format, ...))dlsym(RTLD_NEXT, "printf");
}
// TODO: print desired message from caller.
return orig_printf("within my own printf\n");
}
This question is ancient, however:
In your main.c, you've got a newline at the end and aren't using any of the formatting capability of printf.
If I look at the output of LD_DEBUG=all LD_PRELOAD=./printhack.so hello 2>&1 (I've renamed your files somewhat), then near the bottom I can see
17246: transferring control: ./hello
17246:
17246: symbol=puts; lookup in file=./hello [0]
17246: symbol=puts; lookup in file=./printhack.so [0]
17246: symbol=puts; lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
17246: binding file ./hello [0] to /lib/x86_64-linux-gnu/libc.so.6 [0]: normal symbol `puts' [GLIBC_2.2.5]
and no actual mention of printf. puts is basically printf without the formatting and with an automatic line break at the end, so this evidently the result of gcc being "helpful" by replacing the printf with a puts.
To make your example work, I removed the \n from the printf, which gives me output like:
17114: transferring control: ./hello
17114:
17114: symbol=printf; lookup in file=./hello [0]
17114: symbol=printf; lookup in file=./printhack.so [0]
17114: binding file ./hello [0] to ./printhack.so [0]: normal symbol `printf' [GLIBC_2.2.5]
Now I can see that printhack.so is indeed being dragged in with its custom printf.
Alternatively, you can define a custom puts function as well:
static int (*orig_puts)(const char *str) = NULL;
int puts(const char *str)
{
if (orig_puts == NULL)
{
orig_puts = (int (*)(const char *str))dlsym(RTLD_NEXT, "puts");
}
// TODO: print desired message from caller.
return orig_puts("within my own puts");
}
Check
1) preprocessor output. printf can be changed to smth else
gcc -E main.c
2) ld_debug info about printf symbol and preloading
LD_DEBUG=help LD_PRELOAD=”$mypath/PrintfHank.so” $mypath/main
LD_DEBUG=all LD_PRELOAD=”$mypath/PrintfHank.so” $mypath/main
Change
return orig_printf("within my own printf\n");
to
return (*orig_printf)("within my own printf\n");

Resources