Why can the value of the symbol returned by dlsym() be null? - linux

In Linux. Per the dlsym(3) Linux man page,
*Since the value of the symbol could actually be NULL
(so that a NULL return from dlsym() need not indicate an error),*
Why is this, when can a symbol (for a function, specifically) be actually NULL? I am reviewing code and found a piece using dlerror to clean first, dlsym next, and dlerror to check for errors. But it does not check the resulting function from being null before calling it:
dlerror();
a_func_name = ...dlsym(...);
if (dlerror()) goto end;
a_func_name(...); // Never checked if a_func_name == NULL;
I am just a reviewer so don't have the option to just add the check. And perhaps the author knows NULL can never be returned. My job is to challenge that but don't know what could make this return a valid NULL so I can then check if such a condition could be met in this code's context. Have not found the right thing to read with Google, a pointer to good documentation would be enough unless you want to explain explicitly which would be great.

I know of one particular case where the symbol value returned by dlsym() can be NULL, which is when using GNU indirection functions (IFUNCs). However, there are presumably other cases, since the text in the dlsym(3) manual page pre-dates the invention of IFUNCs.
Here's an example using IFUNCs. First, a file that will be used to create a shared library:
$ cat foo.c
/* foo.c */
#include <stdio.h>
/* This is a 'GNU indirect function' (IFUNC) that will be called by
dlsym() to resolve the symbol "foo" to an address. Typically, such
a function would return the address of an actual function, but it
can also just return NULL. For some background on IFUNCs, see
https://willnewton.name/uncategorized/using-gnu-indirect-functions/ */
asm (".type foo, #gnu_indirect_function");
void *
foo(void)
{
fprintf(stderr, "foo called\n");
return NULL;
}
Now the main program, which will look up the symbol foo in the shared library:
$ cat main.c
/* main.c */
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
void *handle;
void (*funcp)(void);
handle = dlopen("./foo.so", RTLD_LAZY);
if (handle == NULL) {
fprintf(stderr, "dlopen: %s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any outstanding error */
funcp = dlsym(handle, "foo");
printf("Results after dlsym(): funcp = %p; dlerror = %s\n",
(void *) funcp, dlerror());
exit(EXIT_SUCCESS);
}
Now build and run to see a case where dlsym() returns NULL, while dlerror() also returns NULL:
$ cc -Wall -fPIC -shared -o libfoo.so foo.c
$ cc -Wall -o main main.c libfoo.so -ldl
$ LD_LIBRARY_PATH=. ./main
foo called
Results after dlsym(): funcp = (nil); dlerror = (null)

Well, if it's returned with no errors, then pointer is valid and NULL is about as illegal as any random pointer from the shared object. Like the wrong function, data or whatever.

It can't be if the library/PIE is a product of normal C compilation, as C won't ever put a global object at the NULL address, but you can get a symbol to resolve to NULL using special linker tricks:
null.c:
#include <stdio.h>
extern char null_addressed_char;
int main(void)
{
printf("&null_addressed_char=%p\n", &null_addressed_char);
}
Compile, link, and run:
$ clang null.c -Xlinker --defsym -Xlinker null_addressed_char=0 && ./a.out
&null_addressed_char=(nil)
If you don't allow any such weirdness, you can treat NULL returns from dlsym as errors.

dlerror() returns the last error, not the status of the last call. So if nothing else the code you show may potentially get a valid result from dlsym() and fool itself into thinking there was an error (because there was still one in the queue). The purpose behind dlerror is to provide human-readable error messages. If you aren't printing the result, you are using it wrong.

Related

how to prevent some values from being optimized out in linux kernel debugging? [duplicate]

This question already has answers here:
Is there a way to tell GCC not to optimise a particular piece of code?
(3 answers)
Closed last year.
This is a code in linux (5.4.21)
When I use a virtual machine and connect gdb to the linux process, I can use break points and follow code. For example, I set breakpoint on a function arm_smmu_device_probe. When I follow with 'next' command, I see some values, for example, 'smmu' or 'dev' below are shown to have been optimized out. How can I make them not optimized out so that I can see them in gdb?
static int arm_smmu_device_probe(struct platform_device *pdev)
{
int irq, ret;
struct resource *res;
resource_size_t ioaddr;
struct arm_smmu_device *smmu;
struct device *dev = &pdev->dev;
bool bypass;
smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
if (!smmu) {
dev_err(dev, "failed to allocate arm_smmu_device\n");
return -ENOMEM;
}
smmu->dev = dev;
if (dev->of_node) {
ret = arm_smmu_device_dt_probe(pdev, smmu);
} else {
ret = arm_smmu_device_acpi_probe(pdev, smmu);
if (ret == -ENODEV)
return ret;
}
I tried chaning -O2 to -Og in the top Makefile but the kernel build fails then.
Recently I found how to do this. (from Is there a way to tell GCC not to optimise a particular piece of code?, flolo's answer)
If you want a function aaa(...) not to be optimzed, you can do it like this.
#pragma GCC push_options
#pragma GCC optimize ("O0")
aaa ( ... )
{
function body
}
#prgma GCC pop_options
In some cases, this putting #pragma causes some discrepancy between the #include header file and the function source. So in this case (not often) you need to add this #praga around the corresponding #include statement. If linux/bbb.h causes this kind of problem, do this.
#pragma GCC push_options
#pragma GCC optimize ("O0")
#include <linux/bbb.h>
#pragma GCC pop_options
This works sure and I'm enjoying(?) debug/analysis this way.

c++ ~ shared object -> get host application offsets

Im writing a shared library for a FreeBSD application.
This library gets loaded by LD_PRELOAD.
This application has multiple compile-versions, so some function offsets might change and my library wont work there.
Now i want to read the offsets at loading the library.
The offsets are changing, so i think my only way is to read the offsets of specific function names.
The offsets are simply the offsets of functions or labels.
Now the problem - how to do it?
Example
In the first version, i call the main version like that:
int(*main)(int argc, char *argv[])=(int(*)(int,char*[]))0x081F3XXX;
but in the second, the offset has changed:
int(*main)(int argc, char *argv[])=(int(*)(int,char*[]))0x08233XXX;
Programmers (me) are lazy and don't want to compile their libs for every version.. I want to create a lib, that is for every version!
I simply need the offsets of the functions via function name, the rest is no problem..
Thats how i call the library:
LD_PRELOAD="/path/to/library.so" ./executable
or
env LD_PRELOAD="/path/to/library.so" ./executable
Edit with test code
Here my testcode regarding to the comments:
Main.cpp:
#include <stdio.h>
void test() {
printf("Test done.\n");
}
int main(int argc, char * argv[]) {
printf("Program started\n");
test();
}
lib.cpp
#include <stdio.h>
#include <dlfcn.h>
void __attribute__ ((constructor)) my_load(void);
void my_load(void) {
printf("Library loaded\n");
printf("test - offset: 0x%x\n",dlsym(NULL,"test"));
}
test.sh
g++ main.cpp -o program
g++ -shared lib.cpp -o lib.so
env LD_PRELOAD="lib.so" ./program
-> Result:
Library loaded
test - offset: 0x0
Program started
Test done.
Does not seem as would it work :s
Edit 15:45
printf("test - offset: 0x%x\n",dlsym(dlopen("/home/test/test_proc/program",RTLD_GLOBAL),"test"));
This also does not work.. Maybe dlsym is the wrong way?
I reproduced your program on Mac OS X using Clang, and found a solution. First, the boring parts:
To make it compile cleanly I had to change your %x format specifier to %p for the pointer.
Then, on Mac OS X I had to pass RTLD_MAIN_ONLY as the first argument to dlsym(). I guess this is platform-dependent; on Linux it does seem to be NULL as you have.
Now, the meat of the fix!
You're searching with dlsym() for a symbol called test. But there is no such symbol in your application. Why? Because you're using C++, and C++ does "name mangling." You could use any number of tools to figure out the mangled name and try to load that with dlsym(), but it could change with different compilers. So instead, just inhibit name mangling by enclosing your test() function in extern "C":
extern "C" {
void test() {
printf("Test done.\n");
}
}
This fixed it for me:
$ DYLD_INSERT_LIBRARIES=lib.so ./program
Library loaded
test - offset: 0x1027d1eb0
Program started
Test done.

context sharing in FreeGLUT under Linux with xorg

I am trying to use OpenGL with shared context (because of sharing textures between windows) via FreeGLUT library... It work fine, I can share textures, but i failed on the end of program or during windows closing by mouse...
I have cerated the code which emulate the problem: (http://pastie.org/9437038)
// file: main.c
// compile: gcc -o test -lglut main.c
// compile: gcc -o test -lglut -DTIME_LIMIT main.c
#include "GL/freeglut.h"
#include <unistd.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
int winA, winB, winC;
int n;
glutInit(&argc, argv);
glutSetOption(GLUT_ACTION_ON_WINDOW_CLOSE , GLUT_ACTION_CONTINUE_EXECUTION);
//glutSetOption(GLUT_RENDERING_CONTEXT, GLUT_USE_CURRENT_CONTEXT);
glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB);
winA = glutCreateWindow("Test A");
glutSetOption(GLUT_RENDERING_CONTEXT, GLUT_USE_CURRENT_CONTEXT);
winB = glutCreateWindow("Test B");
winC = glutCreateWindow("Test C");
printf("loop\n");
#ifdef TIME_LIMIT
for (n=0;n<50;n++)
{
glutMainLoopEvent();
usleep(5000);
}
#else //TIMELIMIT
glutMainLoop();
#endif // TIME_LIMIT
printf("Destroy winC\n");
glutDestroyWindow(winC);
printf("Destroy winB\n");
glutDestroyWindow(winB);
printf("Destroy winA\n");
glutDestroyWindow(winA);
printf("Normal end\n");
return 0;
}
Output:
loop
X Error of failed request: GLXBadContext
Major opcode of failed request: 153 (GLX)
Minor opcode of failed request: 4 (X_GLXDestroyContext)
Serial number of failed request: 113
Current serial number in output stream: 114
Segmentation fault
output with TIME_LIMIT:
loop
Destroy winC
Destroy winB
Destroy winA
Segmentation fault
Without calling glutSetOption(GLUT_RENDERING_CONTEXT, GLUT_USE_CURRENT_CONTEXT);, it works well.
Do anybody have idea what am I doing bad?
The option GLUT_USE_CURRENT_CONTEXT does not create shared contexts. It just means that the same GL context is used for all windows. You only have one GL conxtext, and destroy it when you first destroy a window which uses that, so the other destruction calls fail. None of the GLUT implementations I'm aware of actually supports GL context sharing.
GLUT_USE_CURRENT_CONTEXT is more like a hack (and it is nor part of the GLUT specification anyway), and not really a well-implemented. It could use some reference counting to destroy the context not before the last window using it is destroyed, but that is simply not the case.

shm_open() fails with EINVAL when creating shared memory in subdirectory of /dev/shm

I have a GNU/Linux application with uses a number of shared memory objects. It could, potentially, be run a number of times on the same system. To keep things tidy, I first create a directory in /dev/shm for each of the set of shared memory objects.
The problem is that on newer GNU/Linux distributions, I no longer seem to be able create these in a sub-directory of /dev/shm.
The following is a minimal C program with illustrates what I'm talking about:
/*****************************************************************************
* shm_minimal.c
*
* Test shm_open()
*
* Expect to create shared memory file in:
* /dev/shm/
* └── my_dir
*    └── shm_name
*
* NOTE: Only visible on filesystem during execution. I try to be nice, and
* clean up after myself.
*
* Compile with:
* $ gcc -lrt shm_minimal.c -o shm_minimal
*
******************************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
int main(int argc, const char* argv[]) {
int shm_fd = -1;
char* shm_dir = "/dev/shm/my_dir";
char* shm_file = "/my_dir/shm_name"; /* does NOT work */
//char* shm_file = "/my_dir_shm_name"; /* works */
// Create directory in /dev/shm
mkdir(shm_dir, 0777);
// make shared memory segment
shm_fd = shm_open(shm_file, O_RDWR | O_CREAT, 0600);
if (-1 == shm_fd) {
switch (errno) {
case EINVAL:
/* Confirmed on:
* kernel v3.14, GNU libc v2.19 (ArchLinux)
* kernel v3.13, GNU libc v2.19 (Ubuntu 14.04 Beta 2)
*/
perror("FAIL - EINVAL");
return 1;
default:
printf("Some other problem not being tested\n");
return 2;
}
} else {
/* Confirmed on:
* kernel v3.8, GNU libc v2.17 (Mint 15)
* kernel v3.2, GNU libc v2.15 (Xubuntu 12.04 LTS)
* kernel v3.1, GNU libc v2.13 (Debian 6.0)
* kernel v2.6.32, GNU libc v2.12 (RHEL 6.4)
*/
printf("Success !!!\n");
}
// clean up
close(shm_fd);
shm_unlink(shm_file);
rmdir(shm_dir);
return 0;
}
/* vi: set ts=2 sw=2 ai expandtab:
*/
When I run this program on a fairly new distribution, the call to shm_open() returns -1, and errno is set to EINVAL. However, when I run on something a little older, it creates the shared memory object in /dev/shm/my_dir as expected.
For the larger application, the solution is simple. I can use a common prefix instead of a directory.
If you could help enlighten me to this apparent change in behavior it would be very helpful. I suspect someone else out there might be trying to do something similar.
So it turns out the issue stems from how GNU libc validates the shared memory name. Specifically, the shared memory object MUST now be at the root of the shmfs mount point.
This was changed in glibc git commit b20de2c3d9 as the result of bug BZ #16274.
Specifically, the change is the line:
if (name[0] == '\0' || namelen > NAME_MAX || strchr (name, '/') != NULL)
Which now disallows '/' from anywhere in the filename (not counting leading '/')
If you have a third party tool that was broken by this shm_open change, a brilliant coworker found a workaround : preload a library that overrides the shm_open call and swaps slashes for underscores. It does the same for shm_unlink as well, so the application can properly free shared memory when needed.
deslash_shm.cc :
#include <dlfcn.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <algorithm>
#include <string>
// function used in place of the standard shm_open() function
extern "C" int shm_open(const char *name, int oflag, mode_t mode)
{
// keep a function pointer to the real shm_open() function
static int (*real_open)(const char *, int, mode_t) = NULL;
// the first time in, ask the dynamic linker to find the real shm_open() function
if (!real_open) real_open = (int (*)(const char *, int, mode_t)) dlsym(RTLD_NEXT,"shm_open");
// take the name we were given and replace all slashes with underscores instead
std::string n = name;
std::replace(n.begin(), n.end(), '/', '_');
// call the real open function with the patched path name
return real_open(n.c_str(), oflag, mode);
}
// function used in place of the standard shm_unlink() function
extern "C" int shm_unlink(const char *name)
{
// keep a function pointer to the real shm_unlink() function
static int (*real_unlink)(const char *) = NULL;
// the first time in, ask the dynamic linker to find the real shm_unlink() function
if (!real_unlink) real_unlink = (int (*)(const char *)) dlsym(RTLD_NEXT, "shm_unlink");
// take the name we were given and replace all slashes with underscores instead
std::string n = name;
std::replace(n.begin(), n.end(), '/', '_');
// call the real unlink function with the patched path name
return real_unlink(n.c_str());
}
To compile this file:
c++ -fPIC -shared -o deslash_shm.so deslash_shm.cc -ldl
And preload it before starting a process that tries to use non-standard slash characters in shm_open:
in bash:
export LD_PRELOAD=/path/to/deslash_shm.so
in tcsh:
setenv LD_PRELOAD /path/to/deslash_shm.so

can a program read its own elf section?

I would like to use ld's --build-id option in order to add build information to my binary. However, I'm not sure how to make this information available inside the program. Assume I want to write a program that writes a backtrace every time an exception occurs, and a script that parses this information. The script reads the symbol table of the program and searches for the addresses printed in the backtrace (I'm forced to use such a script because the program is statically linked and backtrace_symbols is not working). In order for the script to work correctly I need to match build version of the program with the build version of the program which created the backtrace. How can I print the build version of the program (located in the .note.gnu.build-id elf section) from the program itself?
How can I print the build version of the program (located in the .note.gnu.build-id elf section) from the program itself?
You need to read the ElfW(Ehdr) (at the beginning of the file) to find program headers in your binary (.e_phoff and .e_phnum will tell you where program headers are, and how many of them to read).
You then read program headers, until you find PT_NOTE segment of your program. That segment will tell you offset to the beginning of all the notes in your binary.
You then need to read the ElfW(Nhdr) and skip the rest of the note (total size of the note is sizeof(Nhdr) + .n_namesz + .n_descsz, properly aligned), until you find a note with .n_type == NT_GNU_BUILD_ID.
Once you find NT_GNU_BUILD_ID note, skip past its .n_namesz, and read the .n_descsz bytes to read the actual build-id.
You can verify that you are reading the right data by comparing what you read with the output of readelf -n a.out.
P.S.
If you are going to go through the trouble to decode build-id as above, and if your executable is not stripped, it may be better for you to just decode and print symbol names instead (i.e. to replicate what backtrace_symbols does) -- it's actually easier to do than decoding ELF notes, because the symbol table contains fixed-sized entries.
Basically, this is the code I've written based on answer given to my question. In order to compile the code I had to make some changes and I hope it will work for as many types of platforms as possible. However, it was tested only on one build machine. One of the assumptions I used was that the program was built on the machine which runs it so no point in checking endianness compatibility between the program and the machine.
user#:~/$ uname -s -r -m -o
Linux 3.2.0-45-generic x86_64 GNU/Linux
user#:~/$ g++ test.cpp -o test
user#:~/$ readelf -n test | grep Build
Build ID: dc5c4682e0282e2bd8bc2d3b61cfe35826aa34fc
user#:~/$ ./test
Build ID: dc5c4682e0282e2bd8bc2d3b61cfe35826aa34fc
#include <elf.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#if __x86_64__
# define ElfW(type) Elf64_##type
#else
# define ElfW(type) Elf32_##type
#endif
/*
detecting build id of a program from its note section
http://stackoverflow.com/questions/17637745/can-a-program-read-its-own-elf-section
http://www.scs.stanford.edu/histar/src/pkg/uclibc/utils/readelf.c
http://www.sco.com/developers/gabi/2000-07-17/ch5.pheader.html#note_section
*/
int main (int argc, char* argv[])
{
char *thefilename = argv[0];
FILE *thefile;
struct stat statbuf;
ElfW(Ehdr) *ehdr = 0;
ElfW(Phdr) *phdr = 0;
ElfW(Nhdr) *nhdr = 0;
if (!(thefile = fopen(thefilename, "r"))) {
perror(thefilename);
exit(EXIT_FAILURE);
}
if (fstat(fileno(thefile), &statbuf) < 0) {
perror(thefilename);
exit(EXIT_FAILURE);
}
ehdr = (ElfW(Ehdr) *)mmap(0, statbuf.st_size,
PROT_READ|PROT_WRITE, MAP_PRIVATE, fileno(thefile), 0);
phdr = (ElfW(Phdr) *)(ehdr->e_phoff + (size_t)ehdr);
while (phdr->p_type != PT_NOTE)
{
++phdr;
}
nhdr = (ElfW(Nhdr) *)(phdr->p_offset + (size_t)ehdr);
while (nhdr->n_type != NT_GNU_BUILD_ID)
{
nhdr = (ElfW(Nhdr) *)((size_t)nhdr + sizeof(ElfW(Nhdr)) + nhdr->n_namesz + nhdr->n_descsz);
}
unsigned char * build_id = (unsigned char *)malloc(nhdr->n_descsz);
memcpy(build_id, (void *)((size_t)nhdr + sizeof(ElfW(Nhdr)) + nhdr->n_namesz), nhdr->n_descsz);
printf(" Build ID: ");
for (int i = 0 ; i < nhdr->n_descsz ; ++i)
{
printf("%02x",build_id[i]);
}
free(build_id);
printf("\n");
return 0;
}
Yes, a program can read its own .note.gnu.build-id. The important piece is the dl_iterate_phdr function.
I've used this technique in Mesa (the OpenGL/Vulkan implementation) to read its own build-id for use with the on-disk shader cache.
I've extracted those bits into a separate project[1] for easy use by others.
[1] https://github.com/mattst88/build-id

Resources