Two different so dlopen & dlclose couple of times, blocked on dlopen.
Hangs on dlopen, which outputs nothing, cpuidle down to 0%, and couldn't quit via ctrl+c.
LOG_TRACE("attaching...");
handle = dlopen(plugin_path.c_str(), RTLD_LAZY);
LOG_DEBUG("dlopen called"); // this line did not output, after success couple of times;
then I use gdb attach to the procedure:
(gdb) bt
#0 0x0000002a960dbe60 in tcmalloc::ThreadCache::InitTSD () at src/thread_cache.cc:321
#1 0x0000002a960d51bf in TCMallocGuard (this=Variable "this" is not available.) at src/tcmalloc.cc:908
#2 0x0000002a960d5e00 in global constructors keyed to _ZN61FLAG__namespace_do_not_use_directly_use_DECLARE_int64_instead43FLAGS_tcmalloc_large_alloc_report_thresholdE () at src/tcmalloc.cc:935
#3 0x0000002a960fafc6 in __do_global_ctors_aux () at ./src/base/spinlock.h:54
#4 0x0000002a96010f13 in _init () from ../plugins/libmonitor.so
#5 0x0000002a00000000 in ?? ()
#6 0x000000302ad0acaf in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#7 0x000000302aff725c in dl_open_worker () from /lib64/tls/libc.so.6
#8 0x000000302ad0aa60 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9 0x000000302aff79fa in _dl_open () from /lib64/tls/libc.so.6
#10 0x000000302b201054 in dlopen_doit () from /lib64/libdl.so.2
#11 0x000000302ad0aa60 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#12 0x000000302b201552 in _dlerror_run () from /lib64/libdl.so.2
#13 0x000000302b201092 in dlopen##GLIBC_2.2.5 () from /lib64/libdl.so.2
#14 0x000000000041b559 in uap::meta::MetaManageServiceHandler::plugin_action this=0xb26000, _return=#0x7fbffff500, plugin_name=#0x7fbffff4e0, plugin_path=#0x7fbffff570, t=Variable "t" is not available.)
at /usr/lib/gcc/x86_64-redhat-linux/3.4.5/../../../../include/c++/3.4.5/bits/basic_string.h:1456
#15 0x000000000041b0bc in uap::meta::MetaManageServiceHandler::plugin_action (this=0xb26000, _return=#0x7fbffff500, plugin_name=#0x7fbffff4e0, plugin_path=#0x7fbffff570, t=uap::meta::PluginActionType::RELOAD)
at server/service_handler.cpp:173
#16 0x0000000000417641 in uap::meta::test_Service_Handler_suite_test_case_manage_service_plugin_action_Test::TestBody (this=0xb16080) at test_load.cpp:73
#17 0x00000000004446c6 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0xb16080, method={__pfn = 0x21, __delta = 0}, location=0x537f30 "the test body")
at ../../../../com/btest/gtest/src/gtest.cc:2744
#18 0x000000000042dd1c in testing::Test::Run (this=0xb16080) at ../../../../com/btest/gtest/src/gtest.cc:2766
#19 0x000000000042e8b4 in testing::TestInfo::Run (this=0xb17160) at ../../../../com/btest/gtest/src/gtest.cc:2958
#20 0x000000000042f415 in testing::TestCase::Run (this=0xb23000, runtype=0) at ../../../../com/btest/gtest/src/gtest.cc:3160
#21 0x0000000000436352 in testing::internal::UnitTestImpl::RunAllTests (this=0xb22000) at ../../../../com/btest/gtest/src/gtest.cc:5938
#22 0x0000000000434299 in testing::UnitTest::Run (this=0x6f4220, run_type=0) at ../../../../com/btest/gtest/src/gtest.cc:5449
#23 0x0000000000434268 in testing::UnitTest::Run (this=0x6f4220) at ../../../../com/btest/gtest/src/gtest.cc:5387
#24 0x0000000000455404 in main (argc=1, argv=0x7fbffff8c8) at ../../../../com/btest/gtest/src/gtest_main.cc:38
actually i have redefined the four functions:
void __attribute__((constructor)) dlinit()
{
}
void __attribute__((destructor)) dldeinit()
{
}
void _init()
{
}
void _fini()
{
}
I think I have found the root cause: in gdb info , the hang comes form tcmalloc, i have read the tcmalloc corelated code , and couple of locks, then i complie and link so without tcmalloc, nothing happens, this would be tcmalloc bugs when work with so
You should compile both your application and your plugin with gcc -Wall -g and use the debugger gdb (don't forget to compile the plugin sources also with -fPIC and to link its object files with -shared).
As you probably know, dlopen-ing a shared object will run the function having a constructor function attribute (and also, as dlopen(3) says, the obsolete _init function). Also, constructors of C++ static data have the constructor attribute.
I guess that some of these constructors is blocked somehow (perhaps on input). You could also strace your program.
There might be some other reasons for such blocking, e.g. dlopen-ing an NFS mounted file from an unresponsive NFS server, etc...
See also rtld-audit(7), ld.so(8) and LD_DEBUG environment variable (try to set it to all). Also, run ldd on both the plugin and the program.
BTW, in your code the lack of terminating newline \n in your printf format strings is suspicious (and bad taste), and you should print dlerror() when dlopen fails. At least add a call to fflush(NULL); after your code. Try to code instead:
handle = dlopen(plugin_path.c_str(), RTLD_LAZY);
if(!handle) {
printf("dlopening %s failed %s\n", plugin_path.c_str(), dlerror());
} else {
printf("dlopen %s success\n", plugin_path.c_str());
}
fflush(NULL);
You may also have corrupted your heap (elsewhere in your program) to the point that dlopen (or your plugin) cannot work anymore. Use valgrind to hunt memory corruption bugs!
Related
I'm learning how the jvm works. Now I'm trying to figure out at what point to run rt.the jar is loaded into the VM. And where in the code can I see it?
It depends on what you actually mean by 'rt.jar is loaded into the VM'. HotSpot does not load the entire rt.jar in memory. Instead, it looks for a corresponding JAR entry lazily whenever the bootstrap class loader tries to load a class. Sometimes the JVM does not even need to access jar to load a system class, e.g. when using a CDS archive. Also note, there is no longer rt.jar since JDK 9 - there are modular images instead.
A simple way to find when/where the JVM first opens rt.jar - is to run Java under a debugger and set a breakpoint at ZIP_Open.
Breakpoint 1, 0x00007ffff5632880 in ZIP_Open () from /usr/java/jdk8u275/jre/lib/amd64/libzip.so
(gdb) bt
#0 0x00007ffff5632880 in ZIP_Open () from /usr/java/jdk8u275/jre/lib/amd64/libzip.so
#1 0x00007ffff67d65cb in ClassLoader::create_class_path_entry(char const*, stat const*, bool, bool, Thread*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#2 0x00007ffff67d6ba1 in LazyClassPathEntry::open_stream(char const*, Thread*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#3 0x00007ffff67d8b99 in ClassLoader::load_classfile(Symbol*, Thread*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#4 0x00007ffff6e32e9f in SystemDictionary::load_instance_class(Symbol*, Handle, Thread*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#5 0x00007ffff6e3397e in SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#6 0x00007ffff6e34f93 in SystemDictionary::initialize_wk_klasses_until(SystemDictionary::WKID, SystemDictionary::WKID&, Thread*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#7 0x00007ffff6e35165 in SystemDictionary::initialize_preloaded_classes(Thread*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#8 0x00007ffff6e355a8 in SystemDictionary::initialize(Thread*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#9 0x00007ffff6e84928 in Universe::genesis(Thread*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#10 0x00007ffff6e8596c in universe2_init() () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#11 0x00007ffff69d5248 in init_globals() () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#12 0x00007ffff6e6a38d in Threads::create_vm(JavaVMInitArgs*, bool*) () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#13 0x00007ffff6aae50f in JNI_CreateJavaVM () from /usr/java/jdk8u275/jre/lib/amd64/server/libjvm.so
#14 0x00007ffff79aefa0 in JavaMain () from /usr/java/jdk8u275/bin/../lib/amd64/jli/libjli.so
#15 0x00007ffff7bc6e65 in start_thread () from /lib64/libpthread.so.0
#16 0x00007ffff74d388d in clone () from /lib64/libc.so.6
Here we see the exact stack trace, where the JVM first opens rt.jar. This happens during the JVM bootstrap, when initializing the system dictionary, to preload a system class.
Now it's easy to find these functions in the source code.
classLoader.cpp is a good place to start from.
I have one static library, when I'm trying to create object of class from the static library in servicemain I'm getting crash on service startup.
When I disable the call static library classes, Service works fine.
I'm using Poco library for Service Handler, in crash dump we get callstack of Poco library only and not a single trace of our static library so not able to find out the root cause. Code is working fine on Ubuntu 16 and 14.
Below is the stack trace.
#0 __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007f799927f801 in __GI_abort () at abort.c:79
#2 0x00007f799e7cc755 in Poco::SignalHandler::handleSignal(int) ()
from /lib/libPocoFoundation.so.60
#3 <signal handler called>
#4 0x00007f799bb09b40 in std::string::clear() () from /lib/libstdc++.so.6
#5 0x0000563554610794 in Poco::Path::clear (this=0x7fffabfdd3e0) at src/Path.cpp:597
#6 0x00007f799e7bc5e1 in Poco::Path::parseUnix(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /lib/libPocoFoundation.so.60
#7 0x00007f799e7bc889 in Poco::Path::assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /lib/libPocoFoundation.so.60
#8 0x00007f799e7bc916 in Poco::Path::Path(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /lib/libPocoFoundation.so.60
#9 0x00007f799e483e56 in Poco::Util::Application::getApplicationPath(Poco::Path&) const ()
from /lib/libPocoUtil.so.60
#10 0x00007f799e48565c in Poco::Util::Application::init() () from /lib/libPocoUtil.so.60
#11 0x00007f799e49708c in Poco::Util::ServerApplication::run(int, char**) ()
from /lib/libPocoUtil.so.60
#12 0x0000563554107544 in main (argc=2, argv=0x7fffabfdd958) at ../../../src/servicemain/main.cpp:22
(gdb)
Please suggest.
not a single trace of our static library
Your stack trace is suspicious: why does (almost) every frame except frame #5 come from /lib/libPocoUtil.so.60, while frame #5 comes from src/Path.cpp:597 compiled into the main executable?
I suspect that your static library for some reason provides a definition of Poco::Path::clear(), and that definition:
Is incompatible with libPocoUtil.so.60 and
Wins over the definition provided in libPocoUtil.so.60.
You'll need to figure out how src/Path.cpp ended up in your binary, and removing that will fix your crash.
Update:
There are references of Path.hpp of boost in my static library. Could that be reason ?
There could be any number of reasons. You need to investigate why src/Path.cpp is linked into the main executable (is it your code, or is it part of Poco?), and why it defines the Poco::Path::clear() method.
For a start, edit your question and show code around line 597 in src/Path.cpp.
The program crashes before main. Heres the gdb output
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff3732abb in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /opt/example/libstdc++.so.6
Missing separate debuginfos, use: debuginfo-install LynxService-0.10-10.x86_64 LynxService-0.6-6.x86_64 glibc-2.17-55.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.11.3-49.el7.x86_64 libcom_err-1.42.9-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 pcre-8.32-12.el7.x86_64 xz-libs-5.1.2-8alpha.el7.x86_64 zlib-1.2.7-13.el7.x86_64
(gdb) bt
#0 0x00007ffff3732abb in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) () from /opt/example/libstdc++.so.6
#1 0x00007ffff4214f15 in ?? () from /opt/example/libNetwork.so
#2 0x00007ffff4215ef4 in ?? () from /opt/example/libNetwork.so
#3 0x00007ffff7deb503 in ?? () from /opt/example/ld-linux-x86-64.so.2
#4 0x00007ffff7ddd45a in ?? () from /opt/example/ld-linux-x86-64.so.2
#5 0x0000000000000001 in ?? ()
#6 0x00007fffffffe139 in ?? ()
#7 0x0000000000000000 in ?? ()
On multiple instance of gdb invocation the address of the stack frames (i.e 0x00007ffff4215ef4) remains same.
The output of ** objdump -TC libNetwork.so** does not contain text 0x00007ffff4215ef4.
The program has linked to many dependencies and it is very hard to remove the library linking one by one and test it.
Since the stack frame address is same always, it looks like this should be debuggable, but does not know how.
I am trying to make our program runnable on some old Linux versions. One common import that prevents it is __longjmp_chk, added in glibc 2.11 but missing in older ones. One "solution" is to use -D_FORTIFY_SOURCE=0 but this turns off other fortify functions (__printf_chk etc) which are present in the target libc. Is there a way to make __longjmp_chk a "weak import" which would use the function from libc.so.6 if present, and fall back to local stub if not?
Is there a way to make __longjmp_chk a "weak import" which would use
the function from libc.so.6 if present, and fall back to local stub
if not?
I'd say yes, using dlsym() to check for __longjmp_chk and acting accordingly:
/* cc -ldl */
#define _GNU_SOURCE
#include <setjmp.h>
#include <stdio.h>
#include <dlfcn.h>
void __longjmp_chk(sigjmp_buf env, int val)
{
void (*p)(sigjmp_buf, int) = dlsym(RTLD_NEXT, "__longjmp_chk");
if (p)
printf("use the function from libc\n"),
p(env, val);
else
{
printf("falling back to local stub\n");
/* local stub - whatever that may be */
}
}
main()
{ // try it
sigjmp_buf env;
while (!setjmp(env)) __longjmp_chk(env, 1);
return 0;
}
I am trying to make our program runnable on some old Linux versions.
There are only a few ways to make this work, and most of them are enumerated here.
Is there a way to make __longjmp_chk a "weak import".
No.
I have a third-party library on which I did bindings and that I built for archiving using XCode. I use it in my C# Xamarin app. Nevertheless, I had a native crash that I have no way of debugging through Xamarin Studio. I tried attaching gdb to the process but I get the following warnings:
warning: Could not find object file "/var/folders/mf/w59_1t797k3cfrp7hdmncvt40000gn/T/tmp42fc77da.tmp/libCouchCocoa.a(CouchEmbeddedServer.o)" - no debug information available for "CouchEmbeddedServer.m".
warning: Could not find object file "/var/folders/mf/w59_1t797k3cfrp7hdmncvt40000gn/T/tmp42fc77da.tmp/libCouchCocoa.a(CouchTouchDBDatabase.o)" - no debug information available for "CouchTouchDBDatabase.m".
[...]
Then, when the SIGSEGV occurs, I use the bt function and I have no information on what happened in the library. I presume this is related with the warnings.
(gdb) continue
Continuing.
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000008
[Switching to process 98604 thread 0x28403]
0x0438509b in objc_msgSend ()
(gdb) bt
#0 0x0438509b in objc_msgSend ()
#1 0x112924f0 in ?? ()
#2 0x1714fdb0 in ?? ()
#3 0x17555a9c in ?? ()
#4 0x175557f6 in ?? ()
#5 0x17555200 in ?? ()
#6 0x17554c48 in ?? ()
#7 0x17554b4c in ?? ()
#8 0x17554af0 in ?? ()
#9 0x17554aac in ?? ()
#10 0x1718fb1c in ?? ()
#11 0x1718f6dc in ?? ()
#12 0x1718f5d8 in ?? ()
#13 0x0b6c0c8e in ?? ()
#14 0x000a3172 in mono_jit_runtime_invoke (method=0xca60dac, obj=0x10ec7490, params=0xb0974eec, exc=0xb0974ef4) at mini.c:5804
#15 0x0020840e in mono_runtime_invoke (method=0xca60dac, obj=0x10ec7490, params=0xb0974eec, exc=0xb0974ef4) at object.c:2790
#16 0x0020857c in mono_runtime_delegate_invoke (delegate=0x10ec7490, params=0xb0974eec, exc=0xb0974ef4) at object.c:3462
#17 0x002629b4 in mono_async_invoke [inlined] () at :626
#18 0x002629b4 in async_invoke_thread (data=0xc71f870) at threadpool.c:1443
#19 0x00268756 in start_wrapper_internal [inlined] () at :784
#20 0x00268756 in start_wrapper (data=0x1128e680) at threads.c:832
#21 0x0029a69a in thread_start_routine (args=0xfa46204) at wthreads.c:287
#22 0x00245540 in gc_start_thread (arg=0x112922a0) at sgen-gc.c:6280
#23 0x98a89ed9 in _pthread_start ()
#24 0x98a8d6de in thread_start ()
(gdb)
How should I build my third-party libraries so that gdb manages to find debug information about them?
EDIT: Using p mono_pmip, I managed to get the desymbolicated method name, but is there a way not having to do this and having the debug symbols?
You might have more luck if you use gdb on device.
This can be done using fruitstrap (note that fruitstrap is not officially supported by Xamarin - all I can say is that I've been able to use it myself occasionally).
The reason it's harder in the simulator is because we use a JIT there - this means that the mapping between memory addresses and function names / line numbers is only present in-memory, which gdb doesn't understand. When building for device we AOT everything into ARM assembly and we create proper debug information that gdb understands.