ncurses-rs segfaults when using get_wch() - rust

I'm writing a shell history app that uses the ncurses-rs crate, and I came across an issue with deleting entries from history - simply, it was undoable without native readline bindings. My program reads from a pipe (like: histroy | hstr-rs) which makes it impossible to manipulate history files by hand. Hence, I introduced another [unmaintained] crate called rl-sys, made a small change to it (wrapped a single statement in an unsafe block) because it wouldn't compile. This change made the app compile, however, this also caused my program to segfault on start. I reached for gdb to see what's happening, and this is what I was able to find out:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f47821 in doupdate_sp () from /lib/x86_64-linux-gnu/libncursesw.so.6
(gdb) bt
#0 0x00007ffff7f47821 in doupdate_sp () from /lib/x86_64-linux-gnu/libncursesw.so.6
#1 0x00007ffff7f37db8 in wrefresh () from /lib/x86_64-linux-gnu/libncursesw.so.6
#2 0x00007ffff7f307da in ?? () from /lib/x86_64-linux-gnu/libncursesw.so.6
#3 0x00007ffff7f49c64 in wget_wch () from /lib/x86_64-linux-gnu/libncursesw.so.6
#4 0x000055555561a604 in ncurses::get_wch ()
#5 0x00005555555935f6 in hstr_rs::main ()
#6 0x0000555555586c73 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#7 0x0000555555586c8c in std::rt::lang_start::{{closure}} ()
#8 0x0000555555631ae7 in core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once () at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:259
#9 std::panicking::try::do_call () at library/std/src/panicking.rs:381
#10 std::panicking::try () at library/std/src/panicking.rs:345
#11 std::panic::catch_unwind () at library/std/src/panic.rs:382
#12 std::rt::lang_start_internal () at library/std/src/rt.rs:51
#13 0x0000555555595cf2 in main ()
(gdb)
Clearly, it segfaults on get_wch(). Just to make sure that's the cause of the issue, I tried using ordinary getch() and the segfault was gone. However, I need the wide version of the function.
I've talked to some people who advised me to ditch the ncurses-rs crate in favor of some other crate because ncurses-rs is "insanely unsafe", despite its functions not being marked unsafe. Easier said than done. Before ditching it and rewriting the app to use a safer alternative, I want to understand what's going on and try to fix the issue if possible because a rewrite would require a significant time investment that I can not afford at the moment.
I'm still inexperienced but a Valgrind log I obtained does not look good to me.

Related

How to access data written in a socket without reading it

I'm working on an embedded system having Linux.
A client thread is writing some data in the socket but what server thread is reading on the other side isn't the same as it was written. Which is causing the thread (and parent process) to crash.
I'm new to networking and Linux.
I have dumped every piece of data which is being written, it's all fine.
The function trace in gdb shows the following information.
(gdb)
#0 0x00007f62be8e8670 in getenv () from /lib/libc.so.6
#1 0x00007f62be92057a in __libc_message () from /lib/libc.so.6
#2 0x00007f62be99f927 in __fortify_fail () from /lib/libc.so.6
#3 0x00007f62be99f8f0 in __stack_chk_fail () from /lib/libc.so.6
#4 0x0000000000406471 in reading (sockFd=15) at __line_number_in_the_program__
#5 0x793bcf318b18bb01 in ?? ()
#6 0x117d0300942ff567 in ?? ()
#7 0x0000000100000000 in ?? ()
..
..
..
It goes till #785 with some [random] address.
reading() is the function which processes the read data in the server thread.
I suspect there something going wrong inside the socket.
Is there any way to see the data which is in the sockets(client/server) buffer without reading it?
Or any other way to debug it further with gdb?
There are already some checks to handle the read data properly but those are also not helping.
You have a stack buffer overflow problem. If you have never heard of the GCC stack protector, now is the time to look it up. Whilst Wireshark is the obvious tool for looking at data in flight, this is not the locus of your problem. Your server should be proof against any and all malicious data read from the network. This is basic good server design and implementation practice. You have a stack buffer overflow problem.

Backtrace for exited thread.

I noticed that one thread's backtrace looks like:
Thread 8 (Thread 0x7f385f185700 (LWP 12861)):
#0 0x00007f38655a3cf4 in __mcount_internal (frompc=4287758, selfpc=4287663) at mcount.c:72
#1 0x00007f38655a4ac4 in mcount () at ../sysdeps/x86_64/_mcount.S:47
#2 0x0000000000000005 in ?? ()
#3 0x00007f382c02ece0 in ?? ()
#4 0x000000000000002d in ?? ()
#5 0x000000000000ffff in ?? ()
#6 0x0000000000000005 in ?? ()
#7 0x0000000000000005 in ?? ()
#8 0x0000000000000000 in ?? ()
It seems to be exited thread but I am not sure.
I would like know how to understand it. Especially, I don't understand what does it mean LWP and Thread 0x7f385f185700 ( what is that address)?
Moreover, I noticed that profiler indicates that __mcount_internal takes relatively a lot of time. What is it and why it is time-consuming? Especially, what are frompc and selfpc counters?
The my kernel is Linux 4.4.0 and thread are compatible with POSIX ( C++11 implementaion).
LWP = Light Weight Process, and means thread. Linux threads each have their own thread-ID, from the same sequence as PID numbers, even though they're not a separate process. If you look in /proc/PID/task for a multi-threaded process, you will see the entries for each thread ID.
0x7f385f185700 is the pthread ID, as from pthread_self(3). This is a pointer to a pthread_t.
This thread is stopped at RIP = 0x00007f38655a3cf4, the address in frame #0.
frompc and selfpc are function arguments to the __mcount_internal() glibc function.
Your backtrace can show names and args them because you have debug symbols installed for glibc. You just get ?? for the parent functions because you don't have debug info installed for the program or library containing them. (Compile your own program with -g, and install packages like qtbase5-dbg or libglib2.0-0-dbg to get debug symbols for libraries packed by your distro).
mcount seems to be related to profiling (i.e. code generated by -fprofile-generate or -pg). That would explain why it takes Program Counter values as args.
Why do applications compiled by GCC always contain the _mcount symbol?
That thread has not exited. You wouldn't see as many details if it had. (And probably wouldn't see it at all.)

How do I view the crash reason in a core dump?

I'm trying to analyze the core dump of one of my applications, but I'm not able to find the reason for the crash.
When I run gdb binary file corefile I see the following output:
Program terminated with signal SIGKILL, Killed.
#0 0xfedcdf74 in _so_accept () from /usr/lib/libc.so.1
(gdb)
But I am pretty sure that no one has executed kill -9 <pid>. With info thread, I can see all the threads launched by the application, but I can see nothing special about any thread.
By running bt full or maint info sol-threads I don't find anything that leads to the bug. I just see the stack trace for each thread without any information about the bug.
Finally I've found a thread which causes the kill signal.
#0 0xfedcebd4 in _lwp_kill () from /usr/lib/libc.so.1
#1 0xfed67bb8 in raise () from /usr/lib/libc.so.1
#2 0xfed429f8 in abort () from /usr/lib/libc.so.1
#3 0xff0684a8 in __cxxabiv1::__terminate(void (*)()) () from /usr/local/lib/libstdc++.so.5
#4 0xff0684f4 in std::terminate() () from /usr/local/lib/libstdc++.so.5
#5 0xff068dd8 in __cxa_pure_virtual () from /usr/local/lib/libstdc++.so.5
#6 0x00017f40 in A::method (this=0x2538d8) at A.cc:351
Class A inherits of an abstact class and in the line 351 a virtual function declared in the abstract class and defined in A is called. I donĀ“t understand why if object A exists the call to the virtual base function crashes.
That SIGKILL could be caused by your app exceeding some resource limit. Try to get the system log and see if there are any resource limit exceeded messages.
References
Solaris Administration Guide: Resource Controls

How to know who started a thread

I am trying to debug a crash in gdb where is core dumped on this thread. There is other 40+ threads going on at the same time. How do I figure out where this thread 42 is started from?
Also, why the last line (frame #0) is not showing up?
Thread 42 (Thread 0x2aaba65ce940 (LWP 15854)):
#0 0x0000003a95605b03 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#1 0x0000003a9560684b in start_thread () from /lib64/libpthread.so.0
#2 0x0000003a946d526d in clone () from /lib64/libc.so.6
#3 0x0000000000000000 in ?? ()
I am using gdb version 7.7
How do I figure out where this thread 42 is started from?
You can't: neither GDB, nor the OS keeps track of "who started this thread". (It is also often quite useless to know where a particular thread was created).
What you could do is either put instrumentation into your own calls to pthread_create and log "thread X created thread Y", or use catch syscall clone, and print creation stack traces in GDB, then match them later to the crashed thread (match its LWP to the return value of clone earler).
Also, why the last line (frame #0) is not showing up?
You mean frame #3. It doesn't exist -- clone is where the thread is borne (comes to existence).
P.S. Installing libc debug symbols so you can see where inside __nptl_deallocate_tsd the thread crashed is more likely to provide clues than knowing thread creation details.

Core data + NSOperationQueue

i have a subclass of NSOperation which have managedObject as one of its property.
I need to add multiple operation to nsoperationqueue & observe their finish.
For each NSOperation instance i created a new managed object as Apple documentation state "Create a separate managed object context for each thread and share a single persistent store coordinator.".
Once the first operation is finished i get following crash log
#0 0x34970c98 in objc_msgSend ()
#1 0x3608704e in -[_PFArray dealloc] ()
#2 0x36084b80 in -[_PFArray release] ()
#3 0x3179b1a0 in CFRelease ()
#4 0x3179deba in _CFAutoreleasePoolPop ()
#5 0x30d7bbb4 in NSPopAutoreleasePool ()
#6 0x30d91e1c in -[__NSOperationInternal start] ()
#7 0x30d91a7e in -[NSOperation start] ()
#8 0x30df7eca in ____startOperations_block_invoke_2 ()
#9 0x33a248e6 in _dispatch_call_block_and_release ()
#10 0x33a1a532 in _dispatch_worker_thread2 ()
#11 0x368bf590 in _pthread_wqthread ()
#12 0x368bfbc4 in start_wqthread ()
From logs it seems some object is getting over-release.How i can get which object is over-release?
App is run with NSZombieEnabled but only above info is received.
do the NsOperation maintain its own autorelease pool?
Here are your clues:
#7 0x30d91a7e in -[NSOperation start] ()
It is something you are releasing in your operation.
#5 0x30d7bbb4 in NSPopAutoreleasePool ()
It is an object that is autoreleased. That does not necessarily mean you have written the autorelease method call. Objects that are created by convenience methods like [NSString stringWithFormat:...] are autoreleased before they are returned. So look for either a place in your operation code where you are calling autorelease or a place where you are creating an object without the alloc-init pattern.
Yes, an NSOperation maintains its own autorelease pool. You shouldn't have to worry about that. If you only release objects when you are done with them, and only autorelease objects when you will be done with them by the end of the method scope (or calling method scope if returning them) you should be fine.
#2 0x36084b80 in -[_PFArray release] ()
It is an object stored in an array (not the array itself) that is being overreleased.
An error of that kind means you are either releasing or autoreleasing an object when you shouldn't, or you are not retaining an object when you should. It could be either an incorrect release or an incorrect autorelease, even though the error occurs with the autorelease pool. The autorelease could be correct and the release could be incorrect. Either way the error will happen when the autorelease pool is drained because that happens later.

Resources