i have a subclass of NSOperation which have managedObject as one of its property.
I need to add multiple operation to nsoperationqueue & observe their finish.
For each NSOperation instance i created a new managed object as Apple documentation state "Create a separate managed object context for each thread and share a single persistent store coordinator.".
Once the first operation is finished i get following crash log
#0 0x34970c98 in objc_msgSend ()
#1 0x3608704e in -[_PFArray dealloc] ()
#2 0x36084b80 in -[_PFArray release] ()
#3 0x3179b1a0 in CFRelease ()
#4 0x3179deba in _CFAutoreleasePoolPop ()
#5 0x30d7bbb4 in NSPopAutoreleasePool ()
#6 0x30d91e1c in -[__NSOperationInternal start] ()
#7 0x30d91a7e in -[NSOperation start] ()
#8 0x30df7eca in ____startOperations_block_invoke_2 ()
#9 0x33a248e6 in _dispatch_call_block_and_release ()
#10 0x33a1a532 in _dispatch_worker_thread2 ()
#11 0x368bf590 in _pthread_wqthread ()
#12 0x368bfbc4 in start_wqthread ()
From logs it seems some object is getting over-release.How i can get which object is over-release?
App is run with NSZombieEnabled but only above info is received.
do the NsOperation maintain its own autorelease pool?
Here are your clues:
#7 0x30d91a7e in -[NSOperation start] ()
It is something you are releasing in your operation.
#5 0x30d7bbb4 in NSPopAutoreleasePool ()
It is an object that is autoreleased. That does not necessarily mean you have written the autorelease method call. Objects that are created by convenience methods like [NSString stringWithFormat:...] are autoreleased before they are returned. So look for either a place in your operation code where you are calling autorelease or a place where you are creating an object without the alloc-init pattern.
Yes, an NSOperation maintains its own autorelease pool. You shouldn't have to worry about that. If you only release objects when you are done with them, and only autorelease objects when you will be done with them by the end of the method scope (or calling method scope if returning them) you should be fine.
#2 0x36084b80 in -[_PFArray release] ()
It is an object stored in an array (not the array itself) that is being overreleased.
An error of that kind means you are either releasing or autoreleasing an object when you shouldn't, or you are not retaining an object when you should. It could be either an incorrect release or an incorrect autorelease, even though the error occurs with the autorelease pool. The autorelease could be correct and the release could be incorrect. Either way the error will happen when the autorelease pool is drained because that happens later.
Related
I'm writing a shell history app that uses the ncurses-rs crate, and I came across an issue with deleting entries from history - simply, it was undoable without native readline bindings. My program reads from a pipe (like: histroy | hstr-rs) which makes it impossible to manipulate history files by hand. Hence, I introduced another [unmaintained] crate called rl-sys, made a small change to it (wrapped a single statement in an unsafe block) because it wouldn't compile. This change made the app compile, however, this also caused my program to segfault on start. I reached for gdb to see what's happening, and this is what I was able to find out:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f47821 in doupdate_sp () from /lib/x86_64-linux-gnu/libncursesw.so.6
(gdb) bt
#0 0x00007ffff7f47821 in doupdate_sp () from /lib/x86_64-linux-gnu/libncursesw.so.6
#1 0x00007ffff7f37db8 in wrefresh () from /lib/x86_64-linux-gnu/libncursesw.so.6
#2 0x00007ffff7f307da in ?? () from /lib/x86_64-linux-gnu/libncursesw.so.6
#3 0x00007ffff7f49c64 in wget_wch () from /lib/x86_64-linux-gnu/libncursesw.so.6
#4 0x000055555561a604 in ncurses::get_wch ()
#5 0x00005555555935f6 in hstr_rs::main ()
#6 0x0000555555586c73 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#7 0x0000555555586c8c in std::rt::lang_start::{{closure}} ()
#8 0x0000555555631ae7 in core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once () at /rustc/7eac88abb2e57e752f3302f02be5f3ce3d7adfb4/library/core/src/ops/function.rs:259
#9 std::panicking::try::do_call () at library/std/src/panicking.rs:381
#10 std::panicking::try () at library/std/src/panicking.rs:345
#11 std::panic::catch_unwind () at library/std/src/panic.rs:382
#12 std::rt::lang_start_internal () at library/std/src/rt.rs:51
#13 0x0000555555595cf2 in main ()
(gdb)
Clearly, it segfaults on get_wch(). Just to make sure that's the cause of the issue, I tried using ordinary getch() and the segfault was gone. However, I need the wide version of the function.
I've talked to some people who advised me to ditch the ncurses-rs crate in favor of some other crate because ncurses-rs is "insanely unsafe", despite its functions not being marked unsafe. Easier said than done. Before ditching it and rewriting the app to use a safer alternative, I want to understand what's going on and try to fix the issue if possible because a rewrite would require a significant time investment that I can not afford at the moment.
I'm still inexperienced but a Valgrind log I obtained does not look good to me.
I noticed that one thread's backtrace looks like:
Thread 8 (Thread 0x7f385f185700 (LWP 12861)):
#0 0x00007f38655a3cf4 in __mcount_internal (frompc=4287758, selfpc=4287663) at mcount.c:72
#1 0x00007f38655a4ac4 in mcount () at ../sysdeps/x86_64/_mcount.S:47
#2 0x0000000000000005 in ?? ()
#3 0x00007f382c02ece0 in ?? ()
#4 0x000000000000002d in ?? ()
#5 0x000000000000ffff in ?? ()
#6 0x0000000000000005 in ?? ()
#7 0x0000000000000005 in ?? ()
#8 0x0000000000000000 in ?? ()
It seems to be exited thread but I am not sure.
I would like know how to understand it. Especially, I don't understand what does it mean LWP and Thread 0x7f385f185700 ( what is that address)?
Moreover, I noticed that profiler indicates that __mcount_internal takes relatively a lot of time. What is it and why it is time-consuming? Especially, what are frompc and selfpc counters?
The my kernel is Linux 4.4.0 and thread are compatible with POSIX ( C++11 implementaion).
LWP = Light Weight Process, and means thread. Linux threads each have their own thread-ID, from the same sequence as PID numbers, even though they're not a separate process. If you look in /proc/PID/task for a multi-threaded process, you will see the entries for each thread ID.
0x7f385f185700 is the pthread ID, as from pthread_self(3). This is a pointer to a pthread_t.
This thread is stopped at RIP = 0x00007f38655a3cf4, the address in frame #0.
frompc and selfpc are function arguments to the __mcount_internal() glibc function.
Your backtrace can show names and args them because you have debug symbols installed for glibc. You just get ?? for the parent functions because you don't have debug info installed for the program or library containing them. (Compile your own program with -g, and install packages like qtbase5-dbg or libglib2.0-0-dbg to get debug symbols for libraries packed by your distro).
mcount seems to be related to profiling (i.e. code generated by -fprofile-generate or -pg). That would explain why it takes Program Counter values as args.
Why do applications compiled by GCC always contain the _mcount symbol?
That thread has not exited. You wouldn't see as many details if it had. (And probably wouldn't see it at all.)
I'm trying to analyze the core dump of one of my applications, but I'm not able to find the reason for the crash.
When I run gdb binary file corefile I see the following output:
Program terminated with signal SIGKILL, Killed.
#0 0xfedcdf74 in _so_accept () from /usr/lib/libc.so.1
(gdb)
But I am pretty sure that no one has executed kill -9 <pid>. With info thread, I can see all the threads launched by the application, but I can see nothing special about any thread.
By running bt full or maint info sol-threads I don't find anything that leads to the bug. I just see the stack trace for each thread without any information about the bug.
Finally I've found a thread which causes the kill signal.
#0 0xfedcebd4 in _lwp_kill () from /usr/lib/libc.so.1
#1 0xfed67bb8 in raise () from /usr/lib/libc.so.1
#2 0xfed429f8 in abort () from /usr/lib/libc.so.1
#3 0xff0684a8 in __cxxabiv1::__terminate(void (*)()) () from /usr/local/lib/libstdc++.so.5
#4 0xff0684f4 in std::terminate() () from /usr/local/lib/libstdc++.so.5
#5 0xff068dd8 in __cxa_pure_virtual () from /usr/local/lib/libstdc++.so.5
#6 0x00017f40 in A::method (this=0x2538d8) at A.cc:351
Class A inherits of an abstact class and in the line 351 a virtual function declared in the abstract class and defined in A is called. I donĀ“t understand why if object A exists the call to the virtual base function crashes.
That SIGKILL could be caused by your app exceeding some resource limit. Try to get the system log and see if there are any resource limit exceeded messages.
References
Solaris Administration Guide: Resource Controls
I am trying to debug a crash in gdb where is core dumped on this thread. There is other 40+ threads going on at the same time. How do I figure out where this thread 42 is started from?
Also, why the last line (frame #0) is not showing up?
Thread 42 (Thread 0x2aaba65ce940 (LWP 15854)):
#0 0x0000003a95605b03 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#1 0x0000003a9560684b in start_thread () from /lib64/libpthread.so.0
#2 0x0000003a946d526d in clone () from /lib64/libc.so.6
#3 0x0000000000000000 in ?? ()
I am using gdb version 7.7
How do I figure out where this thread 42 is started from?
You can't: neither GDB, nor the OS keeps track of "who started this thread". (It is also often quite useless to know where a particular thread was created).
What you could do is either put instrumentation into your own calls to pthread_create and log "thread X created thread Y", or use catch syscall clone, and print creation stack traces in GDB, then match them later to the crashed thread (match its LWP to the return value of clone earler).
Also, why the last line (frame #0) is not showing up?
You mean frame #3. It doesn't exist -- clone is where the thread is borne (comes to existence).
P.S. Installing libc debug symbols so you can see where inside __nptl_deallocate_tsd the thread crashed is more likely to provide clues than knowing thread creation details.
I've got a repeating NSTimer that I invalidate manually some time before I dealloc the object that owns the callback that is invoked by the timer. I verify that when I invalidate the timer, the callback is not being called anymore. When the object deallocs, an instant later I get an EXC_BAD_ACCESS. The crash is correlated with whether I invalidate the timer or not, i.e. if I don't invalidate it, there is no crash.
Does anyone know why this might be happening? I don't see why the timer would try to access the callback for the dealloc'ed object, which is what seems to be happening. I'm at a loss how to debug this further. The callstack just says:
#0 0x02f05c93 in objc_msgSend
#1 0x00000001 in ??
#2 0x02d255fa in CFRunLoopRunSpecific
#3 0x02d248a8 in CFRunLoopRunInMode
#4 0x037de89d in GSEventRunModal
#5 0x037de962 in GSEventRun
#6 0x00863372 in UIApplicationMain
#7 0x00002e40 in main at main.m:13
UPDATE: I have determined it is not the timer but a leak resulting from my parent object calling dealloc (the non-invalidating timer was preventing dealloc from calling). It would still be useful to hear advice on how to debug things when I hit a wall with the callstack, if it is even possible, so I'll leave this question up.
When you invalidate an NSTimer, the receiver sends a release to the target and userInfo parameters of timerWithTimeInterval:target:selector:userInfo:repeats:. Is it possible you are referring to these objects or have some incorrect memory-management with one of these objects?
One way to debug this is to make use of NSZombie.