Can't get the reason for a deadlock - multithreading
I have a mixed application (boost, ACE, cli, C#).
I've loaded dump into WinDbg and here are my investigations:
1) There are 25 threads:
0:024> ~
# 0 Id: 22d8.2e1c Suspend: 0 Teb: 7eb6f000 Unfrozen
1 Id: 22d8.2d90 Suspend: 0 Teb: 7eb6c000 Unfrozen
2 Id: 22d8.2ab0 Suspend: 0 Teb: 7eb65000 Unfrozen
3 Id: 22d8.17c4 Suspend: 0 Teb: 7ea36000 Unfrozen
4 Id: 22d8.1254 Suspend: 0 Teb: 7ea33000 Unfrozen
5 Id: 22d8.227c Suspend: 0 Teb: 7e9cf000 Unfrozen
6 Id: 22d8.f94 Suspend: 0 Teb: 7e9cc000 Unfrozen
7 Id: 22d8.23e8 Suspend: 0 Teb: 7e9c0000 Unfrozen
8 Id: 22d8.19d8 Suspend: 0 Teb: 7e9c6000 Unfrozen
9 Id: 22d8.19fc Suspend: 0 Teb: 7e9bd000 Unfrozen
10 Id: 22d8.1ec8 Suspend: 0 Teb: 7e9ba000 Unfrozen
11 Id: 22d8.149c Suspend: 0 Teb: 7e9b7000 Unfrozen
12 Id: 22d8.1dec Suspend: 0 Teb: 7e9b4000 Unfrozen
13 Id: 22d8.2e50 Suspend: 0 Teb: 7e9b1000 Unfrozen
14 Id: 22d8.19d0 Suspend: 0 Teb: 7e9ae000 Unfrozen
15 Id: 22d8.2f80 Suspend: 0 Teb: 7e9ab000 Unfrozen
16 Id: 22d8.1218 Suspend: 0 Teb: 7e9a8000 Unfrozen
17 Id: 22d8.2874 Suspend: 0 Teb: 7e9a5000 Unfrozen
18 Id: 22d8.1f7c Suspend: 0 Teb: 7e9a2000 Unfrozen
19 Id: 22d8.292c Suspend: 0 Teb: 7e99f000 Unfrozen
20 Id: 22d8.2c6c Suspend: 0 Teb: 7e99c000 Unfrozen
21 Id: 22d8.27ec Suspend: 0 Teb: 7e9c9000 Unfrozen
22 Id: 22d8.ab0 Suspend: 0 Teb: 7ea39000 Unfrozen
23 Id: 22d8.1d54 Suspend: 0 Teb: 7ea3f000 Unfrozen
24 Id: 22d8.2ee8 Suspend: 0 Teb: 7ea3c000 Unfrozen
22 of them are managed threads
0:024> !threads
ThreadCount: 22
UnstartedThread: 1
BackgroundThread: 20
PendingThread: 1
DeadThread: 0
Hosted Runtime: no
Lock
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
0 1 2e1c 015a8d58 26020 Cooperative 00000000:00000000 015a2f18 1 STA (GC)
2 2 2ab0 015b6b90 2b220 Preemptive 00000000:00000000 015a2f18 0 MTA (Finalizer)
3 3 17c4 016501e0 102a220 Preemptive 00000000:00000000 015a2f18 0 MTA (Threadpool Worker)
5 4 227c 0425f080 1020220 Preemptive 00000000:00000000 015a2f18 0 Ukn (Threadpool Worker)
7 7 23e8 07628058 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
8 8 19d8 07659a08 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
9 9 19fc 07667418 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
10 10 1ec8 07668d18 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
11 11 149c 0766abc8 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
12 12 1dec 076724e0 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
13 13 2e50 07672a28 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
14 14 19d0 0767a448 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
15 15 2f80 0767a990 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
16 16 1218 0767ba88 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
17 17 2874 076811c0 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
18 18 1f7c 076826e0 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
19 19 292c 07683c00 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
20 20 2c6c 07682198 20220 Preemptive 00000000:00000000 015a2f18 0 Ukn
21 31 27ec 098dfac8 8039220 Preemptive 00000000:00000000 015a2f18 0 Ukn (Threadpool Completion Port)
22 5 ab0 098e0fe8 8029220 Preemptive 00000000:00000000 015a2f18 0 MTA (Threadpool Completion Port)
23 35 1d54 098e0558 1039220 Preemptive 00000000:00000000 015a2f18 0 Ukn (Threadpool Worker)
24 38 2ee8 098df038 1600 Preemptive 00000000:00000000 015a2f18 0 Ukn
3) I've got callstack for these threads (with ~*kvn) and searched for NtWaitForSingleObject:
00 00fae970 74f02cc7 000001c8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 049ff670 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 07adf63c 7769de07 00000980 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 0817d4f0 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 0858ed00 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 0868ed48 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 087cebf0 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 0890ed00 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 08a4ecd0 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 08c4f600 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 08d8f9a8 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 08ecfab8 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 0900f9c8 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 0914f928 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 0928f890 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 093cf480 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 0950f630 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 0474ebf0 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 07d9ed38 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
00 096cf77c 7769de07 00000980 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
There are 3 events, that are blocking threads:
As far as i understood, the GC thread (which is in cooperative mode) waits untill all other managed threads will stop their work and will give sign, that GC can start its work.
Threads that are waiting for a handle 000001bc are already waiting for a GC.
All of them have identical top of call stack:
00 049ff670 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
01 049ff6e4 7220a0ab 000001bc ffffffff 00000000 KERNELBASE!WaitForSingleObjectEx+0x99 (FPO: [SEH])
02 049ff714 7220a0f2 00000000 e1a41a23 00000000 clr!CLREventWaitHelper2+0x33 (FPO: [Non-Fpo])
03 049ff764 7220a077 00000000 e1a41adb 0159d398 clr!CLREventWaitHelper+0x2a (FPO: [Non-Fpo])
04 049ff79c 7221499e ffffffff 00000000 00000000 clr!CLREventBase::WaitEx+0x152 (FPO: [Non-Fpo])
05 049ff7b0 72211c25 00000000 e1a41a87 016501e0 clr!WKS::GCHeap::WaitUntilGCComplete+0x34 (FPO: [1,0,0])
06 049ff800 722c53c0 098deaf0 722c9fe6 e1a41527 clr!Thread::RareDisablePreemptiveGC+0x231 (FPO: [0,13,0])
07 049ff808 722c9fe6 e1a41527 00000001 00000000 clr!GCCoopHackNoThread::GCCoopHackNoThread+0x2e (FPO: [0,0,4])
There is a third handle - 00000980 which is blocking threads 6 and 24.
Here are call stacks of these two threads:
6 Id: 22d8.f94 Suspend: 0 Teb: 7e9cc000 Unfrozen
# ChildEBP RetAddr Args to Child
00 07adf63c 7769de07 00000980 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
01 07adf6b0 7769dc8b 0429da58 0429da58 00000002 ntdll!RtlpWaitOnCriticalSection+0xd0 (FPO: [Non-Fpo])
02 07adf6dc 7769dcb5 07adf71c 67691348 67737850 ntdll!RtlpEnterCriticalSectionContended+0xa0 (FPO: [Non-Fpo])
03 07adf6e4 67691348 67737850 f76f5d88 0429da58 ntdll!RtlEnterCriticalSection+0x42 (FPO: [Non-Fpo])
04 07adf71c 6768e470 07adf73c 07adf740 07adf74c ucrtbase!__crt_seh_guarded_call<void>::operator()<<lambda_5b71d36f03204c0beab531769a5b5694>,<lambda_be2b3da3f62db62e9dad5dc70221a656> &,<lambda_8f9ce462984622f9bf76b59e2aaaf805> >+0x48 (FPO: [SEH])
05 07adf754 776ecc7a 0429da58 015b85c8 00000102 ucrtbase!destroy_fls+0xe0 (FPO: [Non-Fpo])
06 07adf77c 776a2dde 015b85c8 e7200ce7 00000000 ntdll!RtlProcessFlsData+0xf8 (FPO: [Non-Fpo])
07 07adf818 776a29dc ffffffff 00000102 74b10000 ntdll!LdrShutdownThread+0x32 (FPO: [SEH])
08 07adf8ec 74f17194 00000000 00000102 74b10000 ntdll!RtlExitUserThread+0x4c (FPO: [Non-Fpo])
09 07adf900 74b1c1c8 74b10000 00000000 74b1c0d0 KERNELBASE!FreeLibraryAndExitThread+0x34 (FPO: [Non-Fpo])
0a 07adf93c 76f57c04 01602f40 76f57be0 e7636a34 mswsock!SockAsyncThread+0x11e (FPO: [Non-Fpo])
0b 07adf950 776dad2f 01602f40 e7200d67 00000000 kernel32!BaseThreadInitThunk+0x24 (FPO: [Non-Fpo])
0c 07adf998 776dacfa ffffffff 776c00b0 00000000 ntdll!__RtlUserThreadStart+0x2f (FPO: [SEH])
0d 07adf9a8 00000000 74b1c0d0 01602f40 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])
24 Id: 22d8.2ee8 Suspend: 0 Teb: 7ea3c000 Unfrozen
# ChildEBP RetAddr Args to Child
00 096cf77c 7769de07 00000980 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
01 096cf7f0 7769dc8b 67737340 075af238 00000002 ntdll!RtlpWaitOnCriticalSection+0xd0 (FPO: [Non-Fpo])
02 096cf81c 7769dcb5 096cf85c 67691598 67737850 ntdll!RtlpEnterCriticalSectionContended+0xa0 (FPO: [Non-Fpo])
03 096cf824 67691598 67737850 f9ae52c8 67737340 ntdll!RtlEnterCriticalSection+0x42 (FPO: [Non-Fpo])
04 096cf85c 6768ed7b 096cf890 096cf87c 096cf884 ucrtbase!__crt_seh_guarded_call<void>::operator()<<lambda_3518db117f0e7cdb002338c5d3c47b6c>,<lambda_b2ea41f6bbb362cd97d94c6828d90b61> &,<lambda_abdedf541bb04549bc734292b4a045d4> >+0x48 (FPO: [SEH])
05 096cf8ac 6768e4f6 00000000 096cf8d8 776c96de ucrtbase!DllMainDispatch+0x17b (FPO: [Non-Fpo])
06 096cf8b8 776c96de 67660000 00000002 00000000 ucrtbase!__acrt_DllMain+0x16 (FPO: [Non-Fpo])
07 096cf8d8 776c9658 6768e4e0 67660000 00000002 ntdll!LdrxCallInitRoutine+0x16
08 096cf928 776e5b33 00000002 00000000 e9e10d3b ntdll!LdrpCallInitRoutine+0x43 (FPO: [SEH])
09 096cf9c4 776daa54 e9e10eeb 00000000 00000000 ntdll!LdrpInitializeThread+0x106 (FPO: [SEH])
0a 096cfa14 776da9d0 00000000 00000000 096cfa30 ntdll!_LdrpInitialize+0x6e (FPO: [Non-Fpo])
0b 096cfa1c 00000000 096cfa30 77680000 00000000 ntdll!LdrInitializeThunk+0x10 (FPO: [Non-Fpo])
The top ich identical. And both are waiting for the same event:
0:024> dt ntdll!_RTL_CRITICAL_SECTION 67737850
+0x000 DebugInfo : 0x07589a98 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : 0n-10
+0x008 RecursionCount : 0n1
+0x00c OwningThread : 0x000023e8 Void
+0x010 LockSemaphore : 0x00000980 Void
+0x014 SpinCount : 0xfa0
And here I can see, that the owner of this event is thread Nr. 7 (23e8).
And here I've stucked:
thread 0 had triggered GC, and starts waiting for all managed threads
thread 7 had started waiting until GC is finished.
thread 6 seems to be an unmanaged thread, so we can ignore it
thread 24 waits until thread 7 will release handle 0x980
thread 0 will not continue until thread 24 will start to wait for GC (WaitUntilGCComplete)
thread 7 will not release handle 0x980 until thread 0 is ready.
Deadlock.
I don't see any line in callstack of the thread 7 where this handle could be set. And i have no idea, how could i found out the reason of the deadlock.
Do you have any advice?
Update, as requested in comment:
Here is the output of !analyze -v -hang
0:007> !analyze -v -hang
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
DUMP_CLASS: 2
DUMP_QUALIFIER: 400
CONTEXT: (.cxr;r)
eax=00000001 ebx=07628058 ecx=00000000 edx=00000000 esi=00000000 edi=000001bc
eip=776bc33c esp=0817d4f4 ebp=0817d564 iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
ntdll!NtWaitForSingleObject+0xc:
776bc33c c20c00 ret 0Ch
FAULTING_IP:
+0
00000000 ?? ???
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00000000
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 0
FAULTING_THREAD: 000023e8
PROCESS_NAME: DataAcquisition.exe
ERROR_CODE: (NTSTATUS) 0xcfffffff - <Unable to get error code text>
EXCEPTION_CODE: (NTSTATUS) 0xcfffffff - <Unable to get error code text>
EXCEPTION_CODE_STR: cfffffff
BUGCHECK_STR: APPLICATION_HANG
WATSON_BKT_EVENT: AppHang
WATSON_BKT_MODULE: unknown
WATSON_BKT_MODVER: 0.0.0.0
WATSON_BKT_MODOFFSET: 0
WATSON_BKT_MODSTAMP: bbbbbbb4
WATSON_BKT_PROCSTAMP: 5a7c8dca
WATSON_BKT_PROCVER: 2.0.10.6
PROCESS_VER_PRODUCT: DataAcquisition
BUILD_VERSION_STRING: 6.3.9600.17415 (winblue_r4.141028-1500)
MODLIST_WITH_TSCHKSUM_HASH: 5885d70d8e46f0eabba57b93b12e832f6e89fef5
MODLIST_SHA1_HASH: fee227812939995a1d0ec519e5d34e45f0278989
NTGLOBALFLAG: 0
PROCESS_BAM_CURRENT_THROTTLED: 0
PROCESS_BAM_PREVIOUS_THROTTLED: 0
APPLICATION_VERIFIER_FLAGS: 0
PRODUCT_TYPE: 3
SUITE_MASK: 272
DUMP_FLAGS: 8000c07
DUMP_TYPE: 3
MISSING_CLR_SYMBOL: 0
ANALYSIS_SESSION_HOST: D116761
ANALYSIS_SESSION_TIME: 03-01-2018 18:23:18.0286
ANALYSIS_VERSION: 10.0.15063.400 x86fre
MANAGED_CODE: 1
MANAGED_ENGINE_MODULE: clr
MANAGED_ANALYSIS_PROVIDER: SOS
MANAGED_THREAD_ID: 2e1c
DERIVED_WAIT_CHAIN:
Dl Eid Cid WaitType
-- --- ------- --------------------------
7 22d8.23e8 Pseudo Thread Handle
WAIT_CHAIN_COMMAND: ~7s;k;;
THREAD_ATTRIBUTES:
BLOCKING_THREAD: 000023e8
DEFAULT_BUCKET_ID: APPLICATION_HANG_BusyHang
PRIMARY_PROBLEM_CLASS: BusyHang
THREAD_SHA1_HASH_MOD_FUNC: 9ced3fd11653de6459589f9d85e80a71659b8f04
THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 35554e4c1cf9537ccfa6541f1738ec2b92873870
LAST_CONTROL_TRANSFER: from 74f02cc7 to 776bc33c
STACK_TEXT:
0817d4f0 74f02cc7 000001bc 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0817d564 7220a0ab 000001bc ffffffff 00000000 KERNELBASE!WaitForSingleObjectEx+0x99
0817d594 7220a0f2 00000000 ed2c38a3 00000000 clr!CLREventWaitHelper2+0x33
0817d5e4 7220a077 00000000 ed2c3b5b 0159d398 clr!CLREventWaitHelper+0x2a
0817d61c 7221499e ffffffff 00000000 00000000 clr!CLREventBase::WaitEx+0x152
0817d630 72211c25 00000000 ed2c3b07 07628058 clr!WKS::GCHeap::WaitUntilGCComplete+0x34
0817d680 72388444 ed2c3a43 00000000 07628058 clr!Thread::RareDisablePreemptiveGC+0x231
0817d704 0829e382 3259f8be 721ffa00 0817d748 clr!JIT_RareDisableHelper+0x24
WARNING: Frame IP not in any known module. Following frames may be wrong.
0817d738 0829e1fe 0829e231 3259f8be 721ffa00 0x829e382
0817d7a4 0829de54 00000014 0817da54 0817d830 0x829e1fe
0817d830 09518e12 0817de98 01e9d821 00000030 0x829de54
0817db2c 095180bc 0817de98 04230d90 0817f350 0x9518e12
0817e044 0437bf99 0817ead0 095488f0 07628058 0x95180bc
0817e070 09513481 0758e698 0817f350 0817ead0 0x437bf99
0817ee94 09545fa9 0817f2c0 00000000 0817f5e4 0x9513481
0817f638 081fb947 0817f82c 097b5d38 3259f8be 0x9545fa9
0817f6b8 0437b535 0817f82c 081f68e0 07628058 0x81fb947
0817f6e4 081fa751 04259828 097b5d38 0817f82c 0x437b535
0817f78c 0437b1fc 0817f814 0817f80c 081f6870 0x81fa751
0817f7bc 0802e1c4 04259828 0817f82c 0817f80c 0x437b1fc
0817f888 0437a01d 0802c4f0 07628058 0817fb9c 0x802e1c4
0817f8b0 0802d897 04259828 07628058 04264ec0 0x437a01d
0817fb8c 04379f56 0802c4d8 07628058 0817fc14 0x802d897
0817fbb4 0802cf80 04259828 3259f8be 721ffa00 0x4379f56
0817fbe8 0802cf1b 00000000 0817fbfc 0817fc2c 0x802cf80
0817fc2c 734ebb5e 04230de0 ecec8d35 04264ec0 0x802cf1b
0817fc58 676e62e4 04230de0 f8d55600 676e6290 boost_thread_vc140_mt_1_60!boost::detail::win32::handle_manager::swap+0x7e
0817fc94 76f57c04 04264ec0 76f57be0 e8d96fcc ucrtbase!thread_start<unsigned int (__stdcall*)(void *)>+0x54
0817fca8 776dad2f 04264ec0 e89a080f 00000000 kernel32!BaseThreadInitThunk+0x24
0817fcf0 776dacfa ffffffff 776c00b0 00000000 ntdll!__RtlUserThreadStart+0x2f
0817fd00 00000000 676e6290 04264ec0 00000000 ntdll!_RtlUserThreadStart+0x1b
THREAD_SHA1_HASH_MOD: 9c5af7c316b52dce74c542a777786aa1292e9273
FOLLOWUP_IP:
clr!Thread::RareDisablePreemptiveGC+231
72211c25 83a388000000ef and dword ptr [ebx+88h],0FFFFFFEFh
FAULT_INSTR_CODE: 88a383
SYMBOL_STACK_INDEX: 6
SYMBOL_NAME: clr!Thread::RareDisablePreemptiveGC+231
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: clr
IMAGE_NAME: clr.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 59cf5105
STACK_COMMAND: ~7s ; kb
BUCKET_ID: APPLICATION_HANG_clr!Thread::RareDisablePreemptiveGC+231
FAILURE_EXCEPTION_CODE: cfffffff
FAILURE_IMAGE_NAME: clr.dll
BUCKET_ID_IMAGE_STR: clr.dll
FAILURE_MODULE_NAME: clr
BUCKET_ID_MODULE_STR: clr
FAILURE_FUNCTION_NAME: Thread::RareDisablePreemptiveGC
BUCKET_ID_FUNCTION_STR: Thread::RareDisablePreemptiveGC
BUCKET_ID_OFFSET: 231
BUCKET_ID_MODPRIVATE: 1
BUCKET_ID_MODTIMEDATESTAMP: 59cf5105
BUCKET_ID_MODCHECKSUM: 6e5783
BUCKET_ID_MODVER_STR: 4.7.2117.0
BUCKET_ID_PREFIX_STR: APPLICATION_HANG_
FAILURE_PROBLEM_CLASS: BusyHang
FAILURE_SYMBOL_NAME: clr.dll!Thread::RareDisablePreemptiveGC
FAILURE_BUCKET_ID: APPLICATION_HANG_BusyHang_cfffffff_clr.dll!Thread::RareDisablePreemptiveGC
TARGET_TIME: 2018-02-28T03:14:53.000Z
OSBUILD: 9600
OSSERVICEPACK: 17415
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
OSPLATFORM_TYPE: x86
OSNAME: Windows 8.1
OSEDITION: Windows 8.1 Server TerminalServer SingleUserTS
OS_LOCALE:
USER_LCID: 0
OSBUILD_TIMESTAMP: 2014-10-29 02:58:22
BUILDDATESTAMP_STR: 141028-1500
BUILDLAB_STR: winblue_r4
BUILDOSVER_STR: 6.3.9600.17415
ANALYSIS_SESSION_ELAPSED_TIME: 5a7d
ANALYSIS_SOURCE: UM
FAILURE_ID_HASH_STRING: um:application_hang_busyhang_cfffffff_clr.dll!thread::raredisablepreemptivegc
FAILURE_ID_HASH: {6a1ff91c-492b-fc00-85f1-6b17a6a44ff6}
Followup: MachineOwner
---------
Update 2:
I've noticed, that all threads are start with ntdll!_RtlUserThreadStart and thread 24 starts with ntdll!LdrInitializeThunk. Could this information give an advice how thread 24 was started? What is the difference between these entry points?
Related
Executing boot ROM functions in Linux drivers
I'm trying to execute some functions from the boot ROM on an NXP IMX6UL in a Linux device driver. I figured a device driver is the only place I can get manage this. Currently, I map the boot ROM using devm_ioremap_resource() and I can read the ROM table in the device fine and it shows the values as expected. The problem comes when I try and execute a function from there, I get a paging request error and crash. I get the following crash message: Unable to handle kernel paging request at virtual address bf968f88 pgd = 8e5fa23c [bf968f88] *pgd=b839e811, *pte=00008653, *ppte=00008453 Internal error: Oops: 8000000f [#1] PREEMPT ARM Modules linked in: CPU: 0 PID: 299 Comm: sh Not tainted 4.19.35-00007-ga99feb79b139-dirty #639 Hardware name: Freescale i.MX6 UltraLite (Device Tree) PC is at 0xbf968f88 LR is at hab_rvt_entry+0x98/0xb4 pc : [<bf968f88>] lr : [<804ee430>] psr: 600f0033 sp : b9f85ea8 ip : 00000000 fp : 00000000 r10: b9e55e90 r9 : b9f85f78 r8 : b9a96800 r7 : 00000002 r6 : bf960000 r5 : 00008f89 r4 : bf968f89 r3 : fde952f0 r2 : fde952f0 r1 : 00000001 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none Control: 10c53c7d Table: b9ea8059 DAC: 00000051 Process sh (pid: 299, stack limit = 0x00d86b0c) Stack: (0xb9f85ea8 to 0xb9f86000) 5ea0: 00000002 b9e55e80 00000000 00000000 b9a96800 8027b3ec 5ec0: 00000000 00000000 81004048 8027b304 002478d0 b9f85f78 00000000 002478d0 5ee0: 00000002 802032c0 00002ee7 00000000 81004048 fde952f0 81004048 7e87c490 5f00: 00235a30 80208600 000007ff 00008180 00000001 00001000 00000000 00000000 5f20: 00000000 00000000 00002ee7 00000000 00000000 fde952f0 b98f1164 00000002 5f40: b9cbb840 002478d0 b9f85f78 00000000 002478d0 8020357c 5dca454a 00000000 5f60: 81004048 b9cbb840 00000000 00000000 b9cbb840 80203794 00000000 00000000 5f80: 00000000 fde952f0 00000002 002478d0 76ec0d98 00000004 80101204 b9f84000 5fa0: 00000004 80101000 00000002 002478d0 00000001 002478d0 00000002 00000000 5fc0: 00000002 002478d0 76ec0d98 00000004 002478d0 00000002 00000000 00000000 5fe0: 00000064 7e87c9d0 76de9ce0 76e42a74 600e0010 00000001 00000000 00000000 [<804ee430>] (hab_rvt_entry) from [<8027b3ec>] (kernfs_fop_write+0xe8/0x1c8) [<8027b3ec>] (kernfs_fop_write) from [<802032c0>] (__vfs_write+0x2c/0x160) [<802032c0>] (__vfs_write) from [<8020357c>] (vfs_write+0xa4/0x17c) [<8020357c>] (vfs_write) from [<80203794>] (ksys_write+0x4c/0xac) [<80203794>] (ksys_write) from [<80101000>] (ret_fast_syscall+0x0/0x54) Exception stack(0xb9f85fa8 to 0xb9f85ff0) 5fa0: 00000002 002478d0 00000001 002478d0 00000002 00000000 5fc0: 00000002 002478d0 76ec0d98 00000004 002478d0 00000002 00000000 00000000 5fe0: 00000064 7e87c9d0 76de9ce0 76e42a74 Code: ffc4 f7fd f833 e7fe (b5f0) b087 For reference and to make sense of these error messages a bit, BF960000 is what the base of my boot ROM is mapped to, and the address of the command I'm trying to execute is physically is at 8F89, virtually at BF968F89. Is there any way to execute commands like this that exist in the boot ROM?
Sony Spresense "audio_manager" assertion failed
I am working over a Spresense project but I have a problem with audio reproduction. The SD card has three files called "1.mp3", "2.mp3" and "3.mp3", but when I call one of them Serial puts an error and the board halts. Other audio examples worked well also with the files actually used in my project (renamed) void Play(int id){ // Open file placed on SD card if(id == 1){ Serial.println("Required file 1"); myFile = theSD.open("1.mp3"); } if(id == 2){ Serial.println("Required file 2"); myFile = theSD.open("2.mp3"); } if(id == 3){ Serial.println("Required file 3"); myFile = theSD.open("3.mp3"); } // Verify file open if (!myFile) { Serial.println("File open error"); } // Send first frames to be decoded err_t err = theAudio->writeFrames(AudioClass::Player0, myFile); if ((err != AUDIOLIB_ECODE_OK) && (err != AUDIOLIB_ECODE_FILEEND)) { Serial.println("File Read Error!"); myFile.close(); } theAudio->startPlayer(AudioClass::Player0); } Required file 1 Attention: module[1] attention id[2]/code[1] (dma_controller/audio_dma_drv.cpp L886) Attention! up_assert: Assertion failed at file:manager/audio_manager.cpp line: 586 task: init up_dumpstate: sp: 0d08464c up_dumpstate: IRQ stack: up_dumpstate: base: 0d07b900 up_dumpstate: size: 00000800 up_dumpstate: used: 000000f8 up_dumpstate: User stack: up_dumpstate: base: 0d084898 up_dumpstate: size: 00001fec up_dumpstate: used: 00000518 up_stackdump: 0d084640: 0d03dc80 00000000 00000000 0d03dc80 00000000 0d0168ed 000fd080 000fda14 up_stackdump: 0d084660: 000fd040 0d01d715 0d0846e0 0d015af5 0d0846c0 000fd040 0001e000 00008000 up_stackdump: 0d084680: 0d03fdf8 0d03fdf8 00000000 0d000000 0d03dc80 0d0017a5 00004000 00004000 up_stackdump: 0d0846a0: 0d03fdf8 0d03fdc8 00000000 0d012071 00000000 0d03fdf8 00004000 0d001f93 up_stackdump: 0d0846c0: 022300f8 0d045500 00000001 0d001fdd 0d0455f8 00000000 0d03fdc8 0d03fc00 up_stackdump: 0d0846e0: 0d03fc00 0d000000 0d03dc80 0d03fc00 0d03fc00 0d000597 0d03fc00 0d00065f up_stackdump: 0d084700: 060107bc 00170000 0000020a 00010000 00000000 00000000 0f0dfdc2 3dbc48c1 up_stackdump: 0d084720: fd1b2fa3 bdbc5713 00000000 c0320000 00000000 00000000 bf800000 bf800000 up_stackdump: 0d084740: bf800000 bf800000 00000000 00000000 00000000 00000000 00000000 00000000 up_stackdump: 0d084760: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 up_stackdump: 0d084780: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 up_stackdump: 0d0847a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 up_stackdump: 0d0847c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 up_stackdump: 0d0847e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 up_stackdump: 0d084800: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 up_stackdump: 0d084820: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 up_stackdump: 0d084840: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 up_stackdump: 0d084860: 00000000 00000000 00000000 0d03dc90 0d03dc90 0d0032cd 0d01c23b 00000101 up_stackdump: 0d084880: 00000000 00000000 00000000 0d005cdb 00000000 00000000 deadbeef 0d0848a4 up_taskdump: Idle Task: PID=0 Stack Used=0 of 0 up_taskdump: hpwork: PID=1 Stack Used=584 of 2028 up_taskdump: lpwork: PID=2 Stack Used=352 of 2028 up_taskdump: lpwork: PID=3 Stack Used=352 of 2028 up_taskdump: lpwork: PID=4 Stack Used=352 of 2028 up_taskdump: init: PID=5 Stack Used=1304 of 8172 up_taskdump: cxd56_pm_task: PID=6 Stack Used=320 of 996 up_taskdump: <pthread>: PID=7 Stack Used=704 of 1020 up_taskdump: AMNG: PID=8 Stack Used=616 of 2028 up_taskdump: PLY_OBJ: PID=9 Stack Used=1088 of 3052 up_taskdump: SUB_PLY_OBJ: PID=10 Stack Used=324 of 3044 up_taskdump: OMIX_OBJ: PID=11 Stack Used=520 of 3044 up_taskdump: RENDER_CMP_DEV0: PID=12 Stack Used=696 of 2020 up_taskdump: RENDER_CMP_DEV1: PID=13 Stack Used=312 of 2020 up_taskdump: REC_OBJ: PID=14 Stack Used=352 of 2028 up_taskdump: CAPTURE_CMP_DEV0: PID=15 Stack Used=312 of 2012 up_taskdump: <pthread>: PID=16 Stack Used=344 of 2044
To analyze a stack dump the Spresense full SDK provide a tool where you can specify two files as arguments. One is your saved log file, and the other one is the system map file. Doing that you should be able to get the stack trace. If you have build your software with the Arduino IDE you should be able to find where your map file is located in the Arduino IDE log window. Go to File -> Preferences -> Settings -> Show verbose output during and select compilation and when you build your sketch you should be able to find where your build folder is. Normally this folder would be located in /tmp and look something like this: /tmp/arduino_build_724727/ Fetch the full SDK from github: git clone --recursive git#github.com:sonydevworld/spresense.git Change directory to the SDK: $cd spresense/sdk spresense/sdk$ ./tools/callstack.py -h Usage: python ./tools/callstack.py <System.map> <stackdump.log> Now just specify the location of your files: ./tools/callstack.py /tmp/arduino_build_724727/output.map stackoverflow.log For Spresense specific questions and technical support please see: https://forum.developer.sony.com/
Kernel debugging from /dev/kmsg
I am having some problem with a (customized) driver (smsc95xx) which runs on my embedded systems, and I would need to understand where the issue exactely comes from. For example, this is a kernel error message from /dev/kmsg reporting the issue: 1,737,1433656890,-;Unable to handle kernel NULL pointer dereference at virtual address 000001a0 1,738,1433665618,-;pgd = daafc000 1,739,1433668609,-;[000001a0] *pgd=9d5dd831, *pte=00000000, *ppte=00000000 0,740,1433675720,-;Internal error: Oops: 17 [#2] SMP ARM 4,741,1433680664,-;Modules linked in: ctr ccm ecb hci_uart rfcomm bnep bluetooth arc4 usb_trimble(O) wl18xx wlcore mac80211 cfg80211 rfkill wlcore_sdio twl4030_madc industrialio ftdi_sio smsc95xx(O) usbserial(O) ipv6 4,742,1433700378,-;CPU: 0 PID: 17418 Comm: sh Tainted: G D O 3.18.18-custom #20 4,743,1433708343,-;task: de30cd40 ti: da9b8000 task.ti: da9b8000 4,744,1433714050,-;PC is at __pm_runtime_resume+0x1c/0x64 4,745,1433719085,-;LR is at usb_autopm_get_interface+0x18/0x5c 4,746,1433724578,-;pc : [<c03cb590>] lr : [<c04677d4>] psr: 20000013\x0asp : da9b9ea8 ip : da9b9f14 fp : 00000000 4,747,1433736633,-;r10: daa22a4c r9 : 00000024 r8 : 00000004 4,748,1433742126,-;r7 : 000000a0 r6 : 00000004 r5 : 00000000 r4 : 00000020 4,749,1433748992,-;r3 : 000001a0 r2 : 00000040 r1 : 00000004 r0 : 00000020 4,750,1433755859,-;Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user 4,751,1433763366,-;Control: 10c5387d Table: 9aafc019 DAC: 00000015 0,752,1433769378,-;Process sh (pid: 17418, stack limit = 0xda9b8240) 0,753,1433775421,-;Stack: (0xda9b9ea8 to 0xda9ba000) 0,754,1433779998,-;9ea0: 00000000 00000000 00000000 00000020 000000a0 c04677d4 0,755,1433788604,-;9ec0: dd31f680 00000000 00000040 c04574c8 c01ae218 c0085f58 00000001 00000000 0,756,1433797210,-;9ee0: 00000000 00000024 c04574a0 dd31f680 c0457510 de687a00 da9b9f88 bf0d44e4 0,757,1433805816,-;9f00: 00000024 da9b9f14 00000004 de687a00 da9b9f88 01110000 00000000 bf0d7990 0,758,1433814422,-;9f20: bf0d7cbc 00000000 00000000 bf0d4554 00000002 00000002 daa22a40 c01ae24c 0,759,1433823028,-;9f40: 00000000 00000000 dd3721c0 00000002 000eb408 da9b9f88 c000e824 da9b8000 0,760,1433831634,-;9f60: 00000000 c0145fd8 de30cd40 c08f20d4 dd3721c0 dd3721c0 00000002 000eb408 0,761,1433840240,-;9f80: c000e824 c01464e0 00000000 00000000 00000000 00000002 000eb408 b6ee1d60 0,762,1433848815,-;9fa0: 00000004 c000e660 00000002 000eb408 00000001 000eb408 00000002 00000000 0,763,1433857421,-;9fc0: 00000002 000eb408 b6ee1d60 00000004 00000000 000e515c 00000001 00000000 0,764,1433865997,-;9fe0: 00000000 beaef904 b6e1946c b6e7139c 60000010 00000001 00000000 00000000 4,765,1433874603,-;[<c03cb590>] (__pm_runtime_resume) from [<c04677d4>] (usb_autopm_get_interface+0x18/0x5c) 4,766,1433884307,-;[<c04677d4>] (usb_autopm_get_interface) from [<c04574c8>] (usbnet_write_cmd+0x28/0x70) 4,767,1433893737,-;[<c04574c8>] (usbnet_write_cmd) from [<bf0d44e4>] (__smsc95xx_write_reg+0x50/0x8c [smsc95xx]) 4,768,1433903839,-;[<bf0d44e4>] (__smsc95xx_write_reg [smsc95xx]) from [<bf0d4554>] (smsc95xx_store+0x34/0x218 [smsc95xx]) 4,769,1433914794,-;[<bf0d4554>] (smsc95xx_store [smsc95xx]) from [<c01ae24c>] (kernfs_fop_write+0xc0/0x184) 4,770,1433924438,-;[<c01ae24c>] (kernfs_fop_write) from [<c0145fd8>] (vfs_write+0xa0/0x1ac) 4,771,1433932586,-;[<c0145fd8>] (vfs_write) from [<c01464e0>] (SyS_write+0x44/0x9c) 4,772,1433940002,-;[<c01464e0>] (SyS_write) from [<c000e660>] (ret_fast_syscall+0x0/0x50) 0,773,1433947967,-;Code: e1a04000 0a000006 e2803d06 f5d3f000 (e1932f9f) 4,774,1433954650,-;---[ end trace bdd277dec40e1d5c ]--- I suppose the most important part are the last few lines: 4,765,1433874603,-;[<c03cb590>] (__pm_runtime_resume) from [<c04677d4>] (usb_autopm_get_interface+0x18/0x5c) 4,766,1433884307,-;[<c04677d4>] (usb_autopm_get_interface) from [<c04574c8>] (usbnet_write_cmd+0x28/0x70) 4,767,1433893737,-;[<c04574c8>] (usbnet_write_cmd) from [<bf0d44e4>] (__smsc95xx_write_reg+0x50/0x8c [smsc95xx]) 4,768,1433903839,-;[<bf0d44e4>] (__smsc95xx_write_reg [smsc95xx]) from [<bf0d4554>] (smsc95xx_store+0x34/0x218 [smsc95xx]) 4,769,1433914794,-;[<bf0d4554>] (smsc95xx_store [smsc95xx]) from [<c01ae24c>] (kernfs_fop_write+0xc0/0x184) 4,770,1433924438,-;[<c01ae24c>] (kernfs_fop_write) from [<c0145fd8>] (vfs_write+0xa0/0x1ac) 4,771,1433932586,-;[<c0145fd8>] (vfs_write) from [<c01464e0>] (SyS_write+0x44/0x9c) 4,772,1433940002,-;[<c01464e0>] (SyS_write) from [<c000e660>] (ret_fast_syscall+0x0/0x50) but maybe there is a better way than checking /dev/kmsg to understand this output ?
Problem solved. The driver was modified to create the files into the /sys/class/dirnamae/files directory (where dirname and files are named into the driver's code). The problem was that the driver did not delete the directory previously created, so unplugging and replugging the device and then writing into the files was causing the kernel error showed before, because it's like writing into a memory area which is not referenced any more. The solution is to delete the /sys/class/dirnamae and recreating it every time the device is unplugged.
How to understand the ARM registers dumped by kernel panic?
After Linux kernel oops on ARM platform, registers are dumped to console. But I got confused with analyzing these registers. For example, Unable to handle kernel paging request at virtual address 0b56e8b8 pgd = c0004000 [0b56e8b8] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP ARM ...... pc : [<bf65e7c0>] lr : [<bf65ec14>] psr: 20000113 sp : c07059f0 ip : 00008d4c fp : c0705a3c r10: 00000003 r9 : e8bcd800 r8 : e88b006c r7 : 0000e203 r6 : c0705a44 r5 : e88b0000 r4 : 0b56e8b8 r3 : 00000000 r2 : 00000b56 r1 : e4592e10 r0 : e889570c Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5787d Table: 69fec06a DAC: 00000015 SP: 0xc0705970: 5970 e8e70000 e45de100 00000181 00000180 c070599c bf65e7c0 20000113 ffffffff 5990 c07059dc e88b006c c0705a3c c07059a8 c000e318 c0008360 e889570c e4592e10 59b0 00000b56 00000000 0b56e8b8 e88b0000 c0705a44 0000e203 e88b006c e8bcd800 59d0 00000003 c0705a3c 00008d4c c07059f0 bf65ec14 bf65e7c0 20000113 ffffffff 59f0 e8b80000 e2030b56 00000000 e889570c 00000003 e88b006c c007eccc c007ebb4 5a10 00000000 eacc0480 e88b0000 00002098 e9c80480 e8c08000 00000000 e8bcdc80 5a30 c0705a5c c0705a40 bf65ec14 bf65e6c0 bf5e51c4 00000000 e88b0000 00000000 5a50 c0705a74 c0705a60 bf65ecfc bf65ebe4 e4554500 e4554500 c0705a84 c0705a78 R5: 0xe88aff80: ff80 bf10f0b0 e8aca4c0 e88aff8c e88b1680 00000000 bf05b70c e87c3580 00000000 ffa0 bf095024 e87c3580 00000000 bf095024 e87c3580 00000000 bf095024 00000001 ffc0 00000004 ebd83000 00000793 e8cc2500 00000002 00000004 00000043 ffffffff ffe0 40320354 be9ee8d8 00030444 40320380 20000010 00000000 70cfe821 70cfec21 0000 bf81e1f8 e88b0018 e88b000c e88e9a00 00000000 bf095024 00000000 fffffffe 0020 00000000 00000000 fffffffe 00000000 00000000 fffffffe 00000000 00000000 0040 00000001 e91dd000 00001073 0010051b 00080000 f1e4d900 00000001 00000002 0060 000000c8 6df9eca0 00008044 e8895700 00000040 00000026 00000003 0b56e8b8 R8: 0xe88affec: ffec 40320380 20000010 00000000 70cfe821 70cfec21 bf81e1f8 e88b0018 e88b000c 000c e88e9a00 00000000 bf095024 00000000 fffffffe 00000000 00000000 fffffffe 002c 00000000 00000000 fffffffe 00000000 00000000 00000001 e91dd000 00001073 004c 0010051b 00060000 f1e4d900 00000001 00000002 000000c8 6df9eca0 00008044 006c e8895700 00000040 00000026 00000003 0b56e8b8 e4604000 0000026c 000000da 008c 00000000 21d7ff6e 000078a9 bf05add4 e88b0000 e88b0000 ebd02600 f1015a05 00ac 00000001 000000a6 000000c4 00000000 e88b0000 1e1e1e1e 1e1e1e1e 1e1e1e1e 00cc 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e 1e1e1e1e Questions: What does the 0xc0705970 stands for in SP: 0xc0705970:? Code address or data address? Where to find it? Why sp : c07059f0 is not at the beginning or end of SP register? How is the stack organized in this register? What does the first column of each register mean? If they stand for relative address, why are they not continuous? Is 0b56e8b8 a pointer pointing to a page? How is it be accessed in R5 and R8?
How the registers are used in an OS is something up to the ABI, a.k.a Application Binary Interface. However we can give a quick, informal and simplified explanation of the dump. I'm not an expert on Linux on ARM but some name seem quite intuitive: sp is Stack Pointer. A pointer to a useful memory area called the stack. fp is Frame Pointer. A pointer used by routine to access local vars. lr is Link Register. A register containing the Return address of a call. nzCv are the flags, If a flag is in uppercase it is set, otherwise clear. n = Last result was Negative z = Last result was Zero C = Last result needed/produced a Carry bit v = Last result Overflowed IRQ on means Hardware interrupts are enabled. FIRQ on means that some hardware interrupts are handled with a fast context switch. Mode is the CPU mode, indicating that the code was privileged. The following info are control structures for the the CPU set by the kernel. The dump make you a favor by considering the sp, r5 and r8 register values as pointers and showing the memory at that addresses. The block below SP: 0xc0705970: for example is a dump of the memory at 0xc0705970. Each row is formatted as follow: The first column is the current address. Only the last four digit are shown as is it obvious what the full address is (ie there is no ambiguity, the addresses start from 0xc0705970). The following eight columns are 32 bit values dumped from memory. Each row show you 32 byte of memory. For example by looking at R5: 0xe88aff80: ff80 bf10f0b0 e8aca4c0 e88aff8c e88b1680 00000000 bf05b70c e87c3580 00000000 ffa0 bf095024 e87c3580 00000000 bf095024 e87c3580 00000000 bf095024 00000001 ffc0 00000004 ebd83000 00000793 e8cc2500 00000002 00000004 00000043 ffffffff ffe0 40320354 be9ee8d8 00030444 40320380 20000010 00000000 70cfe821 70cfec21 0000 bf81e1f8 e88b0018 e88b000c e88e9a00 00000000 bf095024 00000000 fffffffe 0020 00000000 00000000 fffffffe 00000000 00000000 fffffffe 00000000 00000000 0040 00000001 e91dd000 00001073 0010051b 00080000 f1e4d900 00000001 00000002 0060 000000c8 6df9eca0 00008044 e8895700 00000040 00000026 00000003 0b56e8b8 You can tell that the 32 bit value r5 was pointing to was 0xbf10f0b0 or that the 32 bit value at 0xe88a0000 was 0xbf81e1f8 or that the 32 bit value at 0xe88a0028 was 0xfffffffe. All this information are useful for the developer of the code that panicked.
mmap and then munmap, sometimes unmapped region can be accessed
I met this problem on vmware 11.0 and linux-2.6.34, with gcc 4.9.2, had not tested on real hardware. Following code run successfully and message were printed without SIGSEGV. But if I uncommented the printf before munmap, a SIGSEGV were caught. maps before and after munmap() were printed in following messages. static void check_mmap(void){ int fd, i; char *p = NULL; if ((fd = shm_open("xxxxxxxxxxxx", O_RDWR|O_CREAT|O_TRUNC, 0666)) == -1) { printf("open shm file failed.\n"); return; } if (ftruncate(fd, 4096) == -1) goto out; p = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); if (MAP_FAILED == p) goto out; //printf("Mapped at %p\n", p); getchar(); // <----- chance to print maps befor munmap if (munmap(p, 4096) != 0) printf("munmap error: %s\n", strerror(errno)); printf("Corrupting mmap memory.\n"); for(i = 0; i < 4095; i ++) p[i] = 0; printf("Done\n"); getchar(); // <----- chance to print maps after munmap out: close(fd); if (p) munmap(p, 4096); } maps before munmap, shm xxxxxxxxxxxx is mmaped at 7f3f2683a000-7f3f2683b000 00400000-00401000 r-xp 00000000 00:14 121 /mnt/hgfs/vm_shared/asan/asan1 00600000-00601000 rw-p 00000000 00:14 121 /mnt/hgfs/vm_shared/asan/asan1 7f3f25ea6000-7f3f25ebd000 r-xp 00000000 08:02 347266 /lib64/libpthread-2.11.3.so 7f3f25ebd000-7f3f260bc000 ---p 00017000 08:02 347266 /lib64/libpthread-2.11.3.so 7f3f260bc000-7f3f260bd000 r--p 00016000 08:02 347266 /lib64/libpthread-2.11.3.so 7f3f260bd000-7f3f260be000 rw-p 00017000 08:02 347266 /lib64/libpthread-2.11.3.so 7f3f260be000-7f3f260c2000 rw-p 00000000 00:00 0 7f3f260c2000-7f3f2620e000 r-xp 00000000 08:02 298091 /lib64/libc-2.11.1.so 7f3f2620e000-7f3f2640d000 ---p 0014c000 08:02 298091 /lib64/libc-2.11.1.so 7f3f2640d000-7f3f26411000 r--p 0014b000 08:02 298091 /lib64/libc-2.11.1.so 7f3f26411000-7f3f26412000 rw-p 0014f000 08:02 298091 /lib64/libc-2.11.1.so 7f3f26412000-7f3f26417000 rw-p 00000000 00:00 0 7f3f26417000-7f3f2641e000 r-xp 00000000 08:02 335978 /lib64/librt-2.11.1.so 7f3f2641e000-7f3f2661d000 ---p 00007000 08:02 335978 /lib64/librt-2.11.1.so 7f3f2661d000-7f3f2661e000 r--p 00006000 08:02 335978 /lib64/librt-2.11.1.so 7f3f2661e000-7f3f2661f000 rw-p 00007000 08:02 335978 /lib64/librt-2.11.1.so 7f3f2661f000-7f3f2663d000 r-xp 00000000 08:02 260202 /lib64/ld-2.11.1.so 7f3f2682b000-7f3f2682e000 rw-p 00000000 00:00 0 7f3f26839000-7f3f2683a000 rw-p 00000000 00:00 0 7f3f2683a000-7f3f2683b000 rw-p 00000000 00:11 16078 /dev/shm/xxxxxxxxxxxx 7f3f2683b000-7f3f2683c000 rw-p 00000000 00:00 0 7f3f2683c000-7f3f2683d000 r--p 0001d000 08:02 260202 /lib64/ld-2.11.1.so 7f3f2683d000-7f3f2683e000 rw-p 0001e000 08:02 260202 /lib64/ld-2.11.1.so 7f3f2683e000-7f3f2683f000 rw-p 00000000 00:00 0 7fffd9ce3000-7fffd9d04000 rw-p 00000000 00:00 0 [stack] 7fffd9dff000-7fffd9e00000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] maps after munmap, shm was successfully unmapped. 00400000-00401000 r-xp 00000000 00:14 121 /mnt/hgfs/vm_shared/asan/asan1 00600000-00601000 rw-p 00000000 00:14 121 /mnt/hgfs/vm_shared/asan/asan1 7f3f25ea6000-7f3f25ebd000 r-xp 00000000 08:02 347266 /lib64/libpthread-2.11.3.so 7f3f25ebd000-7f3f260bc000 ---p 00017000 08:02 347266 /lib64/libpthread-2.11.3.so 7f3f260bc000-7f3f260bd000 r--p 00016000 08:02 347266 /lib64/libpthread-2.11.3.so 7f3f260bd000-7f3f260be000 rw-p 00017000 08:02 347266 /lib64/libpthread-2.11.3.so 7f3f260be000-7f3f260c2000 rw-p 00000000 00:00 0 7f3f260c2000-7f3f2620e000 r-xp 00000000 08:02 298091 /lib64/libc-2.11.1.so 7f3f2620e000-7f3f2640d000 ---p 0014c000 08:02 298091 /lib64/libc-2.11.1.so 7f3f2640d000-7f3f26411000 r--p 0014b000 08:02 298091 /lib64/libc-2.11.1.so 7f3f26411000-7f3f26412000 rw-p 0014f000 08:02 298091 /lib64/libc-2.11.1.so 7f3f26412000-7f3f26417000 rw-p 00000000 00:00 0 7f3f26417000-7f3f2641e000 r-xp 00000000 08:02 335978 /lib64/librt-2.11.1.so 7f3f2641e000-7f3f2661d000 ---p 00007000 08:02 335978 /lib64/librt-2.11.1.so 7f3f2661d000-7f3f2661e000 r--p 00006000 08:02 335978 /lib64/librt-2.11.1.so 7f3f2661e000-7f3f2661f000 rw-p 00007000 08:02 335978 /lib64/librt-2.11.1.so 7f3f2661f000-7f3f2663d000 r-xp 00000000 08:02 260202 /lib64/ld-2.11.1.so 7f3f2682b000-7f3f2682e000 rw-p 00000000 00:00 0 7f3f26839000-7f3f2683b000 rw-p 00000000 00:00 0 7f3f2683b000-7f3f2683c000 rw-p 00000000 00:00 0 7f3f2683c000-7f3f2683d000 r--p 0001d000 08:02 260202 /lib64/ld-2.11.1.so 7f3f2683d000-7f3f2683e000 rw-p 0001e000 08:02 260202 /lib64/ld-2.11.1.so 7f3f2683e000-7f3f2683f000 rw-p 00000000 00:00 0 7fffd9ce3000-7fffd9d04000 rw-p 00000000 00:00 0 [stack] 7fffd9dff000-7fffd9e00000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] following is objdump 0000000000400890 <main>: 400890: 55 push %rbp 400891: 53 push %rbx 400892: ba b6 01 00 00 mov $0x1b6,%edx 400897: be 42 02 00 00 mov $0x242,%esi 40089c: bf 9c 0a 40 00 mov $0x400a9c,%edi 4008a1: 48 83 ec 08 sub $0x8,%rsp 4008a5: e8 fe fd ff ff callq 4006a8 <shm_open#plt> 4008aa: 83 f8 ff cmp $0xffffffffffffffff,%eax 4008ad: 89 c3 mov %eax,%ebx 4008af: 0f 84 c0 00 00 00 je 400975 <main+0xe5> 4008b5: be 00 10 00 00 mov $0x1000,%esi 4008ba: 89 c7 mov %eax,%edi 4008bc: e8 37 fe ff ff callq 4006f8 <ftruncate#plt> 4008c1: 83 f8 ff cmp $0xffffffffffffffff,%eax 4008c4: 0f 84 9b 00 00 00 je 400965 <main+0xd5> 4008ca: 45 31 c9 xor %r9d,%r9d 4008cd: 31 ff xor %edi,%edi 4008cf: 41 89 d8 mov %ebx,%r8d 4008d2: b9 02 00 00 00 mov $0x2,%ecx 4008d7: ba 03 00 00 00 mov $0x3,%edx 4008dc: be 00 10 00 00 mov $0x1000,%esi 4008e1: e8 22 fe ff ff callq 400708 <mmap#plt> 4008e6: 48 83 f8 ff cmp $0xffffffffffffffff,%rax 4008ea: 48 89 c5 mov %rax,%rbp 4008ed: 0f 84 8e 00 00 00 je 400981 <main+0xf1> 4008f3: 48 8b 3d 0e 05 20 00 mov 0x20050e(%rip),%rdi # 600e08 <__TMC_END__> 4008fa: e8 b9 fd ff ff callq 4006b8 <_IO_getc#plt> 4008ff: be 00 10 00 00 mov $0x1000,%esi 400904: 48 89 ef mov %rbp,%rdi 400907: e8 dc fd ff ff callq 4006e8 <munmap#plt> 40090c: 85 c0 test %eax,%eax 40090e: 75 7a jne 40098a <main+0xfa> 400910: bf d1 0a 40 00 mov $0x400ad1,%edi 400915: e8 6e fd ff ff callq 400688 <puts#plt> 40091a: 48 8d 8d ff 0f 00 00 lea 0xfff(%rbp),%rcx 400921: 48 89 ea mov %rbp,%rdx 400924: 0f 1f 40 00 nopl 0x0(%rax) 400928: c6 02 00 movb $0x0,(%rdx) 40092b: 48 83 c2 01 add $0x1,%rdx 40092f: 48 39 ca cmp %rcx,%rdx 400932: 75 f4 jne 400928 <main+0x98> 400934: bf e9 0a 40 00 mov $0x400ae9,%edi 400939: e8 4a fd ff ff callq 400688 <puts#plt> 40093e: 48 8b 3d c3 04 20 00 mov 0x2004c3(%rip),%rdi # 600e08 <__TMC_END__> 400945: e8 6e fd ff ff callq 4006b8 <_IO_getc#plt>
You're invoking "undefined behavior." Anything could happen. You can't then complain that in one case you like the result more than in another case when both have undefined behavior. Just stop running invalid code.
Following code run successfully and message were printed without SIGSEGV. But if I uncommented the printf before munmap, a SIGSEGV were caught. This prima facie startling behavior has a simple explanation. A Segmentation fault is indeed to be expected when accessing unmapped Linux memory. It's just that without the printf before munmap, the printf("Corrupting mmap memory.\n") after munmap() constitutes the program's first use of stdout, and on that first use, the Linux/GNU C library allocates a stream buffer by means of mmap(), thereby re-mapping exactly the memory page that was unmapped just before, so p[i] can access the anew mapped memory without a fault.