OutOfMemory and Memory Fragmentation in SharePoint 2007 32 bit - sharepoint

for some weeks I struggling with the OutOfMemory issue on our SharePoint 2007 (published intranet with many customizations) WFEs (SP 2 and Win 2003 32 Bit Servers). After I received a crashed memory dump I found out that we have a memory fragmentation issue. For the dump analysis I use the following two tools: DiagDebug and Windbg with sos.dll.
Result: 96,74% Free Memory Fragmentation
DebugDiag (Memory Pressure Analyzers)
Virtual Memory Summary
Size of largest free VM block 23,63 MBytes
Free memory fragmentation 96,74%
Free Memory 725,52 MBytes (35,43% of Total Memory)
Reserved Memory 406,88 MBytes (19,87% of Total Memory)
Committed Memory 915,54 MBytes (44,71% of Total Memory)
Total Memory 2,00 GBytes
Largest free block at 0x00000000`4b0b0000
DebugDiag (SharePoint Analyzers)
Undisposed SPRequest objects: 9
Disposed SPRequest objects: 187
Undisposed SPWeb objects: 185
Disposed SPWeb objects: 34
Undisposed SPSite objects: 8
Disposed SPSite objects: 22
undisposed special purpose (AllowCleanupWhenThreadEnds = false) SPRequest object found at: 0x02320dd0.
undisposed special purpose (AllowCleanupWhenThreadEnds = false) SPRequest object found at: 0x4e8296f8.
undisposed special purpose (AllowCleanupWhenThreadEnds = false) SPRequest object found at: 0x4e869a20.
undisposed SPWeb object 0x02701168 references a disposed or invalid SPRequest object: 0x0270137c
undisposed SPWeb object 0x027013cc references a disposed or invalid SPRequest object: 0x027015e0
undisposed SPWeb object 0x02720824 references a disposed or invalid SPRequest object: 0x02720a20
undisposed SPWeb object 0x02d2aa74 references a disposed or invalid SPRequest object: 0x02d2ac70
...
Undisposed SPRequest Objects per managed Thread:
Thread ID: 6714, Undisposed SPRequest: 4
Thread ID: 4c68, Undisposed SPRequest: 3
Thread ID: 5e8c, Undisposed SPRequest: 1
Thread ID: 6180, Undisposed SPRequest: 1
So now I would like to understand what causes the memory fragmentation. Hope you can help me. These are the steps I did to get the right information.
Windbg with sos.dll
!address summary
-------------------- Usage SUMMARY --------------------------
TotSize ( KB) Pct(Tots) Pct(Busy) Usage
2c748000 ( 728352) : 34.73% 53.79% : RegionUsageIsVAD
2d584000 ( 742928) : 35.43% 00.00% : RegionUsageFree
f987000 ( 255516) : 12.18% 18.87% : RegionUsageImage
10fc000 ( 17392) : 00.83% 01.28% : RegionUsageStack
44000 ( 272) : 00.01% 00.02% : RegionUsageTeb
1585a000 ( 352616) : 16.81% 26.04% : RegionUsageHeap
0 ( 0) : 00.00% 00.00% : RegionUsagePageHeap
1000 ( 4) : 00.00% 00.00% : RegionUsagePeb
1000 ( 4) : 00.00% 00.00% : RegionUsageProcessParametrs
1000 ( 4) : 00.00% 00.00% : RegionUsageEnvironmentBlock
Tot: 7fff0000 (2097088 KB) Busy: 52a6c000 (1354160 KB)
-------------------- Type SUMMARY --------------------------
TotSize ( KB) Pct(Tots) Usage
2d584000 ( 742928) : 35.43% : <free>
154e3000 ( 349068) : 16.65% : MEM_IMAGE
127c000 ( 18928) : 00.90% : MEM_MAPPED
3c30d000 ( 986164) : 47.03% : MEM_PRIVATE
-------------------- State SUMMARY --------------------------
TotSize ( KB) Pct(Tots) Usage
3938b000 ( 937516) : 44.71% : MEM_COMMIT
2d584000 ( 742928) : 35.43% : MEM_FREE
196e1000 ( 416644) : 19.87% : MEM_RESERVE
Largest free region: Base 4b0b0000 - Size 017a0000 (24192 KB)
Seems that there is enough free memory (742928 KB) overall but the biggest free chunk has only 24192 KB. Again: Free Memory Fragmentation!
!threads
ThreadCount: 38
UnstartedThread: 0
BackgroundThread: 37
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
PreEmptive GC Alloc Lock
ID OSID ThreadOBJ State GC Context Domain Count APT Exception
14 1 5358 0010f718 1808220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Worker)
18 2 61a4 001118d0 b220 Enabled 00000000:00000000 000dd1b8 0 MTA (Finalizer)
19 3 6060 0012a3f8 80a220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Completion Port)
20 4 64c8 0012df90 1220 Enabled 00000000:00000000 000dd1b8 0 Ukn
12 5 57f4 00147e80 880a220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Completion Port)
23 7 6714 0eb89f08 180b220 Enabled 00000000:00000000 0012e6d0 1 MTA (Threadpool Worker)
24 8 66b8 0eb91970 180b220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Worker)
25 b 6320 0eb942f0 180b220 Disabled 00000000:00000000 0012e6d0 0 MTA (Threadpool Worker)
26 d 2004 0eb97120 180b220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Worker)
27 e 5bb0 0eb9a438 180b220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Worker)
28 f 61a8 0eb9dee8 380b220 Enabled 00000000:00000000 0012e6d0 1 MTA (Threadpool Worker)
29 14 3b88 0ebba688 180b220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Worker)
30 15 5d74 0ebc4840 380b220 Enabled 00000000:00000000 0012e6d0 1 MTA (Threadpool Worker)
31 16 422c 0ebc91b0 180b220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Worker)
32 18 6544 125242c8 180b220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Worker)
33 1a 4c68 12534bc8 180b220 Disabled 4e875ac4:4e875d30 0012e6d0 1 MTA (Threadpool Worker)
34 1b 66d4 12539c80 180b220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Worker)
35 1c 5e8c 12542e58 180b220 Enabled 00000000:00000000 0012e6d0 1 MTA (Threadpool Worker)
36 1d 62f0 1254be90 180b220 Enabled 4e875d84:4e877d30 0012e6d0 2 MTA (Threadpool Worker) System.OutOfMemoryException (4e875d3c)
39 1e 6558 0ec16d28 80a220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Completion Port)
40 1f 6180 0ec14b70 200b020 Enabled 00000000:00000000 0012e6d0 0 MTA
43 20 592c 0ebd7a00 220 Enabled 00000000:00000000 000dd1b8 0 MTA
45 21 624c 1261a060 220 Enabled 00000000:00000000 000dd1b8 0 MTA
8 22 5c78 125499f8 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
6 23 3c68 126b6e90 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
7 24 6458 36414400 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
44 25 5e60 36675440 220 Enabled 00000000:00000000 000dd1b8 0 MTA
5 26 55d8 364214a0 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
57 27 6534 36622948 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
58 28 59bc 0016f810 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
56 29 3ee0 250fa6d8 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
60 2a 63fc 252da068 200b220 Enabled 00000000:00000000 0012e6d0 0 MTA
59 2b 5fdc 24fc0be8 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
61 2c 4154 25052008 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
62 2d 60fc 250093a8 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
42 2e 1a38 1c99b5d0 220 Enabled 00000000:00000000 000dd1b8 0 MTA
63 13 59e0 0ebb5d48 220 Enabled 00000000:00000000 000dd1b8 0 Ukn
64 19 6420 0ebac6a0 880b220 Enabled 00000000:00000000 000dd1b8 0 MTA (Threadpool Completion Port)
!eeheap -gc
Number of GC Heaps: 2
------------------------------
Heap 0 (000e2418)
generation 0 starts at 0x432d0c54
generation 1 starts at 0x432c0038
generation 2 starts at 0x02060038
ephemeral segment allocation context: none
segment begin allocated size
02060000 02060038 03d0234c 0x01ca2314(30024468)
432c0000 432c0038 441e82f4 0x00f282bc(15893180)
Large object heap starts at 0x0a060038
segment begin allocated size
0a060000 0a060038 0bd57530 0x01cf74f8(30373112)
5b920000 5b920038 5c9bf338 0x0109f300(17429248)
Heap Size 0x5960dc8(93720008)
------------------------------
Heap 1 (00110750)
generation 0 starts at 0x4e869d30
generation 1 starts at 0x4e820038
generation 2 starts at 0x06060038
ephemeral segment allocation context: none
segment begin allocated size
06060000 06060038 07a0af98 0x019aaf60(26914656)
4e820000 4e820038 4e8eaf38 0x000caf00(831232)
Large object heap starts at 0x0c060038
segment begin allocated size
0c060000 0c060038 0c9ec998 0x0098c960(10013024)
6e020000 6e020038 6f90f0a8 0x018ef070(26144880)
Heap Size 0x3cf1830(63903792)
------------------------------
GC Heap Size 0x96525f8(157623800)
!dumpheap (heap1 extract)
0a060038 000e1a98 16 Free
0a060048 793042f4 4096
0a061048 000e1a98 16 Free
0a061058 793042f4 528
0a061268 000e1a98 16 Free
0a061278 793042f4 4096
0a062278 000e1a98 16 Free
0a062288 793042f4 5112
0a063680 000e1a98 16 Free
0a063690 793042f4 4096
0a064690 000e1a98 16 Free
0a0646a0 793042f4 4096
0a0656a0 000e1a98 16 Free
0a0656b0 793042f4 5112
0a066aa8 000e1a98 16 Free
0a066ab8 793042f4 4096
0a067ab8 000e1a98 16 Free
0a067ac8 793042f4 4096
0a068ac8 000e1a98 16 Free
0a068ad8 793042f4 4096
0a069ad8 793042f4 528
0a069ce8 000e1a98 16 Free
0a069cf8 793042f4 528
0a069f08 793042f4 528
0a06a118 000e1a98 16 Free
0a06a128 793042f4 528
0a06a338 000e1a98 260096 Free
0a0a9b38 793042f4 4096
0a0aab38 000e1a98 16 Free
0a0aab48 793042f4 5784
0a0ac1e0 000e1a98 16 Free
0a0ac1f0 793042f4 4096
0a0ad1f0 000e1a98 16 Free
0a0ad200 793042f4 528
0a0ad410 000e1a98 16 Free
0a0ad420 793042f4 4096
0a0ae420 000e1a98 16 Free
0a0ae430 793042f4 528
0a0ae640 000e1a98 16 Free
0a0ae650 793042f4 4096
0a0af650 000e1a98 16 Free
0a0af660 793042f4 528
0a0af870 000e1a98 16 Free
0a0af880 793042f4 528
0a0afa90 000e1a98 131120 Free
0a0cfac0 793042f4 528
0a0cfcd0 000e1a98 16 Free
0a0cfce0 793042f4 4096
0a0d0ce0 000e1a98 16 Free
0a0d0cf0 793042f4 528
0a0d0f00 000e1a98 16 Free
0a0d0f10 793042f4 528
0a0d1120 000e1a98 16 Free
0a0d1130 793042f4 528
0a0d1340 000e1a98 16 Free
0a0d1350 793042f4 4096
0a0d2350 000e1a98 16 Free
0a0d2360 793042f4 5784
0a0d39f8 000e1a98 16 Free
0a0d3a08 793042f4 4096
0a0d4a08 000e1a98 348200 Free
0a129a30 793042f4 528
0a129c40 000e1a98 16 Free
0a129c50 793042f4 528
0a129e60 000e1a98 361224 Free
0a182168 793042f4 528
0a182378 000e1a98 16 Free
0a182388 793042f4 7016
0a183ef0 000e1a98 16 Free
0a183f00 793042f4 7016
...
63859d80 14762 413336 System.Xml.XmlElement
6385a090 12103 435708 System.Xml.XmlName
79332b54 21020 504480 System.Collections.ArrayList
6385798c 32932 658640 System.Xml.NameTable+Entry
6385c76c 35215 704300 System.Xml.XmlAttribute
79331754 505 706416 System.Char[]
7932dd5c 12751 714056 System.Reflection.RuntimePropertyInfo
6385a284 36665 733300 System.Xml.XmlText
79332cc0 5530 791644 System.Int32[]
7932fde0 22824 1278144 System.Reflection.RuntimeMethodInfo
79333274 6758 1733808 System.Collections.Hashtable+bucket[]
793042f4 54360 5051132 System.Object[]
79333594 4772 29304312 System.Byte[]
79330b24 225539 33121896 System.String
000e1a98 239 72089072 Free
Total 711343 objects
Fragmented blocks larger than 0.5 MB:
Addr Size Followed by 4331b1d8 14.8MB 441e8270 System.Threading.Overlapped
I looked inside some of the addresses between the "Free" segments but unfortunately I can't find any information about the source that coused the issue.
!do 0a182388
Name: System.Object[]
MethodTable: 793042f4
EEClass: 790eda64
Size: 7012(0x1b64) bytes
Array: Rank 1, Number of elements 1749, Type CLASS
Element Type: System.Object
Fields:
None
!gcroot 0a182388
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 14 OSTHread 5358
Scan Thread 18 OSTHread 61a4
Scan Thread 19 OSTHread 6060
Scan Thread 20 OSTHread 64c8
Scan Thread 12 OSTHread 57f4
Scan Thread 23 OSTHread 6714
Scan Thread 24 OSTHread 66b8
Scan Thread 25 OSTHread 6320
Scan Thread 26 OSTHread 2004
Scan Thread 27 OSTHread 5bb0
Scan Thread 28 OSTHread 61a8
Scan Thread 29 OSTHread 3b88
Scan Thread 30 OSTHread 5d74
Scan Thread 31 OSTHread 422c
Scan Thread 32 OSTHread 6544
Scan Thread 33 OSTHread 4c68
Scan Thread 34 OSTHread 66d4
Scan Thread 35 OSTHread 5e8c
Scan Thread 36 OSTHread 62f0
Scan Thread 39 OSTHread 6558
Scan Thread 40 OSTHread 6180
Scan Thread 43 OSTHread 592c
Scan Thread 45 OSTHread 624c
Scan Thread 8 OSTHread 5c78
Scan Thread 6 OSTHread 3c68
Scan Thread 7 OSTHread 6458
Scan Thread 44 OSTHread 5e60
Scan Thread 5 OSTHread 55d8
Scan Thread 57 OSTHread 6534
Scan Thread 58 OSTHread 59bc
Scan Thread 56 OSTHread 3ee0
Scan Thread 60 OSTHread 63fc
Scan Thread 59 OSTHread 5fdc
Scan Thread 61 OSTHread 4154
Scan Thread 62 OSTHread 60fc
Scan Thread 42 OSTHread 1a38
Scan Thread 63 OSTHread 59e0
Scan Thread 64 OSTHread 6420
DOMAIN(0012E6D0):HANDLE(Pinned):e4613d4:Root:0a182388(System.Object[])
!gcroot 0a129a30
...
Scan Thread 64 OSTHread 6420
DOMAIN(000DD1B8):HANDLE(Pinned):1fc11b8:Root:0a129a30(System.Object[])
!gcroot 0a061278
...
DOMAIN(000DD1B8):HANDLE(Pinned):1fb13f0:Root:0a061278(System.Object[])
!gchandles
GC Handle Statistics:
Strong Handles: 1007
Pinned Handles: 474
Async Pinned Handles: 6
Ref Count Handles: 5
Weak Long Handles: 681
Weak Short Handles: 56
Other Handles: 0
...
661485ec 68 2176 System.Web.NativeFileChangeNotification
66153774 93 2976 System.Web.Hosting.ISAPIAsyncCompletionCallback
793310f8 68 3808 System.Threading.Thread
793141f0 162 6480 System.Reflection.Emit.DynamicResolver
79332070 279 6696 System.Reflection.Assembly
7932f19c 228 10944 System.Reflection.Module
793327e8 328 11808 System.Security.PermissionSet
7932f25c 386 29336 System.RuntimeType+RuntimeTypeCache
793042f4 185 294456 System.Object[]
In the DebugDiag SharePoint Analysis some undisposed SPWeb objects were reported.
So trying to find the cause here...
Report: "undisposed SPWeb object 0x02701168 references a disposed or invalid SPRequest object: 0x0270137c"
!do 0x02701168
Name: Microsoft.SharePoint.SPWeb
MethodTable: 1325ed80
EEClass: 1669cd80
Size: 508(0x1fc) bytes
(C:\WINDOWS\assembly\GAC_MSIL\Microsoft.SharePoint\12.0.0.0__71e9bce111e9429c\Microsoft.SharePoint.dll)
!gcroot 0x02701168
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 14 OSTHread 5358
Scan Thread 18 OSTHread 61a4
Scan Thread 19 OSTHread 6060
Scan Thread 20 OSTHread 64c8
Scan Thread 12 OSTHread 57f4
Scan Thread 23 OSTHread 6714
ESP:efbe92c:Root:07a2cbc8(System.Collections.Hashtable+bucket[])->
023f1044(Microsoft.SharePoint.Publishing.CacheManager)->
023f3df4(Microsoft.SharePoint.Publishing.CachedObjectFactory)->
023f3e7c(Microsoft.SharePoint.Publishing.WssObjectCache)->
023f3f30(System.Collections.Hashtable)->
03b0f448(System.Collections.Hashtable+bucket[])->
0733b918(Microsoft.SharePoint.Publishing.ThreadSafeCache`2+CacheEntry`2[[System.String, mscorlib],[Microsoft.SharePoint.Publishing.CachedObjectWrapper, Microsoft.SharePoint.Publishing],[System.String, mscorlib],[Microsoft.SharePoint.Publishing.CachedObjectWrapper, Microsoft.SharePoint.Publishing]])->
0733b868(Microsoft.SharePoint.Publishing.CachedObjectWrapper)->
035f25b4(Microsoft.SharePoint.Publishing.CachedPage)->
035f2718(System.Collections.Generic.Dictionary`2[[Microsoft.SharePoint.Publishing.Navigation.PortalSiteMapProvider, Microsoft.SharePoint.Publishing],[Microsoft.SharePoint.Publishing.Navigation.PortalSiteMapNode, Microsoft.SharePoint.Publishing]])->
0733ba90(System.Collections.Generic.Dictionary`2+Entry[[Microsoft.SharePoint.Publishing.Navigation.PortalSiteMapProvider, Microsoft.SharePoint.Publishing],[Microsoft.SharePoint.Publishing.Navigation.PortalSiteMapNode, Microsoft.SharePoint.Publishing]][])->
0733b92c(Microsoft.SharePoint.Publishing.Navigation.PortalListItemSiteMapNode)->
069b2f88(Microsoft.SharePoint.Publishing.Navigation.PortalWebSiteMapNode)->
069bfee4(System.Collections.Generic.Dictionary`2[[System.Guid, mscorlib],[Microsoft.SharePoint.Publishing.Navigation.ProxySiteMapNode, Microsoft.SharePoint.Publishing]])->
069d8fe8(System.Collections.Generic.Dictionary`2+Entry[[System.Guid, mscorlib],[Microsoft.SharePoint.Publishing.Navigation.ProxySiteMapNode, Microsoft.SharePoint.Publishing]][])->
069e0934(Microsoft.SharePoint.Publishing.Navigation.ProxySiteMapNode)->
069e04a8(Microsoft.SharePoint.Navigation.SPNavigationNode)->
069bfe28(Microsoft.SharePoint.Navigation.SPNavigation)->
069bf7dc(Microsoft.SharePoint.SPWeb)->
02700dcc(Microsoft.SharePoint.SPSite)->
02701364(System.Collections.Generic.List`1[[Microsoft.SharePoint.SPWeb, Microsoft.SharePoint]])->
The MS disposed checker didn't find any issues too.
So now I don't know how to proceed further to find the (custom) component that causes the memory fragmentation. I hope that someone you could give me some hints, tool suggestion or check list of components that may cause the fragmentation (Antivirus, caching etc). The problem ossurs only in the prod environment and the only thing that we do now is iisreset - sometime 5 times a day…
Thank you in advance and best regards,
Anton

Your crash logs might contain the faulting object but more likely the assembly that's executing is changing with every crash and seem random. They might just be the innocent bystander that got left holding the bag when all the memory was gone.
First - can't you configure the app pool to automatically recycle when a certain memory threshold is reached? This might help alleviate your need to constantly monitor and be ready for an IISRESET. Otherwise you might want to schedule regular recycles to keep the memory tidy for the time being.
Next, try to identify when the crashes began and check your deployment logs to see what was installed. (You DO keep logs of software package installs, right?)
Are the custom components developed in-house? You can punt some of the work initially by having the developers check all their projects which have been deployed with the SharePoint Dispose Checker Tool (is this what you were referring to at the end of your question?) Un-disposed SPWeb and SPSite objects seem to be the biggest cause of this fragmentation.
Another avenue to explore is this MSDN question I ran into while looking for something else. It appears the Navigation bar on a publishing page was to blame. There is a hotfix for that issue but you have to request it directly from Microsoft.
I've been developing for SharePoint for a long time but it's always been someone else's job to find these problems! These tidbits are what I've gleaned over time and hopefully something will be useful.

Related

NTLM authentication fails with .NET 6 (LDAP error 53), succeeds with .NET 4.7.2

In the client we are using the HttpClient with the UseDefaultCredentials option to authenticate against a Node.js server running express-nltm. The authentication is done using NTLM and express-nltm is communicating with an Active Directory server over LDAPS.
The client is compiled against .NET Standard 2.0. If .NET 4.7.2 is used as runtime everything works fine. However, if the same assembly is executed with .NET 6.0.4 the authentication fails since the the Active Directory server returns the error code 53 - unwilling to perform.
The authentication fails at the last step of the NTLM flow. It may be relevant that the first 58 bytes of the Authorization header sent by the client are equal for .NET 4.7.2 and .NET 6 except for bytes 51 and 53, so something seems to be done differently by .NET 6 in comparison to .NET 4.7.2. Additionally, the issue only occurs if the communication to the Active Directory is done over LDAPS. It works fine in case LDAP is used.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
.NET 4.7.2
78
84
76
77
83
83
80
0
3
0
0
0
24
0
24
0
130
0
0
0
24
0
24
0
.NET 6
78
84
76
77
83
83
80
0
3
0
0
0
24
0
24
0
130
0
0
0
24
0
24
0
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
.NET 4.7.2
154
0
0
0
20
0
20
0
88
0
0
0
6
0
108
0
0
0
16
0
16
0
114
0
.NET 6
154
0
0
0
20
0
20
0
88
0
0
0
6
0
108
0
0
0
16
0
16
0
114
0
49
50
51
52
53
54
55
56
57
58
.NET 4.7.2
0
0
0
0
0
0
178
0
0
0
.NET 6
0
0
16
0
16
0
178
0
0
0
So the questions would be
Has anything changed in the NTLM implementation between .NET 4.7.2 and .NET 6?
What is the significance of bytes 51 and 53 in the Authorization header?

Understand output of vmstat memory utilization

I have a solaris box and im trying to know whether its running out of memory or if its stable.
below is the output of vmstat.
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr vc vc vc vc in sy cs us sy id
1 0 0 11426696 4603520 613 1477 449 6 6 0 0 78 22 28 29 8970 37714 22961 43 6 51
4 0 0 4975280 0 1747 3487 805 0 0 0 0 233 41 33 44 9558 53713 15845 74 8 18
4 0 0 4936944 0 933 1837 0 0 0 0 0 56 28 12 39 9317 46898 14648 82 7 11
5 0 0 4943080 0 1056 2806 805 0 0 0 0 103 21 18 18 9286 46900 14866 78 8 14
5 0 0 4942264 0 1088 2173 804 6 6 0 0 109 8 40 31 9927 56484 16495 84 8 8
3 0 0 4942520 0 308 1018 1756 3 3 0 0 166 87 29 44 10638 64146 21413 83 9 8
0 0 0 4942512 0 156 326 1740 0 0 0 0 370 12 33 52 11554 40375 21897 75 9 16
2 0 0 4947384 0 294 560 845 0 0 0 0 121 18 23 20 9445 52382 17016 77 6 17
I can see the free column shows 0 however the sr column also shows 0
And output from top command doesn't show how much free memory available. Swap shows 0.0%
load averages: 11.4, 9.12, 9.24;
9021 processes: 9018 sleeping, 1 running, 2 on cpu
CPU states: 0.0% idle, 71.4% user, 28.6% kernel, 0.0% iowait, 0.0% swap
Memory: 24G phys mem, 16G total swap, 13G free swap
Am i running out of RAM?
Please suggest how to interpret this data. Do i need to increase my physical memory?
Appreciate some insights.
From the Solaris 11.4 vmstat man page, there's one important thing to note:
Without options, vmstat displays a one-line summary of the virtual memory activity since the system was booted.
That also applies to the first line of output from Solaris vmstat: it's a summary of all activity since the system was booted.
A good description of the output fields is found in the EXAMPLES section of the Solaris man vmstat page:
Examples
Example 1 Using vmstat
The following command displays a summary of what the system is doing
every five seconds.
example% vmstat 5
kthr memory page disk faults cpu
r b w swap free re mf pi p fr de sr s0 s1 s2 s3 in sy cs us sy id
0 0 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82
0 0 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62
0 0 1 10132 4616 0 0 20 0 0 0 0 0 19 0 0 150 172 146 3 33 64
0 0 1 10132 5292 0 0 9 0 0 0 0 0 21 0 0 165 105 130 1 21 78
1 1 1 10132 5496 0 0 5 0 0 0 0 0 23 0 0 183 92 134 1 20 79
1 0 1 10132 5564 0 0 25 0 0 0 0 0 18 0 0 131 231 116 4 34 62
1 0 1 10124 5412 0 0 37 0 0 0 0 0 22 0 0 166 179 118 1 33 67
1 0 1 10124 5236 0 0 24 0 0 0 0 0 14 0 0 109 243 113 4 56 39
example%
The fields of vmstat's display are
kthr
Report the number of kernel threads in each of the three following
states:
r
the number of kernel threads in run queue
b
the number of blocked kernel threads that are waiting for
resources I/O, paging, and so forth
w
the number of swapped out lightweight processes (LWPs) that
are waiting for processing resources to finish.
memory
Report on usage of virtual and real memory.
swap
available swap space (Kbytes)
free
size of the free list (Kbytes)
page
Report information about page faults and paging activity. The
information on each of the following activities is given in units per
second.
re
page reclaims — but see the –S option for how this field is modified.
mf
minor faults — but see the –S option for how this field is modified.
pi
kilobytes paged in
po
kilobytes paged out
fr
kilobytes freed
de
anticipated short-term memory shortfall (Kbytes)
sr
pages scanned by clock algorithm
When executed in a zone and if the pools facility is active, all of
the above (except for ‘de’) only report activity on the processors in
the processor set of the zone's pool.
disk
Report the number of disk operations per second. There are slots for
up to four disks, labeled with a single letter and number. The letter
indicates the type of disk (s = SCSI, i = IPI, and so forth); the
number is the logical unit number.
faults
Report the trap/interrupt rates (per second).
in
interrupts
sy
system calls
cs
CPU context switches
When executed in a zone and if the pools facility is active, all of
the above only report activity on the processors in the processor set
of the zone's pool.
cpu
Give a breakdown of percentage usage of CPU time. On MP systems, this
is an average across all processors.
us
user time
sy
system time
id
idle time
When executed in a zone and if the pools facility is active, all of
the above only report activity on the processors in the processor set
of the zone's pool.
This can help you https://www.howtogeek.com/424334/how-to-use-the-vmstat-command-on-linux/. There is explanation of those shorts.
Memory
swpd: the amount of virtual memory used. In other words, how much memory has been swapped out.,
free: the amount of idle (currently unused) memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
Swap
si: Amount of virtual memory swapped in from swap space.
so: Amount of virtual memory swapped out to swap space.
IO
bi: Blocks received from a block device. The number of data blocks used to swap virtual memory back into RAM.
bo: Blocks sent to a block device. The number of data blocks used to swap virtual memory out of RAM and into swap space.
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second. A context switch is when the kernel swaps from system mode processing into user mode processing.
"0" is not a valid free memory value.
By design, Solaris always makes sure a minimal amount of free memory is available. The fact the sr column is also equals to zero suggests there is no memory shortage. In any case, you wouldn't have been able to run vmstat or top in the first place with such an extreme RAM shortage.
You should investigate further to understand why the free memory is reported a zero. mdb's ::memstat command would be a good start:
# echo "::memstat" | mdb -k

Large number of dead threads in .Net memory dump

during the analysis of a memory dump for a .Net4.5 WCF w3wp process, I encountered many threads identified as dead. !threads shows 68 out of 107 threads are dead which appear to be quite high. I was wondering if these threads could hold large amount of memory since the process eventually goes as high as 20GB+ and seem to never go down.
How can I inspect such threads and see the objects/memory held by these? Is it normal to have so many?
0:000> !threads
ThreadCount: 107
UnstartedThread: 0
BackgroundThread: 35
PendingThread: 0
DeadThread: 68
Hosted Runtime: no
ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception
7 1 16fc 0000009d253a36e0 28220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
14 2 a64 000000a1702d7560 2b220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Finalizer)
XXXX 3 0 000000a1702f9390 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Completion Port)
XXXX 4 0 000000a1702fa270 8038820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
16 6 21c8 000000a17031f310 102a220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
17 7 2af4 000000a170327ef0 21220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
19 9 1b50 000000a1703cccd0 1020220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
21 10 85c 000000a170416570 202b020 Preemptive 000000A0945502B8:000000A094550FD0 000000a1703360c0 0 MTA
25 11 13cc 000000a1711823f0 202b020 Preemptive 000000A094554D60:000000A094554FD0 000000a1703360c0 0 MTA
26 12 2044 000000a1711921d0 3029220 Preemptive 0000000000000000:0000000000000000 000000a1703360c0 0 MTA (Threadpool Worker)
XXXX 16 0 000000a17128a690 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Completion Port)
XXXX 17 0 000000a1712bd610 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Completion Port)
XXXX 18 0 000000a1712c5e30 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Completion Port)
XXXX 19 0 000000a1712c4e90 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Completion Port)
2 20 8a4 000000a1712c6600 20220 Preemptive 0000009E8B81C238:0000009E8B81DFD0 0000009d25385d70 0 Ukn
18 21 28f8 000000a1712c3720 20220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
22 22 bfc 000000a1712c3ef0 20220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
20 23 257c 000000a1712c5660 20220 Preemptive 000000A09457AC30:000000A09457AFD0 0000009d25385d70 0 Ukn
23 24 13e0 000000a1712c6dd0 20220 Preemptive 0000009F87F0B5C8:0000009F87F0CFD0 0000009d25385d70 0 Ukn
XXXX 26 0 000000a1713d8fb0 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
28 27 2aac 000000a1713dbe90 a029220 Preemptive 0000000000000000:0000000000000000 000000a1703360c0 0 MTA (Threadpool Completion Port)
XXXX 29 0 000000a1713dc660 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Completion Port)
29 30 284c 000000a1713d9f50 202b220 Preemptive 0000000000000000:0000000000000000 000000a1703360c0 0 MTA
XXXX 31 0 000000a1713da720 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 32 0 000000a1713db6c0 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Completion Port)
XXXX 33 0 000000a174347600 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 34 0 000000a174344720 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 35 0 000000a174345e90 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 36 0 000000a174346660 39820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
XXXX 37 0 000000a174346e30 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 38 0 000000a1743456c0 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 39 0 000000a1741b9d10 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 40 0 000000a1741bc420 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 41 0 000000a1741bcbf0 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 42 0 000000a1741ba4e0 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 43 0 000000a1741be360 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
3 44 1e94 000000a1741bd3c0 20220 Preemptive 0000009F87E511F8:0000009F87E52FD0 0000009d25385d70 0 Ukn
XXXX 45 0 000000a1741bdb90 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
35 46 12dc 000000a1741bacb0 20220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA
XXXX 47 0 000000a1741beb30 30820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
XXXX 48 0 000000a1741bf300 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 49 0 000000a171171f40 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
36 50 2bb4 000000a171173e80 202b020 Preemptive 0000000000000000:0000000000000000 000000a1703360c0 0 MTA
37 51 9e4 000000a171177530 202b020 Preemptive 000000A0945528D0:000000A094552FD0 000000a1703360c0 0 MTA
39 53 6d0 000000a171174e20 21220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
40 54 f34 000000a171172ee0 21220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
41 55 f74 000000a1711755f0 21220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
42 56 2198 000000a171174650 21220 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn
XXXX 57 0 000000a171175dc0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 60 0 000000a171176590 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 62 0 000000a171177d00 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 64 0 000000a171178ca0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 65 0 000000a1741bfad0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 70 0 000000a174344ef0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 71 0 000000a1713d9780 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 69 0 000000a171171770 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 68 0 000000a1711736b0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 67 0 000000a171172710 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 66 0 000000a171176d60 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 59 0 000000a1711784d0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 58 0 000000a1741bbc50 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 63 0 000000a1741c1240 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 61 0 000000a1741c02a0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 28 0 000000a1741c0a70 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 25 0 000000a1712c46c0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 15 0 000000a1713daef0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 14 0 000000a174347dd0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 13 0 000000a16744b400 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 52 0 000000a167448520 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 8 0 000000a16744bbd0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 72 0 000000a16744ac30 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Completion Port)
XXXX 73 0 000000a16744a460 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 74 0 000000a171268f50 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 75 0 000000a1712658a0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 76 0 000000a171269720 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 77 0 000000a171266070 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 78 0 000000a1712677e0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 79 0 000000a171269ef0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 80 0 000000a171266840 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 81 0 000000a17126a6c0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 82 0 000000a171267010 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 83 0 000000a17126ae90 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
XXXX 5 0 000000a171268780 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Completion Port)
43 84 dcc 000000a17126b660 8029220 Preemptive 0000009D9D1B3B88:0000009D9D1B3FD0 0000009d25385d70 0 MTA (Threadpool Completion Port)
XXXX 85 0 000000a171267fb0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 86 0 000000a17126be30 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
46 87 1e54 000000a17126c600 1029220 Preemptive 000000A094575068:000000A094576FD0 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 88 0 000000a17126cdd0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
45 89 1db8 000000a16744c3a0 1029220 Preemptive 000000A094577250:000000A094578FD0 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 90 0 000000a167448cf0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
XXXX 91 0 000000a16744cb70 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 92 0 000000a1674494c0 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 93 0 000000a16744d340 1039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Worker)
50 94 15a4 000000a16744db10 1029220 Preemptive 000000A09456AF80:000000A09456AFD0 0000009d25385d70 0 MTA (Threadpool Worker)
47 95 29c8 000000a167449c90 1029220 Preemptive 000000A094573D08:000000A094574FD0 0000009d25385d70 0 MTA (Threadpool Worker)
48 96 28c4 000000a16744e2e0 1029220 Preemptive 000000A094548ED8:000000A094548FD0 0000009d25385d70 0 MTA (Threadpool Worker)
49 97 69c 000000a16744eab0 1029220 Preemptive 0000009D9D1863F0:0000009D9D187FD0 0000009d25385d70 0 MTA (Threadpool Worker)
XXXX 98 0 000000a16744fa50 8039820 Preemptive 0000000000000000:0000000000000000 0000009d25385d70 0 Ukn (Threadpool Completion Port)
51 99 2bac 000000a16744f280 8029220 Preemptive 0000009F87F32660:0000009F87F32FD0 0000009d25385d70 0 MTA (Threadpool Completion Port)
52 101 c40 000000a174599040 1029220 Preemptive 0000009D9D178538:0000009D9D179FD0 0000009d25385d70 0 MTA (Threadpool Worker)
54 102 1e5c 000000a174598870 1029220 Preemptive 0000009F87F51578:0000009F87F52FD0 0000009d25385d70 0 MTA (Threadpool Worker)
56 103 2b68 000000a174596930 1029220 Preemptive 0000009D9D188E70:0000009D9D189FD0 0000009d25385d70 0 MTA (Threadpool Worker)
55 104 2924 000000a174595990 1029220 Preemptive 0000009D9D18C290:0000009D9D18DFD0 0000009d25385d70 0 MTA (Threadpool Worker)
53 105 2f0 000000a174599810 1029220 Preemptive 0000009E8B89EFD0:0000009E8B89FFD0 0000009d25385d70 0 MTA (Threadpool Worker)
57 106 f5c 000000a174596160 1029220 Preemptive 0000009E8B894828:0000009E8B895FD0 0000009d25385d70 0 MTA (Threadpool Worker)
58 107 20c 000000a174599fe0 1029220 Preemptive 0000009F87F53258:0000009F87F54FD0 0000009d25385d70 0 MTA (Threadpool Worker)
60 100 1f60 000000a17459a7b0 8029220 Preemptive 0000009F87F7B1A8:0000009F87F7CFD0 0000009d25385d70 0 MTA (Threadpool Completion Port)
I was wondering if these threads could hold large amount of memory
Remember the following rule: a process provides memory, a thread consumes CPU time. The inverse is also true: a process does not run and a thread does not hold memory. If someone says "my process still runs", that's a simplification of the sentence "my process has at least one thread that still runs".
A dead thread (marked with XXXX) means that there is a .NET Thread object in memory and the "real" thread (the kernel object maintained by the operating system) is gone.
The following is an MCVE for that situation:
using System;
using System.Collections.Generic;
using System.Threading;
namespace DeadThreadExample
{
class Program
{
static List<Thread> AllThreadsIEverStarted = new List<Thread>();
static void Main()
{
for(int i=0; i<1000; i++)
{
Thread t = new Thread(DoNothing);
t.Start();
AllThreadsIEverStarted.Add(t);
t.Join();
}
Console.WriteLine("There should be 1000 dead threads now. Debug it with WinDbg and SOS !threads");
Console.ReadLine();
}
private static void DoNothing()
{
// Just nothing
}
}
}
The debugging session is:
0:006> !threads
PDB symbol for clr.dll not loaded
ThreadCount: 1002
UnstartedThread: 0
BackgroundThread: 1
PendingThread: 0
DeadThread: 1000
Hosted Runtime: no
[...]
could hold large amount of memory
0:006> !dumpheap -stat
Statistics:
MT Count TotalSize Class Name
[...]
53dde9b0 1000 20000 System.Threading.ThreadHelper
53d66bf0 1000 44000 System.Threading.ExecutionContext
53d62e10 1001 52052 System.Threading.Thread
53dad5cc 2000 64000 System.Threading.ThreadStart
So, yes, there is a "memory leak", if you call the static collection a leak. Maybe it's not a leak, because you need that information at some point in time. Once the collection is cleared, it's no longer a leak.
1000 dead threads is equivalent to ~180 kB "memory leak". I wouldn't call that "large amount". Even if you pass an object as an argument (use ParameterizedThreadStart), it seems that the property m_ThreadStartArg of the Thread object is not set, so I can hardly see how a larger amount of memory would be leaked.
If you don't like that situation, use a memory profiler and find out which GC roots still has a reference to those threads.
Is it normal to have so many?
Maybe you were just unlucky. They might all be gone with the next garbage collection.
How can I inspect such threads and see the objects/memory held by these?
Use !dumpheap -stat -type, then dumpheap -mt and then !do:
0:006> !dumpheap -stat -type Thread
Statistics:
MT Count TotalSize Class Name
[...]
53d62e10 1001 52052 System.Threading.Thread
0:006> !dumpheap -mt 53d62e10
Address MT Size
02ec247c 53d62e10 52
02ec2504 53d62e10 52
[...]
Statistics:
MT Count TotalSize Class Name
53d62e10 1001 52052 System.Threading.Thread
Total 1001 objects
0:006> !do 02ec247c
Name: System.Threading.Thread
MethodTable: 53d62e10
EEClass: 53e679a4
Size: 52(0x34) bytes
File: C:\WINDOWS\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
MT Field Offset Type VT Attr Value Name
53d6cd68 400192d 4 ....Contexts.Context 0 instance 00000000 m_Context
53d66bf0 400192e 8 ....ExecutionContext 0 instance 00000000 m_ExecutionContext
53d624e4 400192f c System.String 0 instance 00000000 m_Name
53d63c70 4001930 10 System.Delegate 0 instance 00000000 m_Delegate
53d65074 4001931 14 ...ation.CultureInfo 0 instance 00000000 m_CurrentCulture
53d65074 4001932 18 ...ation.CultureInfo 0 instance 00000000 m_CurrentUICulture
53d62734 4001933 1c System.Object 0 instance 00000000 m_ThreadStartArg
53d67b18 4001934 20 System.IntPtr 1 instance 11519f8 DONT_USE_InternalThread
53d642a8 4001935 24 System.Int32 1 instance 2 m_Priority
53d642a8 4001936 28 System.Int32 1 instance 3 m_ManagedThreadId
53d6878c 4001937 2c System.Boolean 1 instance 0 m_ExecutionContextBelongsToOuterScope
[ ... static ... ]

Crash in MetalContext with Xamarin

We have a crash in MetalContext with over 3000 crashes in a single day on iOS 10 devices only. We are unable to track down the root of the error and google reveals nobody else have the same issue. Is there anyone who has any clues to where we can begin to look?
CRASH_INFO_ENTRY_0
Assertion failed: (_mcimpl->device == [_mcimpl->queue device]), function MetalContext, file /BuildRoot/Library/Caches/com.apple.xbs/Sources/QuartzCore/QuartzCore-449.40.9/LayerKit/ogl/ogl-metal.mm, line 1005.
tid_403
0 libsystem_kernel.dylib 0x18d19e8e8 __ulock_wait + 8
1 libdispatch.dylib 0x18d06c0d8 _dispatch_ulock_wait + 48
2 libdispatch.dylib 0x18d06c200 _dispatch_thread_event_wait_slow + 36
3 libdispatch.dylib 0x18d069df8 _dispatch_barrier_sync_f_slow + 236
4 QuartzCore 0x1913e7090 CABackingStoreGetFrontTexture(CABackingStore*) + 92
5 QuartzCore 0x1913e7118 CABackingStorePrepareFrontTexture + 64
6 QuartzCore 0x1914db0a4 CA::Layer::prepare_commit(CA::Transaction*) + 320
7 QuartzCore 0x1914577f8 CA::Context::commit_transaction(CA::Transaction*) + 264
8 QuartzCore 0x19147ec58 CA::Transaction::commit() + 512
9 QuartzCore 0x19147f678 CA::Transaction::observer_callback(__CFRunLoopObserver*, unsigned long, void*) + 120
10 CoreFoundation 0x18e17b7dc __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 32
11 CoreFoundation 0x18e17940c __CFRunLoopDoObservers + 372
12 CoreFoundation 0x18e17989c __CFRunLoopRun + 1024
13 CoreFoundation 0x18e0a8048 CFRunLoopRunSpecific + 444
14 GraphicsServices 0x18fb2e198 GSEventRunModal + 180
15 UIKit 0x1940942fc -[UIApplication _run] + 684
16 UIKit 0x19408f034 UIApplicationMain + 208
17 NDC2010 0x100a321d4 wrapper_managed_to_native_UIKit_UIApplication_UIApplicationMain_int_string___intptr_intptr (<unknown>:1)
18 NDC2010 0x1009ae138 Xamarin_iOS_UIKit_UIApplication_Main_string___string_string (UIApplication.cs:63)
19 NDC2010 0x100526288 NDC2010_NDC2010_NDC2010Application_Main_string__ + 28620
20 NDC2010 0x100921ba4 wrapper_runtime_invoke_object_runtime_invoke_dynamic_intptr_intptr_intptr_intptr + 4204776
21 Mono 0x1019aa4e8 mono_jit_runtime_invoke + 1772
22 Mono 0x101a1ad64 do_runtime_invoke + 112
23 Mono 0x101a1d348 mono_runtime_exec_main + 832
24 Mono 0x101a1cf64 mono_runtime_run_main + 764
25 Mono 0x10198eb04 mono_jit_exec + 236
26 NDC2010 0x10051ed3c xamarin_main (monotouch-main.m:487)
27 NDC2010 0x101291970 main (main.arm64.m:133)
28 libdispatch.dylib 0x18d08c5b8 (Missing)
My app kept crashing with the same log, what fixed for me was a quick reset of device, i traced it to webView usage. I cant reset device every time so further testing revealed that i was using xcode 7.3 to build for ios10 that was the problem, after updating to xcode 8 and rebuilding the app i no longer get this crash. Hope this helps

In a linux(CentOS)/multiprocessor setting, how to assign CPU cores to NUMA nodes?

I am working on a quad Operton 6272 system with CentOS installed on it. I suspect there is something wrong with the NUMA configuration.
When I run numactl --hardware I get:
available: 5 nodes (0,2-4,6)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 32765 MB
node 0 free: 31145 MB
node 2 cpus: 16 17 18 19 20 21 22 23
node 2 size: 16384 MB
node 2 free: 15501 MB
node 3 cpus: 24 25 26 27 28 29 30 31 40 41 42 43 44 45 46 47
node 3 size: 16384 MB
node 3 free: 14913 MB
node 4 cpus: 32 33 34 35 36 37 38 39
node 4 size: 32768 MB
node 4 free: 31551 MB
node 6 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 6 size: 32752 MB
node 6 free: 31575 MB
node distances:
node 0 2 3 4 6
0: 10 16 22 16 16
2: 16 10 16 16 16
3: 22 16 10 22 22
4: 16 16 22 10 16
6: 16 16 22 16 10
There are 4 CPU chips so having 5 NUMA nodes makes no sense to me.
Can anyone please tell me where are CPU cores assigned to NUMA nodes?
Do you have any kernel boot options defined for memory layout?
Can you also post dmesg from boot where the numa nodes are listed with memory ranges?
Also, will be nice to know kernel version and libnuma version.

Resources