windbg !gcroot <address> returns no roots - memory-leaks

What does it mean if !gcroot returns an empty thread list?
0:000> !gcroot 0000000010817c50
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 2 OSTHread 15a4
Scan Thread 10 OSTHread 1db4
Scan Thread 11 OSTHread 147c
Scan Thread 12 OSTHread 15d4
Scan Thread 14 OSTHread 9dc
Scan Thread 15 OSTHread 12a4
Scan Thread 21 OSTHread 18c4
Scan Thread 23 OSTHread 1260
Scan Thread 24 OSTHread 16c8
Scan Thread 25 OSTHread bd4
Scan Thread 26 OSTHread de8
I have a LOT of entries when doing !dumpheap -type System.String, but most of them return nothing as the example above.

The are no roots for these objects and when the GC runs it will collect the objects.

If you say you have a lot of strings with no roots, it may be the case that many of them are fairly big strings (over 85k), and so they are stuck in the LOH, and so may not be collected as frequently as needed. See this topic for more detail:
WinDbg not telling me where my string is rooted

Related

Why do threads not increase runtime past a certain point

Basically, I have a table with the number of threads in a program and the runtime of the program (in seconds)
Threads Runtime
1 12.06
2 6.03
3 4.02
4 4.556
5 4.154
10 3.216
15 2.68
20 3.082
25 2.814
50 3.35
I understand why having threads/concurrency drastically increases the runtime to start with, however, I am slightly confused about when we get between 10-50 threads the runtime stays relatively stable and does not seem to be increasing at all, despite the number of threads increasing by a large amount.
Why is this?
Threads represent subdivisions of their parent Process.
In your situation, I predict that most of your threads simply have nothing to do!
The "one-thread" case predicts your worst-case runtime. As expected, "two threads" divides it in half, and "three threads" into thirds. "Four threads and beyond" make it clear that there is no additional benefit. The rest of them have nothing to do.

Run 4 threads at same time, if complete to excute 1 thread then excute new thread?

I have 30 tasks.
I want to run 4 threads at a same time to do 4 first tasks.
If any threads completed, i want to excute next thread and it always has 4 threads at same time
When I completed 28 tasks (7 times), I only do 2 tasks (2 threads)
How to solve it ? i use threading namespace
Thank you
You have not mentioned any particular language here, but in case you are using java this is a classic use case of ThreadPoolExecutor.
If you are using some other coding language, you can have your own implementation of simplified ThreadPoolExecutor. Basically:
A thread safe list of tasks to be executed
4 threads reading from the queue and executing tasks
Implement the termination logic for your threads (you may want to terminate if thread finds that the queue is empty or may be wait for some time and then try again)

Does kernel threads get scheduled by the scheduller?

How kernel threads gets executed on the CPU
does these kernel threads get scheduled by the scheduller , like normal user space processes?
or they get waken up when some events happen ?
root 2 0 0 Nov30 ? 00:00:00 [kthreadd]
root 3 2 0 Nov30 ? 00:00:03 [ksoftirqd/0]
The answer to both questions is yes - kernel threads gets scheduled just like user threads and they are normally blocking pending certain events (different events per kernel thread).
Answer is Yes.
Only major difference between kernel threads and user space process would be task->mm = NULL for kernel threads.
Hence they don't have distinct address space. Rest is pretty much same for kernel threads and user space processes.

How does more than one thread execute on a processor core

I wanted to know how does a multi-threaded program with more number of threads executes on a processor core. For example, my program has 12 threads and I am running it on a intel core-i5 machine. It has four CPUs. Will each core run 3 threads? I am confused because I have seen programs with 30 threads running on a 4 core machine.
Thanks
Each core would be able to execute one thread simultaneously. So if there are 30 threads and 4 cores, 26 threads will be waiting to get context switched to get executed. Something like, thread 1-4 runs for 200ms and then 5-8 runs for 200 ms and so on
The processor core is capable of executing one thread at a time. In a quad core, 4 threads are executed simultaneously. Not all the user space threads are executed simultaneously, the kernel threads also runs to schedule the next thread or do some other kernel tasks.

POSIX Threads on a Multiprocessor System

I have written software which takes advantage of POSIX threads so that I can utilize shared memory within the process. My question is if I have a machine running Ubuntu with 4 processors and each processor has 16 cores. Is it more efficient to run 4 processes each with 16 threads or 1 process with 64 threads? Each processor has a dedicated 32gb of ram.
My main worry is that there will be a lot of memcopy happening behind the seen with 1 process.
In summary:
On a 4(16core) Proc Machine
1 process 64 threads? 4 Processes 16 Threads each?
If the process requires more than 32 gb of RAM(The amount dedicated to one Proc) does the answer differ?
Thanks for your help
Depends on what your application does.
A thread in a single-threaded process runs faster then a thread in a multi-threaded process since the latter requires synchronization between threads in library functions like malloc(), fprintf(), etc.. Also, more threads in a multi-threaded process are likely to cause more lock contention slowing down each other. If threads don't need to communicate and don't share data they don't need to be in the same process.
In your case, you may get better parallelism with 4 processes with 16 threads rather then 1 process with 64 threads.

Resources