How to extend GHC's Thread State Object - haskell

I'd like to add two extra fields of type StgWord32 to the thread state object (TSO). Based on the information I found on the GHC-Wiki and from looking at the source code, I have extended the struct in /includes/rts/storage/TSO.h and changed the program that creates different offsets (creating DerivedConstants.h). The compiler, the rts, and a simple application re-compile, but at the end of the execution (in hs_exit_) the garbage collector complains:
internal error: scavenge_stack: weird activation record found on stack: 45
I guess it has to to with cmm and/or the STG implementation details (the offsets are generated since the structs are not visible at cmm level, correct me if I'm wrong). Is the order of fields significant? Have I missed a file that should be changed?
I use a debug build of the compiler and RTS and a rather dated ghc 6.12.3 on a 64bit architecture. Any hints to relevant documentation and comments
on the difference between ghc 6 and 7 regarding TSO handling are welcome, too.

The error that you are getting comes from: ghc/rts/sm/Scav.c. Specifically at line 1917:
default:
barf("scavenge_stack: weird activation record found on stack: %d", (int)(info->i.type));
It looks like you need to also modify ClosureTypes.h, which you can find in ghc/includes/rts/storage. This file seems to contain the different kinds of headers that can appear in a heap object. I've also run into some strange bootstrapping errors, where if I try to rebuild using the stage-1 compiler, I get the error you mentioned, but if I do a clean build, then it compiles just fine.

A workaround that turned out good enough for me was to introduce a separate data structure for each Capability that would hold the additional information for each lightweight thread. I have used a HashTable (see rts/Hash.h and .c) mapping from thread id to the custom info struct. The entries were added when the threads were created from sparks (in schduleActiveteSpark).
Timing the creation, insertion, lookup and destruction of the entries and the table showed negligible overhead for small programs. The main overhead results from the actual usage of the information and should ideally be kept outside of the innermost scheduler loop. For the THREADED_RTS build one needs to ensure that other Capabilities don't access tables that are not their own (or use a mutex if such access is required, which is potential source of additional overhead).

Related

Why does v8 report duplicate module strings in heap in my jest tests?

In the process of upgrading node (16.1.x => 16.5.0), I observed that I'm getting OOM issues from jest. In troubleshooting, I'm periodically taking heap snapshots. I'm regularly seeing entries in "string" for module source (same shallow/retained size). In this example screenshot, you can see that the exact same module (React) is listed 2x. Sometimes, the module string is listed even 4x for any given source module.
Upon expansion, it says "system / Map", which suggests to me I think? that theres some v8 wide reference to this module string? That makes sense--maybe. node has a require cache, jest has a module cache, v8 and node i'd assume... share module references? The strings and compiled code buckets do increase regularly, but I expect them to get GC'd. In fact, I can see that many do--expansion of the items show the refs belonging to GC Roots. But I suspect something is holding on to these module references, and I fear it's not at the user level, but at the tooling level. This is somewhat evidenced by observation that only the node.js upgrade induces the OOM failure mode.
Why would my jest test have multiple instances of the same module (i am using --runInBand, so I don't expect multiple workers)
What tips would you offer to diagnose further?
I do show multiple VM Contexts, which I think makes sense--I suppose jest is running some test suites in some sort of isolation.
I do not have a reproduction--I am looking for discussion, best-know-methods, diagnostic ideas.
I can offer some thoughts:
"system / Map" does not mean "some v8 wide reference". "Map" is the internal name for "hidden class", which you may have heard of. The details don't even matter here; TL;DR: some internal thing, totally normal, not a sign of a problem.
Having several copies of the same string on the heap is also quite normal, because strings don't get deduplicated by default. So if you run some string-producing operation twice (such as: reading an external file), you'll get two copies of the string. I have no idea what jest does under the hood, but it's totally conceivable that running tests in parallel in mostly-isolated environments has a side effect of creating duplicate strings. That may be inefficient in a sense, but as long as they get GC'ed after a while, it's not really a problem.
If the specific hypothesis implied above (there are several tests in each file, and jest creates an in-memory copy of the entire file for each executing test) holds, then a possible mitigation might be to split your test files into smaller chunks (1.8MB is quite a lot for a single file). I don't have much confidence in this, but maybe it'd be easy for you to try it and see.
More generally: in the screenshot, there are 36MB of memory used by strings. That's far from being an OOM reason.
It might be insightful to measure the memory consumption of both Node versions. If, for example, it used to consume 4GB and now crashes when it reaches 2GB, that would indicate that the limit has changed. If it used to consume 2GB and now crashes when it reaches 4GB, that would imply that something major has changed. If it used to consume 1.98GB and now crashes when it reaches 2.0GB, then chances are something tiny has changed and you just happened to get lucky with the old version.
Until contradicting evidence turns up, I would operate under the assumption that the resource consumption is normal and simply must be accommodated. You could try giving Node more memory, or reducing the number of parallel test executions.
This seems like a known issue of Jest at Node JS v16.11.0+ and has already been reported to GitHub.

Getting a list or count of all running threads

How can I do one of the following?
get a list of ThreadIDs all running forked threads (preferably with labels) from application code?
get a simple (maybe approximate; e.g. from the last major GC), count of running threads
get either of the above from gdb or similar
Things that don't work for me:
requiring a profiling build
maintaining some data structure to try to track this information myself
Things I thought might be promising:
write a custom eventlog hook to try to get this info https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#hooks-to-change-rts-behaviour
create a hacky RTS binding somehow
GHC 9.6 introduced listThreads :: IO [ThreadId]
From GHC User’s Guide:
GHC now provides a set of operations for introspecting on the threads of a program, GHC.Conc.listThreads, as well as operations for querying a thread’s label (GHC.Conc.threadLabel) and status (GHC.Conc.threadStatus).

there is __cond_lock(x,c) define in compiler.h file , but no __cond_unlock(x,c) define?

In complier.h, there is a macro define as below:
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
But here I have a question, that is, where there is a __cond_lock definition, but does not define the corresponding __cond_unlock, then the variable on the release, how to keep consistent between __cond_lock and __cond_unlock?
And I checked the definition of function spin_trylock (), and it is used __cond_lock, but which also used a _spin_trylock function.in _spin_trylock function, after a few calls, it will use to __acquire function in this case, the equivalent of an operation, it carried out two calculations would lead Sparse detection warning message appears, after I wrote the code for an experiment to test my judgment, is indeed a warning message will appear, if I wrote it twice unlock instruction, there is no alarm information, but this is inconsistent as program running.
Protecting critical sections using locking is up to the programmer. That means, if you hold a lock to protect a critical reason, you've must have to release the lock when you're finished.
There are various types of locking primitives inside Linux kernel like. spinlock(), spinlock_irq(), spin_trylock(). They have their own purposes. Now, spin_trylock() using __cond_lock inside of it, it's because to make sure, whether that particular lock is available for locking or it's been already taken. Take a look at few examples of how spin_trylock or __cond_lock is being used. For ex. at kernel/sched/fair.c::rebalance_domain (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/fair.c?id=d8dfad3876e4386666b759da3c833d62fb8b2267#n5574) see how the balancing is used, it's been using spin_trylock() to hold the lock and while releasing doing it conditionally. Another example could be found at kernel/posix-timers.c, lock_timer() macro. If you closely look at the uses of lock_timer() you'll find how __cond_lock is being used inside kernel and hopefully your confusion will disappear.
In other words, __cond_lock is used to hold a lock conditionally and not being used directly. It's possible to check a particular lock before releasing the lock and this what has been done so far.

How can I delete and deallocate OVM objects in SystemVerilog?

I would like to delete an ovm object (and its children) so that I can recreate it with different configs. Is there a way to do this in OVM?
Currently, when I try to create the object a second time with new, I get the following VCS runtime error:
[CLDEXT] Cannot set 'ap' as a child of 'instance', which already has a child by that name.
I realize that I can simply use a different name to "re-create" the instance, but then I'll still have the old instance sitting around and soaking up memory.
OVM is just a SystemVerilog library. That means that all the rules of SystemVerilog apply to OVM. So, yes, you can use new() with OVM. Sometimes it's preferable to use the factory, and sometimes it's preferable to use new() (that's a topic for a different discussion).
SystemVerilog does not have a delete operator or a destructor like C++. Instead, when you are done with an object you just remove all references to it and the garbage collector will clean up the memory. Here's a quote from the SystemVerilog reference manual (IEEE 1800-2009) section 8.7:
SystemVerilog does not require the complex memory allocation and deallocation of C++. Construction of an object is straightforward; and garbage collection, as in Java, is implicit and automatic. There can be no memory leaks or other subtle behaviors, which are so often the bane of C++ programmers.
It's not entirely true that you cannot have a memory leak. You can forget to remove all references to an object and the garbage collector will not know to pick it up. However, you do not have to worry about memory with the same detail as you do in C++.
The particular error you received with id CLDEXT is from ovm_component class. From the message it appears that you attempted to create two components with the same name and the same parent. Components in OVM are typically static. That is, you create and elaborate them once, usually at time 0, and don't delete or add components after that. Because of this model there are no methods in ovm_component to remove child components. So there really isn't a good way to replace a component once it has been instantiated. By the way, this only applies to components. Other types of objects can be re-allocated.
If you feel that you need to replace a component with a different one after time 0 you should re-think the architecture of your testbench. There are probably betters ways to accomplish what you are trying to do without replacing components.
I have only UVM experience but I think OVM is similar. I would have liked to reply to #Victor Lyuboslavsky's comment but I can't add comments.
The issue is with the name 'ap' which evidently has already been used for a child of 'instance'. Use this code instead.
static int instNum = 0;
instance_ap = my_ovm_extended_class::type_id::create
($sformatf ("ap%0d", instNum), this);
The first time an object is created & the handle assigned to 'instance_ap', the object would have the name 'instance.ap0'. The next time the code executes an object called 'instance.ap1', and so on.
As mentioned by other posters this ought to be done only for non-component objects, and components should be static and must be created during/before the build phase & connected to each other during/before the connect phase.
Try assigning null to the object before calling new again.
Unless I see someone else answer this question, I'd say there is no easy way to deallocate objects in OVM framework.
OVM testbenches are static and created when the testbench is created.
When the environment class is instantiated, it will call new(create), build, connect, end_of_elaboration, start_of_simulation, run and check on all components.
By the end of the environment build phase all components must be created.
By the end of the environment connect phase all components must have their TLM ports connected.
Because of these requirements, you can not change components (or port connections) except for during the phase.
As part of the static nature of the testbench environment, every component must have a unique get_full_name() response. This is because string lookups are used to identify components in the hierarchy.
Assigning an object to null should deallocate memory. If there is no other handle pointing to that memory location, then it should get reclaimed.

Specifying runtime behavior of a program

Are there any languages/extensions that allow the programmer to define general runtime behavior of a program during specific code segments?
Some garbage-collected languages let you modify the behavior of the GC at runtime. Like in lua, the collectgarbage function lets you do this. So, for example, you can stop the GC when you want to be sure that CPU resources aren't used in garbage collection for a critical section of code (after which you start the GC again).
I'm looking for a general way to specify intended behavior of the program without resorting to specifying specific GC tweaks. I'm interested even in an on-paper sort of specification method (ie something a programmer would code toward, but not program syntax that would actually implement that behavior). The point would be that this could be used to specify critical sections of code that shouldn't be interrupted (latency dependent activity) or other intended attributes of certain codepaths (maximum time between an output and an input or two outputs, average running time, etc).
For example, this syntax might describe that maximum time latencyDependentStuff should take is 5 milliseconds:
requireMaxTime(5) {
latencyDependentStuff();
}
Has anyone seen anything like this anywhere before?

Resources