Maxima. Thread local storage exhausted - multithreading

I am writing mathematical modules for analysis problems. All files are compiled to .fasl.
The sizes of these files are gradually increasing and new ones are added to them. I ran into a problem today when loading a module load("foo.mac") ~0.4s loading 100+ files and another module from 200+, which declare functions and variables without precomputing.
Error: Thread local storage exhausted fatal error encountered is SBCL pid %PRIMITIVE HALT called; the party is over. Welcome to LDB.. CPU and RAM indicators are stable at this moment
Doesn't help maxima -X '--dynamic-space-size 2048', 4096 - too, by default 1024. Why it does not work?
SBCL + Windows = works without errors. SBCL 1.4.5.debian + Linux (server) this error is thrown. However, if I reduce the size of the files a little, then the module is loaded.
I recompiled the files, checked all .UNLISP. Changed the order of uploaded files, but an error occurs when loading the most recent ones in the list. Tests run without errors. There are some ways to increase the amount "local storage" through SBCL, Maxima? In which direction to move? Any ideas
Update:
Significantly reduced the load by removing duplicate code matchdeclare(..). Now no error is observed.

From https://sourceforge.net/p/maxima/mailman/message/36659152/
maxima uses quite a few special variables which sometimes makes
sbcl run out of thread-local storage when running the testsuite.
They proposed to add an environment variable that allows to change
the thread-local storage size but added a command-line option
instead => if supported by sbcl we now generate an image with
ab bigger default thread-local storage whose size can be
overridden by users passing the --tls-limit option.
The NEWS file in SBCL's source code also indicates that the default value is 4096
changes in sbcl-1.5.2 relative to sbcl-1.5.1:
* enhancement: RISC-V support with the generational garbage collector.
* enhancement: command-line option "--tls-limit" can be used to alter the
maximum number of thread-local symbols from its default of 4096.
* enhancement: better muffling of redefinition and lambda-list warnings
* platform support:
** OS X: use Grand Central Dispatch semaphores, rather than Mach semaphores
** Windows: remove non-functional definition of make-listener-thread
* new feature: decimal reader syntax for rationals, using the R exponent
marker and/or *READ-DEFAULT-FLOAT-FORMAT* of RATIONAL.
* optimization: various Unicode tables have been packed more efficiently

Related

Python3 pathlib's Path.glob() generator keeps increasing memory usage when performed on large file structure

I used pathlib's Path(<path>).glob() function for walking through file directories and grabbing their files' name and extension parameters. My Python script is meant to run on a large file system, so I tested it on my root directory of my Linux machine. When left for a few hours I noticed that my machine's memory usage increased by over a GB.
After using memray and memory_profiler, I found that whenever I looped through directory items using the generator the memory usage kept climbing.
Here's the problematic code (path is the path to the root directory):
dir_items = Path(path).glob("**/*")
for item in dir_items:
pass
Since I was using a generator, my expectation was that my memory requirements would remain constant throughout. I think I might have some fundamental misunderstanding. Can anyone explain where I've gone wrong?

Is there a way to see how much memory does a python module take?

In python3 is there a simple way to see how much memory is used when loading a module? (not while running its content such as functions or methods, which may load data and so on).
# Memory used before, in bytes
import mymodule
# Memory used after, in bytes
# Delta memory = memory used before - memory used after
(E.g. these 3 comment lines of extra code to insert would be what I call "simple").
By using the spyder IDE for example, I can see in the "File explorer" tab on the top right, the size of the file (i.e. size on disk) which contains my module, but I think it's not the size that is taken into memory after Python has actually loaded its contents, with the many imports I need in there.
And in the "Memory and Swap History" part of the "System Monitor" (Ubuntu 18.04) I can see a little bump while effectively loading my module in python (it may get bigger as the module grows of course) and which is probably the amount I'm searching for:
My uses would mainly be inside the Spyder IDE, any jupyter-notebook or directly into a python console.

Gulp/Node: error while loading shared libraries: cannot allocate memory in static TLS block

Trying to run gulp and getting this output
$ gulp
node: error while loading shared libraries: cannot allocate memory in static TLS block
From what I have found, this seems to relate to gcc or g++, not sure how it pertains to node or gulp. Either way I can't seem to run gulp anymore. Should also mention, this just popped up today. It was running fine yesterday.
EDIT: seems like it's for all node commands. Just tried running npm -v to get the version number and it has the same output. Same with node -v
Running CentOS 6.9
The GNU toolchain supports various kinds of TLS, and one of them (the initial-exec model) involves what is essentially a fixed offset from the thread control block. At program startup, the dynamic linker computes all the offsets and makes sure that all threads have sufficient space for all the required thread local variables.
However, with dlopen, this does not work in general because it is not possible to move the thread control block around to make room for more thread-local variables. The current glibc dynamic linker has a heuristic which reserves some space for future dlopen calls, but if you load a number of shared objects, each wither their own thread-local variables, this is not enough.
The usual workaround is to use the LD_DEBUG=files environment variable (or strace) to find relevant shared objects loaded with dlopen (unfortunately, the error message you quoted does not provide this information). After that, you can use the LD_PRELOAD environment variable to tell the dynamic linker to load them early. (It is sufficient to do this for the shared object which is dlopened, its dependencies are processed automatically.) This has the side effect that the computation at program startup takes into account their TLS needs, and when the dlopen call happens later at run time, no additional TLS variables have to be allocated. However, this approach does not work for all shared objects because it affects symbol lookup and the order in which ELF constructors run.
In the general case, it may be necessary to switch some shared objects to the global-dynamic TLS model (which requires recompiling them), or use a glibc build with an increased TLS reserve. Unfortunately, the reserve cannot currently be set at run time.

Reducing memory usage in an extended Mathematica session

I'm doing some rather long computations, which can easily span a few days. In the course of these computations, sometimes Mathematica will run out of memory. To this end, I've ended up resorting to something along the lines of:
ParallelEvaluate[$KernelID]; (* Force the kernels to launch *)
kernels = Kernels[];
Do[
If[Mod[iteration, n] == 0,
CloseKernels[kernels];
LaunchKernels[kernels];
ClearSystemCache[]];
(* Complicated stuff here *)
Export[...], (* If a computation ends early I don't want to lose past results *)
{iteration, min, max}]
This is great and all, but over time the main kernel accumulates memory. Currently, my main kernel is eating up roughly 1.4 GB of RAM. Is there any way I can force Mathematica to clear out the memory it's using? I've tried littering Share and Clear throughout the many Modules I'm using in my code, but the memory still seems to build up over time.
I've tried also to make sure I have nothing big and complicated running outside of a Module, so that something doesn't stay in scope too long. But even with this I still have my memory issues.
Is there anything I can do about this? I'm always going to have a large amount of memory being used, since most of my calculations involve several large and dense matrices (usually 1200 x 1200, but it can be more), so I'm wary about using MemoryConstrained.
Update:
The problem was exactly what Alexey Popkov stated in his answer. If you use Module, memory will leak slowly over time. It happened to be exacerbated in this case because I had multiple Module[..] statements. The "main" Module was within a ParallelTable where 8 kernels were running at once. Tack on the (relatively) large number of iterations, and this was a breeding ground for lots of memory leaks due to the bug with Module.
Since you are using Module extensively, I think you may be interested in knowing this bug with non-deleting temporary Module variables.
Example (non-deleting unlinked temporary variables with their definitions):
In[1]:= $HistoryLength=0;
a[b_]:=Module[{c,d},d:=9;d/;b===1];
Length#Names[$Context<>"*"]
Out[3]= 6
In[4]:= lst=Table[a[1],{1000}];
Length#Names[$Context<>"*"]
Out[5]= 1007
In[6]:= lst=.
Length#Names[$Context<>"*"]
Out[7]= 1007
In[8]:= Definition#d$999
Out[8]= Attributes[d$999]={Temporary}
d$999:=9
Note that in the above code I set $HistoryLength = 0; to stress this buggy behavior of Module. If you do not do this, temporary variables can still be linked from history variables (In and Out) and will not be removed with their definitions due to this reason in more broad set of cases (it is not a bug but a feature, as Leonid mentioned).
UPDATE: Just for the record. There is another old bug with non-deleting unreferenced Module variables after Part assignments to them in v.5.2 which is not completely fixed even in version 7.0.1:
In[1]:= $HistoryLength=0;$Version
Module[{L=Array[0&,10^7]},L[[#]]++&/#Range[100];];
Names["L$*"]
ByteCount#Symbol##&/#Names["L$*"]
Out[1]= 7.0 for Microsoft Windows (32-bit) (February 18, 2009)
Out[3]= {L$111}
Out[4]= {40000084}
Have you tried to evaluate $HistoryLength=0; in all subkernels and as well as in the master kernel? History tracking is the most common source for going out of memory.
Have you tried do not use slow and memory-consuming Export and use fast and efficient Put instead?
It is not clear from your post where you evaluate ClearSystemCache[] - in the master kernel or in subkernels? It looks like you evaluate it in the master kernel only. Try to evaluate it in all subkernels too before each iteration.

Debugging memory leaks in Windows Explorer extensions

Greetings all,
I'm the developer of a rather large C# Windows Explorer extension. As you can imagine, there is a lot of P/Invoke involved, and unfortunately, I've confirmed that it's leaking unmanaged memory somewhere. However, I'm coming up empty as to how to find the leak. I tried following this helpful guide, which says to use WinDBG. But, when I try to use !heap, it won't let me because I don't have the .PDB files for explorer.exe (and the public symbol files aren't sufficient, apparently).
Help?
I've used many time UMDH with very good results. The guide you mentioned describing WinDbg uses the same method as UMDH, based on ability of debug heap to record stack traces for all allocations. The only difference is that UMDH does it automated -- you simply run umdh from command line and it creates snapshot of all current allocations. Normally you to repeate the snapshots two or more times, then you calculate 'delta' between two snapshots (also using umdh.exe). The information on the 'delta' file gives you all new allocations that happen between your snapshots, sorted by the allocation size.
UMDH also needs symbols. You will need at least symbols for ntdll.dll (heap implementation lives there). Public symbols available on public symbols from http://msdl.microsoft.com/download/symbols will work fine.
Make sure you are using correct bitness of the umdh.exe. Explorer.exe is 64 bit on 64 bit OS, so if your OS is 64 bit you need to use 64 bit umdh.exe -- i.e. download appropriate bitness of Windows debugging tools.

Resources