Is there a way to see how much memory does a python module take? - python-3.x

In python3 is there a simple way to see how much memory is used when loading a module? (not while running its content such as functions or methods, which may load data and so on).
# Memory used before, in bytes
import mymodule
# Memory used after, in bytes
# Delta memory = memory used before - memory used after
(E.g. these 3 comment lines of extra code to insert would be what I call "simple").
By using the spyder IDE for example, I can see in the "File explorer" tab on the top right, the size of the file (i.e. size on disk) which contains my module, but I think it's not the size that is taken into memory after Python has actually loaded its contents, with the many imports I need in there.
And in the "Memory and Swap History" part of the "System Monitor" (Ubuntu 18.04) I can see a little bump while effectively loading my module in python (it may get bigger as the module grows of course) and which is probably the amount I'm searching for:
My uses would mainly be inside the Spyder IDE, any jupyter-notebook or directly into a python console.

Related

Python3 pathlib's Path.glob() generator keeps increasing memory usage when performed on large file structure

I used pathlib's Path(<path>).glob() function for walking through file directories and grabbing their files' name and extension parameters. My Python script is meant to run on a large file system, so I tested it on my root directory of my Linux machine. When left for a few hours I noticed that my machine's memory usage increased by over a GB.
After using memray and memory_profiler, I found that whenever I looped through directory items using the generator the memory usage kept climbing.
Here's the problematic code (path is the path to the root directory):
dir_items = Path(path).glob("**/*")
for item in dir_items:
pass
Since I was using a generator, my expectation was that my memory requirements would remain constant throughout. I think I might have some fundamental misunderstanding. Can anyone explain where I've gone wrong?

How do I read a large .conll file with python?

My attempt to read a very large file with pyconll and conllu keeps running into memory errors. The file is 27Gb in size and even using iterators to read it does not help. I'm using python 3.7.
Both pyconll and conllu have iterative versions that use less memory at a given moment. If you call pyconll.load_from_file it is going to try to read and parse the entire file into memory and likely your machine has much less than 27Gb of RAM. Instead use pyconll.iter_from_file. This will read the sentences one by one and use minimal memory, and you can extract that's needed from those sentences piecemeal.
If you need to do some larger processing that requires having all information at once, it's a bit outside the scope of either of these libraries to support that type of scenario.

Maxima. Thread local storage exhausted

I am writing mathematical modules for analysis problems. All files are compiled to .fasl.
The sizes of these files are gradually increasing and new ones are added to them. I ran into a problem today when loading a module load("foo.mac") ~0.4s loading 100+ files and another module from 200+, which declare functions and variables without precomputing.
Error: Thread local storage exhausted fatal error encountered is SBCL pid %PRIMITIVE HALT called; the party is over. Welcome to LDB.. CPU and RAM indicators are stable at this moment
Doesn't help maxima -X '--dynamic-space-size 2048', 4096 - too, by default 1024. Why it does not work?
SBCL + Windows = works without errors. SBCL 1.4.5.debian + Linux (server) this error is thrown. However, if I reduce the size of the files a little, then the module is loaded.
I recompiled the files, checked all .UNLISP. Changed the order of uploaded files, but an error occurs when loading the most recent ones in the list. Tests run without errors. There are some ways to increase the amount "local storage" through SBCL, Maxima? In which direction to move? Any ideas
Update:
Significantly reduced the load by removing duplicate code matchdeclare(..). Now no error is observed.
From https://sourceforge.net/p/maxima/mailman/message/36659152/
maxima uses quite a few special variables which sometimes makes
sbcl run out of thread-local storage when running the testsuite.
They proposed to add an environment variable that allows to change
the thread-local storage size but added a command-line option
instead => if supported by sbcl we now generate an image with
ab bigger default thread-local storage whose size can be
overridden by users passing the --tls-limit option.
The NEWS file in SBCL's source code also indicates that the default value is 4096
changes in sbcl-1.5.2 relative to sbcl-1.5.1:
* enhancement: RISC-V support with the generational garbage collector.
* enhancement: command-line option "--tls-limit" can be used to alter the
maximum number of thread-local symbols from its default of 4096.
* enhancement: better muffling of redefinition and lambda-list warnings
* platform support:
** OS X: use Grand Central Dispatch semaphores, rather than Mach semaphores
** Windows: remove non-functional definition of make-listener-thread
* new feature: decimal reader syntax for rationals, using the R exponent
marker and/or *READ-DEFAULT-FLOAT-FORMAT* of RATIONAL.
* optimization: various Unicode tables have been packed more efficiently

Memory Leak examples written in 4D

What are some examples of developer created memory leaks written in the 4D programming language?
By developer created memory leak, i am referring to a memory leak created by bad programming that could have been avoided with better programming.
32 bit
When ran in a 32 bit application it should eventually crash once it attempts to allocate more than 2^32 bytes (4 GB) of memory. If on the Mac OS X platform, the bottom of the crash report beneath the VM Region Summary should show a memory value around 3.7 GB:
TOTAL               3.7G
64 bit
When ran in a 64 bit application the code will continue to raise the amount of memory allocated and will not plateau, in that situation the OS will eventually complain that it has ran out of memory:
Overview
There are many ways that a developer can create there own memory leaks. Most of what you want to avoid is listed here.
use CLEAR VARIABLE when done using a variable
use CLEAR SET when done using a set
use CLEAR NAMED SELECTION when done using a named selection
use CLEAR LIST when done using a list
re-size your BLOBs to 0 with SET BLOB SIZE when done using the BLOB or use CLEAR VARIABLE
re-size your arrays to 0 when done using the array or use CLEAR VARIABLE
don't forget to close any open XML trees such as XML, DOM, SVG, etc (DOM CLOSE XML, SVG_CLEAR)
if using ODBC always remember to free the connection using ODBC_SQLFreeConnect
make sure to cleanup any offscreen areas used
Examples
Here are two specific examples of developer created memory leaks:
Forgetting to close XML
Bad code:
Repeat
$xmlRef:=DOM Create XML Ref("root")
Until (<>crashed_or_quit)
The code snippet above will leak memory because each call to DOM CREATE XML REF will create a new reference to a memory location, while the developer of this code has neglected to include a call to free the memory. Running this in a loop in a 32 bit host application will eventually cause a crash.
Fixed code:
This code can be easily fixed by calling DOM CLOSE XML when finished with the XML reference.
Repeat
$xmlRef:=DOM Create XML Ref("root")
DOM CLOSE XML($xmlRef)
Until (<>crashed_or_quit)
Forgetting to clear a list
Bad code:
Repeat
$listRef:=New list
Until (<>crashed_or_quit)
The code snippet above will leak memory because each time NEW LIST is called a reference to a new location in memory is returned. The developer is supposed to clear the the memory at the referenced location by using the CLEAR LIST($listRef) command. As a bonus, if the list has any sublists attached, the sublists can be cleared by passing the * parameter like CLEAR LIST($listRef;*).
Fixed code:
This can be easily fixed by calling CLEAR LIST($listRef;*) as seen in the following fixed example:
Repeat
$listRef:=New list
CLEAR LIST($listRef;*)
Until (<>crashed_or_quit)

Debugging memory leaks in Windows Explorer extensions

Greetings all,
I'm the developer of a rather large C# Windows Explorer extension. As you can imagine, there is a lot of P/Invoke involved, and unfortunately, I've confirmed that it's leaking unmanaged memory somewhere. However, I'm coming up empty as to how to find the leak. I tried following this helpful guide, which says to use WinDBG. But, when I try to use !heap, it won't let me because I don't have the .PDB files for explorer.exe (and the public symbol files aren't sufficient, apparently).
Help?
I've used many time UMDH with very good results. The guide you mentioned describing WinDbg uses the same method as UMDH, based on ability of debug heap to record stack traces for all allocations. The only difference is that UMDH does it automated -- you simply run umdh from command line and it creates snapshot of all current allocations. Normally you to repeate the snapshots two or more times, then you calculate 'delta' between two snapshots (also using umdh.exe). The information on the 'delta' file gives you all new allocations that happen between your snapshots, sorted by the allocation size.
UMDH also needs symbols. You will need at least symbols for ntdll.dll (heap implementation lives there). Public symbols available on public symbols from http://msdl.microsoft.com/download/symbols will work fine.
Make sure you are using correct bitness of the umdh.exe. Explorer.exe is 64 bit on 64 bit OS, so if your OS is 64 bit you need to use 64 bit umdh.exe -- i.e. download appropriate bitness of Windows debugging tools.

Resources