How to monitor process creation and statistics using kernel module - linux

I wrote a kernel module to monitor cpu and memory time series. Additionally to that, I would like to log all process creations (and their meta date like pid, cmdline, ...) and also exists with their statistics like total I/O and CPU usage.
The main questions is: Can I create a kind of listener to process creation and exit? Especially on exit, I would also need the meta information for the process. How can this be done?

What you're describing sounds eerily like the Linux process accounting system, which already exists in the kernel. If it isn't an exact fit, your best bet will be to consider extending it, rather than building something entirely new.
Another existing system to look at will be the process events connector, which can be used to notify userspace processes when other processes are created and exit.

I know you are talking about monitoring processes using a Linux kernel module.
But I think it worth mention the python module psutil. Even if it is a user-space solution.
It is a very complete tool that allows monitoring processes about the resources they are using, memory, disk, CPU.
Some examples from the documentation:
Getting CPU usage for some process
>>> import psutil
>>> p = psutil.Process()
>>> # blocking
>>> p.cpu_percent(interval=1)
2.0
>>> # non-blocking (percentage since last call)
>>> p.cpu_percent(interval=None)
2.9
Getting memory info
>>> import psutil
>>> p = psutil.Process()
>>> p.memory_info()
pmem(rss=15491072, vms=84025344, shared=5206016, text=2555904, lib=0, data=9891840, dirty=0)
And the very interesting open_files
>>> import psutil
>>> f = open('file.ext', 'w')
>>> p = psutil.Process()
>>> p.open_files()
[popenfile(path='/home/giampaolo/svn/psutil/file.ext', fd=3, position=0, mode='w', flags=32769)]
The process creation time
>>> import psutil, datetime
>>> p = psutil.Process()
>>> p.create_time()
1307289803.47
>>> datetime.datetime.fromtimestamp(p.create_time()).strftime("%Y-%m-%d %H:%M:%S")
'2011-03-05 18:03:52'
Of course, you can query info for any process running in your target system just provide the pid to psutil.Process like this: psutil.Process(pid)

Related

Performance for recurring doopl calls

I'm using doopl.factory to solve multiple linear programs in a loop. I noticed a decreasing performance while looping through instances. Using memory_profiler shows that the memory increases after each call, which, eventually, leads to a very poor performance. It seems that doopl.factory.create_opl_model() and opl.run() somehow block memory that is not cleared with opl.end(). Is my analysis correct?
memory_profiler analysis screenshot
I set up a simple example to demonstrate the issue.
import doopl.factory, os, psutil
from memory_profiler import profile
#profile
def main():
dat = 'data.dat'
mod = 'model.mod'
print('memory before doopl: ' + str(psutil.Process(os.getpid()).memory_info().rss / 100000000) + ' GB')
with doopl.factory.create_opl_model(model=mod, data=dat) as opl:
try:
opl.mute()
opl.run()
opl.end() **# EDIT:** this is just to explicitly demonstrate with memory_profiler that opl.end() does not free all memory.
except:
'error'
print('memory after doopl: ' + str(psutil.Process(os.getpid()).memory_info().rss / 100000000) + ' GB')
if __name__ == "__main__":
main()
The data.dat file is empty and the model.mod file is as follows:
range X = 1..5;
dvar int+ y[X];
minimize sum(x in X) y[x];
subject to {
forall (x in X) {
y[x] <= 2;
};
};
Is there some way to fully clear memory after solving a model with doopl?
How can your code even work ?
opl.end() sets some internals to None. If you do:
with create_opl_model(model=mod) as opl:
opl.run()
opl.end()
opl.end() is actually called twice: once as opl.end() then once
when the context manager exits, resulting in an exception.
Please do not call opl.end() if you are using it as a context manager.
This is unless you have a very old version of doopl (>2 years). If so, please upgrade..
Now, in opl.end(), I can tell you that the C++ objects are correctly freed.
I'm not aware of any memory leak issues here (but a memory leak in OPL
must be demonstrated using c++, not a garbage collected language).
As far as I know, memory_profiler is based on process size using psutil.
There is no guaranty that when you release some memory, the process size
decreases (python might have release the memory, but the memory allocators
might not have returned the memory to the system).

Run two functions in parallel in Python

Currently I have a program which scans all the available ble devices and then updates it to a list bt_addrs. This list is then passed to a function named data() which performs the work. To go over all the elements in the list I have used ProcessPoolExecutor.
One issue that I am facing is the scanning happens only for one time and then it stops. I want to make the scanning function to execute continuously so that if any new device comes in the vicinity, it will get added to the list and list should get updated and data function should work on that updated list in parallel.
Having an intermediate knowledge about Python and multiprocessing is an advance topic , a help would definitely help me increase my knowledge and help me get the desired results of the whole program.
If you have any other approach, please tell.
I am running this code on Raspberry Pi 4- quad core Arm v8 processor, python version 3.7.3
Below is the code I currently have
import sys
import time
from bluepy import btle
from bluepy.btle import Scanner
import concurrent.futures
import signal
bt_addrs = []
def data(mac_adrs3):
while True:
// main works happen here
def main():
scanner = Scanner() # scanner ovject
devices = scanner.scan(30.0) # scans for 30 sec
available_devices=[]
for dev in devices:
available_devices.append(dev.addr) # all the MAC address are stored
bt_addrs = available_devices
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.map(data,bt_addrs) #data fun runs on all the elements in list bt_addrs
if __name__=="__main__":
main()

Python keeps running for 10+mins (after last statement in program) when there is huge (33GB) data structure in memory (nothing in swap)

I have need to parse a huge gz file (about ~10GB compressed, ~100GB uncompressed). The code creates data structure ('data_struct') in memory. I am running on a machine with Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz with 16 CPUs and plenty RAM (ie 200+ GB), running CentOS-6.9. I have implemented these things using a Class in Python3.6.3 (CPython) as shown below :
class my_class():
def __init__(self):
cmd = f'gunzip huge-file.gz'
self.process = subprocess(cmd, stdout=subprocess.PIPE, shell=True)
self.data_struct = dict()
def populate_struct(self):
for line in process.stdout:
<populate the self.data_struct dictionary>
def __del__():
self.process.wait()
#del self.data_struct # presence/absence of this statement decreases/increases runtime respectively
#================End of my_class===================
def main():
my_object = my_class()
my_object.populate_struct()
print(f'~~~~ Finished populate_struct() ~~~~') # last statement in my program.
## Python keeps running at 100% past the previous statement for 10+mins
if __name__ == '__main__':
main()
#================End of Main=======================
The resident memory consumption of my data_struct in memory (RAM only, no swap) is about ~33GB. I did $ top to find the PID of Python process and traced the Python process using $ strace -p <PID> -o <out_file> (to see what Python is doing). While it is executing populate_struct(), I can see in the out_file of strace that Python is using calls like mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0684160000 to create data_struct. While Python was running past the last print() statement, I found that Python was issuing only munmap() operations as shown below :
munmap(0x2b3c75375000, 41947136) = 0
munmap(0x2b3c73374000, 33558528) = 0
munmap(0x2b4015d2a000, 262144) = 0
munmap(0x2b4015cea000, 262144) = 0
munmap(0x2b4015caa000, 262144) = 0
munmap(0x2b4015c6a000, 262144) = 0
munmap(0x2b4015c2a000, 262144) = 0
munmap(0x2b4015bea000, 262144) = 0
munmap(0x2b4015baa000, 262144) = 0
...
...
The Python keeps running anywhere between 10+ mins to 12mins after the last print() statement. An observation is that if I have del self.data_struct statement in __del__() method, then it takes 2mins only. I have done these experiments multiple times and the runtime decrease/increase by the presence/absence of del self.data_struct in __del__().
My questions :
I understanding is that Python is doing cleanup work by using munmap(), but unlike Python, other languages like Perl immediately release memory and exit the program. Am I doing it right by implementing as shown above ? Is there a way to tell Python to avoid this munmap() ?
Why does it take 10+mins to cleanup if there is no del self.data_struct statement in __del__(), and takes only 2mins to cleanup if there is del self.data_struct statement in __del__() ?
Is there a way to speedup the cleanup work ie munmap()?
Is there a way to exit program immediately without the cleanup work ?
Other thoughts/suggestions about tackling this problem are appreciated.
Please try a more recent version of Python (at least 3.8)? This shows several signs of being a mild(!) form of a worst-case quadratic-time algorithm in CPython's object deallocator, which was rewritten here (and note that the issue linked to here in turn contains a link to an older StackOverflow post with more details):
https://bugs.python.org/issue37029
Some glosses
If my guess is right, the amount of memory isn't particularly important - it's instead the sheer number of distinct Python objects being managed by CPython's "small object allocator" (obmalloc.c), combined with "bad luck" in the order in which their memory is released.
When that code was first written, RAM wasn't big enough to hold millions of Python objects, so nobody noticed that one particular part of deallocation logic could take time quadratic in the number of allocated "arenas" (details aren't really helpful, but "arenas" are the granularity at which system mmap() and munmap() calls are made - 256 KiB chunks).
It's not those mapping calls that are consuming mounds of time, and any decent implementation of any language using OS memory mapping facilities will eventually call munmap() enough times to release the OS resources consumed by its mmap() calls.
So that's a red herring. munmap() is being called many times simply because you allocated many objects, which required many mmap() calls.
There isn't any crisp or easy way to explain exactly when the problem shows up. See "bad luck" above ;-) The relevant code was rewritten for CPython 3.8 to be worst-case linear time instead, which gave a factor of ~250 speedup for the specific program that triggered the issue report (see the link already given).
As a comment noted, you can exit your program immediately at any time by invoking os._exit(), but the leading underscore is meant to scare you off: "immediately" means "immediately". No cleanups of any kind are performed. For example, the __del__ method in your class? Skipped. __del__ is run as a side effect of deallocation, but if you actually "immediately release memory and exit the program" then no destructors of any kind are run, nor any handlers registered with the atexit module, etc etc. It's as drastic as a program dying, e.g., with a segfault.

Memory error in pycharm using scipy's welch function

I want to get the Welch's periodogram using scipy.signal in pycharm. My signal is an 5-min audio file with Fs = 48 kHz, so I guess it's a very big signal. The line was:
f, p = signal.welch(audio, Fs, nperseg=512)
I am getting a memory error. I was wondering if that's a pycharm configuration thing, or it's just a too big signal. My RAM is 8 Gb.
Sometimes it works with some audio files, but the idea is to do it with several, so after one or two, the error raises.
I've tested your setup and welch does not seem to be the problem. For further analysis the entire script you are running would be necessary.
import numpy as np
from scipy.signal import welch
fs = 48000
signal_length = 5 * 60 * fs
audio_signal = np.random.rand(signal_length)
f, Pxx = welch(audio_signal, fs=fs, nperseg=512)
On my computer (windows 10, 64 bit) it consumes 600 MB of peak memory during the call to welch which gets recycled directly afterwards, additionally to ~600MB of allocation for the initial array and Python itself. The call to welch itself does not lead to any permanent significant memory increase.
You can do the following:
Upgrade to newest version of scipy, as there have been problems with Welch previously
Check that your PC has enough free memory and close memory-hungry applications (eg. chrome)
Convert your array in a lower datatype e.g. from float64 to float32 or float16
Make sure to free variables that are not needed anymore . Especially if you load several signals and store the result in different arrays, it can accumulate quite quickly. Only keep what you need and delete vars via del variable_name, check that there are no references remaining elsewhere in the program. E.g if you don't need the audio variable, either delete it explicitly after welch(...) or overwrite it with the next audio data.
Run the garbage collector gc.collect(). However, this will probably not solve your problem as garbage is managed automatically in Python anyway.

Python3 multiprocessing: Memory Allocation Error

I know that this question has been asked a lot of times, but the answers are not applicable.
This is answer one of a parallelized loop using multiprocessing on StackoverFlow:
import multiprocessing as mp
def processInput(i):
return i * i
if __name__ == '__main__':
inputs = range(1000000)
pool = mp.Pool(processes=4)
results = pool.map(processInput, inputs)
print(results)
This code works fine. But if I increase the range to 1000000000, my 16GB of Ram are getting filled completely and I get [Errno 12] Cannot allocate memory. It seems as if the map function starts as many processes as possible. How do I limit the number of parallel processes?
The pool.map function starts 4 processes as you instructed it (in the line processes=4 you instruct the pool on how many processes it can use to perform your logic).
There is however a different issue underlying this implementation.
The pool.map function will return a list of objects, in this case its numbers.
Numbers do not act like int-s in ANSI-C they have overhead and will not overflow (e.g. turn to -2^31 whenever reaching 2^31+1 on 32-bit).
Also python lists are not array and do incur an overhead.
To be more specific, on python 3.6, running the following code will reveal some overhead:
>>>import sys
>>>t = [1,2,3,4]
>>>sys.getsizeof(t)
96
>>>t = [x for x in range(1000)]
>>>sys.getsizeof(t)
9024
So this means 24 bytes per number on small lists and ~9 bytes on large lists.
So for a list the size of 10^9 we get about 8.5GB
EDIT: 1. As tfb mentioned, this is not even the size of the underlying Number objects, just pointers and list overhead, meaning there is much more memory overhead I did not account for in the original answer.
Default python installation on windows is 32-bit (you can get 64-bit installation but you need to check the section of all available downloads in the python website), So I assumed you are using the 32-bit installation.
range(1000000000) creates a list of 10^9 ints. This is around 8GB (8 bytes per int on a 64-bit system). You are then trying to process this to create another list of 10^9 ints. A really really smart implementation might be able to do this on a 16GB machine, but its basically a lost cause.
In Python 2 you could try using xrange which might or might not help. I am not sure what the Python 3 equivalent is.

Resources