QWebView's memory (cache) management - pyqt

Here is the code that downloads the same page 10 times:
app = QApplication([])
event = threading.Event()
def load(url):
def _load_finished(ok):
event.set()
web_view = QWebView()
web_view.loadFinished.connect(_load_finished)
event.clear()
web_view.setUrl(QUrl(url));
while not event.wait(.05): app.processEvents()
web_view.loadFinished.disconnect(_load_finished)
return web_view.page().mainFrame().documentElement()
QWebSettings.setMaximumPagesInCache(0)
QWebSettings.setObjectCacheCapacities(0, 0, 0)
if __name__ == '__main__':
for i in range(10):
load('http://www.huffingtonpost.com/')
QWebSettings.clearMemoryCaches()
QWebSettings.clearIconDatabase()
print(i)
app.exec_()
And here is Process Explorer's snapshot after 7th download:
At 10th download memory reaches 270MB.
Is this normal? How do I fix it?
Oddly enough, depending on the address, consumption may fluctuate, but stay below certain threshold (here it's 90MB):

Have stumbled onto this answer. Quoting comment in QT sources:
Dead resources in the cache are kept in non-purgeable memory.
When we prune dead resources, instead of freeing them, we mark their memory as purgeable and
keep the resources until the
kernel reclaims the purgeable memory.
By leaving the in-cache
dead resources in dirty resident memory, we decrease the likelihood of
the kernel claiming that memory and forcing us to refetch the
resource (for example when a user presses back).
This sort of settles it.. and relives my restless soul.
Following bms20's advice I run QtWebKit code in a separate process (using subprocess.Popen) and cache web resources on disk (PyQt5.QtNetwork.QNetworkDiskCache) to preserve traffic:
def ExecuteCode(code):
import os
os.environ['PYTHONIOENCODING'] = 'utf-8' #Optionally
from subprocess import Popen, PIPE, STDOUT
proc = Popen('python.exe', stdin=PIPE)
out, err = proc.communicate(code.encode())
Part of code content:
cache = QNetworkDiskCache()
cache.setCacheDirectory('cache')
web_view = QWebView()
web_view.page().networkAccessManager().setCache(cache)
# Do stuff with web_page

Related

Need to detect any usb block device with specific partition label

I'm surprised this is proving to be difficult to find.
I need to detect when a USB block device with a specific partition label is added (plugged in) using python3.
Is there a way to use pyudev to provide a list of USB block devices? How can I specify a filter with subsystem="block" AND subsystem="usb", they seem to be mutually exclusive filters.
When a USB device having a partition named "XYZ" is plugged in, I need to run a script to mount it and run a program that uses the data on that partition.
I have tried too many variations to count, from various udev rules, systemd units, many scripts and combinations thereof, but have not had any success until I used the following code. It worked but caused 100% CPU load. When I added sleep time in the while loop at the end it no longer worked at all, and even prevented PCmanFM automounting as well.
The issue was in the usbEvent.py process. I could run it from the command line and it worked just fine. First thing it does is use Popen to call "grep devName /proc/mounts" to wait for the automounter to mount the partition. The Popen is called in a loop and adding some time.sleep eliminated the CPU burden tho that was surprising since the mount point appears in under a few seconds.
There seems to be some interplay between the code below systemd runs and the usbEvent.py process it spawns that I don't fully understand. They are separate processes so I would think they should be quite independent of each other.
The usbEvent.py handler works but it takes much longer to recognize the mount and continue. While it's running it consumes around 5% of the CPU, and only 0.3 when it finishes. Why it doesn't end when the timeout is over must be due to p.communicate, but if p.poll does NOT return None the process should be complete and should not block... but it does! Why?
The platform is a Raspberry Pi4 with 8GB RAM and January 2021 Raspbery Pi OS release.
#!/usr/bin/env python3
import os
import time
import subprocess as sp
import pyudev
# This code is run on boot via systemd to detect when
# my custom USB storage device (USB stick, SSD etc)
# is inserted or removed. It spawns a new process to
# handle the event.
context = pyudev.Context()
monitor = pyudev.Monitor.from_netlink(context)
monitor.filter_by('block', device_type="partition")
def log_event(action, device):
devName = device.get('DEVNAME')
devLabel = device.get('ID_FS_LABEL')
if devLabel == "MY_CUSTOM_USB":
sp.Popen(["/home/user/bin/customUSB/usbEvent.py",
action, devName, devLabel],
stdin=sp.DEVNULL, stdout=sp.DEVNULL, stderr=sp.DEVNULL)
observer = pyudev.MonitorObserver(monitor, log_event)
observer.start()
while True:
# pass
time.sleep(0.1)
Here is the portion of the usbEvent.py handler that I changed to get it working:
# Waits for mount point of "dev" to appear and returns it. Communnicate
def getMountPoint(dev):
out = ""
interval = 0.1
timeout = 5 / interval
while timeout > 0:
p = sp.Popen(["grep", dev, "/proc/mounts"],
text=True, stdout=sp.PIPE, stderr=sp.PIPE)
retCode = p.poll()
if retCode is None:
time.sleep(interval)
else:
out, err = p.communicate() # This should not block but does!
if retCode == 0 and len(out) > 0:
out = out.split()[1]
break
else:
lg.info(f"exit code: {retCode} Error: {err}")
exit(1)
if timeout == 0:
p.terminate()
return out

Performance for recurring doopl calls

I'm using doopl.factory to solve multiple linear programs in a loop. I noticed a decreasing performance while looping through instances. Using memory_profiler shows that the memory increases after each call, which, eventually, leads to a very poor performance. It seems that doopl.factory.create_opl_model() and opl.run() somehow block memory that is not cleared with opl.end(). Is my analysis correct?
memory_profiler analysis screenshot
I set up a simple example to demonstrate the issue.
import doopl.factory, os, psutil
from memory_profiler import profile
#profile
def main():
dat = 'data.dat'
mod = 'model.mod'
print('memory before doopl: ' + str(psutil.Process(os.getpid()).memory_info().rss / 100000000) + ' GB')
with doopl.factory.create_opl_model(model=mod, data=dat) as opl:
try:
opl.mute()
opl.run()
opl.end() **# EDIT:** this is just to explicitly demonstrate with memory_profiler that opl.end() does not free all memory.
except:
'error'
print('memory after doopl: ' + str(psutil.Process(os.getpid()).memory_info().rss / 100000000) + ' GB')
if __name__ == "__main__":
main()
The data.dat file is empty and the model.mod file is as follows:
range X = 1..5;
dvar int+ y[X];
minimize sum(x in X) y[x];
subject to {
forall (x in X) {
y[x] <= 2;
};
};
Is there some way to fully clear memory after solving a model with doopl?
How can your code even work ?
opl.end() sets some internals to None. If you do:
with create_opl_model(model=mod) as opl:
opl.run()
opl.end()
opl.end() is actually called twice: once as opl.end() then once
when the context manager exits, resulting in an exception.
Please do not call opl.end() if you are using it as a context manager.
This is unless you have a very old version of doopl (>2 years). If so, please upgrade..
Now, in opl.end(), I can tell you that the C++ objects are correctly freed.
I'm not aware of any memory leak issues here (but a memory leak in OPL
must be demonstrated using c++, not a garbage collected language).
As far as I know, memory_profiler is based on process size using psutil.
There is no guaranty that when you release some memory, the process size
decreases (python might have release the memory, but the memory allocators
might not have returned the memory to the system).

Python keeps running for 10+mins (after last statement in program) when there is huge (33GB) data structure in memory (nothing in swap)

I have need to parse a huge gz file (about ~10GB compressed, ~100GB uncompressed). The code creates data structure ('data_struct') in memory. I am running on a machine with Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz with 16 CPUs and plenty RAM (ie 200+ GB), running CentOS-6.9. I have implemented these things using a Class in Python3.6.3 (CPython) as shown below :
class my_class():
def __init__(self):
cmd = f'gunzip huge-file.gz'
self.process = subprocess(cmd, stdout=subprocess.PIPE, shell=True)
self.data_struct = dict()
def populate_struct(self):
for line in process.stdout:
<populate the self.data_struct dictionary>
def __del__():
self.process.wait()
#del self.data_struct # presence/absence of this statement decreases/increases runtime respectively
#================End of my_class===================
def main():
my_object = my_class()
my_object.populate_struct()
print(f'~~~~ Finished populate_struct() ~~~~') # last statement in my program.
## Python keeps running at 100% past the previous statement for 10+mins
if __name__ == '__main__':
main()
#================End of Main=======================
The resident memory consumption of my data_struct in memory (RAM only, no swap) is about ~33GB. I did $ top to find the PID of Python process and traced the Python process using $ strace -p <PID> -o <out_file> (to see what Python is doing). While it is executing populate_struct(), I can see in the out_file of strace that Python is using calls like mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0684160000 to create data_struct. While Python was running past the last print() statement, I found that Python was issuing only munmap() operations as shown below :
munmap(0x2b3c75375000, 41947136) = 0
munmap(0x2b3c73374000, 33558528) = 0
munmap(0x2b4015d2a000, 262144) = 0
munmap(0x2b4015cea000, 262144) = 0
munmap(0x2b4015caa000, 262144) = 0
munmap(0x2b4015c6a000, 262144) = 0
munmap(0x2b4015c2a000, 262144) = 0
munmap(0x2b4015bea000, 262144) = 0
munmap(0x2b4015baa000, 262144) = 0
...
...
The Python keeps running anywhere between 10+ mins to 12mins after the last print() statement. An observation is that if I have del self.data_struct statement in __del__() method, then it takes 2mins only. I have done these experiments multiple times and the runtime decrease/increase by the presence/absence of del self.data_struct in __del__().
My questions :
I understanding is that Python is doing cleanup work by using munmap(), but unlike Python, other languages like Perl immediately release memory and exit the program. Am I doing it right by implementing as shown above ? Is there a way to tell Python to avoid this munmap() ?
Why does it take 10+mins to cleanup if there is no del self.data_struct statement in __del__(), and takes only 2mins to cleanup if there is del self.data_struct statement in __del__() ?
Is there a way to speedup the cleanup work ie munmap()?
Is there a way to exit program immediately without the cleanup work ?
Other thoughts/suggestions about tackling this problem are appreciated.
Please try a more recent version of Python (at least 3.8)? This shows several signs of being a mild(!) form of a worst-case quadratic-time algorithm in CPython's object deallocator, which was rewritten here (and note that the issue linked to here in turn contains a link to an older StackOverflow post with more details):
https://bugs.python.org/issue37029
Some glosses
If my guess is right, the amount of memory isn't particularly important - it's instead the sheer number of distinct Python objects being managed by CPython's "small object allocator" (obmalloc.c), combined with "bad luck" in the order in which their memory is released.
When that code was first written, RAM wasn't big enough to hold millions of Python objects, so nobody noticed that one particular part of deallocation logic could take time quadratic in the number of allocated "arenas" (details aren't really helpful, but "arenas" are the granularity at which system mmap() and munmap() calls are made - 256 KiB chunks).
It's not those mapping calls that are consuming mounds of time, and any decent implementation of any language using OS memory mapping facilities will eventually call munmap() enough times to release the OS resources consumed by its mmap() calls.
So that's a red herring. munmap() is being called many times simply because you allocated many objects, which required many mmap() calls.
There isn't any crisp or easy way to explain exactly when the problem shows up. See "bad luck" above ;-) The relevant code was rewritten for CPython 3.8 to be worst-case linear time instead, which gave a factor of ~250 speedup for the specific program that triggered the issue report (see the link already given).
As a comment noted, you can exit your program immediately at any time by invoking os._exit(), but the leading underscore is meant to scare you off: "immediately" means "immediately". No cleanups of any kind are performed. For example, the __del__ method in your class? Skipped. __del__ is run as a side effect of deallocation, but if you actually "immediately release memory and exit the program" then no destructors of any kind are run, nor any handlers registered with the atexit module, etc etc. It's as drastic as a program dying, e.g., with a segfault.

How to monitor process creation and statistics using kernel module

I wrote a kernel module to monitor cpu and memory time series. Additionally to that, I would like to log all process creations (and their meta date like pid, cmdline, ...) and also exists with their statistics like total I/O and CPU usage.
The main questions is: Can I create a kind of listener to process creation and exit? Especially on exit, I would also need the meta information for the process. How can this be done?
What you're describing sounds eerily like the Linux process accounting system, which already exists in the kernel. If it isn't an exact fit, your best bet will be to consider extending it, rather than building something entirely new.
Another existing system to look at will be the process events connector, which can be used to notify userspace processes when other processes are created and exit.
I know you are talking about monitoring processes using a Linux kernel module.
But I think it worth mention the python module psutil. Even if it is a user-space solution.
It is a very complete tool that allows monitoring processes about the resources they are using, memory, disk, CPU.
Some examples from the documentation:
Getting CPU usage for some process
>>> import psutil
>>> p = psutil.Process()
>>> # blocking
>>> p.cpu_percent(interval=1)
2.0
>>> # non-blocking (percentage since last call)
>>> p.cpu_percent(interval=None)
2.9
Getting memory info
>>> import psutil
>>> p = psutil.Process()
>>> p.memory_info()
pmem(rss=15491072, vms=84025344, shared=5206016, text=2555904, lib=0, data=9891840, dirty=0)
And the very interesting open_files
>>> import psutil
>>> f = open('file.ext', 'w')
>>> p = psutil.Process()
>>> p.open_files()
[popenfile(path='/home/giampaolo/svn/psutil/file.ext', fd=3, position=0, mode='w', flags=32769)]
The process creation time
>>> import psutil, datetime
>>> p = psutil.Process()
>>> p.create_time()
1307289803.47
>>> datetime.datetime.fromtimestamp(p.create_time()).strftime("%Y-%m-%d %H:%M:%S")
'2011-03-05 18:03:52'
Of course, you can query info for any process running in your target system just provide the pid to psutil.Process like this: psutil.Process(pid)

Tensorflow shared queue in PS server with reader in workers

I am running a distributed Tensorflow program with large input files (150 MB per example).
I wish to have a shared input queue for file names in the PS in order for the workers to work on different examples.
I want the CPU of each worker to then read the shared input queue and generate data for the GPU to process.
The code below is only run by the workers:
with tf.Graph().as_default():
with tf.device('/job:ps/replica:0/task:0'):
file_queue =tf.train.string_input_producer(file_paths, shared_name='train_queue')
with tf.device('/cpu:0'):
input_tensors = model.input_fn(file_queue, ...)
# sets variables to PS and ops default to GPU
with tf.device(tf.train.replica_device_setter(cluster=cluster_spec)):
output_tensors = model.model_fn(input_tensors, ...)
However, the Reader() for file_queue (which is inside model.input_fn()) is being placed in the PS instead of in the workers' CPUs, as treid to specify using tf.device().
This causes 150 MB messages being sent between the PS and the workers, which slows down training (I only notice this because google-cloud ml engine raises a warning when large messages are being sent).
Why is the Reader() not being placed on the workers' CPU?
Is it mandatory for a queue and its reader to be on the same device?
Here is a link to my previous context which might provide more context.
Here is the code for input_fn():
def input_fn(file_queue, ...):
reader = tf.TFRecordReader()
_, example = reader.read(file_queue)
image, ground_truth = my_decoder(example)
image, ground_truth = tf.train.shuffle_batch([image, ground_truth], ...)
return image, ground_truth
The problem is tf.TFRecordReader() is being placed in the PS. All the other ops (decoder and batch) are correctly placed in the workers' CPUs.

Resources