I'm surprised this is proving to be difficult to find.
I need to detect when a USB block device with a specific partition label is added (plugged in) using python3.
Is there a way to use pyudev to provide a list of USB block devices? How can I specify a filter with subsystem="block" AND subsystem="usb", they seem to be mutually exclusive filters.
When a USB device having a partition named "XYZ" is plugged in, I need to run a script to mount it and run a program that uses the data on that partition.
I have tried too many variations to count, from various udev rules, systemd units, many scripts and combinations thereof, but have not had any success until I used the following code. It worked but caused 100% CPU load. When I added sleep time in the while loop at the end it no longer worked at all, and even prevented PCmanFM automounting as well.
The issue was in the usbEvent.py process. I could run it from the command line and it worked just fine. First thing it does is use Popen to call "grep devName /proc/mounts" to wait for the automounter to mount the partition. The Popen is called in a loop and adding some time.sleep eliminated the CPU burden tho that was surprising since the mount point appears in under a few seconds.
There seems to be some interplay between the code below systemd runs and the usbEvent.py process it spawns that I don't fully understand. They are separate processes so I would think they should be quite independent of each other.
The usbEvent.py handler works but it takes much longer to recognize the mount and continue. While it's running it consumes around 5% of the CPU, and only 0.3 when it finishes. Why it doesn't end when the timeout is over must be due to p.communicate, but if p.poll does NOT return None the process should be complete and should not block... but it does! Why?
The platform is a Raspberry Pi4 with 8GB RAM and January 2021 Raspbery Pi OS release.
#!/usr/bin/env python3
import os
import time
import subprocess as sp
import pyudev
# This code is run on boot via systemd to detect when
# my custom USB storage device (USB stick, SSD etc)
# is inserted or removed. It spawns a new process to
# handle the event.
context = pyudev.Context()
monitor = pyudev.Monitor.from_netlink(context)
monitor.filter_by('block', device_type="partition")
def log_event(action, device):
devName = device.get('DEVNAME')
devLabel = device.get('ID_FS_LABEL')
if devLabel == "MY_CUSTOM_USB":
sp.Popen(["/home/user/bin/customUSB/usbEvent.py",
action, devName, devLabel],
stdin=sp.DEVNULL, stdout=sp.DEVNULL, stderr=sp.DEVNULL)
observer = pyudev.MonitorObserver(monitor, log_event)
observer.start()
while True:
# pass
time.sleep(0.1)
Here is the portion of the usbEvent.py handler that I changed to get it working:
# Waits for mount point of "dev" to appear and returns it. Communnicate
def getMountPoint(dev):
out = ""
interval = 0.1
timeout = 5 / interval
while timeout > 0:
p = sp.Popen(["grep", dev, "/proc/mounts"],
text=True, stdout=sp.PIPE, stderr=sp.PIPE)
retCode = p.poll()
if retCode is None:
time.sleep(interval)
else:
out, err = p.communicate() # This should not block but does!
if retCode == 0 and len(out) > 0:
out = out.split()[1]
break
else:
lg.info(f"exit code: {retCode} Error: {err}")
exit(1)
if timeout == 0:
p.terminate()
return out
Related
I have a user with two GPU's; the first one is AMD which can't run CUDA, and the second one is a cuda-capable NVIDIA GPU. I am using the code model.half().to("cuda:0"). I'm not sure if the invocation successfully used the GPU, nor am I able to test it because I don't have any spare computer with more than 1 GPU lying around.
In this case, does "cuda:0" mean the first device which can run CUDA, so it would've worked even if their first device was AMD? Or would I need to say "cuda:1" instead? How would I detect which number is the first CUDA-capable device?
The package nvidia-smi can help to track GPU's memory while running your code.
To install, run pip install nvidia-ml-py3. Take a look at this code snip:
import nvidia_smi
cuda_idx = 0 # edit device index that you want to track
to_cuda = f'cuda:{cuda_idx}' # 'cuda:0' in this case
nvidia_smi.nvmlInit()
handle = nvidia_smi.nvmlDeviceGetHandleByIndex(cuda_idx)
def B2G(num):
return round(num/(1024**3),2)
def print_memory(name, handle, pre_used):
info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
used = info.used
print(f'{name}: {B2G(used)}')
print(f'This step use: {B2G(used-pre_used)}')
print('------------')
return used
# start
mem = print_memory('Start', handle, 0)
model = ... # init your model
model.to(to_cuda)
mem = print_memory('Init model', handle, mem)
Above is the example with nvidia-smi that can help to track the memory that needs for each part of the model and print it in GB unit.
Edited: To check the list of GPUs:
def check_gpu():
for i in range(torch.cuda.device_count()):
device_name = f'cuda:{i}'
print(f'{i} device name:{torch.cuda.get_device_name(torch.device(device_name))}')
I tested it and as I suspected the model.half().to("cuda:0") will put your model in the first available GPU with CUDA support i.e. NVIDIA GPU in your case, the AMD GPU isn't visible as a cuda device, feel safe to assume cuda:0 is only a CUDA enabled GPU, and AMD GPU won't be seen by your program.
Have a good day.
There are plenty of methods of torch.cuda to query and monitor GPU devices.
For example, you can check the type of each device:
torch.cuda.get_device_name(torch.device('cuda:0'))
% or
torch.cuda.get_device_name(torch.device('cuda:1'))
In my case, the output of get_device_name returns:
'Quadro RTX 6000'
If you want a more programmatic way to explore the properties of your devices, you can use torch.cuda.get_device_properties.
Once you are working with a device (or believe you are), you can use [torch.cuda]'s memory management functions to monitor GPU memory usage.
For instance, you can get a very detailed account of the current state of your device's memory using:
torch.cuda.memory_stats(torch.device('cuda:0'))
% or
torch.cuda.memory_stats(torch.device('cuda:0'))
If you want nvidia-smi-like stats on utilization, you can use torch.cuda.utilization
Currently I have a program which scans all the available ble devices and then updates it to a list bt_addrs. This list is then passed to a function named data() which performs the work. To go over all the elements in the list I have used ProcessPoolExecutor.
One issue that I am facing is the scanning happens only for one time and then it stops. I want to make the scanning function to execute continuously so that if any new device comes in the vicinity, it will get added to the list and list should get updated and data function should work on that updated list in parallel.
Having an intermediate knowledge about Python and multiprocessing is an advance topic , a help would definitely help me increase my knowledge and help me get the desired results of the whole program.
If you have any other approach, please tell.
I am running this code on Raspberry Pi 4- quad core Arm v8 processor, python version 3.7.3
Below is the code I currently have
import sys
import time
from bluepy import btle
from bluepy.btle import Scanner
import concurrent.futures
import signal
bt_addrs = []
def data(mac_adrs3):
while True:
// main works happen here
def main():
scanner = Scanner() # scanner ovject
devices = scanner.scan(30.0) # scans for 30 sec
available_devices=[]
for dev in devices:
available_devices.append(dev.addr) # all the MAC address are stored
bt_addrs = available_devices
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.map(data,bt_addrs) #data fun runs on all the elements in list bt_addrs
if __name__=="__main__":
main()
I have need to parse a huge gz file (about ~10GB compressed, ~100GB uncompressed). The code creates data structure ('data_struct') in memory. I am running on a machine with Intel(R) Xeon(R) CPU E5-2667 v4 # 3.20GHz with 16 CPUs and plenty RAM (ie 200+ GB), running CentOS-6.9. I have implemented these things using a Class in Python3.6.3 (CPython) as shown below :
class my_class():
def __init__(self):
cmd = f'gunzip huge-file.gz'
self.process = subprocess(cmd, stdout=subprocess.PIPE, shell=True)
self.data_struct = dict()
def populate_struct(self):
for line in process.stdout:
<populate the self.data_struct dictionary>
def __del__():
self.process.wait()
#del self.data_struct # presence/absence of this statement decreases/increases runtime respectively
#================End of my_class===================
def main():
my_object = my_class()
my_object.populate_struct()
print(f'~~~~ Finished populate_struct() ~~~~') # last statement in my program.
## Python keeps running at 100% past the previous statement for 10+mins
if __name__ == '__main__':
main()
#================End of Main=======================
The resident memory consumption of my data_struct in memory (RAM only, no swap) is about ~33GB. I did $ top to find the PID of Python process and traced the Python process using $ strace -p <PID> -o <out_file> (to see what Python is doing). While it is executing populate_struct(), I can see in the out_file of strace that Python is using calls like mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0684160000 to create data_struct. While Python was running past the last print() statement, I found that Python was issuing only munmap() operations as shown below :
munmap(0x2b3c75375000, 41947136) = 0
munmap(0x2b3c73374000, 33558528) = 0
munmap(0x2b4015d2a000, 262144) = 0
munmap(0x2b4015cea000, 262144) = 0
munmap(0x2b4015caa000, 262144) = 0
munmap(0x2b4015c6a000, 262144) = 0
munmap(0x2b4015c2a000, 262144) = 0
munmap(0x2b4015bea000, 262144) = 0
munmap(0x2b4015baa000, 262144) = 0
...
...
The Python keeps running anywhere between 10+ mins to 12mins after the last print() statement. An observation is that if I have del self.data_struct statement in __del__() method, then it takes 2mins only. I have done these experiments multiple times and the runtime decrease/increase by the presence/absence of del self.data_struct in __del__().
My questions :
I understanding is that Python is doing cleanup work by using munmap(), but unlike Python, other languages like Perl immediately release memory and exit the program. Am I doing it right by implementing as shown above ? Is there a way to tell Python to avoid this munmap() ?
Why does it take 10+mins to cleanup if there is no del self.data_struct statement in __del__(), and takes only 2mins to cleanup if there is del self.data_struct statement in __del__() ?
Is there a way to speedup the cleanup work ie munmap()?
Is there a way to exit program immediately without the cleanup work ?
Other thoughts/suggestions about tackling this problem are appreciated.
Please try a more recent version of Python (at least 3.8)? This shows several signs of being a mild(!) form of a worst-case quadratic-time algorithm in CPython's object deallocator, which was rewritten here (and note that the issue linked to here in turn contains a link to an older StackOverflow post with more details):
https://bugs.python.org/issue37029
Some glosses
If my guess is right, the amount of memory isn't particularly important - it's instead the sheer number of distinct Python objects being managed by CPython's "small object allocator" (obmalloc.c), combined with "bad luck" in the order in which their memory is released.
When that code was first written, RAM wasn't big enough to hold millions of Python objects, so nobody noticed that one particular part of deallocation logic could take time quadratic in the number of allocated "arenas" (details aren't really helpful, but "arenas" are the granularity at which system mmap() and munmap() calls are made - 256 KiB chunks).
It's not those mapping calls that are consuming mounds of time, and any decent implementation of any language using OS memory mapping facilities will eventually call munmap() enough times to release the OS resources consumed by its mmap() calls.
So that's a red herring. munmap() is being called many times simply because you allocated many objects, which required many mmap() calls.
There isn't any crisp or easy way to explain exactly when the problem shows up. See "bad luck" above ;-) The relevant code was rewritten for CPython 3.8 to be worst-case linear time instead, which gave a factor of ~250 speedup for the specific program that triggered the issue report (see the link already given).
As a comment noted, you can exit your program immediately at any time by invoking os._exit(), but the leading underscore is meant to scare you off: "immediately" means "immediately". No cleanups of any kind are performed. For example, the __del__ method in your class? Skipped. __del__ is run as a side effect of deallocation, but if you actually "immediately release memory and exit the program" then no destructors of any kind are run, nor any handlers registered with the atexit module, etc etc. It's as drastic as a program dying, e.g., with a segfault.
Here is the code that downloads the same page 10 times:
app = QApplication([])
event = threading.Event()
def load(url):
def _load_finished(ok):
event.set()
web_view = QWebView()
web_view.loadFinished.connect(_load_finished)
event.clear()
web_view.setUrl(QUrl(url));
while not event.wait(.05): app.processEvents()
web_view.loadFinished.disconnect(_load_finished)
return web_view.page().mainFrame().documentElement()
QWebSettings.setMaximumPagesInCache(0)
QWebSettings.setObjectCacheCapacities(0, 0, 0)
if __name__ == '__main__':
for i in range(10):
load('http://www.huffingtonpost.com/')
QWebSettings.clearMemoryCaches()
QWebSettings.clearIconDatabase()
print(i)
app.exec_()
And here is Process Explorer's snapshot after 7th download:
At 10th download memory reaches 270MB.
Is this normal? How do I fix it?
Oddly enough, depending on the address, consumption may fluctuate, but stay below certain threshold (here it's 90MB):
Have stumbled onto this answer. Quoting comment in QT sources:
Dead resources in the cache are kept in non-purgeable memory.
When we prune dead resources, instead of freeing them, we mark their memory as purgeable and
keep the resources until the
kernel reclaims the purgeable memory.
By leaving the in-cache
dead resources in dirty resident memory, we decrease the likelihood of
the kernel claiming that memory and forcing us to refetch the
resource (for example when a user presses back).
This sort of settles it.. and relives my restless soul.
Following bms20's advice I run QtWebKit code in a separate process (using subprocess.Popen) and cache web resources on disk (PyQt5.QtNetwork.QNetworkDiskCache) to preserve traffic:
def ExecuteCode(code):
import os
os.environ['PYTHONIOENCODING'] = 'utf-8' #Optionally
from subprocess import Popen, PIPE, STDOUT
proc = Popen('python.exe', stdin=PIPE)
out, err = proc.communicate(code.encode())
Part of code content:
cache = QNetworkDiskCache()
cache.setCacheDirectory('cache')
web_view = QWebView()
web_view.page().networkAccessManager().setCache(cache)
# Do stuff with web_page
I wrote a little program which records voice from the microphone and sends it over network and plays it there. I'm using PyAudio for this task. It works almost fine but on both computers i get errors from ALSA that an underrun occurred. I googled a lot about it and now I know what an underrun even is. But I still don't know how to fix the problem. Most of the time the sound is just fine. But it sounds a little bit strange if underruns occur. Is there anything I should take care of in my code? It feels like I'm doing an simple error and I miss it.
My system: python: python3.3, OS: Linux Mint Debian Edition UP7, PyAudio v0.2.7
Have you considered syncing sound?
You didn't provide the code, so my guess is that you need to have a timer in separate thread, that will execute every CHUNK_SIZE/RATE milliseconds code that looks like this:
silence = chr(0)*self.chunk*self.channels*2
out_stream = ... # is the output stream opened in pyaudio
def play(data):
# if data has not arrived, play the silence
# yes, we will sacrifice a sound frame for output buffer consistency
if data == '':
data = silence
out_stream.write(data)
Assuming this code will execute regularly, this way we will always supply some audio data to output audio stream.
It's possible to prevent the underruns by filling silence in if needed.
That looks like that:
#...
data = s.recv(CHUNK * WIDTH) # Receive data from peer
stream.write(data) # Play sound
free = stream.get_write_available() # How much space is left in the buffer?
if free > CHUNK # Is there a lot of space in the buffer?
tofill = free - CHUNK
stream.write(SILENCE * tofill) # Fill it with silence
#...
The solution for me was to buffer the first 10 packets/frames of the recorded sound. Look at the snippet below
BUFFER = 10
while len(queue) < BUFFER:
continue
while running:
recorded_frame = queue.pop(0)
audio.write(recorded_frame)