"All executable devices in the domain are busy" - redhawksdr

When I try to run a very simple waveform from the command line, I get the error shown below. I'm running something incredibly simple, and there is no way the GPP is completely utilized. Load is .04 on a 2 core machine without the waveform running.
This is a somewhat random error. Sometimes it works, usually it does not.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/redhawk/core/lib/python/ossie/utils/redhawk/core.py", line 1793, in createApplication
app = app_factory.create(name, initConfiguration, [])
File "/usr/local/redhawk/core/lib/python/ossie/cf/cf_idl.py", line 2026, in create
return _omnipy.invoke(self, "create", _0_CF.ApplicationFactory._d_create, args)
ossie.cf.CF.CreateApplicationError: CF.ApplicationFactory.CreateApplicationError(errorNumber=CF_ENOSPC, msg="Unable to launch component 'add_constant'. All executable devices (i.e.: GPP) in the Domain are busy")

The default settings for the GPP are for a multi core machine. For limited CPU resource machines, you can change the following settings to ignore the idle setting calculations.
reservered_capacity_per_component=0
thresholds:
cpu_idle = 0

Related

Error using MultiWorkerMirroredStrategy to train object detection research model ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8

I'm trying to train research model ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8 using the MultiWorkerMirroredStrategy (by setting --num_workers=2 in the invocation of model_main_tf2.py). I'm trying to train across two workers (0 and 1), each with a single GPU. However, when I attempt this I get the following error, always on worker 1:
Traceback (most recent call last):
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 553, in __next__
return self.get_next()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 610, in get_next
return self._get_next_no_partial_batch_handling(name)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 642, in _get_next_no_partial_batch_handling
replicas.extend(self._iterators[i].get_next_as_list(new_name))
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 1594, in get_next_as_list
return self._format_data_list_with_options(self._iterator.get_next())
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\data\ops\multi_device_iterator_ops.py", line 580, in get_next
result.append(self._device_iterators[i].get_next())
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 889, in get_next
return self._next_internal()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\data\ops\iterator_ops.py", line 819, in _next_internal
ret = gen_dataset_ops.iterator_get_next(
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\ops\gen_dataset_ops.py", line 2922, in iterator_get_next
_ops.raise_from_not_ok_status(e, name)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\framework\ops.py", line 7186, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence [Op:IteratorGetNext]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\JS\Desktop\Tensorflow\models\research\object_detection\model_main_tf2.py", line 114, in <module>
tf.compat.v1.app.run()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\platform\app.py", line 36, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\absl\app.py", line 312, in run
_run_main(main, args)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\absl\app.py", line 258, in _run_main
sys.exit(main(argv))
File "C:\Users\JS\Desktop\Tensorflow\models\research\object_detection\model_main_tf2.py", line 105, in main
model_lib_v2.train_loop(
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\object_detection\model_lib_v2.py", line 605, in train_loop
load_fine_tune_checkpoint(
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\object_detection\model_lib_v2.py", line 401, in load_fine_tune_checkpoint
_ensure_model_is_built(model, input_dataset, unpad_groundtruth_tensors)
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\object_detection\model_lib_v2.py", line 161, in _ensure_model_is_built
features, labels = iter(input_dataset).next()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 549, in next
return self.__next__()
File "C:\Users\JS\.conda\envs\tensor2\lib\site-packages\tensorflow\python\distribute\input_lib.py", line 555, in __next__
raise StopIteration
StopIteration
Worker 0 eventually fails after detecting that worker 1 has gone down.
This error happens regardless of the physical machines on which the two workers run. In other words I see it if I'm running both workers on a single machine (using localhost) OR different machines on the same network.
Based on the trace in the error messages, the error appears to be occurring whenever the training loop attempts to iterate over the training data generated by strategy.experimental_distribute_datasets_from_function. Note that if I change the strategy to MirroredStrategy it runs fine on a single machine (no other changes made). I'm not sure if I'm doing something wrong or if there is a bug in the object detection API.
My setup on both machines is identical (I basically followed the setup instructions on the object detection web-site):
Windows 10
Tensorflow 2.8.0
Cuda Toolkit 11.2
cudnn 8.1
Has anyone ever seen this error before? If so, is there a way around it?
Ok, I think I understand the issue. In the object detection library there is a file called dataset_builder.py that builds the training dataset from the TFRecord stored in the file specified in the pipeline.config file (in the input_path item of the tf_record_input_reader). The function that actually reads the TFRecord file is _read_dataset_internal. This function treats the input_path of the pipeline config as a LIST OF FILES and then applies a sharding function (passed as an argument) to divide the files between the replicas doing the training (one replica per worker). Since my input_path only specified a single TFRecord file it was assigned to the first replica and the other replicas were given empty filenames!! Thus only the first replica actually had an input dataset to work with, hence the crash.
The solution was to split the training data across two files (two TFRecords) and then set the input_path in pipeline.config to be a list of paths rather than a single path. Once I did this it appears as though the model trained successfully (at least it didn't crash).
I'm not sure if this is a bug in the object detection code or not. I assumed that if I only had one training record (visible to both workers) that both workers would use it and just batch the data accordingly. I'm just not sure if the assumption itself is wrong or if the assumption is correct and the code is wrong.
Anyway, I this helps anyone who might be wrestling with the same issue.

LIBUSB_ERROR_BUSY [-6] Sending a basic packet with Python 3.9 using LibUSB1

I am developing some software in python 3.9 and I am at the point where I have a device connected to my USB port and would like to send a basic packet to test the interface before I proceed. I am using this example to try and get my interface to work. I am not bothered about speed or byte count. I would like to see any response on the interface (But on reflection Im wondering if usb speed could be the issue):
import usb1
import usb.util
import os
import sys
import libusb
import usb.core
from usb import util
import math
dev = usb.core.find(idVendor=0x11ac,idProduct=0x317d)
with usb1.USBContext() as context:
handle=context.openByVendorIDAndProductID(
0x11ac,
0x317d,)
handle.claimInterface(0)
handle.setInterface(0)
data = bytearray(b"\\xf0\\x0f"* (int(math.ceil(0xb5db91/4.0))))
handle.controlWrite(0x40, 0xb0, 0xb5A6, 0xdb91, b"")
handle.bulkWrite(2,data,timeout=5000)
`
https://github.com/vpelletier/python-libusb1/issues/21
I have had a look in various forums for several days and cannot seem to get an answer that works. Here is the trace: Its worth noting that from time to time, this py file does run without error but does nothing, and I see no traffic traveling to the USB interface.
Can someone please help me configure a working example of how to send a packet to the interface? I have tried various things like detaching the kernel, setting configuration, etc.
For 4 days I have struggled with libusb01 & 10, after discovering libusb1, I have changed my wrapper and got a lot more success
I also see a lot of examples in forums like this one, and I always get the same response. Also Im curious as to where it appears that 0x40 is the endpoint(out)
Traceback (most recent call last):
File "/home/jbgilbert/Desktop/Packets/Backend_replace.py", line 16, in <module>
handle.claimInterface(0)
File "/usr/lib/python3/dist-packages/usb1/__init__.py", line 1213, in claimInterface
libusb1.libusb_claim_interface(self.__handle, interface),
File "/usr/lib/python3/dist-packages/usb1/__init__.py", line 133, in mayRaiseUSBError
__raiseUSBError(value)
File "/usr/lib/python3/dist-packages/usb1/__init__.py", line 125, in raiseUSBError
raise __STATUS_TO_EXCEPTION_DICT.get(value, __USBError)(value)
usb1.USBErrorBusy: LIBUSB_ERROR_BUSY [-6]
My device is a laptop, using lsmod reveals all devices linked to that particular end point, in this case because if the presence of a webcam I was unable to write to an available end point. Despite disabling the driver I had no avail and had to try to code on a machine that had less onboard accessories that proved more successful

play mp3 without default player

I'm trying play mp3 without default player use, so I want try pyglet, but nothing works
import pyglet
music = pyglet.resource.media('D:/folder/folder/audio.mp3')
music.play()
pyglet.app.run()
I've tried it this way
music = pyglet.resource.media('D:\folder\folder\audio.mp3')
and like this:
music = pyglet.resource.media('D:\\folder\\folder\\audio.mp3')
but have this error
Traceback (most recent call last):
File "C:\Users\User\AppData\Roaming\Python\Python35\site-packages\pyglet\resource.py", line 624, in media
location = self._index[name]
KeyError: 'D:\folder\folder\audio.mp3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PC/PyCharm_project/0_TEMP.py", line 3, in <module>
music = pyglet.resource.media('D:\folder\folder\audio.mp3')
File "C:\Users\User\AppData\Roaming\Python\Python35\site-packages\pyglet\resource.py", line 634, in media
raise ResourceNotFoundException(name)
pyglet.resource.ResourceNotFoundException: Resource "D:\folder\folder\audio.mp3" was not found on the path. Ensure that the filename has the correct captialisation.
It's because of the way you load the resource.
Try this code for instance:
import pyglet
music = pyglet.media.load(r'D:\folder\folder\audio.mp3')
music.play()
pyglet.app.run()
The way this works is that it loads a file and places it in a pyglet.resource.media container. Due to namespaces and other things, the code you wrote is only allowed to load resources from the working directory. So instead, you use pyglet.media.load which is able to load the resource you need into the current namespace (Note: I might be missusing the word "namespace" here, for lack of a better term without looking at the source code of pyglet, this is the best description I could come up with).
You could experiment by placing the .mp3 in the script folder and run your code again but with a relative path:
import pyglet
music = pyglet.resource.media('audio.mp3')
music.play()
pyglet.app.run()
But I'd strongly suggest you use pyglet.media.load() and have a look at the documentation

shelve archives not openable on other computer

I have saved some objects with shelve. In an other file I was able to restore these objects. But when I copy the archive to an other computer shelve gives me a _gdbm.error: File read error. The packages which hold the classes of the stored objects are directly accessible on both computers (but they are stored in different locations and added with PYTHONPATH). Both machines run on Ubuntu 13.10, one is 32bit and the other 64bit.
Shouldn't these archives be machine independent?
On the 64bit machine I get
>>> import shelve
>>> shelve.open('arch.db')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.3/shelve.py", line 232, in open
return DbfilenameShelf(filename, flag, protocol, writeback)
File "/usr/lib/python3.3/shelve.py", line 216, in __init__
Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
File "/usr/lib/python3.3/dbm/__init__.py", line 94, in open
return mod.open(file, flag, mode)
_gdbm.error: File read error
and on the 32bit machine it works.
When I create an archive on the 64bit machine, it is openable on the 32bit machine, but the interactive python prompt crashes:
>>> import shelve
>>> s = shelve.open('arch.db')
>>> for i in s.items(): print(i)
...
gdbm fatal: lseek error
I don't even get a traceback.
It is really annoying, I intended to work on both computers, but at the moment I am bound to my slow 32bit eeepc, because I've already saved a lot into the archive.
The problem is that shelve uses gdbm (provided as dbm.gnu as default backend to store the serialized objects. The files created with gdbm depend on the architecture of the system and are therefore only usable on 32bit or 64bit systems.
There are some tools (gdbmexport or gdbm_dump) which allow you to convert the gdbm files, however, the workflow becomes error prone if you want to access the files from both systems on a regular basis.
Fortunately, python provides different backends: dbm.gnu, dbm.ndbm and dbm.dumb. The latter two are platform independent.
import shelve
import dbm
dbm._defaultmod = dbm.ndbm
db = shelve.open('somename')
The database with the above code can be used on 64bit and 32bit systems.
The default backend has to be set only when you create the file. dbm checks the file type of your database before opening and uses the correct backend.
Please note that the above code changes the default dbm for the whole python process. Another component might break if it relies on gdbm being the default.

Linux bluetooth serial can only be opened once

I have a bluetooth serial port and am trying to connect to it. I do:
sudo rfcomm bind 13 00:0A:3A:26:4A:86
and that seems to succeed. I then try to access it:
>>> f=file('/dev/rfcomm13','rw')
>>> f.close()
>>> f=file('/dev/rfcomm13','rw')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 16] Device or resource busy: '/dev/rfcomm13'
This seems to be consistent -- it can be opened once, but ever afterwards it is "busy" until I unbind and rebind it. Doing this in python shows the error most clearly, but it seems to happen everywhere.
It seems that the close syscall isn't cleaning up some key resource. From a quick skim of the driver source, it's probably a dlci channel or something similar, but I'm pretty vague on what those are.
Is there any way to open the connection multiple times per bind?
Thanks

Resources