logging multithreading deadlock in python - multithreading

I run 10 processes with 10 threads per each, and they constantly and quite often write to 10 log file (one per process) using logging.info() & logging.debug() during 30 seconds.
Once, usually after 10 seconds, there happens a deadlock. Processes stops processing (all of them).
gdp python [pid] with py-bt & info threads shows that it stuck here:
Id Target Id Frame
* 1 Thread 0x7ff50f020740 (LWP 1622) "python" 0x00007ff50e8276d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x564f17c8aa80)
at ../sysdeps/unix/sysv/linux/futex-internal.h:205
2 Thread 0x7ff509636700 (LWP 1624) "python" 0x00007ff50eb57bb7 in epoll_wait (epfd=8, events=0x7ff5096351d0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
3 Thread 0x7ff508e35700 (LWP 1625) "python" 0x00007ff50eb57bb7 in epoll_wait (epfd=12, events=0x7ff508e341d0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
4 Thread 0x7ff503fff700 (LWP 1667) "python" 0x00007ff50e8276d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x564f17c8aa80)
at ../sysdeps/unix/sysv/linux/futex-internal.h:205
...[threads 5-6 like 4]...
7 Thread 0x7ff5027fc700 (LWP 1690) "python" 0x00007ff50eb46187 in __GI___libc_write (fd=2, buf=0x7ff50967bc24, nbytes=85) at ../sysdeps/unix/sysv/linux/write.c:27
...[threads 8-13 like 4]...
Stack of thread 7:
Traceback (most recent call first):
File "/usr/lib/python2.7/logging/__init__.py", line 889, in emit
stream.write(fs % msg)
...[skipped useless lines]...
And this code (I guess the code of logging __init__ function):
884 #the codecs module, but fail when writing to a
885 #terminal even when the codepage is set to cp1251.
886 #An extra encoding step seems to be needed.
887 stream.write((ufs % msg).encode(stream.encoding))
888 else:
>889 stream.write(fs % msg)
890 except UnicodeError:
891 stream.write(fs % msg.encode("UTF-8"))
892 self.flush()
893 except (KeyboardInterrupt, SystemExit):
894 raise
Stack of the rest threads is similar -- waiting for GIL:
Traceback (most recent call first):
Waiting for the GIL
File "/usr/lib/python2.7/threading.py", line 174, in acquire
rc = self.__block.acquire(blocking)
File "/usr/lib/python2.7/logging/__init__.py", line 715, in acquire
self.lock.acquire()
...[skipped useless lines]...
Its written that package logging is multithreaded without additional locks. So why does package logging may deadlock? Does it open too many file descriptors or limit anything else?
That's how I initialize it (if it is important):
def get_logger(log_level, file_name='', log_name=''):
if len(log_name) != 0:
logger = logging.getLogger(log_name)
else:
logger = logging.getLogger()
logger.setLevel(logger_state[log_level])
formatter = logging.Formatter('%(asctime)s [%(levelname)s][%(name)s:%(funcName)s():%(lineno)s] - %(message)s')
# file handler
if len(file_name) != 0:
fh = logging.FileHandler(file_name)
fh.setLevel(logging.DEBUG)
fh.setFormatter(formatter)
logger.addHandler(fh)
# console handler
console_out = logging.StreamHandler()
console_out.setLevel(logging.DEBUG)
console_out.setFormatter(formatter)
logger.addHandler(console_out)
return logger

Problem was because I've been writing output to console & to file, but all those processes were initialized with redirection to pipe, which was never listened.
p = Popen(proc_params,
stdout=PIPE,
stderr=STDOUT,
close_fds=ON_POSIX,
bufsize=1
)
So it seems pipes in this case have it's buffer size limit, and after filling it deadlocks.
Here the explanation: https://docs.python.org/2/library/subprocess.html
Note
Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes.
It's done for functions which I don't use, but it seems valid for Popen run, if then you don't read out the pipes.

Related

mpi4py irecv causes segmentation fault

I'm running following code which sends an array from rank 0 to 1 using command mpirun -n 2 python -u test_irecv.py > output 2>&1.
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
asyncr = 1
size_arr = 10000
if comm.Get_rank()==0:
arrs = np.zeros(size_arr)
if asyncr: comm.isend(arrs, dest=1).wait()
else: comm.send(arrs, dest=1)
else:
if asyncr: arrv = comm.irecv(source=0).wait()
else: arrv = comm.recv(source=0)
print('Done!', comm.Get_rank())
Running in synchronous mode with asyncr = 0 gives the expected output
Done! 0
Done! 1
However running in asynchronous mode with asyncr = 1 gives errors as follows.
I need to know why it runs okay in synchronous mode and not so in asynchronous mode.
Output with asyncr = 1:
Done! 0
[nia1477:420871:0:420871] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x138)
==== backtrace ====
0 0x0000000000010e90 __funlockfile() ???:0
1 0x00000000000643d1 ompi_errhandler_request_invoke() ???:0
2 0x000000000008a8b5 __pyx_f_6mpi4py_3MPI_PyMPI_wait() /tmp/eb-A2FAdY/pip-req-build-dvnprmat/src/mpi4py.MPI.c:49819
3 0x000000000008a8b5 __pyx_f_6mpi4py_3MPI_PyMPI_wait() /tmp/eb-A2FAdY/pip-req-build-dvnprmat/src/mpi4py.MPI.c:49819
4 0x000000000008a8b5 __pyx_pf_6mpi4py_3MPI_7Request_34wait() /tmp/eb-A2FAdY/pip-req-build-dvnprmat/src/mpi4py.MPI.c:83838
5 0x000000000008a8b5 __pyx_pw_6mpi4py_3MPI_7Request_35wait() /tmp/eb-A2FAdY/pip-req-build-dvnprmat/src/mpi4py.MPI.c:83813
6 0x00000000000966a3 _PyMethodDef_RawFastCallKeywords() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Objects/call.c:690
7 0x000000000009eeb9 _PyMethodDescr_FastCallKeywords() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Objects/descrobject.c:288
8 0x000000000006e611 call_function() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:4563
9 0x000000000006e611 _PyEval_EvalFrameDefault() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:3103
10 0x0000000000177644 _PyEval_EvalCodeWithName() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:3923
11 0x000000000017774e PyEval_EvalCodeEx() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:3952
12 0x000000000017777b PyEval_EvalCode() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/ceval.c:524
13 0x00000000001aab72 run_mod() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/pythonrun.c:1035
14 0x00000000001aab72 PyRun_FileExFlags() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/pythonrun.c:988
15 0x00000000001aace6 PyRun_SimpleFileExFlags() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Python/pythonrun.c:430
16 0x00000000001cad47 pymain_run_file() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:425
17 0x00000000001cad47 pymain_run_filename() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:1520
18 0x00000000001cad47 pymain_run_python() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:2520
19 0x00000000001cad47 pymain_main() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:2662
20 0x00000000001cb1ca _Py_UnixMain() /dev/shm/mboisson/avx2/Python/3.7.0/dummy-dummy/Python-3.7.0/Modules/main.c:2697
21 0x00000000000202e0 __libc_start_main() ???:0
22 0x00000000004006ba _start() /tmp/nix-build-glibc-2.24.drv-0/glibc-2.24/csu/../sysdeps/x86_64/start.S:120
===================
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 420871 on node nia1477 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
The versions are as follows:
Python: 3.7.0
mpi4py: 3.0.0
mpiexec --version gives mpiexec (OpenRTE) 3.1.2
mpicc -v gives icc version 18.0.3 (gcc version 7.3.0 compatibility)
Running with asyncr = 1 in another system with MPICH gave the following output.
Done! 0
Traceback (most recent call last):
File "test_irecv.py", line 14, in <module>
if asyncr: arrv = comm.irecv(source=0).wait()
File "mpi4py/MPI/Request.pyx", line 235, in mpi4py.MPI.Request.wait
File "mpi4py/MPI/msgpickle.pxi", line 411, in mpi4py.MPI.PyMPI_wait
mpi4py.MPI.Exception: MPI_ERR_TRUNCATE: message truncated
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[23830,1],1]
Exit code: 1
--------------------------------------------------------------------------
[master:01977] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics
[master:01977] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Apparently this is a known problem in mpi4py as described in https://bitbucket.org/mpi4py/mpi4py/issues/65/mpi_err_truncate-message-truncated-when. Lisandro Dalcin says
The implementation of irecv() for large messages requires users to pass a buffer-like object large enough to receive the pickled stream. This is not documented (as most of mpi4py), and even non-obvious and unpythonic...
The fix is to pass a large enough pre-allocated bytearray to irecv. A working example is as follows.
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
size_arr = 10000
if comm.Get_rank()==0:
arrs = np.zeros(size_arr)
comm.isend(arrs, dest=1).wait()
else:
arrv = comm.irecv(bytearray(1<<20), source=0).wait()
print('Done!', comm.Get_rank())

Spurious out-of-memory error when allocating shared memory with multiprocessing

I'm trying to allocate a set of image buffers in shared memory using multiprocessing.RawArray. It works fine for smaller numbers of images. However, when I get to a certain number of buffers, I get a OSError indicating that I've run out of memory.
Obvious question, am I actually out of memory? By my count, the buffers I'm trying to allocate should be about 1 GB of memory, and according to the Windows Task Manager, I have about 20 GB free. I don't see how I could actually be out of memory!
Am I hitting some kind of artificial memory consumption limit that I can increase? If not, why is this happening, and how can I get around this?
I'm using Windows 10, Python 3.7, 64 bit architecture, 32 GB RAM total.
Here's a minimal reproducible example:
import multiprocessing as mp
import ctypes
imageDataType = ctypes.c_uint8
imageDataSize = 1024*1280*3 # 3,932,160 bytes
maxBufferSize = 300
buffers = []
for k in range(maxBufferSize):
print("Creating buffer #", k)
buffers.append(mp.RawArray(imageDataType, imageDataSize))
Output:
Creating buffer # 0
Creating buffer # 1
Creating buffer # 2
Creating buffer # 3
Creating buffer # 4
Creating buffer # 5
...etc...
Creating buffer # 278
Creating buffer # 279
Creating buffer # 280
Traceback (most recent call last):
File ".\Cruft\memoryErrorTest.py", line 10, in <module>
buffers.append(mp.RawArray(imageDataType, imageDataSize))
File "C:\Users\Brian Kardon\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\context.py", line 129, in RawArray
return RawArray(typecode_or_type, size_or_initializer)
File "C:\Users\Brian Kardon\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\sharedctypes.py", line 61, in RawArray
obj = _new_value(type_)
File "C:\Users\Brian Kardon\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\sharedctypes.py", line 41, in _new_value
wrapper = heap.BufferWrapper(size)
File "C:\Users\Brian Kardon\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\heap.py", line 263, in __init__
block = BufferWrapper._heap.malloc(size)
File "C:\Users\Brian Kardon\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\heap.py", line 242, in malloc
(arena, start, stop) = self._malloc(size)
File "C:\Users\Brian Kardon\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\heap.py", line 134, in _malloc
arena = Arena(length)
File "C:\Users\Brian Kardon\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\heap.py", line 38, in __init__
buf = mmap.mmap(-1, size, tagname=name)
OSError: [WinError 8] Not enough memory resources are available to process this command
Ok, the folks over at Python bug tracker figured this out for me. For posterity:
I was using 32-bit Python, which is limited to a memory address space of 4 GB, much less than my total available system memory. Apparently enough of that space was taken up by other stuff that the interpreter couldn't find a large enough contiguous block for all my RawArrays.
The error does not occur when using 64-bit Python, so that seems to be the easiest solution.

os.read on inotify file descriptor: reading 32 bytes works but 31 raises an exception

I'm writing a program that should respond to file changes using inotify. The below skeleton program works as I expect...
# test.py
import asyncio
import ctypes
import os
IN_CLOSE_WRITE = 0x00000008
async def main(loop):
libc = ctypes.cdll.LoadLibrary('libc.so.6')
fd = libc.inotify_init()
os.mkdir('directory-to-watch')
wd = libc.inotify_add_watch(fd, 'directory-to-watch'.encode('utf-8'), IN_CLOSE_WRITE)
loop.add_reader(fd, handle, fd)
with open(f'directory-to-watch/file', 'wb') as file:
pass
def handle(fd):
event_bytes = os.read(fd, 32)
print(event_bytes)
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
... in that it outputs...
b'\x01\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00file\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
However, if I change it to attempt to read 31 bytes...
event_bytes = os.read(fd, 31)
... then it raises an exception...
Traceback (most recent call last):
File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/t.py", line 19, in handle
event_bytes = os.read(fd, 31)
OSError: [Errno 22] Invalid argument
... and similarly for all numbers smaller than 31 that I have tried, including 1 byte.
Why is this? I would have thought it should be able to attempt to read any number of bytes, and just return whatever is in the buffer, up to the length given by the second argument of os.read.
I'm running this in Alpine linux 3.10 in a docker container on Mac OS, with very basic Dockerfile:
FROM alpine:3.10
RUN apk add --no-cache python3
COPY test.py /
and running it by
docker build . -t test && docker run -it --rm test python3 /test.py
It's because it's written to only allow reads that can return information about the next event. From http://man7.org/linux/man-pages/man7/inotify.7.html
The behavior when the buffer given to read(2) is too small to return
information about the next event depends on the kernel version: in
kernels before 2.6.21, read(2) returns 0; since kernel 2.6.21,
read(2) fails with the error EINVAL.
and from https://github.com/torvalds/linux/blob/f1a3b43cc1f50c6ee5ba582f2025db3dea891208/include/uapi/asm-generic/errno-base.h#L26
#define EINVAL 22 /* Invalid argument */
which presumably maps to the Python OSError with Errno 22.

How to profile a vim plugin written in python

Vim offers the :profile command, which is really handy. But it is limited to Vim script -- when it comes to plugins implemented in python it isn't that helpful.
Currently I'm trying to understand what is causing a large delay on Denite. As it doesn't happen in vanilla Vim, but only on some specific conditions which I'm not sure how to reproduce, I couldn't find which setting/plugin is interfering.
So I turned to profiling, and this is what I got from :profile:
FUNCTION denite#vim#_start()
Defined: ~/.vim/bundle/denite.nvim/autoload/denite/vim.vim line 33
Called 1 time
Total time: 5.343388
Self time: 4.571928
count total (s) self (s)
1 0.000006 python3 << EOF
def _temporary_scope():
nvim = denite.rplugin.Neovim(vim)
try:
buffer_name = nvim.eval('a:context')['buffer_name']
if nvim.eval('a:context')['buffer_name'] not in denite__uis:
denite__uis[buffer_name] = denite.ui.default.Default(nvim)
denite__uis[buffer_name].start(
denite.rplugin.reform_bytes(nvim.eval('a:sources')),
denite.rplugin.reform_bytes(nvim.eval('a:context')),
)
except Exception as e:
import traceback
for line in traceback.format_exc().splitlines():
denite.util.error(nvim, line)
denite.util.error(nvim, 'Please execute :messages command.')
_temporary_scope()
if _temporary_scope in dir():
del _temporary_scope
EOF
1 0.000017 return []
(...)
FUNCTIONS SORTED ON TOTAL TIME
count total (s) self (s) function
1 5.446612 0.010563 denite#helper#call_denite()
1 5.396337 0.000189 denite#start()
1 5.396148 0.000195 <SNR>237_start()
1 5.343388 4.571928 denite#vim#_start()
(...)
I tried to use the python profiler directly by wrapping the main line:
import cProfile
cProfile.run(_temporary_scope(), '/path/to/log/file')
, but no luck -- just a bunch of errors from cProfile. Perhaps it is because the way python is started from Vim, as it is hinted here that it only works on the main thread.
I guess there should be an easier way of doing this.
The python profiler does work by enclosing the whole code,
cProfile.run("""
(...)
""", '/path/to/log/file')
, but it is not that helpful. Maybe it is all that is possible.

Is OpenCV running two instances of SIFT detectAndCompute concurrently?

I can get SIFT keypoints and descriptors from two, seperate, large images (~2GB) when I run sift.detectAndCompute from the command line. I run it on one image, wait a very long time, but eventually get the keypoints and descriptors. Then I repeat for the second image, and again it takes a long time, but I DO eventually get my keypoints and descriptors. Here are the two lines I run from the IPython console in Spyder, which I am running on my machine with 32 GB of RAM. (MAX_MATCHES = 50000 in the code below):
sift = cv2.xfeatures2d.SIFT_create(MAX_MATCHES)
keypoints, descriptors = sift.detectAndCompute(imgGray, None)
This takes 10 minutes to finish, but it does finish. Next, I run this:
keypoints2, descriptors2 = sift.detectAndCompute(refimgGray, None)
When done, keypoints and keypoints2 DO contain 50000 keypoint objects.
However, if I run my script, which calls a function that uses sift.detectAndCompute and returns keypoints and descriptors, the process takes a long time, uses 100% of my memory and ~95% of my disk BW and then fails with this traceback:
runfile('C:/AV GIS/python scripts/img_align_w_geo_w_mask_refactor_ret_1.py', wdir='C:/AV GIS/python scripts')
Reading reference image : C:\Users\kellett\Downloads\3074_transparent_mosaic_group1.tif
xfrm for image = (584505.1165100001, 0.027370000000000002, 0.0, 4559649.608440001, 0.0, -0.027370000000000002)
Reading image to align : C:\Users\kellett\Downloads\3071_transparent_mosaic_group1.tif
xfrm for image = (584499.92168, 0.02791, 0.0, 4559648.80372, 0.0, -0.02791)
Traceback (most recent call last):
File "<ipython-input-75-571660ddab7f>", line 1, in <module>
runfile('C:/AV GIS/python scripts/img_align_w_geo_w_mask_refactor_ret_1.py', wdir='C:/AV GIS/python scripts')
File "C:\Users\kellett\AppData\Local\Continuum\anaconda3\envs\testgdal\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
execfile(filename, namespace)
File "C:\Users\kellett\AppData\Local\Continuum\anaconda3\envs\testgdal\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/AV GIS/python scripts/img_align_w_geo_w_mask_refactor_ret_1.py", line 445, in <module>
matches = find_matches(refKP, refDesc, imgKP, imgDesc)
File "C:/AV GIS/python scripts/img_align_w_geo_w_mask_refactor_ret_1.py", line 301, in find_matches
matches = matcher.match(dsc1, dsc2)
error: C:\ci\opencv_1512688052760\work\modules\core\src\stat.cpp:4024: error: (-215) (type == 0 && dtype == 4) || dtype == 5 in function cv::batchDistance
The function is simply called once for each image thusly:
print("Reading image to align : ", imFilename);
img, imgGray, imgEdgmask, imgXfrm, imgGeoInfo = read_ortho4align(imFilename)
refKP, refDesc = extractKeypoints(refimgGray, refEdgmask)
imgKP, imgDesc = extractKeypoints(imgGray, imgEdgmask)
HERE IS MY QUESTION (sorry for shouting): Do you think Python tries to run the two lines above concurrently in some way? If so, how can I force it to run serially? If not, do you have any idea why the two keypoint detections would work individually, but not when they come one after another in a script?
One more clue - I put in a statement to see if the script proceeds to the second detectAndCompute statement before it fails, and it does. (I just put a print statement in between the two.)
My error was coming later in my script where I was finding matches.
I have no reason to believe the two SIFT keypoint finding processes are occurring at the same time.
I downsampled the images I was searching for SIFT keypoints and was able to iterate my troubleshooting more quickly and found my error.
I will look at my error more closely next time before asking a question.

Resources