Filewriter Multithreading - multithreading

I have a executor service which processes results and writes to a file .In short the file will be written by multiple child threads , the issue is when the first child thread is finished the file writers is closed and when the second child thread writes to the file it throws the below error :
16:56:06,026 [pool-1-thread-3 ] ExpectedPaymentReportGenerator - Error processing query java.io.IOException: Stream closed
at sun.nio.cs.StreamEncoder.ensureOpen(StreamEncoder.java:45)
How to make sure filewriter is open till all the Child threads are finished execution

Related

Creating NSAttributedString on background queue: "[NSCell init] must be used from main thread only"

I have a Cocoa app based on NSDocument that presents text documents to the user. Document contents are read on a background queue, which causes a problem:
I use NSAttributedString with images, i.e. it can contain NSTextAttachment and NSTextAttachmentCell. When I try to initialize an attachment for an image and I have the main thread checker activated in Xcode, I get the following error:
// On background queue:
let attachment = NSTextAttachment()
attachment.attachmentCell = NSTextAttachmentCell(imageCell: image) <-
"[NSCell init] must be used from main thread only"
My first attempt was to wrap that code in DispatchQueue.main.sync {}, but this caused a deadlock with NSDocument once in a while when autosaving took place or when the user saved the document.
Autosaving would block the main queue, my code would run in the background trying to read the document, but this ended up in a deadlock, because I could not call out to the main queue to create the text attachment.
My question:
Is it possible for me to ignore the main thread checker in Xcode and instantiate NSTextAttachmentCell on a background queue anyway?
All I'm doing on the background queue is initializing the attributed string with its attachments. Further modifications are made on the main queue.
Sequence of events
Thread 2 (bg queue): Need to update abc.txt for some reason. Get read/write access to abc.txt via NSFileCoordinator
Thread 2 (bg) now in NSFileCoordinator block
Thread 1 (MAIN): User-initiated NSDocument save, NSDocument requests write access to abc.txt via NSFileCoordinator
Thread 1 (MAIN) now blocked, waiting for file coordinator lock of Thread 2
Thread 2 (bg queue): Moving along in file coordinator block..., trying to initialize NSAttributedString, oh, it contains an attachment, can't initialize NSTextAttachmentCell on background queue, let me hand this off to the MAIN queue real' quick... ⚡️ DEADLOCK ⚡️
Thread 2 (bg) is now waiting for Thread 1 (MAIN), which is waiting in front of its file coordinator access block for Thread 2 (bg) to finish with its file coordinator block.
You should not ignore the main thread warnings. If you use main.async rather than sync there should never be an issue with adding the attachment. This reference also helps out defining which classes behave well in different threads. Generally speaking, any AppKit view type class should only be used on the main thread.

Python subprocess hangs as Popen when piping output

I've been through dozens of the "Python subprocess hangs" articles here and think I've addressed all of the issues presented in the various articles in the code below.
My code intermittently hangs at the Popen command. I am running 4 threads using multiprocessing.dummy.apply_async, each of those threads starts a subprocess and then reads the output line by line and prints a modified version of it to stdout.
def my_subproc():
exec_command = ['stdbuf', '-i0', '-o0', '-e0',
sys.executable, '-u',
os.path.dirname(os.path.realpath(__file__)) + '/myscript.py']
proc = subprocess.Popen(exec_command, env=env, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1)
print "DEBUG1", device
for line in iter(proc.stdout.readline, b''):
with print_lock:
for l in textwrap.wrap(line.rstrip(), LINE_WRAP_DEFAULT):
The code above is run from apply_async:
pool = multiprocessing.dummy.Pool(4)
for i in range(0,4):
pool.apply_async(my_subproc)
Intermittently the subprocess will hang at subprocess.Popen, the statement "DEBUG1" is not printed. Sometimes all threads will work, sometimes as few as 1 of the 4 will work.
I'm not aware that this exhibits any of the known deadlock situations for Popen. Am I wrong?
There is an insidious bug in subprocess.Popen() caused by io buffering of stdout (possibly stderr). There is a limit of around 65536 characters in the child process io buffer. If the child process writes enough output the child process will "hang" waiting for the buffer to be flushed - a deadlock situation. The authors of subprocess.py seem to believe this is a problem caused by the child, even though a subprocess.flush would be welcome. Pearson Anders pearson,
https://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/ Has a simple solution but you have to pay attention. As he says, "tempfile.TemporaryFile() is your friend." In my case I am running an application in a loop to batch process a bunch of files, the code for the solution is:
with tempfile.TemporaryFile() as fout:
sp.run(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], \
timeout=cpto, check=True, stdout=fout, stderr=fout)
The fix above still deadlocks after processing about 20 files. An improvement but not good enough, since I need to process hundreds of files in a batch. I came up with the below "crowbar" approach.
proc = sp.Popen(['gmat', '-m', '-ns', '-x', '-r', str(gmat_args)], stdout=sp.PIPE, stderr=sp.STDOUT)
""" Run GMAT for each file in batch.
Arguments:
-m: Start GMAT with a minimized interface.
-ns: Start GMAT without the splash screen showing.
-x: Exit GMAT after running the specified script.
-r: Automatically run the specified script after loading.
Note: The buffer passed to Popen() defaults to io.DEFAULT_BUFFER_SIZE, usually 62526 bytes.
If this is exceeded, the child process hangs with write pending for the buffer to be read.
https://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/
"""
try:
(outs, errors) = proc.communicate(cpto)
"""Timeout in cpto seconds if process does not complete."""
except sp.TimeoutExpired as e:
logging.error('GMAT timed out in child process. Time allowed was %s secs, continuing', str(cpto))
logging.info("Process %s being terminated.", str(proc.pid))
proc.kill()
""" The child process is not killed by the system. """
(outs, errors) = proc.communicate()
""" And the stdout buffer must be flushed. """
The basic idea is to kill the process and flush the buffer on each timeout. I moved the TimeoutExpired exception into the batch processing loop so that after killing the process, it continues with the next. This is harmless if the timeout value is sufficient to allow gmat to complete (albeit slower). I find that the code will process from 3 to 20 files before it times out.
This really seems like a bug in subprocess.
This appears to be a bad interaction with multiprocessing.dummy. When I use multiprocessing (not the .dummy threading interface) I'm unable to reproduce the error.

About asynchronous methods and threads

What actually happens behind the scenes with asynchronous functions?
Does it open a new thread and let the OS start and run it?
If so, can it cause deadlocks or other thread problems?
Here's an example of a async method:
var fs = require('fs')
var file = process.argv[2]
fs.readFile(file, function (err, contents) {
var lines = contents.toString().split('\n').length - 1
console.log(lines)
})
In fs.readFile(file,callback).This is a non-blocking call which means.
node's main thread stores the callback in event-table and
associate it with an event which will be emitted whenever file
reading process is done.
By the same time node has several internal threads(thread pool) from
which node's main thread assign file reading task to one of the
thread.
After this assignment the command is returned to main thread and
main thread continues with the other tasks and file reading process
is being done in background by other thread(not main thread).
Whenever file reading process is completed the event associated with
the callback is emitted along with the data from file and that
callback is pushed into task-queue where event loop tries to push
each task to the main thread(stack).
And when main thread(stack) becomes available and and there is no
task present before the callback's task this callback is pushed to
main thread's stack by the event-loop.
Please read event-loop for more info.
So the thread which is responsible for file reading doesnt cause Deadlock to othere threads.
It simply emit exception or success which is later handled by the callback

Thread Pool : how to spawn a child task from a running task?

A simple thread pool with a global shared queue of tasks (functors).
Each worker (thread) will pick up one task from the worker, and execute it. It wont execute the next task, until this one is finished.
Lets imagine a big task that needs to spawn child tasks to produce some data, and then continue with evaluation (for example, to sort a big array before save to disk).
pseudo code of the task code:
do some stuff
generate a list of child tasks
threadpool.spawn (child tasks)
wait until they were executed
continue my task
The problem is that the worker will dead lock, because the task is waiting for the child task, and the thread pool is waiting for the parent task to end, before running the child one.
One idea is to run the child task inside the spawn code:
threadpool.spawn pseudo code:
threadpool.push (tasks)
while (not all incoming task were executed)
t = threadpool.pop()
t.run()
return (and continue executing parent task)
but, how can I know that all the task were executed , in an efficient way?
Another idea is to split the parent task.. something like this:
task pseudo code:
l = generate a list of child tasks
threadpool.push ( l , high priority )
t = create a task to work with generated data
threadpool.push (t , lo priority )
But i found this quite intrusive...
any opinions?
pd. merry christmas!
pd2. edited some bad names
You can have a mechanism for the children threads to signal back to the main worker whenever they are done so it can proceed. In Java, Callable tasks submitted to an ExecutorService thread pool respond back with their results as Futures data structures. Another approach would be to maintain a separate completion signal, something similar to a CountDownLatch, which will serve as a common countdown mechanism to be updated every time a thread completes.

Prevent fork() from copying sockets

I have the following situation (pseudocode):
function f:
pid = fork()
if pid == 0:
exec to another long-running executable (no communication needed to that process)
else:
return "something"
f is exposed over a XmlRpc++ server. When the function is called over XML-RPC, the parent process prints "done closing socket" after the function returned "something". But the XML-RPC client hangs as long as the child process is still running. When I kill the child process, the XML-RPC client correctly finishes the RPC call.
It seems to me that I'm having a problem with fork() copying socket descriptors to the child process (parent called closesocket but child still owns a reference -> connection still established). How can I circumvent this?
EDIT: I read about FD_CLOEXEC already, but can't I force all descriptors to be closed on exec?
No, you can't force all file descriptors to be closed on exec. You will need to loop over all unwanted file descriptors in the child after the fork() and close them. Unfortunately, there isn't an easy, portable, way to do that - the usual approach is to use getrlimit() to get the current value of RLIMIT_NOFILE and loop from 3 to that number, trying close() on each candidate.
If you are happy to be Linux-only, you can read the /proc/self/fd/ directory to determine the open file descriptors and close them (except 0, 1 and 2 - which should either be left alone or reopened to /dev/null).

Resources