I have a test suite harness which is used to run test scripts (classes defined therein actually), and as it iterates through the tests, it manipulates the python logger such that the log messages are all output to different files, each associated with its own test (class). This works fine for tests run in a sequential manner where i can control the log handlers in the root logger which enable all log messages (from whatever libraries the test classes may use) to log their messages into the proper test log file.
But what I am really trying to figure out is how to run such tests in parallel (via threading or multiprocessing) such that each thread will have its own log file to place all such messages.
I believe that I still need to manipulate the root logger, because that is the only place both tests and the libraries they use will converge on to do all logging to a common place.
I was thinking that I could add a handler for each thread which would contain a log filter to only log from a particular thread, and that would get me close (haven't tried this yet, but seems possible in theory). And this would possibly be the full solution (if indeed such would work) except for one thing. I cannot tell test writers to not use threads themselves, in their tests. So if they did so, again, this solution would fail. I'm fine with test-internal threads all logging to the one file, but these new threads would fail to log to the file their parent thread is logging to. The filter doesn't know anything about them.
And I could be mistaken, but it seems that threading.Thread objects cannot determine their own parent thread? This precludes a better log handler filter that accepts messages generated in a thread or any of its child/descendant threads. (?)
Any suggestions about how to approach this would be great.
Thanks,
Bruce
Related
This is quite complex to explain, and a lot of code involve which I cannot paste here.
But I have a Test application which executes Test cases through designated plugins.
When the app is executed it creates a separate multi process called a writer, this module handles all updates to a web page which holds all running information about the test cases and their state.
For this write I also create an interface (WrIf). This interface hold a Queue to the Writer Thread together with a weakref.proxy().
Now when the Test app starts executing its Test cases it creates new multiprocessor from where it can call the specific plugin. That means that the WrIf is "serialized" to this multi-process.
For every call to the WrIf it makes a check to see if the main Writer thread is still running. however here is where I get a problem. I get the following assert when I try to call is_alive().
assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process
I can expand some on this however I think it will get muddy fast because the app is rather large and somewhat complex.
Regards
I am building a simple application to download a set of XML files and parse them into a database using the async module (https://npmjs.org/package/node-async) for flow control. The overall flow is as follows:
Download list of datasets from API (single Request call)
Download metadata for each dataset to get link to XML file (async.each)
Download XML for each dataset (async.parallel)
Parse XML for each dataset into JSON objects (async.parallel)
Save each JSON object to a database (async.each)
In effect, for each dataset there is a parent process (2) which sets of a series of asynchronous child processes (3, 4, 5). The challenge that I am facing is that, because so many parent processes fire before all of the children of a particular process are complete, child processes seem to be getting queued up in the event loop, and it takes a long time for all of the child processes for a particular parent process to resolve and allow garbage collection to clean everything up. The result of this is that even though the program doesn't appear to have any memory leaks, memory usage is still too high, ultimately crashing the program.
One solution which worked was to make some of the child processes synchronous so that they can be grouped together in the event loop. However, I have also seen an alternative solution discussed here: https://groups.google.com/forum/#!topic/nodejs/Xp4htMTfvYY, which pushes parent processes into a queue and only allows a certain number to be running at once. My question then is does anyone know of a more robust module for handling this type of queueing, or any other viable alternative for handling this kind of flow control. I have been searching but so far no luck.
Thanks.
I decided to post this as an answer:
Don't launch all of the processes at once. Let the callback of one request launch the next one. The overall work is still asynchronous, but each request gets run in series. You can then pool up a certain number of the connections to be running simultaneously to maximize I/O throughput. Look at async.eachLimit and replace each of your async.each examples with it.
Your async.parallel calls may be causing issues as well.
I'm using the each() method of the async lib and experiencing some very odd (and inconsistent) errors that appear to be File handle errors when I attempt to log to file from within the child processes.
The array that I'm handing to this method frequently has hundreds of items and I'm curious if Node is having trouble running out of available file handles as it tries to log to file from within all these simultaneous processes. The problem goes away when I comment out my log calls, so it's definitely related to this somehow, but I'm having a tough time tracking down why.
All the logging is trying to go into a single file... I'm entirely unclear on how that works given that each write (presumably) blocks, which makes me wonder how all these simultaneous processes are able to run independently if they're all sitting around waiting on the file to become available to write to.
Assuming that this IS the source of my troubles, what's the right way to log from a process such as Asnyc.each() which runs N number of processes at once?
I think you should have some adjustable limit to how many concurrent/outstanding write calls you are going to do. No, none of them will block, but I think async.eachLimit or async.queue will give you the flexibility to set the limit low and be sure things behave and then gradually increase it to find out what resource constraints you eventually bump up against.
In my nlog configuration, I've set
<targets async="true">
with the understanding that all logging now happens asynchronously to my application workflow. (and I have noticed a performance improvement, especially on the Email target).
This has me thinking about log sequence though. I understand that with async, one has no guarantee of the order in which the OS will execute the async work. So if, in my web app, multiple requests come in to the same method, each logging their occurrence to NLog, does this really mean that the sequence in which the events appear in my log target will not necessarily be the sequence in which the log method was called by the various requests?
If so, is this just a consequence of async that one has to live with? Or is there something I can do to keep have my logs reflect the correct sequence?
Unfortunately this is something you have to live with. If it is important to maintain the sequence you'll have to run it synchronously.
But if it is possible for you to manually maintain a sequence number in the log message, it could be a solution.
I know this is old and I'm just ramping up on NLog but if you see a performance increase for the email client, you may want to just assert ASYNC for the email target?
NLog will not perform reordering of LogEvent sequence, by activating <targets async="true">. It just activates an internal queue, that provides better handling of bursts and enables batch-writing.
If a single thread writes 1000 LogEvents then they will NOT become out-of-order, because of async-handling.
If having 10 threads each writing 1000 LogEvents, then their logging will mix together. But the LogEvents of an individual thread will be in the CORRECT order.
But be aware that <targets async="true"> use the overflowAction=Discard as default. See also: https://github.com/nlog/NLog/wiki/AsyncWrapper-target#async-attribute-will-discard-by-default
For more details about performance. See also: https://github.com/NLog/NLog/wiki/performance
Forget for a second the question of why on earth would you do such a thing - if, for whatever reason, two FileAppenders are configured with the same file - will this setup work?
Log4j's FileAppender does not allow for two JVM's writing to the same file. If you try, you'll get a corrupt log file. However, logback, log4j's successor, in prudent mode allows two appenders even in different JVMs to write to the same file.
It doesn't directly answer your question, but log4*net*'s FileAppender has a LockingModel attribute that you can set to only lock when the file is actually in use. So if you had two FileAppenders working in the same thread with MinimalLock set, it would probably work perfectly fine. On different threads, you might hit deadlock once in a while.
The FileAppender supports pluggable file locking models via the LockingModel property. The default behavior, implemented by FileAppender.ExclusiveLock is to obtain an exclusive write lock on the file until this appender is closed. The alternative model, FileAppender.MinimalLock, only holds a write lock while the appender is writing a logging event.
A cursory web search didn't turn up any useful results about implementing MinimalLock in log4j.
From Log4j FAQ a3.3
How do I get multiple process to log to the same file?
You may have each process log to a SocketAppender. The receiving SocketServer (or SimpleSocketServer) can receive all the events and send them to a single log file.
As to what that actually means I will be investigating myself.
I also found the following workaround on another SO question:
Code + Example