Stop tensorflow training process programmatically

Stop tensorflow training process programmatically - multithreading

I'm writing some platform for run trainigs on the server and I need some option to halt the training process via API. It's mean when I receive somr request to REST controller, the main threads need to stop the training that can take some days time long.
I see that Tensorflow have Coordinator class and EarlyStopping callback, but I don't see nothing that can stop thetraining by demand.
Something like: model.stop()

Yes you can but it is a bit different.
You can't tell model to stop but the model can ask you if it should stop. That is done using a callback.
Here is a simple thing that you could do:
Implement your callback
documentation here. Your callback could check for, let's say, a file in a folder. Just an example "../.../stop_training.txt"
Add your callback to your model with the event you want to use.
Create an API called "https://..../stop-training" which just creates that stop_training.txt file.

Related

In a flask application how can you introduce asynchronicity?

In my flask application there are some REST end points which takes too much time to respond. When invoked, it mostly carries out some CRUD operations in database, some of which could be made asynchronous. I do not have any issue if it sends the response to the client and database inserts keep going on in the background. I wanted to use asyncio, but heard that flask does not support asyncio. In that event I am left with just the choice of threading. Any suggestions? I do not have the option of dumping flask. I do not want to use Celery, as it would be too big of a change.
When using threading at some places it works, at others it does not.Looks like it is not finding the application context.
RuntimeError: Working outside of application context.
This typically means that you attempted to use functionality that needed
to interface with the current application object in some way. To solve
this, set up an application context with app.app_context(). See the
documentation for more information.
In the thread my 1st line of code is - and it is here where it fails:
user: AuthUser = g.logged_in_user
#Edit
In the view function I have to do multiple things - some of them are chained, so cannot be made asynchronous - they need to happen in order as the result returned from one database call is used in invoking the next call & the final result is used in composing the json output the method returns. There is only one database call which is independent from others & being insertion of # 1K records is the greatest contributor in slowing down the api response.
This method if I place at the end it says that the psycopg2 connection has already closed - though I never closed the connection explicitly - only may be the view function has returned the json payload.
If I place the heavy method at the beginning of the view function it works.
#Edit2
From the view function I pass application context and user.Instead of passing the database connection to the thread, I connect to the PostgreSql database from inside the thread.
threading.Thread(target=set_responder_contacts,
args=(user, template_id,),
kwargs={'app':current_app._get_current_object()})
.start()
Code snippet from the function the thread invokes:
def set_responder_contacts(user: AuthUser, template_id: int, app=None):
with app.app_context():
try:
db = dbConn()
#database insertion code

asyncio wouldn't directly help here.
If you trust your server process to remain up for the duration of the background function, then sure, just spin up a thread and do the background work there:
def heavy_work(some_id):
pass
#app.post(...)
def view(...):
some_id = create_thing(...)
threading.Thread(target=heavy_work, args=(some_id,)).start()
return "Okay (though processing in the background)"
There are caveats:
As I alluded to earlier, if the WSGI server process is killed for some reason (for instance, a memory or request count limit is exceeded, or it outright crashes), the background operation will be taken with it.
If the heavy operation is heavily CPU-bound, it may affect the performance of other requests being served by the same server process.

Keras callback execution order?

I'm trying to understand Keras callback execution order. Suppose we pass multiple callbacks to model.fit(), each has a on_epoch_end method. So the moment we reach the end of an epoch, in which order will the callback functions be executed? Does the main process spawn multiple child-process and assign one to each callback?
It'd nice if the documentations are more detailed.

They should be called in the order you've added them.
If you look at the implementation of CallbackList class which manage your callbacks, you will see it's iterating by order of appearance.
For example here in on_epoch_end.
Also, this is how the class is used in training loop and it does not seems that a separate process is spawn.

They will be executed in the order they are specified in your
callbacks list inside the model.fit()
child process is not created by a parent to perform execution

Tensorflow tf.estimator.Estimator with QueueRunner

I'm rewriting my to code to use tf.estimator.Estimator as an encapsulating object for my models.
The problem is :
I don't see how typical input pipeline fits into the picture.
My input pipeline use queues which are coordianted by tf.train.Coordinator.
To satisify tf.estimator.Estimator requirements i create all the "input graph" in init_fn function that is passed to estimator when calling:
Estimator.train(...)
It looks like this
input_fn(f):
...create input graph...
qr = tf.train.QueueRunner(queue, [operations...])
tf.train.add_queue_runner(qr)
The problem is: in such scenario how can I start and stop queue runners, respectivly at the start and beginning of the Estimator.train(...)?
Starting
I figured out for starting the queues I can pass and init_fn that does it to scaffold object passed to Estimator.
However how to join threads and close them gracefully - this I do not know.
Is there reference architecture for proper threaded input pipeline when using tf.estimator.?
Is Estimator class even ready to work with queues?

Estimator uses tf.train.MonitoredTrainingSession which handles starting and joining threads. You can check a couple example input-fns, such as
tf.estimator.inputs.*, tf.contrib.learn.io.read*

Testing background processes in nodejs (using tape)

This is a general question about testing, but I will frame it in the context of Node.js. I'm not as concerned with a particular technology, but it may matter.
In my application, I have several modules that are called upon to do work when my web server receives a request. In the case of some of these modules, I close the request before I call upon them.
What is a good way to test that these modules are doing what they are supposed to do?
The advice here for RSpec is to mock out the work these modules are doing and just ensure that the appropriate methods are being called. This makes sense to me, but in Node.js, since my modules are not global, I don't think I cannot mock out functions without changing my program architecture so that every instance receives instances of objects that it needs1.
[1] This is a well known programming paradigm, but I cannot remember its name right now.
The other option I see is to use setTimeout and take my best guess at when these modules are done with their work.
Neither of these seems ideal.
Am I missing something? Are background processes not tested?

Since you are speaking of integration tests of these background components, a few strategies come to mind.
Take all the asynchronicity out of their operation for test mode. I'm imagining you have some sort of queueing process (that could be a faulty assumption), you toss work into the queue, and then your modules pick up that work and do their task. You could rework your test harness such that the test harness stands in as the queuing mechanism and you effectively get direct control over when the modules execute.
Refactor your modules to take some sort of next callback function. They would end up functioning a bit like Express's middleware layer or how async's each function works, but into each module you'd pass some callback that you call when that module's task is complete. Once all of the modules have reported in, then you can check the state of the program.
Exactly what you already suggested-- wait some amount of time, and if it still isn't done, consider that a failure. Mocha sort of does that, in that if a given test is over a definable threshold, then it's a failure. I don't like this way though, because if you add more tests, they all have to wait the same amount of time.

QAbstractItemModel Lazy Loading locks application

I have implemented canFetchMore, hasChildren and fetchMore in order to allow my model to be lazy loaded. It's very simple and based on QT's: http://doc.qt.io/archives/qt-4.7/itemviews-simpletreemodel.html
My problem is that in my application fetching children is not a very quick operation, it involves a few seconds of delay on the server side while it figures out who the children actually are.
I'm unsure how to deal with that. I can't have my application locking up for several seconds every time someone expands a node. I don't know how to go about making this happen in the background. If I was to create a sub-process or thread to actually do the work of retrieving the children and updating the client side data structure, how would I go about telling the model that this had successfully completed (and for the node to finally expand).
Also, is there a way to show that the node is currently in the process of loading the data in the background?
Apologies if these are stupid questions, GUI programming is still a bit of a mystery to me and I've never used QT before.
For the record, I'm using Python, but if answers are given in C++ I can understand them.
Thanks

If I was to
create a sub-process or thread to actually do the work of retrieving
the children and updating the client side data structure, how would I
go about telling the model that this had successfully completed (and
for the node to finally expand).
You can use signal and slots. In the thread you retrieve the data you will emit a custom signal like someDataAvailable(YourdataType) and then in the gui you will handle this signal with a slot something like handleDataReadySignal(YourdataType). The signal passes the object that you give it when emitting. Apparently you need to update the gui and the list in the handleDataReadySignal slot. Of course you need to connect the slot to the signal preferably in the constructor of the window/dialog to which the list is attached

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Stop tensorflow training process programmatically - multithreading

Related

In a flask application how can you introduce asynchronicity?

Keras callback execution order?

Tensorflow tf.estimator.Estimator with QueueRunner

Testing background processes in nodejs (using tape)

QAbstractItemModel Lazy Loading locks application

Categories

Resources