Multithreaded model building in rails 4 with jRuby - multithreading

I'm trying to optimize/multi-thread building a very large number of models (300+) all at once to try to speed up the creation this table to be saved to the database in my Rails 4 app.
I tried to move as many references to objects etc outside of the threads with things like memo variables and such, but I'm just not sure what to try anymore.
The code I have is as follows, I tried to keep the code that is being multi-threaded as small as possible but I keep running into circular dependency errors and/or not all of the record are created. Any help is appreciated.
Example 1:
def create
#checklist = Checklist.new(checklist_params)
respond_to do |format|
if #checklist.save
tasks = Task.where(:active => true)
checklist_date_as_time = Time.parse(#checklist.date.to_s).at_beginning_of_day
checklist_id = #checklist.id.to_i
threads = []
ActiveRecord::Base.transaction do
tasks.each do |task|
time = task.start_time
begin
threads << Thread.new do
complete_time = checklist_date_as_time + time.hour.hours + time.min.minutes
task.responses.build( task_start_time: complete_time, must_complete_by: complete_time + task.time_window, checklist_id: checklist_id, task_id: task.id)
end
end while (time += task.frequency.minutes) < task.end_time
threads.map(&:join)
task.save
end
end
format.html { redirect_to #checklist, notice: 'Checklist was successfully created.' }
format.json { render :show, status: :created, location: #checklist }
else
format.html { render :new }
format.json { render json: #checklist.errors, status: :unprocessable_entity }
end
end

AR is not "thread-safe" ... that means that a single record instance's behaviour/correctness when shared between threads is not defined/guaranteed by the framework.
the easiest answer to your question would be to perform the whole tasks = ...; ActiveRecord::Base.transaction do ... work in 1 background thread (frameworks such as DelayedJob might help) - so that the "heavy" computation is not part of the response cycle.
also be aware that using multiple threads might cause you to utilize multiple connections - thus essentially draining the AR pool. it also means that (depending on what's going on during task.responses.build) the desired intention with ActiveRecord::Base.transaction { ... } might not be correct (due multiple connection objects being involved).

Related

Main thread doesn't wait background thread to finish in Swift

Here's my problem : I'm doing a background work, where I parse some JSON and write some Objects into my Realm, and in the main thread I try to update the UI (reloading the TableView, it's linked to an array of Object). But when I reload the UI, my tableView doesn't update, like my Realm wasn't updated. I have the reload my View to see the updates. Here's my code :
if (Realm().objects(Objects).filter("...").count > 0)
{
var results = Realm().objects(Objects) // I get the existing objects but it's empty
tableView.reloadData()
}
request(.GET, url).responseJSON() {
(request, response, data, error) in
let priority = DISPATCH_QUEUE_PRIORITY_DEFAULT
dispatch_async(dispatch_get_global_queue(priority, 0)) {
// Parsing my JSON
Realm().write {
Realm().add(object)
}
dispatch_sync(dispatch_get_main_queue()) {
// Updating the UI
if (Realm().objects(Objects).filter("...").count > 0)
{
results = Realm().objects(Objects) // I get the existing objects but it's empty
tableView.reloadData()
}
}
}
}
I have to do something bad with my threads, but I couldn't find what. Can someone know what's wrong?
Thank you for your answer!
such workflow makes more sense to me for your case:
let priority = DISPATCH_QUEUE_PRIORITY_DEFAULT
dispatch_async(dispatch_get_global_queue(priority, 0)) {
// Parsing my JSON
Realm().write {
Realm().add(object)
dispatch_sync(dispatch_get_main_queue()) {
// Updating the UI
if (Realm().objects(Objects).filter("...").count > 0)
{
results = Realm().objects(Objects) // I get the existing objects but it's empty
tableView.reloadData()
}
}
}
}
NOTE: you have a problem with timing in your original workflow: the UI might be updated before the write's block executed, that is why your UI looks abandoned; this idea above would be a more synchronised way between tasks, according their performance's schedule.
You are getting some new objects and storing them into "results".
How is tableView.reloadData () supposed to access that variable? You must change something that your tableView delegate will access.
PS. Every dispatch_sync () is a potential deadlock. You are using one that is absolutely pointless. Avoid dispatch_sync unless you have a very, very good reason to use it.

(Leadtools) Rastercodecs.load is thread safe?

I want to convert a PDF to images. I am using Leadtools and to increase the speed, I am using multi-threading in the following way.
string multiPagePDF = #"Manual.pdf";
string destFileName = #"output\Manual";
Task.Factory.StartNew(() =>
{
using (RasterCodecs codecs = new RasterCodecs())
{
CodecsImageInfo info = codecs.GetInformation(multiPagePDF, true);
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = 5;
Parallel.For(1, multiPagePDF.TotalPages+1, po, i =>
{
RasterImage image = codecs.Load(multiPagePDF, i);
codecs.Save(image, destFileName + i + ".png", RasterImageFormat.Png, 0);
});
}
});
Is this a thread-safe manner? Will it result in unexpected output?
I tried this a few times and there were instances when a specific page appeared twice in the output images.
Solution
According to Leadtools online chat support (which is very helpful btw), Rastercodecs.load is NOT thread safe and the above code would result in unexpected output (in my case, Page 1 occurred twice in the output set of images). The solution is to define codecs variable within the Parallel.For so that each iteration separately accesses its own RasterCodecs.
Amyn,
As you found out, the correct way to use the RasterCodecs object in this case is this:
Task.Factory.StartNew(() =>
{
using (RasterCodecs codecs = new RasterCodecs())
{
CodecsImageInfo info = codecs.GetInformation(multiPagePDF, true);
ParallelOptions po = new ParallelOptions();
po.MaxDegreeOfParallelism = 5;
Parallel.For(1, info.TotalPages + 1, po, i =>
{
using(RasterCodecs codecs2 = new RasterCodecs()) {
RasterImage image = codecs2.Load(multiPagePDF, i);
codecs2.Save(image, destFileName + i + ".png", RasterImageFormat.Png, 0);
}
});
}
});
This gives you the same speed benefits when running on a multi-core processor without causing any conflicts between concurrent threads.
The LEADTOOLS RasterCodecs.Load() and RasterCodecs.Save() methods are thread-safe. The reason behind creating multiple instances of the RasterCodecs class is because this class internally uses structures that hold many different loading & saving options for files. Using these structures (where these options are changed) across multiple threads can cause unpredictable results. One such property in the loading options structure is the page number. For this reason, using separate instances of this class is recommended.

Is this possible using Reactive Framework?

I have a list of objects in my C# 4.0 app. Suppose this list contains 100 objects of student class. Is there any way in Reactive Framework to parallel execute 10 objects each at a time?
Each student object runs a method which is some what time consuming for about 10 to 15 seconds. So the first time through, take the first 10 student objects from the list and wait for all the 10 student objects to finish its work and then take next 10 student objects and so on until it completes the full items in the lists?
I have a List<Student> with 100 count.
First take 10 items from the lists and calls each object's long run method in parallel.
Receives each objects return value and update the UI [subscription part].
Next round starts only if the first 10 rounds completes and releases all the memory.
Repeat the same process for all the items in the lists.
How to catch the errors in each process ??
How to release each student object's resources and other resources from memory ?
Which is the best way to do all these things in Reactive Framework ?
This version will always have 10 students running at a time. As a student finishes, another will start. And as each student finishes, you can handle any error it had and then clean it up (this will happen before the next student starts running).
students
.ToObservable()
.Select(student => Observable.Defer(() => Observable.Start(() =>
{
// do the work for this student, then return a Tuple of the student plus any error
try
{
student.DoWork();
return { Student = student, Error = (Exception)null };
}
catch (Exception e)
{
return { Student = student, Error = e };
}
})))
.Merge(10) // let 10 students be executing in parallel at all times
.Subscribe(studentResult =>
{
if (studentResult.Error != null)
{
// handle error
}
studentResult.Student.Dispose(); // if your Student is IDisposable and you need to free it up.
});
This is not exactly what you asked since it does not finish the first batch of 10 before starting the next batch. This always keeps 10 running in parallel. If you really want batches of 10 I'll adjust the code for that.
My attempt....
var students = new List<Student>();
{....}
var cancel = students
.ToObservable(Scheduler.Default)
.Window(10)
.Merge(1)
.Subscribe(tenStudents =>
{
tenStudents.ObserveOn(Scheduler.Default)
.Do(x => DoSomeWork(x))
.ObserverOnDispatcher()
.Do(tenStudents => UpdateUI(tenStudents))
.Subscribe();
});
This to me sounds very much like a problem for TPL. You have a known set of data at rest. You want to partition up some heavy processing to run in parallel and you want to be able to batch process the load.
I don't see anywhere in your problem a source that is async, a source that is data in motion, or a consumer that needs to be reactive. This is my rationale for suggesting that you use TPL instead.
On a separate note, why the magic number of 10 to be processed in parallel? Is this a business requirement, or potentially an attempt to optimize performance? Normally it is best practice to allow the TaskPool to work out what is best for the client CPU based in the number of cores and current load. I imagine this becomes ever more important with the large variations in Devices and their CPU structures (Single Core, Multi Core, Many Core, low power/disabled cores etc).
Here is one way you could do it in LinqPad (but note the lack of Rx)
void Main()
{
var source = new List<Item>();
for (int i = 0; i < 100; i++){source.Add(new Item(i));}
//Put into batches of ten, but only then pass on the item, not the temporary tuple construct.
var batches = source.Select((item, idx) =>new {item, idx} )
.GroupBy(tuple=>tuple.idx/10, tuple=>tuple.item);
//Process one batch at a time (serially), but process the items of the batch in parallel (concurrently).
foreach (var batch in batches)
{
"Processing batch...".Dump();
var results = batch.AsParallel().Select (item => item.Process());
foreach (var result in results)
{
result.Dump();
}
"Processed batch.".Dump();
}
}
public class Item
{
private static readonly Random _rnd = new Random();
private readonly int _id;
public Item(int id)
{
_id = id;
}
public int Id { get {return _id;} }
public double Process()
{
var threadId = Thread.CurrentThread.ManagedThreadId;
string.Format("Processing on thread:{0}", threadId).Dump(Id);
var loopCount = _rnd.Next(10000,1000000);
Thread.SpinWait(loopCount);
return _rnd.NextDouble();
}
public override string ToString()
{
return string.Format("Item:{0}", _id);
}
}
I would be interested to find out if you do have a data-in-motion problem or a reactive consumer problem, but have just "dumbed down" the question to make it easier to explain.

Best way to deal with document locking in xPages?

What is the best way to deal with document locking in xPages? Currently we use the standard soft locking and it seems to work fairly well in the Notes client.
In xPages I considered using the "Allow Document Locking" feature but I am worried that people would close the browser without using a close or save button then the lock would never be cleared.
Is there a way to clear the locks when the user has closed his session? I am seeing no such event.
Or is there an easier way to have document locking?
I realize I can clear the locks using an agent but when to run it? I would think sometime a night then I am fairly certain the lock should no longer really be active.
Here is code I'm using:
/* DOCUMENT LOCKING */
/*
use the global object "documentLocking" with:
.lock(doc) -> locks a document
.unlock(doc) -> unlocks a document
.isLocked(doc) -> returns true/false
.lockedBy(doc) -> returns name of lock holder
.lockedDT(doc) -> returns datetime stamp of lock
*/
function ynDocumentLocking() {
/*
a lock is an entry in the application scope
with key = "$ynlock_"+UNID
containing an array with
(0) = username of lock holder
(1) = timestamp of lock
*/
var lockMaxAge = 60 * 120; // in seconds, default 120 min
this.getUNID = function(v) {
if (!v) return null;
if (typeof v == "NotesXspDocument") return v.getDocument().getUniversalID();
if (typeof v == "string") return v;
return v.getUniversalID();
}
/* puts a lock into application scope */
this.lock = function(doc:NotesDocument) {
var a = new Array(1);
a[0] = #UserName();
a[1] = #Now();
applicationScope.put("$ynlock_"+this.getUNID(doc), a);
// print("SET LOCK "+"$ynlock_"+doc.getUniversalID()+" / "+a[0]+" / "+a[1]);
}
/* removes a lock from the application scope */
this.unlock = function(doc:NotesDocument) {
applicationScope.put("$ynlock_"+this.getUNID(doc), null);
//print("REMOVED LOCK for "+"$ynlock_"+doc.getUniversalID());
}
this.isLocked = function(doc:NotesDocument) {
try {
//print("ISLOCKED for "+"$ynlock_"+doc.getUniversalID());
// check how old the lock is
var v = applicationScope.get("$ynlock_"+this.getUNID(doc));
if (!v) {
//print("no lock found -> return false");
return false;
}
// if lock holder is the current user, treat as not locked
if (v[0] == #UserName()) {
//print("lock holder = user -> not locked");
return false;
}
var dLock:NotesDateTime = session.createDateTime(v[1]);
var dNow:NotesDateTime = session.createDateTime(#Now());
// diff is in seconds
//print("time diff="+dNow.timeDifference(dLock)+" dLock="+v[1]+" now="+#Now());
// if diff > x seconds then remove lock, it not locked
if (dNow.timeDifference(dLock) > lockMaxAge) {
// print("LOCK is older than maxAge "+lockMaxAge+" -> returning false");
return false;
}
//print("return true");
return true;
// TODO: check how old the lock is
} catch (e) {
print("ynDocumentLocking.isLocked: "+e);
}
}
this.lockedBy = function(doc:NotesDocument) {
try {
var v = applicationScope.get("$ynlock_"+this.getUNID(doc));
if (!v) return "";
//print("ISLOCKEDBY "+"$ynlock_"+doc.getUniversalID()+" = "+v[0]);
return v[0];
} catch (e) {
print("ynDocumentLocking.isLockedBy: "+e);
}
}
this.lockedDT = function(doc:NotesDocument) {
try {
var v = applicationScope.get("$ynlock_"+this.getUNID(doc));
if (!v) return "";
return v[1];
} catch (e) {
print("ynDocumentLocking.isLockedBy: "+e);
}
}
}
var documentLocking = new ynDocumentLocking();
You could take a page from the way webDAV works. There a servlet manages a "lock-list" of locked documents. The locks automatically expire after 10 minutes. Locks can be renewed or terminated trough calls. So when you edit a document you would request a lock, then kick off a CSJS timer that calls the relocking function every 8 minutes (so you have some margin for error) and the postSave calls the unlock (unless you stay in edit mode).
If a user closes the browser after 10 minutes the document is automatically unlocked. Since you are free how to implement the locking function, you can capture user/location and use that information in the "lock failed" display (you event could push that further and let the original author know about it or do some "retry" option.
It isn't simple to implement, but once implemented simple to use
ApplicationScope may be a good place to capture "locked" documents. After all, for applicationScope to expire, all users' sessions have to have expired, so anyone with the page open will not be able to save anyway.
Maybe capture UNID, user and time when someone edits a doc. Clear the value when the document is saved. Bear in mind that the user might close the browser etc. I've been discussing this approach internally and if we end up building this I would look to add it to OpenNTF. But we're unlikely to get onto it within the next month.
I Prefer to use a solution similar to Mr. Withers' answer. The main issue is how to deal with the unwanted and dreaded back button. It is easy to lock a document when it is opened, but there are many ways to close the XPage, and the user is not limited to just the navigation you provide but also can, as he stated, close the browser completely, use the back button, etc. So, the best way that I can think of is to create a few java objects which we will use in the application and session scopes.
The first step is to create a "LockedDocument" class. As we know, the documents are not serializable and we do not want to save the document itself in this object, we want to save the UNID and the time it was saved. We want to do it this way so that we can manage to clear the object after a given time (like thirty minutes to an hour). This class should also implement the comparable interface in order to sort the collection by this time so that the oldest documents are first and the newest documents are last.
Next we create another class that holds a list or a map with these LockedDocuments. This class must also have a thread (implement Runnable) that will check all documents every five minutes or so, I did not test this yet, but it should work). Any document that was locked thirty to sixty minutes ago (predefined) will be unlocked (deleted from the list). It is important that the list be sorted as described above and that the loop is "broken" when a time less than the locktime is reached in order to prevent unwanted processing.
The next step would be to include the user specific list in the sessionScope. This list is the LockedDocuments that this current user has. It is set when the user changes the document's status to editable, and is checked before the document is set to editable to prevent one document from being opened in multiple tabs by the same user. The lock is once again checked onquerysave(). Once a main page is opened, the lock is automatically released. The onquerysave() must also check to make sure the documents UNID is in the sessionScope list, or if the document is new before allowing a save.
quick recap
Any UNID saved in the applicationScope LockedDocumentList would not be editable by anyone unless it exists in their own sessionScope list.
It is possible to warn a user that their lockedTime is approaching and reset the timer.
The class containing a list with the locked documents must be a singleton
There are probably ways to improve this answer, and I am sure I am missing something. It is just a thought.
There might be a better way to handle this, but it is the best I found.
You can remove the Domino lock in window.onunload event:
window.onunload = function(){
dojo.xhrGet(...
}
No need to reinvent the wheel.

What is the best way to implement a timer that will be stopped and restarted in a Java Swing Application?

Ok, I'm working on my final dilemna for my project. The project is an IPv4 endpoint updater for TunnelBroker's IPv6 tunnel. I have everything working, except for the timer. It works, however if the user disables the "automatic update" and reenables it, the application crashes. I need the timer to be on an thread outside of the EDT (in such a way that it can be destroyed and recreated when the user unchecks/checks the automatic update feature or changes the amount of time between updates).
What I'm pasting here is the code for the checkbox that handles automatic updates, and the timer class. Hopefully this will be enough to get an answer on how to do this (I'm thinking either it needs to be a worker, or use multi-threading--even though only one timer will be active).
private void jCheckBox1ItemStateChanged(java.awt.event.ItemEvent evt) {
// TODO add your handling code here:
// if selected, then run timer for auto update
// set time textbox to setEditable(true) and get the time from it.
// else cancel timer. Try doing this on different
// class to prevent errors from happening on reselect.
int updateAutoTime = 0;
if (jCheckBox1.isSelected())
{
updateAutoTime = Integer.parseInt(jTextField4.getText())*60*1000;
if (updateAutoTime < 3600000)
{
updateAutoTime = 3600000;
jTextField4.setText(new Integer(updateAutoTime/60/1000).toString());
}
updateTimer.scheduleAtFixedRate(new TimerTask() {
public void run()
{
// Task here ...
if (jRadioButton1.isSelected())
{
newIPAddress = GetIP.getIPAddress();
}
else
{
newIPAddress = jTextField3.getText();
}
strUsername = jTextField1.getText();
jPasswordField1.selectAll();
strPassword = jPasswordField1.getSelectedText().toString();
strTunnelID = jTextField2.getText();
strIPAddress = newIPAddress;
if (!newIPAddress.equals(oldIPAddress))
{
//fire the tunnelbroker updater class
updateIP.setIPAddress(strUsername, strPassword, strTunnelID, strIPAddress);
oldIPAddress = newIPAddress;
jLabel8.setText(newIPAddress);
serverStatus = updateIP.getStatus().toString();
jLabel6.setText(serverStatus);
}
else
{
serverStatus = "No IP Update was needed.";
jLabel6.setText(serverStatus);
}
}
}, 0, updateAutoTime);
}
else
{
updateTimer.cancel();
System.out.println("Timer cancelled");
System.out.println("Purged {updateTimer.purge()} tasks.");
}
}
As I mentioned, this works once. But if the user deselects the checkbox, it won't work again. And the user can't change the value in jTextField4 after they select the checkbox.
So, what I'm looking for is this:
How to make this so that user can select and deselect the checkbox as they want (even if it's multiple times in a row).
How to make this so the user can change the value in jTextField4, and have it automatically cancel the current timer, and start a new one with the new value (I haven't done anything with the jTextField4 at all, so I'll have to create an event to cover it later).
Thanks, and have a great day:)
Patrick.
Perhaps this task would be better suited to a javax.swing.Timer. See Timer.restart() for details.
Note that Timer is relatively inaccurate over long time periods. One way to account for that is to have it repeat frequently but perform it's assigned task only one a certain time has been reached or passed.
Would I be able to wrap everything in the "task" portion of the call to Swing Timer, or do I have to create another class that handles the task?
You might want to wrap the grunt work in a SwingWorker to ensure the EDT is not blocked.
..I'm assuming that I would have to create the timer as a class-level declaration .. correct?
Yes, that is what I was thinking.

Resources