How should I handle mutable state in Clojure in this case? - multithreading

I'm fairly new to Clojure and LISPs in general, so excuse me in advance if this question sounds a bit silly. I've recently created a turn-based videogame in Java in a MVC fashion, with a thread in charge of the graphic loop (so, updating the geometric state of graphic entities on screen at a fixed rate), and a second thread in charge of handling the logic state of the game; it acted pretty much like a daemon: "sleeping" in the background unless requested to do something (elaborate user input), and then delivering a "change log" to the graphic thread so that it could render the changes made to the logic state. So the game world was not "shared" at all: only the logic thread had access to it, and the graphic thread's only duty was to apply changes to the rendering after an initialization phase and when a new a changed log arrived from the logic thread.
I read that Clojure has vars, refs and atoms. In such a scenario, which one of these identities should I use to store my game world state in the logic thread in Clojure?

In your case, no concurrent access is required, so any option is valid.
ref is a overshoot for this problem. var is not typically used in such a scenario (usually var rebinding is used for configurable parameters, instead of the business logic in your case). So atom should be fine.
EDIT: (elaboration on var)
var, when declared as ^:dynamic (and usually *earmuffed*), is thread-local. It can have a root binding, and each thread can rebind it to a new thread-local value. So typical uses of var you can see in clojure code repo, are *warn-on-reflection*, *unchecked-math*, etc. They mostly tune the behavior of our code somehow.
Since you have only one thread that works on the "board" data, it's OK to use var anyway, but it would look a bit weird to me. I become a bit upset when seeing a var be changed so often. :)

Related

Use cocoa bindings and threads

I have a few labels bound to a few variables that are modified in other threads via GCD.
Now I've read that cocoa bindings are not thread safe but my app is running fine (the UI updates when the values of the variables are updated in a background thread)
Would it be the correct way to do the calculations in the background thread and if I need to change the variable value make this via
DispatchQueue.main.sync() {
self.variable = newValue
}
?
If cocoa bindings are not thread safe, why I never encountered any crash because of a "read" of the bound UI element while the value was written by a background process?
What is the preferred way to have a value bound to a UI element (via cocoa bindings) and also modify it by async threads?
Thanks!
Yes, if you modify an object that is observed by Cocoa bindings, you should do so only on the main thread, and GCD dispatching the modification to the main thread is a good enough way to do that.
Yes, your app probably works fine most of the time, but that is likely luck based and not actually correct. The problem is that Cocoa bindings are based on Key Value Observation, and KVO notifications are posted on the thread that causes the mutation.
It’s also a complexity problem. As long as your app is relatively simple and fast, there’s much less chance of two threads running afoul of one another. Imagine when your app gets more complex and computationally intensive... and a problem crops up... but by this point you might have hundreds of places where you’re modifying bound properties from multiple threads. It’ll save you grief in the long run to just follow the rules. Use the main thread for updating bound to objects and try to keep bound properties to immutable, value-semantic types.

Designing concurrency in a Python program

I'm designing a large-scale project, and I think I see a way I could drastically improve performance by taking advantage of multiple cores. However, I have zero experience with multiprocessing, and I'm a little concerned that my ideas might not be good ones.
Idea
The program is a video game that procedurally generates massive amounts of content. Since there's far too much to generate all at once, the program instead tries to generate what it needs as or slightly before it needs it, and expends a large amount of effort trying to predict what it will need in the near future and how near that future is. The entire program, therefore, is built around a task scheduler, which gets passed function objects with bits of metadata attached to help determine what order they should be processed in and calls them in that order.
Motivation
It seems to be like it ought to be easy to make these functions execute concurrently in their own processes. But looking at the documentation for the multiprocessing modules makes me reconsider- there doesn't seem to be any simple way to share large data structures between threads. I can't help but imagine this is intentional.
Questions
So I suppose the fundamental questions I need to know the answers to are thus:
Is there any practical way to allow multiple threads to access the same list/dict/etc... for both reading and writing at the same time? Can I just launch multiple instances of my star generator, give it access to the dict that holds all the stars, and have new objects appear to just pop into existence in the dict from the perspective of other threads (that is, I wouldn't have to explicitly grab the star from the process that made it; I'd just pull it out of the dict as if the main thread had put it there itself).
If not, is there any practical way to allow multiple threads to read the same data structure at the same time, but feed their resultant data back to a main thread to be rolled into that same data structure safely?
Would this design work even if I ensured that no two concurrent functions tried to access the same data structure at the same time, either for reading or for writing?
Can data structures be inherently shared between processes at all, or do I always explicitly have to send data from one process to another as I would with processes communicating over a TCP stream? I know there are objects that abstract away that sort of thing, but I'm asking if it can be done away with entirely; have the object each thread is looking at actually be the same block of memory.
How flexible are the objects that the modules provide to abstract away the communication between processes? Can I use them as a drop-in replacement for data structures used in existing code and not notice any differences? If I do such a thing, would it cause an unmanageable amount of overhead?
Sorry for my naivete, but I don't have a formal computer science education (at least, not yet) and I've never worked with concurrent systems before. Is the idea I'm trying to implement here even remotely practical, or would any solution that allows me to transparently execute arbitrary functions concurrently cause so much overhead that I'd be better off doing everything in one thread?
Example
For maximum clarity, here's an example of how I imagine the system would work:
The UI module has been instructed by the player to move the view over to a certain area of space. It informs the content management module of this, and asks it to make sure that all of the stars the player can currently click on are fully generated and ready to be clicked on.
The content management module checks and sees that a couple of the stars the UI is saying the player could potentially try to interact with have not, in fact, had the details that would show upon click generated yet. It produces a number of Task objects containing the methods of those stars that, when called, will generate the necessary data. It also adds some metadata to these task objects, assuming (possibly based on further information collected from the UI module) that it will be 0.1 seconds before the player tries to click anything, and that stars whose icons are closest to the cursor have the greatest chance of being clicked on and should therefore be requested for a time slightly sooner than the stars further from the cursor. It then adds these objects to the scheduler queue.
The scheduler quickly sorts its queue by how soon each task needs to be done, then pops the first task object off the queue, makes a new process from the function it contains, and then thinks no more about that process, instead just popping another task off the queue and stuffing it into a process too, then the next one, then the next one...
Meanwhile, the new process executes, stores the data it generates on the star object it is a method of, and terminates when it gets to the return statement.
The UI then registers that the player has indeed clicked on a star now, and looks up the data it needs to display on the star object whose representative sprite has been clicked. If the data is there, it displays it; if it isn't, the UI displays a message asking the player to wait and continues repeatedly trying to access the necessary attributes of the star object until it succeeds.
Even though your problem seems very complicated, there is a very easy solution. You can hide away all the complicated stuff of sharing you objects across processes using a proxy.
The basic idea is that you create some manager that manages all your objects that should be shared across processes. This manager then creates its own process where it waits that some other process instructs it to change the object. But enough said. It looks like this:
import multiprocessing as m
manager = m.Manager()
starsdict = manager.dict()
process = Process(target=yourfunction, args=(starsdict,))
process.run()
The object stored in starsdict is not the real dict. instead it sends all changes and requests, you do with it, to its manager. This is called a "proxy", it has almost exactly the same API as the object it mimics. These proxies are pickleable, so you can pass as arguments to functions in new processes (like shown above) or send them through queues.
You can read more about this in the documentation.
I don't know how proxies react if two processes are accessing them simultaneously. Since they're made for parallelism I guess they should be safe, even though I heard they're not. It would be best if you test this yourself or look for it in the documentation.

Is calling a lua function(as a callback) from another thread safe enough?

Actually I am using visual C++ to try to bind lua functions as callbacks for socket events(in another thread). I initialize the lua stuff in one thread and the socket is in another thread, so every time the socket sends/receives a message, it will call the lua function and the lua function determines what it should do according to the 'tag' within the message.
So my questions are:
Since I pass the same Lua state to lua functions, is that safe? Doesn't it need some kinda protection? The lua functions are called from another thead so I guess they might be called simultaneously.
If it is not safe, what's the solution for this case?
It is not safe to call back asynchronously into a Lua state.
There are many approaches to dealing with this. The most popular involve some kind of polling.
A recent generic synchronization library is DarkSideSync
A popular Lua binding to libev is lua-ev
This SO answer recommends Lua Lanes with LuaSocket.
It is not safe to call function within one Lua state simultaneously in multiple threads.
I was dealing with the same problem, since in my application all basics such as communication are handled by C++ and all the business logic is implemented in Lua. What I do is create a pool of Lua states that are all created and initialised on an incremental basis (once there's not enough states, create one and initialise with common functions / objects). It works like this:
Once a connection thread needs to call a Lua function, it checks out an instance of Lua state, initialises specific globals (I call it a thread / connection context) in a separate (proxy) global table that prevents polluting the original global, but is indexed by the original global
Call a Lua function
Check the Lua state back in to the pool, where it is restored to the "ready" state (dispose of the proxy global table)
I think this approach would be well suited for your case as well. The pool checks each state (on an interval basis) when it was last checked out. When the time difference is big enough, it destroys the state to preserve resources and adjust the number of active states to current server load. The state that is checked out is the most recently used among the available states.
There are some things you need to consider when implementing such a pool:
Each state needs to be populated with the same variables and global functions, which increases memory consumption.
Implementing an upper limit for state count in the pool
Ensuring all the globals in each state are in a consistent state, if they happen to change (here I would recommend prepopulating only static globals, while populating dynamic ones when checking out a state)
Dynamic loading of functions. In my case there are many thousands of functions / procedures that can be called in Lua. Having them constantly loaded in all states would be a huge waste. So instead I keep them byte code compiled on the C++ side and have them loaded when needed. It turns out not to impact performance that much in my case, but your mileage may vary. One thing to keep in mind is to load them only once. Say you invoke a script that needs to call another dynamically loaded function in a loop. Then you should load the function as a local once before the loop. Doing it otherwise would be a huge performance hit.
Of course this is just one idea, but one that turned out to be best suited for me.
It's not safe, as the others mentioned
Depends on your usecase
Simplest solution is using a global lock using the lua_lock and lua_unlock macros. That would use a single Lua state, locked by a single mutex. For a low number of callbacks it might suffice, but for higher traffic it probably won't due to the overhead incurred.
Once you need better performance, the Lua state pool as mentioned by W.B. is a nice way to handle this. Trickiest part here I find synchronizing the global data across the multiple states.
DarkSideSync, mentioned by Doug, is useful in cases where the main application loop resides on the Lua side. I specifically wrote it for that purpose. In your case this doesn't seem a fit. Having said that; depending on your needs, you might consider changing your application so the main loop does reside on the Lua side. If you only handle sockets, then you can use LuaSocket and no synchronization is required at all. But obviously that depends on what else the application does.

What is the Cocoa-way of observing progress of a background task?

Imagine the following situation: you have a background task (the term "task" here means a random computational unit, not an NSTask!), that is implemented using any of the modern technology such as Grand Central Dispatch or Operation Queues. Some controller object on main thread wants to monitor the progress of this background task and report it to a user.
Task progress can have following characteristics:
Be indeterminate or determinate
Because controller object must know when to switch NSProgressIndicator to the appropriate style. We can use a convention that progress is treated as indeterminate until the actual progress value raises from zero.
Progress value itself
A simple float value
Localized description of a current phase
NSString, because communication with user is good
What design suits these requirements at best while being the most Cocoa-ish?
There can be variants.
Delegation
Before firing up the task set your controller object as delegate.
#protocol MyBackgroundTaskDelegate
#required
- (void) progress: (float) value; // 0.0…1.0
#optional
- (void) workingOn: (NSString*) msg; // #"Doing this, doing that…"
#end
Actually, i successfully used this template many times, but it feels a little too verbose.
Block callback
Very similar to delegation, but keeps code in one place.
// Starting our background task...
[MyTask startComputationWithProgressHandler: ^(float progress, NSString* msg)
{
// Switching to the main thread because all UI stuff should go there...
dispatch_async(dispatch_get_main_queue(), ^()
{
self.progressIndicator.progress = progress;
self.informationalMessage = msg;
});
}];
KVO or polling of a progress properties
In this case background task object must have two properties similar to these:
#property(readonly, atomic) float progress;
#property(readonly, atomic) NSString* message;
And a client (our controller object) should set itself as an observer of these properties. The major flaw i see in this solution is that KVO-notifications always arrive on the same thread that caused the change. While you can force your observer (callback) method to run on a particular GCD queue it may not be always appropriate.
NSNotificationCenter
Background task sends notifications and client listens to them.
Is there any other patterns applicable to this situation? What solution can be treated as a most modern and Cocoa-ish?
When it comes to What is the Cocoa-way of observing progress of a background task? I would say delegation and NSNotificationCenter because blocks and KVO were introduced later, and hence didn't originally exist in the first Cocoa code writting years. In fact optional protocol methods were not present in previous objc versions too, everything was required by default.
From that you can actually see that blocks are a simpler way of implementing adhoc delegates, where the receiver of the block declares what parameters are passed to the block, and you are free to do whatever you want with them in your block. And KVO seems to be a less boilerplate way of implementing NSNotification with a more standardized approach to properties, useful for joining the UI created in what previously was called Interface Bilder, and simplifying the "what the hell do I have to do to know when this value changes" which requires a lot of documentation with NSNotification and long constants.
But I still think that there are places for each of these techniques: blocks are nice for mini-adhoc protocols, but would be a serious bother if you need a medium or higher interface area or bidirectional interface, and KVO doesn't help with watching global variables or values outside of a class/object, or stuff you don't want to make part of your public interface.
So my definitive answer is:
1 to 1 simple communication: blocks
1 to 1 complex communication: delegates/protocols
1 to many simple communication: KVO (where possible)
1 to many complex communication: NSNotifications
As always, pick the best tool for each problem, and consider I'm guilty of implementing all of the above in none of the suggested ways!
For the type of task you describe, I feel that NSNotificationCenter is the best option for a generic pattern. The reason is that you can't know, generally, how many external observers there are. The notification system already supports an arbitrary number of observers for an event, whereas the other non-polling options (delegation and blocks) are more typically one-to-one unless you do extra work to support multiple registrations.
As you pointed out yourself, polling is a bad idea if you can avoid it.
In my experience delegation or block callback are the best design choices. Choosing one over the other is mostly dictated by which one is more convenient to code and support for the particular situation. Both are asynchronous. Block callbacks usually reduce the necessity for additional instance variables since blocks capture variables within their scope. Of course for both it's necessary to be aware on which thread the call back is executed or delegate method is called.
I'd go with KVO because you get it for free when using #properties basically.
BUT
I would not recommend using plain KVO. because that will always call - observerValueOfKeyPath... and once you observe multiple keypaths it gets annoying to maintain. you have this mega function with lots of if(keyPath==bla)......
I recommend MAKVONotificationCenter by MikeAsh for this. It also saves you from many a crash when you forget to remove an observer when you dont need it anymore

Does WinRT still have the same old UI threading restrictions?

In WinForms, pretty much all your UI is thread-specific. You have to use [STAThread] so that the common dialogs will work, and you can't (safely) access a UI element from any thread other than the one that created it. From what I've heard, that's because that's just how Windows works -- window handles are thread-specific.
In WPF, these same restrictions were kept, because ultimately it's still building on top of the same Windows API, still window handles (though mostly just for top-level windows), etc. In fact, WPF even made things more restrictive, because you can't even access things like bitmaps across threads.
Now along comes WinRT, a whole new way of accessing Windows -- a fresh, clean slate. Are we still stuck with the same old threading restrictions (specifically: only being able to manipulate a UI control from the thread that created it), or have they opened this up?
I would expect it to be the same model - but much easier to use, at least from C# and VB, with the new async handling which lets you write a synchronous-looking method which just uses "await" when it needs to wait for a long-running task to complete before proceeding.
Given the emphasis on making asynchronous code easier to write, it would be surprising for MS to forsake the efficiency of requiring single-threaded access to the UI at the same time.
The threading model is identical. There is still a notion of single threaded and multi-threaded apartments (STA/MTA), it must be initialized by a call to RoInitialize. Which behaves very much like CoInitialize in name, argument and error returns. The user interface thread is single threaded, confirmed at 36:00 in this video.
The HTML/CSS UI model is inherently single threaded (until the advent of web workers recently, JS didn't support threads). Xaml is also single threaded (because it's really hard for developers to write code to a multithreaded GUI).
The underlying threading model does have some key differences. When your application starts, an ASTA (Application STA) is created to run your UI code as I showed in the talk. This ASTA does not allow reentrancy - you will not receive unrelated calls while making an outgoing call. This is a significant difference from STAs.
You are allowed to create async workitems - see the Windows.System.Threadpool namespace. These workitem threads are automatically initialized to MTA. As Larry mentioned, webworkers are the JS equivalent concept.
Your UI components are thread affined. See the Windows.UI.Core.CoreDispatcher class for information on how to execute code on the UI thread. You can check out the threading sample for some example code to update the UI from an async operation.
Things are different in pretty important ways.
While it's true the underlying threading model is the same, your question is generally related to how logical concurrency works with UI, and with respect to this what developers see in Windows 8 will be new.
As you mention most dialogs previously blocked. For Metro apps many UI components do not block all. Remember the talk of WinRT being asynchronous? It applies to UI components also.
For example this .NET 4 code will not necessarily kill your harddrive because the UI call blocks on Show (C# example):
bool formatHardDrive = true;
if (MessageBox.Show("Format your harddrive?") == NO)
formatHardDrive = false;
if (formatHardDrive == true)
Format();
With Windows 8 Metro many UI components like Windows.UI.Popups.MessageDialog, are by default Asynchronous so the Show call would immediately (logically) fall through to the next line of code before the user input is retrieved.
Of course there is an elegant solution to this based on the await/promise design patterns (Javascript example):
var md = Windows.UI.Popups.MessageDialog("Hello World!");
md.showAsync().then(function (command) {
console.log("pressed: " + command.label); });
The point is that while the threading model doesn't change, when most people mention UI and threading they are thinking about logical concurrency and how it affects the programming model.
Overall I think the asynchronous paradigm shift is a positive thing. It requires a bit of a shift in perspective, but it's consistent with the way other platforms are evolving on both the client and server sides.

Resources