Love2D, Threads execution and input - multithreading

Please, i got a problem with Love2d thread functions, and i didnt find any examples or explanations i would understand anywhere.
first:
in main file i got:
thread1 = love.thread.newThread("ambient.lua")
thread2 = love.thread.newThread("ambient.lua")
thread1:start()
thread2:start()
ambient.lua contains:
random1 = love.math.random(1, 10)
gen1 = love.audio.newSource("audio/ambient/gen/a/gen_a_".."random1"..".mp3",
"static")
gen1:setVolume(1.0)
gen1:setLooping(false)
gen1:play()
works fine, problem is that when i ask var = Thread1:isRunning( ) in same step or with delay, when audio is playing and try to print it, it throws error (is supposedly null). when the audio finishes, i see that memory is cleared. also if i link thread1:start() to mouse click and then start it multiple times in short time, memory usage goes up like crazy, then after time similar to sample length it starts to decrease. question is, am i creating multiple threads? in that case did they even terminate properly after samples ended? or is the thread lifetime just 1-step long and i am only creating multiple audio sources playing with the same thread? is problem in the check itself?
next issue is that i need to use thread1:start() with values:
thread1:start(volume, sampleID)
and i have no clue how to addres them in the thread itself. guides and examples says "vararg" reference. i didnt see any decent explanation or any example containing "..." usage in variable input into threads. i am in need of example how to write it. even if this audio fiddle is not a great example, i will surely need it for AI. No need for complicated input, just simple x,y,size,target_x,target_y values.

and i have no clue how to addres them in the thread itself. guides and
examples says "vararg" reference. i didnt see any decent explanation
or any example containing "..." usage in variable input into threads
You didn't read manuals enough. Every loaded Lua chunk (section 2.4.1 of Lua 5.1 manuals) is an anonymous function with variable number of arguments.When you call love.thread.newThread("ambient.lua"), Love2D will create new chunk, so basic Lua rules applies to this case.
In your example, volume and sampleID parameters from within thread would be retrieved like this:
local volume, sampleID = ...
gen1 = love.audio.newSource(get_stream_by_id(sampleID), "static")
gen1:setVolume(volume)
gen1:setLooping(false)
gen1:play()

Related

Delphi [volatile] and InterlockedCompareExchange not reliable?

I wrote a simple lock-free node stack (Delp[hi XE4, Win7-64, 32-bit app) where I can have multiple 'stacks' and pop/push nodes between them from various threads simultaneously.
It works 99.999% of the time but eventually glitch under a stress test using all CPU cores.
Stripped-down, it comes down to this (not real/compiled code):
Nodes are :
type POBNode = ^TOBNode;
[volatile]TOBNode = record
[volatile]Next : POBNode;
Data : Int64;
end;
Simplified stack :
type TOBStack = class
private
[volatile]Head:POBNode;
function Pop:POBNode;
procedure Push(NewNode:POBNode);
end;
procedure TOBStack.Push(NewNode:POBNode);
var zTmp : POBNode;
begin;
repeat
zTmp:=InterlockedCompareExchangePointer(Pointer(Head),nil,nil);(memory fenced-read*)
NewNode.Next:=zTmp;
if InterlockedCompareExchangePointer(Head,NewNode,zTmp)=zTmp
then break (*success*)
else continue;
until false;
end;
function TOBStack.Pop:POBNode;
begin;
repeat
Result:=InterlockedCompareExchangePointer(Pointer(Head),nil,nil);(memory fenced-read*)
if Result=nil
the exit;
NewHead:=Result.Next;
if InterlockedCompareExchangePointer(Pointer(Head),NewHead,Result)=Result
then break (*Success*)
else continue;(*Fail, try again*)
until False;
end;
I have tried many variations on this but cannot get it to be stable.
If I create a few threads that each have a stack and they all push/pop to/from a global stack, it eventually glitch, but not quickly. Only when I stress it for minutes on end from multiple threads, in tight loops.
I cannot have hidden bugs in my code, so I need more advice than to ask a specific question to get this running 100% error-free, 24/7.
Does the code above look fine for a lock-free thread-safe stack?
What else can I look at? This is impossible to debug normally as the errors occur at various places, telling me there is a pointer or RAM corruption happening somewhere. I also get duplicate entries, meaning that a node that was popped of one stack, then later returned to that same stack, is still on top of the old stack... Impossible to happen according to my algorithm? This lead me to believe it's possible to violate Delphi/Windows InterlockedCompareExchange Methods or there is some hidden knowledge I have yet to be revealed. :) (I also tried TInterlocked)
I have made a complete test case that can be copied from ftp.true.co.za.
In there I run 8 threads doing 400 000 push/pops each and it usually crashes (safely due to checks/raised exceptions) after a few cycles of these tests, sometimes many many test cycles complete before one suddenly crash.
Any advice would be appreciated.
Regards
Anton
E
At first I was skeptical of this being an ABA problem as indicated by gabr. It seemed to me that: if one thread looks at the current Head, and another thread pushes then pops; you're happy to still operate on the same Head in the same way.
However, consider this from your Pop method:
NewHead:=Result.Next;
if InterlockedCompareExchangePointer(Pointer(Head),NewHead,Result)=Result
If the thread is swapped out after the first line.
A value for NewHead is stored in a local variable.
Then another thread successfully pops the node this thread was targetting.
It also manages to push the same node back, but with a different value for Next before the first thread resumes.
The second line will pass the comparand check allowing head to receive the NewHead value from the local variable.
However, the current value for NewHead is incorrect, thereby corrupting your stack.
There's a subtle variation on this problem not even covered by your test app. This problem isn't checked in your test app because you aren't destroying any nodes until the end of your test.
Look at current head
Another thread pops some nodes.
The nodes are destroyed, and new nodes created and pushed.
By the time your "looking thread" is active again, it could be looking at an entirely different node that is coincidentally at the same address.
If you're popping, you might assign a garbage pointer to Head.
Apart form the above...
Looking at your test app there's also some really dodgy code. E.g.
You generate a "random number": J:=GetTickCount and 7;(*Get a 'random' number 0..7*).
Do you realise how fast computers are?
Do you realise that GetTickCount will generate reams of duplicates in a tight loop?
I.e. the numbers you generate will be nothing like random.
And when comments don't agree with code, my spidey-sense kicks in.
You're allocating memory of a hard-coded size: GetMem(zTmp,12);(*Allocate new node*).
Why aren't you using SizeOf?
You're using a multi-platform compiler.... The size of that structure can vary.
There is absolutely zero reason to hard-code the size of the structure.
Right now, given these two examples, I wouldn't be entirely confident that there isn't also an error in your test code.

ocaml sdl chunk garbage collection

So here's the deal: at first I thought the following was some really sexy code for playing and then freeing a wav file when it is finished, without freezing the machine with a delay command (assuming the program is and will be doing stuff for the duration of the wav file, i.e., not exiting, when the function is called):
let rec play_wav file play =
Sdlmixer.open_audio ~freq:44100 ();
let loaded_file = Sdlmixer.loadWAV file in
if play = false then
Sdlmixer.free_chunk loaded_file
else
(
Sdlmixer.play_channel ~loops:1 loaded_file;
play_wav file false
)
;;
I should also say there are probably better ways of accomplishing the same task, and it might only work because of machine particular features, etc., but now I have just an academic curiosity about whether:
(1) the file is being loaded twice, and freed only once, thus making decidedly unpretty code;
or, contrariwise,
(2) whether the wav file loaded twice by Sdlmixer.loadWAV is not assigned two separate memory addresses, mallocs, etc., or in separation logic h = (h1 * emp) is a post-condition ;) In other words, if once loaded, loading it again is operationally ineffectual, and a single free will free the chunk, no matter how many times it was loaded.
and lastly, whether
(3) the Sdlmixer.free_chunk is even necessary, since the similar free_surface C function for the OCaml-sdl libraries is not implemented.
Running valgrind on all of the below does not seem to indicate memory leaks:
(a) a program containing the play_wav function,
(b) with a function that fails to free the chunk,
(c) with a sequential load-play-wait-free_chunk code block,
(d) with a function that loads the same wav file 1000 times.
(Actually, and technically, in every case it states "definitely lost: 337 bytes in 4 blocks", not sure what that's about, but regardless valgrind reports the same memory results for all four cases.)
I imagine in the case of (b) OCaml's garbage collector takes care of this when the program terminates, so its hard to say if its still loaded and taking up memory after that particular routine finishes, and hence needs freeing, since when the function finishes the program terminates, so it probably is a good idea to use the free chunk function in larger programs.
Anyway, was just wondering what people's thoughts and opinions on this might be.
From perusing the source code of the library, it appears to me that chunk values aren't automatically garbage collected, so you have to free them explicitly, or setup your own handler to do that (see Gc.finalize).
It is strange to me that valgrind doesn't report any significant problem.
Your code is indeed loading the sample twice if the parameter play is true on the initial invocation.
I suppose you need the play parameter outside the fact that you are using it to stop and release the sample in the function.
Perhaps something like:
let play_wav = function
true -> (fun file ->
Sdlmixer.open_audio ~freq:44100 ();
let loaded_file = Sdlmixer.loadWAV file in
Sdlmixer.play_channel ~loops:1 loaded_file;
Sdlmixer.delay 1000;
Sdlmixer.free_chunk loaded_file;
Sdlmixer.close_audio ())
| _ -> (fun _ -> ())
could fit your need. I swapped the two parameters of play_wav, so as to make it more obvious to the compiler that it doesn't have anything to do when play is false. If you pass false explicitly, the function should be optimized away (I believe).
I added the close_audio call that was missing, and a delay to give the mixer some time to play the sample.
Now, if you need to play the same sample many times, it might be more interesting to cache it to avoid reloading it later on.

Multithreading (pthreads)

I'm working on a project where I need to make a program run on multiple threads. However, I'm running into a bit of an issue.
In my program, I have an accessory function called 'func_call'.
If I use this in my code:
func_call((void*) &my_pixels);
The program runs fine.
However, if I try to create a thread, and then run the function on that, the program runs into a segmentation fault.
pthread_t thread;
pthread_create (&thread, NULL, (void*)&func_call, (void*) &my_pixels);
I've included pthread.h in my program. Any ideas what might be wrong?
You are not handling data in a thread safe manner:
the thread copies data from the thread argument, which is a pointer to the main thread's my_pixels variable; the main thread may exit, making my_pixles invalid.
the thread uses scene, main thread calls free_scene() on it, which I imagine makes it invalid
the thread calls printf(), the main thread closes stdout (kind of unusual itself)
the thread updates the picture array, the main thread accesses picture to output data from it
It looks like you should just wait for the thread to finish its work after creating it - call pthread_join() to do that.
For a single thread, that would seem to be pointless (you've just turned a multi-threaded program into a single threaded program). But on the basis of code that's commented out, it looks like you're planning to start up several threads that work on chunks of the data. So, when you get to the point of trying that again, make sure you join all the threads you start. As long as the threads don't modify the same data, it'll work. Note that you'll need to use separate my_pixels instances for each thread (make an array of them, just like you did with pthreads), or some threads will likely get parameters that are intended for a different thread.
Without knowing what func_call does, it is difficult to give you an answer. Nevertheless, here are few possibilities
Does func_call use some sort of a global state - check if that is initialized properly from within the thread. The order of execution of threads is not always the same for every execution
Not knowing your operating system (AIX /Linux/Solaris etc) it is difficult to answer this, but please check your compilation options
Please provide the signal trapped and atleast a few lines of the stack-trace - for all the threads. One thing you can check for yourself is to print the threads' stack-track (using threads/thread or pthread and thread current <x> based on the debugger) and and if there is a common data that is being accessed. It is most likely that the segfault occurred when two threads were trying to read off the other's (uncommitted) change
Hope that helps.
Edit:
After checking your code, I think the problem is the global picture array. You seem to be modifying that in the thread function without any guards. You loop using px and py and all the threads will have the same px and py and will try to write into the picture array at the same time. Please try to modify your code to prevent multiple threads from stepping on each other's data modifications.
Is func_call a function, or a function pointer? If it's a function pointer, there is your problem: you took the address of a function pointer and then cast it.
People are guessing because you've provided only a fraction of the program, which mentions names like func_call with no declaration in scope.
Your compiler must be giving you diagnostics about this program, because you're passing a (void *) expression to a function pointer parameter.
Define your thread function in a way that is compatible with pthread_create, and then just call it without any casts.

What multithreading package for Lua "just works" as shipped?

Coding in Lua, I have a triply nested loop that goes through 6000 iterations. All 6000 iterations are independent and can easily be parallelized. What threads package for Lua compiles out of the box and gets decent parallel speedups on four or more cores?
Here's what I know so far:
luaproc comes from the core Lua team, but the software bundle on luaforge is old, and the mailing list has reports of it segfaulting. Also, it's not obvious to me how to use the scalar message-passing model to get results ultimately into a parent thread.
Lua Lanes makes interesting claims but seems to be a heavyweight, complex solution. Many messages on the mailing list report trouble getting Lua Lanes to build or work for them. I myself have had trouble getting the underlying "Lua rocks" distribution mechanism to work for me.
LuaThread requires explicit locking and requires that communication between threads be mediated by global variables that are protected by locks. I could imagine worse, but I'd be happier with a higher level of abstraction.
Concurrent Lua provides an attractive message-passing model similar to Erlang, but it says that processes do not share memory. It is not clear whether spawn actually works with any Lua function or whether there are restrictions.
Russ Cox proposed an occasional threading model that works only for C threads. Not useful for me.
I will upvote all answers that report on actual experience with these or any other multithreading package, or any answer that provides new information.
For reference, here is the loop I would like to parallelize:
for tid, tests in pairs(tests) do
local results = { }
matrix[tid] = results
for i, test in pairs(tests) do
if test.valid then
results[i] = { }
local results = results[i]
for sid, bin in pairs(binaries) do
local outcome, witness = run_test(test, bin)
results[sid] = { outcome = outcome, witness = witness }
end
end
end
end
The run_test function is passed in as an argument, so a package can be useful to me only if it can run arbitrary functions in parallel. My goal is enough parallelism to get 100% CPU utilization on 6 to 8 cores.
Norman wrote concerning luaproc:
"it's not obvious to me how to use the scalar message-passing model to get results ultimately into a parent thread"
I had the same problem with a use case I was dealing with. I liked lua proc due to its simple and light implementation, but my use case had C code that was calling lua, which was triggering a co-routine that needed to send/receive messages to interact with other luaproc threads.
To achieve my desired functionality I had to add features to luaproc to allow sending and receiving messages from the parent thread or any other thread not running from the luaproc scheduler. Additionally, my changes allow using luaproc send/receive from coroutines created from luaproc.newproc() created lua states.
I added an additional luaproc.addproc() function to the api which is to be called from any lua state running from a context not controlled by the luaproc scheduler in order to set itself up with luaproc for sending/receiving messages.
I am considering posting the source as a new github project or contacting the developers and seeing if they would like to pull my additions. Suggestions as to how I should make it available to others are welcome.
Check the threads library in torch family. It implements a thread pool model: a few true threads (pthread in linux and windows thread in win32) are created first. Each thread has a lua_State object and a blocking job queue that admits jobs added from the main thread.
Lua objects are copied over from main thread to the job thread. However C objects such as Torch tensors or tds data structures can be passed to job threads via pointers -- this is how limited shared memory is achieved.
This is a perfect example of MapReduce
You can use LuaRings to accomplish your parallelization needs.
Concurrent Lua might seem like the way to go, but as I note in my updates below, it doesn't run things in parallel. The approach I tried was to spawn several processes that execute pickled closures received through the message queue.
Update
Concurrent Lua seems to handle first-class functions and closures without a hitch. See the following example program.
require 'concurrent'
local NUM_WORKERS = 4 -- number of worker threads to use
local NUM_WORKITEMS = 100 -- number of work items for processing
-- calls the received function in the local thread context
function worker(pid)
while true do
-- request new work
concurrent.send(pid, { pid = concurrent.self() })
local msg = concurrent.receive()
-- exit when instructed
if msg.exit then return end
-- otherwise, run the provided function
msg.work()
end
end
-- creates workers, produces all the work and performs shutdown
function tasker()
local pid = concurrent.self()
-- create the worker threads
for i = 1, NUM_WORKERS do concurrent.spawn(worker, pid) end
-- provide work to threads as requests are received
for i = 1, NUM_WORKITEMS do
local msg = concurrent.receive()
-- send the work as a closure
concurrent.send(msg.pid, { work = function() print(i) end, pid = pid })
end
-- shutdown the threads as they complete
for i = 1, NUM_WORKERS do
local msg = concurrent.receive()
concurrent.send(msg.pid, { exit = true })
end
end
-- create the task process
local pid = concurrent.spawn(tasker)
-- run the event loop until all threads terminate
concurrent.loop()
Update 2
Scratch all of that stuff above. Something didn't look right when I was testing this. It turns out that Concurrent Lua isn't concurrent at all. The "processes" are implemented with coroutines and all run cooperatively in the same thread context. That's what we get for not reading carefully!
So, at least I eliminated one of the options I guess. :(
I realize that this is not a works-out-of-the-box solution, but, maybe go old-school and play with forks? (Assuming you're on a POSIX system.)
What I would have done:
Right before your loop, put all tests in a queue, accessible between processes. (A file, a Redis LIST or anything else you like most.)
Also before the loop, spawn several forks with lua-posix (same as the number of cores or even more depending on the nature of tests). In parent fork wait until all children will quit.
In each fork in a loop, get a test from the queue, execute it, put results somewhere. (To a file, to a Redis LIST, anywhere else you like.) If there are no more tests in queue, quit.
In the parent fetch and process all test results as you do now.
This assumes that test parameters and results are serializable. But even if they are not, I think that it should be rather easy to cheat around that.
I've now built a parallel application using luaproc. Here are some misconceptions that kept me from adopting it sooner, and how to work around them.
Once the parallel threads are launched, as far as I can tell there is no way for them to communicate back to the parent. This property was the big block for me. Eventually I realized the way forward: when it's done forking threads, the parent stops and waits. The job that would have been done by the parent should instead be done by a child thread, which should be dedicated to that job. Not a great model, but it works.
Communication between parent and children is very limited. The parent can communicate only scalar values: strings, Booleans, and numbers. If the parent wants to communicate more complex values, like tables and functions, it must code them as strings. Such coding can take place inline in the program, or (especially) functions can be parked into the filesystem and loaded into the child using require.
The children inherit nothing of the parent's environment. In particular, they don't inherit package.path or package.cpath. I had to work around this by the way I wrote the code for the children.
The most convenient way to communicate from parent to child is to define the child as a function, and to have the child capture parental information in its free variables, known in Lua parlances as "upvalues." These free variables may not be global variables, and they must be scalars. Still, it's a decent model. Here's an example:
local function spawner(N, workers)
return function()
local luaproc = require 'luaproc'
for i = 1, N do
luaproc.send('source', i)
end
for i = 1, workers do
luaproc.send('source', nil)
end
end
end
This code is used as, e.g.,
assert(luaproc.newproc(spawner(randoms, workers)))
This call is how values randoms and workers are communicated from parent to child.
The assertion is essential here, as if you forget the rules and accidentally capture a table or a local function, luaproc.newproc will fail.
Once I understood these properties, luaproc did indeed work "out of the box", when downloaded from askyrme on github.
ETA: There is an annoying limitation: in some circumstances, calling fread() in one thread can prevent other threads from being scheduled. In particular, if I run the sequence
local file = io.popen(command, 'r')
local result = file:read '*a'
file:close()
return result
the read operation blocks all other threads. I don't know why this is---I assume it is some nonsense going on within glibc. The workaround I used was to call directly to read(2), which required a little glue code, but this works properly with io.popen and file:close().
There's one other limitation worth noting:
Unlike Tony Hoare's original conception of communicating sequential processing, and unlike most mature, serious implementations of synchronous message passing, luaproc does not allow a receiver to block on multiple channels simultaneously. This limitation is serious, and it rules out many of the design patterns that synchronous message-passing is good at, but it's still find for many simple models of parallelism, especially the "parbegin" sort that I needed to solve for my original problem.

Interview Question on .NET Threading

Could you describe two methods of synchronizing multi-threaded write access performed
on a class member?
Please could any one help me what is this meant to do and what is the right answer.
When you change data in C#, something that looks like a single operation may be compiled into several instructions. Take the following class:
public class Number {
private int a = 0;
public void Add(int b) {
a += b;
}
}
When you build it, you get the following IL code:
IL_0000: nop
IL_0001: ldarg.0
IL_0002: dup
// Pushes the value of the private variable 'a' onto the stack
IL_0003: ldfld int32 Simple.Number::a
// Pushes the value of the argument 'b' onto the stack
IL_0008: ldarg.1
// Adds the top two values of the stack together
IL_0009: add
// Sets 'a' to the value on top of the stack
IL_000a: stfld int32 Simple.Number::a
IL_000f: ret
Now, say you have a Number object and two threads call its Add method like this:
number.Add(2); // Thread 1
number.Add(3); // Thread 2
If you want the result to be 5 (0 + 2 + 3), there's a problem. You don't know when these threads will execute their instructions. Both threads could execute IL_0003 (pushing zero onto the stack) before either executes IL_000a (actually changing the member variable) and you get this:
a = 0 + 2; // Thread 1
a = 0 + 3; // Thread 2
The last thread to finish 'wins' and at the end of the process, a is 2 or 3 instead of 5.
So you have to make sure that one complete set of instructions finishes before the other set. To do that, you can:
1) Lock access to the class member while it's being written, using one of the many .NET synchronization primitives (like lock, Mutex, ReaderWriterLockSlim, etc.) so that only one thread can work on it at a time.
2) Push write operations into a queue and process that queue with a single thread. As Thorarin points out, you still have to synchronize access to the queue if it isn't thread-safe, but it's worth it for complex write operations.
There are other techniques. Some (like Interlocked) are limited to particular data types, and there are even more (like the ones discussed in Non-blocking synchronization and Part 4 of Joseph Albahari's Threading in C#), though they are more complex: approach them with caution.
In multithreaded applications, there are many situations where simultaneous access to the same data can cause problems. In such cases synchronization is required to guarantee that only one thread has access at any one time.
I imagine they mean using the lock-statement (or SyncLock in VB.NET) vs. using a Monitor.
You might want to read this page for examples and an understanding of the concept. However, if you have no experience with multithreaded application design, it will likely become quickly apparent, should your new employer put you to the test. It's a fairly complicated subject, with many possible pitfalls such as deadlock.
There is a decent MSDN page on the subject as well.
There may be other options, depending on the type of member variable and how it is to be changed. Incrementing an integer for example can be done with the Interlocked.Increment method.
As an excercise and demonstration of the problem, try writing an application that starts 5 simultaneous threads, incrementing a shared counter a million times per thread. The intended end result of the counter would be 5 million, but that is (probably) not what you will end up with :)
Edit: made a quick implementation myself (download). Sample output:
Unsynchronized counter demo:
expected counter = 5000000
actual counter = 4901600
Time taken (ms) = 67
Synchronized counter demo:
expected counter = 5000000
actual counter = 5000000
Time taken (ms) = 287
There are a couple of ways, several of which are mentioned previously.
ReaderWriterLockSlim is my preferred method. This gives you a database type of locking, and allows for upgrading (although the syntax for that is incorrect in the MSDN last time I looked and is very non-obvious)
lock statements. You treat a read like a write and just prevent access to the variable
Interlocked operations. This performs an operations on a value type in an atomic step. This can be used for lock free threading (really wouldn't recommend this)
Mutexes and Semaphores (haven't used these)
Monitor statements (this is essentially how the lock keyword works)
While I don't mean to denigrate other answers, I would not trust anything that does not use one of these techniques. My apologies if I have forgotten any.

Resources