ocaml sdl chunk garbage collection - garbage-collection

So here's the deal: at first I thought the following was some really sexy code for playing and then freeing a wav file when it is finished, without freezing the machine with a delay command (assuming the program is and will be doing stuff for the duration of the wav file, i.e., not exiting, when the function is called):
let rec play_wav file play =
Sdlmixer.open_audio ~freq:44100 ();
let loaded_file = Sdlmixer.loadWAV file in
if play = false then
Sdlmixer.free_chunk loaded_file
else
(
Sdlmixer.play_channel ~loops:1 loaded_file;
play_wav file false
)
;;
I should also say there are probably better ways of accomplishing the same task, and it might only work because of machine particular features, etc., but now I have just an academic curiosity about whether:
(1) the file is being loaded twice, and freed only once, thus making decidedly unpretty code;
or, contrariwise,
(2) whether the wav file loaded twice by Sdlmixer.loadWAV is not assigned two separate memory addresses, mallocs, etc., or in separation logic h = (h1 * emp) is a post-condition ;) In other words, if once loaded, loading it again is operationally ineffectual, and a single free will free the chunk, no matter how many times it was loaded.
and lastly, whether
(3) the Sdlmixer.free_chunk is even necessary, since the similar free_surface C function for the OCaml-sdl libraries is not implemented.
Running valgrind on all of the below does not seem to indicate memory leaks:
(a) a program containing the play_wav function,
(b) with a function that fails to free the chunk,
(c) with a sequential load-play-wait-free_chunk code block,
(d) with a function that loads the same wav file 1000 times.
(Actually, and technically, in every case it states "definitely lost: 337 bytes in 4 blocks", not sure what that's about, but regardless valgrind reports the same memory results for all four cases.)
I imagine in the case of (b) OCaml's garbage collector takes care of this when the program terminates, so its hard to say if its still loaded and taking up memory after that particular routine finishes, and hence needs freeing, since when the function finishes the program terminates, so it probably is a good idea to use the free chunk function in larger programs.
Anyway, was just wondering what people's thoughts and opinions on this might be.

From perusing the source code of the library, it appears to me that chunk values aren't automatically garbage collected, so you have to free them explicitly, or setup your own handler to do that (see Gc.finalize).
It is strange to me that valgrind doesn't report any significant problem.
Your code is indeed loading the sample twice if the parameter play is true on the initial invocation.
I suppose you need the play parameter outside the fact that you are using it to stop and release the sample in the function.
Perhaps something like:
let play_wav = function
true -> (fun file ->
Sdlmixer.open_audio ~freq:44100 ();
let loaded_file = Sdlmixer.loadWAV file in
Sdlmixer.play_channel ~loops:1 loaded_file;
Sdlmixer.delay 1000;
Sdlmixer.free_chunk loaded_file;
Sdlmixer.close_audio ())
| _ -> (fun _ -> ())
could fit your need. I swapped the two parameters of play_wav, so as to make it more obvious to the compiler that it doesn't have anything to do when play is false. If you pass false explicitly, the function should be optimized away (I believe).
I added the close_audio call that was missing, and a delay to give the mixer some time to play the sample.
Now, if you need to play the same sample many times, it might be more interesting to cache it to avoid reloading it later on.

Related

F# / MailBoxProcessor is unresponsive to PostAndReply under nearly 100% load

I have a MailBoxProcessor, which does the following things:
Main loop (type AsyncRunner: https://github.com/kkkmail/ClmFSharp/blob/master/Clm/ContGen/AsyncRun.fs#L257 – the line number may change as I keep updating the code). It generates some "models", compiles each of them into a model specific folder, spawns them as external processes, and then each model uses WCF to "inform" AsyncRunner about its progress by calling updateProgress. A model may take several days to run. Once any of the models is completed, the runner generates / spawns more. It is designed to run at 100% processor load (but with priority: ProcessPriorityClass.BelowNormal), though I can specify a smaller number of logical cores to use (some number between 1 and Environment.ProcessorCount). Currently I "async"-ed almost everything that goes inside MailBoxProcessor by using … |> Async.Start to ensure that I "never ever" block the main loop.
I can "ask" the runner (using WCF) about its state by calling member this.getState () = messageLoop.PostAndReply GetState.
OR I can send some commands to it (again using WCF), e.g. member this.start(), member this.stop(), …
Here is where it gets interesting. Everything works! However, if I run a "monitor", which would ask for a state by effectively calling PostAndReply (exposed as this.getState ()) in an infinite loop, the after a while it sort of hangs up. I mean that it does eventually return, but with some unpredictably large delays (like a few minutes). At that same time, I can issue commands and they do return fast while getState still has not returned.
Is it possible to make it responsive at nearly 100% load? Thanks a lot!
I would suggest not asyncing anything(other than your spawning of processes) in your main program, since your code creates additional processes. Your main loop is waiting on the loop return to continue before processing the GetState() method.

Love2D, Threads execution and input

Please, i got a problem with Love2d thread functions, and i didnt find any examples or explanations i would understand anywhere.
first:
in main file i got:
thread1 = love.thread.newThread("ambient.lua")
thread2 = love.thread.newThread("ambient.lua")
thread1:start()
thread2:start()
ambient.lua contains:
random1 = love.math.random(1, 10)
gen1 = love.audio.newSource("audio/ambient/gen/a/gen_a_".."random1"..".mp3",
"static")
gen1:setVolume(1.0)
gen1:setLooping(false)
gen1:play()
works fine, problem is that when i ask var = Thread1:isRunning( ) in same step or with delay, when audio is playing and try to print it, it throws error (is supposedly null). when the audio finishes, i see that memory is cleared. also if i link thread1:start() to mouse click and then start it multiple times in short time, memory usage goes up like crazy, then after time similar to sample length it starts to decrease. question is, am i creating multiple threads? in that case did they even terminate properly after samples ended? or is the thread lifetime just 1-step long and i am only creating multiple audio sources playing with the same thread? is problem in the check itself?
next issue is that i need to use thread1:start() with values:
thread1:start(volume, sampleID)
and i have no clue how to addres them in the thread itself. guides and examples says "vararg" reference. i didnt see any decent explanation or any example containing "..." usage in variable input into threads. i am in need of example how to write it. even if this audio fiddle is not a great example, i will surely need it for AI. No need for complicated input, just simple x,y,size,target_x,target_y values.
and i have no clue how to addres them in the thread itself. guides and
examples says "vararg" reference. i didnt see any decent explanation
or any example containing "..." usage in variable input into threads
You didn't read manuals enough. Every loaded Lua chunk (section 2.4.1 of Lua 5.1 manuals) is an anonymous function with variable number of arguments.When you call love.thread.newThread("ambient.lua"), Love2D will create new chunk, so basic Lua rules applies to this case.
In your example, volume and sampleID parameters from within thread would be retrieved like this:
local volume, sampleID = ...
gen1 = love.audio.newSource(get_stream_by_id(sampleID), "static")
gen1:setVolume(volume)
gen1:setLooping(false)
gen1:play()

When does the garbage collector run when calling Haskell exports from C?

When exporting a Haskell function to be called from C, when does Haskell's garbage get collected? If C owns main then there is no way to predict the next call in to Haskell. This question is especially pertinent when running single-threaded Haskell or without parallel GC.
When you initialize the ghc runtime, you can pass rts flags to it via argc and argv like so:
RtsConfig conf = defaultRtsConfig;
conf.rts_opts_enabled = RtsOptsAll;
hs_init_ghc(&argc, &argv, conf);
This lets you set options to, for example fix a smaller maximum heap size or use a compaction algorithm on the nursery to further reduce allocation. Further, note there is an idle GC whose interval can be set (or disabled), and if you link the threaded runtime, that should run whether or not you ever yield back to a Haskell call.
Edit: I haven't actually performed experimentation to verify the following, but if we look at the source of hs_init_ghc we see that it initializes signal handlers, which should include the timer handlers that respond on SIGVTALRM and indeed it also starts the time, which calls (on POSIX) timer_create that should throw those signals on regular intervals. In turn, this periodically should "wake up" the RTS whether or not anything is happening, which in turn should mean that it will run idle GC whether or not the system yields back to Haskell from C. But again, I have only read the code and commentary, not tested this myself.

If one thread writes to a location and another thread is reading, can the second thread see the new value then the old?

Start with x = 0. Note there are no memory barriers in any of the code below.
volatile int x = 0
Thread 1:
while (x == 0) {}
print "Saw non-zer0"
while (x != 0) {}
print "Saw zero again!"
Thread 2:
x = 1
Is it ever possible to see the second message, "Saw zero again!", on any (real) CPU? What about on x86_64?
Similarly, in this code:
volatile int x = 0.
Thread 1:
while (x == 0) {}
x = 2
Thread 2:
x = 1
Is the final value of x guaranteed to be 2, or could the CPU caches update main memory in some arbitrary order, so that although x = 1 gets into a CPU's cache where thread 1 can see it, then thread 1 gets moved to a different cpu where it writes x = 2 to that cpu's cache, and the x = 2 gets written back to main memory before x = 1.
Yes, it's entirely possible. The compiler could, for example, have just written x to memory but still have the value in a register. One while loop could check memory while the other checks the register.
It doesn't happen due to CPU caches because cache coherency hardware logic makes the caches invisible on all CPUs you are likely to actually use.
Theoretically, the write race you talk about could happen due to posted write buffering and read prefetching. Miraculous tricks were used to make this impossible on x86 CPUs to avoid breaking legacy code. But you shouldn't expect future processors to do this.
Leaving aside for a second tricks done by the compiler (even ones allowed by language standards), I believe you're asking how the micro-architecture could behave in such scenario. Keep in mind that the code would most likely expand into a busy wait loop of cmp [x] + jz or something similar, which hides a load inside it. This means that [x] is likely to live in the cache of the core running thread 1.
At some point, thread 2 would come and perform the store. If it resides on a different core, the line would first be invalidated completely from the first core. If these are 2 threads running on the same physical core - the store would immediately affect all chronologically younger loads.
Now, the most likely thing to happen on a modern out-of-order machine is that all the loads in the pipeline at this point would be different iterations of the same first loop (since any branch predictor facing so many repetitive "taken" resolution is likely to assume the branch will continue being taken, until proven wrong), so what would happen is that the first load to encounter the new value modified by the other thread will cause the matching branch to simply flush the entire pipe from all younger operations, without the 2nd loop ever having a chance to execute.
However, it's possible that for some reason you did get to the 2nd loop (let's say the predictor issue a not-taken prediction just at the right moment when the loop condition check saw the new value) - in this case, the question boils down to this scenario:
Time -->
----------------------------------------------------------------
thread 1
cmp [x],0 execute
je ... execute (not taken)
...
cmp [x],0 execute
jne ... execute (not taken)
Can_We_Get_Here:
...
thread2
store [x],1 execute
In other words, given that most modern CPUs may execute instructions out of order, can a younger load be evaluated before an older one to the same address, allowing the store (from another thread) to change the value so it may be observed inconsistently by the loads.
My guess is that the above timeline is quite possible given the nature of out-of-order execution engines today, as they simply arbitrate and perform whatever operation is ready. However, on most x86 implementations there are safeguards to protect against such a scenario, since the memory ordering rules strictly say -
8.2.3.2 Neither Loads Nor Stores Are Reordered with Like Operations
Such mechanisms may detect this scenario and flush the machine to prevent the stale/wrong values becoming visible. So The answer is - no, it should not be possible, unless of course the software or the compiler change the nature of the code to prevent the hardware from noticing the relation. Then again, memory ordering rules are sometimes flaky, and i'm not sure all x86 manufacturers adhere to the exact same wording, but this is a pretty fundamental example of consistency, so i'd be very surprised if one of them missed it.
The answer seems to be, "this is exactly the job of the CPU cache coherency." x86 processors implement the MESI protocol, which guarantee that the second thread can't see the new value then the old.

Is it ok to have multiple threads writing the same values to the same variables?

I understand about race conditions and how with multiple threads accessing the same variable, updates made by one can be ignored and overwritten by others, but what if each thread is writing the same value (not different values) to the same variable; can even this cause problems? Could this code:
GlobalVar.property = 11;
(assuming that property will never be assigned anything other than 11), cause problems if multiple threads execute it at the same time?
The problem comes when you read that state back, and do something about it. Writing is a red herring - it is true that as long as this is a single word most environments guarantee the write will be atomic, but that doesn't mean that a larger piece of code that includes this fragment is thread-safe. Firstly, presumably your global variable contained a different value to begin with - otherwise if you know it's always the same, why is it a variable? Second, presumably you eventually read this value back again?
The issue is that presumably, you are writing to this bit of shared state for a reason - to signal that something has occurred? This is where it falls down: when you have no locking constructs, there is no implied order of memory accesses at all. It's hard to point to what's wrong here because your example doesn't actually contain the use of the variable, so here's a trivialish example in neutral C-like syntax:
int x = 0, y = 0;
//thread A does:
x = 1;
y = 2;
if (y == 2)
print(x);
//thread B does, at the same time:
if (y == 2)
print(x);
Thread A will always print 1, but it's completely valid for thread B to print 0. The order of operations in thread A is only required to be observable from code executing in thread A - thread B is allowed to see any combination of the state. The writes to x and y may not actually happen in order.
This can happen even on single-processor systems, where most people do not expect this kind of reordering - your compiler may reorder it for you. On SMP even if the compiler doesn't reorder things, the memory writes may be reordered between the caches of the separate processors.
If that doesn't seem to answer it for you, include more detail of your example in the question. Without the use of the variable it's impossible to definitively say whether such a usage is safe or not.
It depends on the work actually done by that statement. There can still be some cases where Something Bad happens - for example, if a C++ class has overloaded the = operator, and does anything nontrivial within that statement.
I have accidentally written code that did something like this with POD types (builtin primitive types), and it worked fine -- however, it's definitely not good practice, and I'm not confident that it's dependable.
Why not just lock the memory around this variable when you use it? In fact, if you somehow "know" this is the only write statement that can occur at some point in your code, why not just use the value 11 directly, instead of writing it to a shared variable?
(edit: I guess it's better to use a constant name instead of the magic number 11 directly in the code, btw.)
If you're using this to figure out when at least one thread has reached this statement, you could use a semaphore that starts at 1, and is decremented by the first thread that hits it.
I would expect the result to be undetermined. As in it would vary from compiler to complier, langauge to language and OS to OS etc. So no, it is not safe
WHy would you want to do this though - adding in a line to obtain a mutex lock is only one or two lines of code (in most languages), and would remove any possibility of problem. If this is going to be two expensive then you need to find an alternate way of solving the problem
In General, this is not considered a safe thing to do unless your system provides for atomic operation (operations that are guaranteed to be executed in a single cycle).
The reason is that while the "C" statement looks simple, often there are a number of underlying assembly operations taking place.
Depending on your OS, there are a few things you could do:
Take a mutual exclusion semaphore (mutex) to protect access
in some OS, you can temporarily disable preemption, which guarantees your thread will not swap out.
Some OS provide a writer or reader semaphore which is more performant than a plain old mutex.
Here's my take on the question.
You have two or more threads running that write to a variable...like a status flag or something, where you only want to know if one or more of them was true. Then in another part of the code (after the threads complete) you want to check and see if at least on thread set that status... for example
bool flag = false
threadContainer tc
threadInputs inputs
check(input)
{
...do stuff to input
if(success)
flag = true
}
start multiple threads
foreach(i in inputs)
t = startthread(check, i)
tc.add(t) // Keep track of all the threads started
foreach(t in tc)
t.join( ) // Wait until each thread is done
if(flag)
print "One of the threads were successful"
else
print "None of the threads were successful"
I believe the above code would be OK, assuming you're fine with not knowing which thread set the status to true, and you can wait for all the multi-threaded stuff to finish before reading that flag. I could be wrong though.
If the operation is atomic, you should be able to get by just fine. But I wouldn't do that in practice. It is better just to acquire a lock on the object and write the value.
Assuming that property will never be assigned anything other than 11, then I don't see a reason for assigment in the first place. Just make it a constant then.
Assigment only makes sense when you intend to change the value unless the act of assigment itself has other side effects - like volatile writes have memory visibility side-effects in Java. And if you change state shared between multiple threads, then you need to synchronize or otherwise "handle" the problem of concurrency.
When you assign a value, without proper synchronization, to some state shared between multiple threads, then there's no guarantees for when the other threads will see that change. And no visibility guarantees means that it it possible that the other threads will never see the assignt.
Compilers, JITs, CPU caches. They're all trying to make your code run as fast as possible, and if you don't make any explicit requirements for memory visibility, then they will take advantage of that. If not on your machine, then somebody elses.

Resources