How many functions can I declare in phaser game update loop? - phaser-framework

Is there any limit on how many functions I can declare in the Phaser game update loop & does the performance decrease if there are a lot of functions in the update loop?

Declaring and Calling Functions
There's a difference between declaring a function
function foo(n) {
return n + 1;
}
and calling a function:
var bar = foo(3);
If you really mean declare, you can indeed declare functions within update, since JavaScript supports nesting and closures:
function update() {
function updateSomeThings() {
...
}
function updateSomeOtherThings() {
....
}
}
This has negligible performance impact, since this snippet doesn't actually call any of these functions. If however later in update you called them:
updateSomeThings();
updateSomeOtherThings();
then yes there is a cost.
Note: You don't have to declare functions within update itself to call them! You can call functions declared elsewhere, as long as they're in scope. It's worth looking at a JavaScript guide if this is too confusing.
The Cost of Function Calls
Every function you call takes time to execute. The time it takes depends on how complex the function is (how much work it does), and the it may call other functions which also take time to execute. This may be obvious, but the function's total execution time is the sum total of the execution time of all the code within that function, including the time taken by any functions it calls (and that they call, and so on).
Frame Rate
Phaser by default will aim to run at 60 frames per second, which is pretty standard for games. This means it will try to update and draw your game 60 times every second. Phaser does other things apart from calling your update function each time, not least of which is drawing your game, but it also has other housekeeping to do. Depending on the game, the bulk of your frame time may end up being taken up by either updates or drawing.
You certainly want to take less than 1/60th of a second (approx. 16 milliseconds) to complete your update, and that's assuming the game is incredibly quick for Phaser to draw.
Some things you do in Phaser are slower than others. Some developers have been doing this long enough to estimate what is "too slow" to work, but many 2D games will be just fine without taking too much care over optimization (making things run more efficiently in terms of memory used or time taken).
Good and Bad Ideas
Some bad ideas: if you have 50,000 sprites onscreen (though some machines are very powerful especially when Phaser is set to use WebGL), they will often times take far too long to draw even if you never update them. If you have 10,000 sprites bouncing and colliding with each other, collision detection will often times take far too long to update, even though some machines may be able to draw them just fine.
The best advice is to do everything you have to, but nothing you don't. Try to keep your design as simple as possible when getting started. Add complexity via interesting game mechanics, rather than by computationally expensive logic.
If all else fails, sometimes you can split work across multiple updates, or there may be some things you can do every other update or every n updates (which works best if there's different work you can do on the other updates, so you don't just have some updates slower than others).

Related

Executing a function periodically on accurate and precise intervals

I want to implement an accurate and precise countdown timer for my application. I started with the most simple implementation, which was not accurate at all.
loop {
// Code which can take upto 10 ms to finish
...
let interval = std::time::Duration::from_millis(1000);
std::thread::sleep(interval);
}
As the code before the sleep call can take some time to finish, I cannot run the next iteration at the intended interval. Even worse, if the countdown timer is run for 2 minutes, the 10 milliseconds from each iteration add up to 1.2 seconds. So, this version is not very accurate.
I can account for this delay by measuring how much time this code takes to execute.
loop {
let start = std::time::Instant::now();
// Code which can take upto 10 ms to finish
...
let interval = std::time::Duration::from_millis(1000);
std::thread::sleep(interval - start.elapsed());
}
Even though this seems to precise up to milliseconds, I wanted to know if there is a way to implement this which is even more accurate and precise and/or how it is usually done in software.
For precise timing, you basically have to busy wait: while time.elapsed() < interval {}. This is also called "spinning" (you might have heard of "spin lock"). Of course, this is far more CPU intensive than using the OS-provided sleep functionality (which often transitions the CPU in some low power mode).
To improve upon that slightly, instead of doing absolutely nothing in the loop body, you could:
Call thread::yield_now().
Call std::hint::spin_loop()
Unfortunately, I can't really tell you what timing guarantees these two functions give you. But from the documentation it seems like spin_loop will result in more precise timing.
Also, you very likely want to combine the "spin waiting" with std::thread::sleep so that you sleep the majority of the time with the latter method. That saves a lot of power/CPU-resources. And hey, there is even a crate for exactly that: spin_sleep. You should probably just use that.
Finally, just in case you are not aware: for several use cases of these "timings", there are other functions you can use. For example, if you want to render a frame every 60th of a second, you want to use some API that synchronizes your loop with the refresh rate/v-blanking of the monitor directly, instead of manually sleeping.

Lockless game engine with complete seperation of update and render

I apologize up front for this long post, but as you can probably see I have been thinking about this for quite some time, and I feel I need some input from other people before my head explodes :-)
I have been experimenting for some time now with various ways of building a game engine which satifies all the following criteria:
Complete seperation of object updating and object rendering
Full determinism
Updating and rendering at individual speeds
No blocking on shared resources
Complete seperation of object updating and object rendering
Seperation of object updating and object rendering seems to be vital to ensure optimal usage of resources while sending data to the graphics API and swapping buffers.
Even if you want to ensure full parallelism to use multiple cores of a CPU it seems that this seperation must still be managed.
Full determinism
Many game types, and especially multiplayer versions, must ensure full determinism. Otherwise players will experience different states of the same game effectively breaking the game logic. Determinism is required for game replays as well. And it is useful for other purposes where it is important that each run of a simulation produces the same result every time given the same starting conditions and inputs.
Updating and rendering at individual speeds
This is really a prerequisite for full determinism as you cannot have the simulation depend on rendering speeds (ie the various monitor refresh rates, graphics adapter speed etc.). During optimal conditions the update speed should be set at a certain fixed interval (eg. 25 updates per second - maybe less depending on the update type), and the rendering speed should be whatever the client's monitor refresh rate / graphics adapter allows.
This implies that rendering speed higher that update speed should be allowed. And while that sounds like a waste there are known tricks to ensure that the added rendering cycles are not wastes (interpolation / extrapolation) which means that faster monitors / adapters would be rewarded with a more visually pleasing experience as they should.
Rendering speeds lower than update speed must also be allowed though, even if this does in fact result in wasted updating cycles - at least the added updating cycles are not all presented to the user. This is however necessary to ensure a smooth multiplayer experience even if the rendering in one of the clients slows to a sudden crawl for one reason or another.
No blocking on shared resources
If the other criterias mentioned above are to be implemented it must also follow that we cannot allow rendering to be waiting for updating or vice versa. Of course it is painfully obvious that when 2 different threads share access to resources and one thread is updating some of these resources then it is impossible to guarantee that blocking will never take place. It is, however, possible to keep this blocking at an absolute minimum - for example when switching pointer references between queue of updated object and a queue of previously rendered objects.
So...
My question to all you skilled people in here is: Am I asking for too much?
I have been reading about ideas of these various topics on many sites. But always it seems that one part or the other is left out from the suggestions I've seen. And maybe the reason is that you cannot have it all without compromise.
I started this seemingly common quest a long time ago when I was putting my thoughts about it in this thread:
Thoughts about rendering loop strategies
Back then my first naive assumption was that it shouldn't matter if updating and reading happened simultaneously since this variations object state was so small that you shouldn't notice if one object was occasionally a step ahead of the other.
Now I am somewhat wiser, but still confused at times.
The most promising and detailed description of a method that would allow for all my wishes to come through was this:
http://blog.slapware.eu/game-engine/programming/multithreaded-renderloop-part1/
A three-state model that will ensure that the renderer can always choose a new queue for rendering without any wait (except perhaps a micro-second while switching pointer-references). At the same time the updater can alway gain access to 2 queues required for building the next state tree (1 queue for creating/updating the next state, and 1 queue for reading the previsous - which can be done even while the renderer reads it as well).
I recently found time to make a sample implementation of this, and it works very well, but for two issues.
One is a minor issue of having to deal with multiple references to all involved objects
The other is more serious (unless I'm just being too needy). And that is the fact that extrapolation - as opposed to intrapolation - is used to maintain a visually pleasing representation of the states given a fast screen refresh rate. While both methods do the job of showing states deviating from the solidly calculated object states, extrapolation seems to me to produce much more visible artifacts when the predictions fail to represent reality. My position seems to be supported by this:
http://gafferongames.com/networked-physics/snapshots-and-interpolation/
And it is not possible to implement interpolation in the three-state design as far as I can tell, since it requires the renderer to have read-access to 2 queues at all times to calculate the intermediate state between two known states.
So I was toying with extending the three-state model suggested on the slapware-blog to utilize interpolation instead of extrapolation - and at the same time try to simplify the multi-reference structur. While it seems to me to be possible, I am wondering if the price is too high. In order to meet all my goals I would need to have
2 queues (or states) exclusively held by the renderer (they could be used by another thread for read-only purposes, but never updated, or switched during rendering
1 queue (or state) with the newest updated state ready to switch over to the renderer, when it is done rendering the current scene
1 queue (or state) with the next frame being built/updated by the updater
1 queue (or state) containing a copy of the frame last built/updated. This is the same state as last sent to the renderer, so this queue/state should be accessible by both the updater for reading the previous state and the renderer for rendering the state.
So that would mean that I should keep at all times 4 copies of render states to be able to keep this design running smoothly, locklessly, deterministically.
I fear that I'm overthinking this. So if any of you have advise to pull me back on the ground, or advises of what can be improved, critique of the design, or perhaps references to good resources explaining how these goals can be achieved, or why this is or isn't a good idea - please hit me with them :-)

Limiting work in progress of parallel operations of a streamed resource

I've found myself recently using the SemaphoreSlim class to limit the work in progress of a parallelisable operation on a (large) streamed resource:
// The below code is an example of the structure of the code, there are some
// omissions around handling of tasks that do not run to completion that should be in production code
SemaphoreSlim semaphore = new SemaphoreSlim(Environment.ProcessorCount * someMagicNumber);
foreach (var result in StreamResults())
{
semaphore.Wait();
var task = DoWorkAsync(result).ContinueWith(t => semaphore.Release());
...
}
This is to avoid bringing too many results into memory and the program being unable to cope (generally evidenced via an OutOfMemoryException). Though the code works and is reasonably performant, it still feels ungainly. Notably the someMagicNumber multiplier, which although tuned via profiling, may not be as optimal as it could be and isn't resilient to changes to the implementation of DoWorkAsync.
In the same way that thread pooling can overcome the obstacle of scheduling many things for execution, I would like something that can overcome the obstacle of scheduling many things to be loaded into memory based on the resources that are available.
Since it is deterministically impossible to decide whether an OutOfMemoryException will occur, I appreciate that what I'm looking for may only be achievable via statistical means or even not at all, but I hope that I'm missing something.
Here I'd say that you're probably overthinking this problem. The consequences for overshooting are rather high (the program crashes). The consequences for being too low are that the program might be slowed down. As long as you still have some buffer beyond a minimum value, further increases to the buffer will generally have little to no effect, unless the processing time of that task in the pipe is extraordinary volatile.
If your buffer is constantly filling up it generally means that the task before it in the pipe executes quite a bit quicker than the task that follows it, so even without a fairly small buffer it is likely to always ensure the task following it has some work. The buffer size needed to get 90% of the benefits of a buffer is usually going to be quite small (a few dozen items maybe) whereas the side needed to get an OOM error are like 6+ orders of magnate higher. As long as you're somewhere in-between those two numbers (and that's a pretty big range to land in) you'll be just fine.
Just run your static tests, pick a static number, maybe add a few percent extra for "just in case" and you should be good. At most, I'd move some of the magic numbers to a config file so that they can be altered without a recompile in the event that the input data or the machine specs change radically.

What are the benefits of coroutines?

I've been learning some lua for game development. I heard about coroutines in other languages but really came up on them in lua. I just don't really understand how useful they are, I heard a lot of talk how it can be a way to do multi-threaded things but aren't they run in order? So what benefit would there be from normal functions that also run in order? I'm just not getting how different they are from functions except that they can pause and let another run for a second. Seems like the use case scenarios wouldn't be that huge to me.
Anyone care to shed some light as to why someone would benefit from them?
Especially insight from a game programming perspective would be nice^^
OK, think in terms of game development.
Let's say you're doing a cutscene or perhaps a tutorial. Either way, what you have are an ordered sequence of commands sent to some number of entities. An entity moves to a location, talks to a guy, then walks elsewhere. And so forth. Some commands cannot start until others have finished.
Now look back at how your game works. Every frame, it must process AI, collision tests, animation, rendering, and sound, among possibly other things. You can only think every frame. So how do you put this kind of code in, where you have to wait for some action to complete before doing the next one?
If you built a system in C++, what you would have is something that ran before the AI. It would have a sequence of commands to process. Some of those commands would be instantaneous, like "tell entity X to go here" or "spawn entity Y here." Others would have to wait, such as "tell entity Z to go here and don't process anymore commands until it has gone here." The command processor would have to be called every frame, and it would have to understand complex conditions like "entity is at location" and so forth.
In Lua, it would look like this:
local entityX = game:GetEntity("entityX");
entityX:GoToLocation(locX);
local entityY = game:SpawnEntity("entityY", locY);
local entityZ = game:GetEntity("entityZ");
entityZ:GoToLocation(locZ);
do
coroutine.yield();
until (entityZ:isAtLocation(locZ));
return;
On the C++ size, you would resume this script once per frame until it is done. Once it returns, you know that the cutscene is over, so you can return control to the user.
Look at how simple that Lua logic is. It does exactly what it says it does. It's clear, obvious, and therefore very difficult to get wrong.
The power of coroutines is in being able to partially accomplish some task, wait for a condition to become true, then move on to the next task.
Coroutines in a game:
Easy to use, Easy to screw up when used in many places.
Just be careful and not use it in many places.
Don't make your Entire AI code dependent on Coroutines.
Coroutines are good for making a quick fix when a state is introduced which did not exist before.
This is exactly what java does. Sleep() and Wait()
Both functions are the best ways to make it impossible to debug your game.
If I were you I would completely avoid any code which has to use a Wait() function like a Coroutine does.
OpenGL API is something you should take note of. It never uses a wait() function but instead uses a clean state machine which knows exactly what state what object is at.
If you use coroutines you end with up so many stateless pieces of code that it most surely will be overwhelming to debug.
Coroutines are good when you are making an application like Text Editor ..bank application .. server ..database etc (not a game).
Bad when you are making a game where anything can happen at any point of time, you need to have states.
So, in my view coroutines are a bad way of programming and a excuse to write small stateless code.
But that's just me.
It's more like a religion. Some people believe in coroutines, some don't. The usecase, the implementation and the environment all together will result into a benefit or not.
Don't trust benchmarks which try to proof that coroutines on a multicore cpu are faster than a loop in a single thread: it would be a shame if it were slower!
If this runs later on some hardware where all cores are always under load, it will turn out to be slower - ups...
So there is no benefit per se.
Sometimes it's convenient to use. But if you end up with tons of coroutines yielding and states that went out of scope you'll curse coroutines. But at least it isn't the coroutines framework, it's still you.
We use them on a project I am working on. The main benefit for us is that sometimes with asynchronous code, there are points where it is important that certain parts are run in order because of some dependencies. If you use coroutines, you can force one process to wait for another process to complete. They aren't the only way to do this, but they can be a lot simpler than some other methods.
I'm just not getting how different they are from functions except that
they can pause and let another run for a second.
That's a pretty important property. I worked on a game engine which used them for timing. For example, we had an engine that ran at 10 ticks a second, and you could WaitTicks(x) to wait x number of ticks, and in the user layer, you could run WaitFrames(x) to wait x frames.
Even professional native concurrency libraries use the same kind of yielding behaviour.
Lots of good examples for game developers. I'll give another in the application extension space. Consider the scenario where the application has an engine that can run a users routines in Lua while doing the core functionality in C. If the user needs to wait for the engine to get to a specific state (e.g. waiting for data to be received), you either have to:
multi-thread the C program to run Lua in a separate thread and add in locking and synchronization methods,
abend the Lua routine and retry from the beginning with a state passed to the function to skip anything, least you rerun some code that should only be run once, or
yield the Lua routine and resume it once the state has been reached in C
The third option is the easiest for me to implement, avoiding the need to handle multi-threading on multiple platforms. It also allows the user's code to run unmodified, appearing as if the function they called took a long time.

Progress bar and multiple threads, decoupling GUI and logic - which design pattern would be the best?

I'm looking for a design pattern that would fit my application design.
My application processes large amounts of data and produces some graphs.
Data processing (fetching from files, CPU intensive calculations) and graph operations (drawing, updating) are done in seperate threads.
Graph can be scrolled - in this case new data portions need to be processed.
Because there can be several series on a graph, multiple threads can be spawned (two threads per serie, one for dataset update and one for graph update).
I don't want to create multiple progress bars. Instead, I'd like to have single progress bar that inform about global progress. At the moment I can think of MVC and Observer/Observable, but it's a little bit blurry :) Maybe somebody could point me in a right direction, thanks.
I once spent the best part of a week trying to make a smooth, non-hiccupy progress bar over a very complex algorithm.
The algorithm had 6 different steps. Each step had timing characteristics that were seriously dependent on A) the underlying data being processed, not just the "amount" of data but also the "type" of data and B) 2 of the steps scaled extremely well with increasing number of cpus, 2 steps ran in 2 threads and 2 steps were effectively single-threaded.
The mix of data effectively had a much larger impact on execution time of each step than number of cores.
The solution that finally cracked it was really quite simple. I made 6 functions that analyzed the data set and tried to predict the actual run-time of each analysis step. The heuristic in each function analyzed both the data sets under analysis and the number of cpus. Based on run-time data from my own 4 core machine, each function basically returned the number of milliseconds it was expected to take, on my machine.
f1(..) + f2(..) + f3(..) + f4(..) + f5(..) + f6(..) = total runtime in milliseconds
Now given this information, you can effectively know what percentage of the total execution time each step is supposed to take. Now if you say step1 is supposed to take 40% of the execution time, you basically need to find out how to emit 40 1% events from that algorithm. Say the for-loop is processing 100,000 items, you could probably do:
for (int i = 0; i < numItems; i++){
if (i % (numItems / percentageOfTotalForThisStep) == 0) emitProgressEvent();
.. do the actual processing ..
}
This algorithm gave us a silky smooth progress bar that performed flawlessly. Your implementation technology can have different forms of scaling and features available in the progress bar, but the basic way of thinking about the problem is the same.
And yes, it did not really matter that the heuristic reference numbers were worked out on my machine - the only real problem is if you want to change the numbers when running on a different machine. But you still know the ratio (which is the only really important thing here), so you can see how your local hardware runs differently from the one I had.
Now the average SO reader may wonder why on earth someone would spend a week making a smooth progress bar. The feature was requested by the head salesman, and I believe he used it in sales meetings to get contracts. Money talks ;)
In situations with threads or asynchronous processes/tasks like this, I find it helpful to have an abstract type or object in the main thread that represents (and ideally encapsulates) each process. So, for each worker thread, there will presumably be an object (let's call it Operation) in the main thread to manage that worker, and obviously there will be some kind of list-like data structure to hold these Operations.
Where applicable, each Operation provides the start/stop methods for its worker, and in some cases - such as yours - numeric properties representing the progress and expected total time or work of that particular Operation's task. The units don't necessarily need to be time-based, if you know you'll be performing 6,230 calculations, you can just think of these properties as calculation counts. Furthermore, each task will need to have some way of updating its owning Operation of its current progress in whatever mechanism is appropriate (callbacks, closures, event dispatching, or whatever mechanism your programming language/threading framework provides).
So while your actual work is being performed off in separate threads, a corresponding Operation object in the "main" thread is continually being updated/notified of its worker's progress. The progress bar can update itself accordingly, mapping the total of the Operations' "expected" times to its total, and the total of the Operations' "progress" times to its current progress, in whatever way makes sense for your progress bar framework.
Obviously there's a ton of other considerations/work that needs be done in actually implementing this, but I hope this gives you the gist of it.
Multiple progress bars aren't such a bad idea, mind you. Or maybe a complex progress bar that shows several threads running (like download manager programs sometimes have). As long as the UI is intuitive, your users will appreciate the extra data.
When I try to answer such design questions I first try to look at similar or analogous problems in other application, and how they're solved. So I would suggest you do some research by considering other applications that display complex progress (like the download manager example) and try to adapt an existing solution to your application.
Sorry I can't offer more specific design, this is just general advice. :)
Stick with Observer/Observable for this kind of thing. Some object observes the various series processing threads and reports status by updating the summary bar.

Resources