How does Erlang sleep (at night?) - multithreading

I want to run a small clean up process every few hours on an Erlang server.
I know of the timer module. I saw an example in a tutorial used chained timer:sleep commands to wait for an event that would occur multiple days later, which I found strange. I understand that Erlang process are unique compared to those in other languages, but the idea of a process/thread sleeping for days, weeks, and even months at a time seemed odd.
So I set out to find out the details of what sleeping actually does. The closest I found was a blog post mentioning that sleep is implemented with a receive timeout, but that still left the question:
What do these sleep/sleep-like functions actually do?
Is my process taking up resources as it sleeps? Would having thousands of sleeping process use as many resources, as say, thousands of process servicing a recursive call that did nothing? Is there any performance penalty from repeatedly sleeping within processes, or sleeping for long periods of time? Is the VM constantly expending resources to see if the conditions to end the processes' sleep are up?
And as a side note, I'd appreciate if someone could comment on if there is a better way than sleeping to pause for hours or days at a time?

That is the Karma of any erlang process: it waits or dies :o)
when a process is spawned, it start executing until the last execution line, and die, returning the last evaluation.
To keep a process alive, there is no other solution to recursively loop in a never ending succession of calls.
of course there are several conditions that make it stop or sleep:
end of the loop: the process received a message which tell him to
stop recursion
a receive bloc: the process will wait until a message
matching one entry in the receive bloc is posted in the message
queue.
The VM scheduler stop it temporarily to let access to the CPU
to other processes
in the 2 last cases the execution will restart under the responsibility of the VM scheduler.
while waiting it uses no CPU bandwidth, but keeps the exact same memory layout it had when it started waiting. The Erlang OTP offers some means to reduce this memory layout to the minimum using the hibernate option (see the documentation of gen_serevr or gen_fsm, but it is for advanced usage only in my mind).
a simple way to create a "signal" that will fire a process at regular (or almost regular) interval is effectively to use receive block with timout (The timeout is limited to 65535 ms), for example:
on_tick_sec(Module,Function,Arglist,Period) ->
on_tick(Module,Function,Arglist,1000,Period,0).
on_tick_mn(Module,Function,Arglist,Period) ->
on_tick(Module,Function,Arglist,60000,Period,0).
on_tick_hr(Module,Function,Arglist,Period) ->
on_tick(Module,Function,Arglist,60000,Period*60,0).
on_tick(Module,Function,Arglist,TimeBase,Period,Period) ->
apply(Module,Function,Arglist),
on_tick(Module,Function,Arglist,TimeBase,Period,0);
on_tick(Module,Function,Arglist,TimeBase,Period,CountTimeBase) ->
receive
stop -> stopped
after TimeBase ->
on_tick(Module,Function,Arglist,TimeBase,Period,CountTimeBase+1)
end.
and usage:
1> Pid = spawn(util,on_tick_sec,[io,format,["hello~n"],5]).
<0.40.0>
hello
hello
hello
hello
2> Pid ! stop.
stop
3>
[edit]
The timer module is a standard gen_server running in a separate process. All the function in the timer module are public interfaces that execute a hidden gen_server:call or gen_server:cast to the timer server. This is a common usage to hide the internal of a server and allow further evolutions without impact on existing applications.
The server uses internally a table (ets) to store all the actions it has to do along with each timer reference and it uses its own function to be awaken when needed (at the end, the VM must take care of this ?).
So you can hibernate a process without any effect on the timer server behavior. The hibernation mechanism is
tricky, see documentation at hibernate/3 definition, you will see that yo have to "rebuild" the context by yourself since everything was removed from the process context, and a tuple(Module,Function,Arguments} is stored by the system to restart your process when needed.
cost some time in garbage collecting and process restart
It is why I said that it is really an advance feature that need good reason to be used.

There is also erlang:hibernate/3 that puts a process in "deep sleep", minimizing memory usage for it.

Related

How can I monitor stalled tasks?

I am running a Rust app with Tokio in prod. In the last version i had a bug, and some requests caused my code to go into an infinite loop.
What happened is while the task that got into the loop was stuck, all the other task continue to work well and processing requests, that happened until the number of stalling tasks was high enough to cause my program to be unresponsive.
My problem is took a lot of time to our monitoring systems to identify that something go wrong. For example, the task that answer to Kubernetes' health check works well and I wasn't able to identify that I have stalled tasks in my system.
So my question is if there's a way to identify and alert in such cases?
If i could find way to define timeout on task, and if it's not return to the scheduler after X seconds/millis to mark the task as stalled, that will be a good enough solution for me.
Using tracing might be an option here: following issue 2655 every tokio task should have a span. Alongside tracing-futures this means you should get a tracing event every time a task is entered or suspended (see this example), by adding the relevant data (e.g. task id / request id / ...) you should then be able to feed this information to an analysis tool in order to know:
that a task is blocked (was resumed then never suspended again)
if you add your own spans, that a "userland" span was never exited / closed, which might mean it's stuck in a non-blocking loop (which is also an issue though somewhat less so)
I think that's about the extent of it: as noted by issue 2510, tokio doesn't yet use the tracing information it generates and so provide no "built-in" introspection facilities.

How to call a function every n milliseconds in "real world" time exactly?

If I understand correctly, setInterval(() => console.log('hello world'), 1000) will place the function to some queue of tasks to run. But if there are other tasks in-front of it, it won't run exactly at 1000 millisecond or every time.
In a single complex program, is it possible to also make calls to some function every n millisecond exactly in real world time with node.js ?
If I understand correctly, setInterval(() => console.log('hello world'), 1000) will place the function to some queue of tasks to run. But if there are other tasks in-front of it, it won't run exactly at 1000 millisecond or every time.
That is correct. It won't run exactly at the desired time if node.js happens to be busy doing something else when the timer is ready to run. node.js will wait until it finishes it's other task before running the timer callback. You can think of node.js as if it has a one-track mind (can only do one thing at a time) and timers don't ever interrupt existing tasks that are running.
In a single complex program, is it possible to also make calls to some function every n millisecond exactly in real world time with node.js ?
No, it is not possible to do that in node.js. node.js runs your Javascript as single-threaded, it's event driven and not-preemptive. All of these mean that you cannot rely on code running at a precise real-world time.
What happens under the covers in node.js is that you set a timer for a specific time in the future. That timer goes is registered with the node.js event loop so that each time it gets through the event loop, it will check if there are any pending timers. But, it only gets through the event loop when other code that was running before the timer was ready to fire finishes running. Here's the sequence of events:
Run some code
Set timer for some time in the future (say time X)
Run some more code
Nothing to do for awhile
Run some more code (while this code is running, time X passes - the time for your timer to run)
Previous block of code finishes running and control returns back to the node.js event loop at time X + n (some time after the timer X was supposed to fire).
Event loop checks to see if there are any pending timers. It finds a timer and calls its callback at time X + n.
So, the only way that your timer gets called at approximately time X is if node.js has nothing else to do at exactly time X. If your program is ever doing anything else, you can't guarantee that your program will be free at exactly time X to run the timer exactly when you want it to run. node.js is NOT a real-time system in any way. single-threaded and non-pre-emptive mean that a timer may have to wait for node.js to finish some other things before it gets to run and thus there is no guarantee that the timer will run exactly on time. Instead, it will run as not before time X when the interpreter is next free to return back to the event loop (done running whatever else might have been running at the time). This could be close to time X or it could be a significant time after time X.
If you really need something to run precisely at a specific time, then you likely need a pre-emptive system (not node.js) that is much more real-time than node.js is.
You could create a "work-around" in node.js by firing up another node.js process (you could use the child_process module) and start a program in that other process that has nothing else to do except serve your timer and execute the code associated with that timer. Then, at least you timer won't be pre-empted by some other Javascript task that might be running and will get to run pretty close to the desired time. Keep in mind that even this work-around still isn't a true real-time system, but might serve some purposes.
Otherwise, you probably want to write this in a more real-time system language that has pre-emptive timers (probably even with thread priorities).
But if there are other tasks in-front of it, it won't run exactly at 1000 millisecond or every time.
Your question is actually operating system specific, assuming the computer is running some (usual) operating system (like Windows, Android, Linux, MacOSX, etc...). I recommend reading Operating Systems: Three Easy Pieces to learn more.
In practice, your computer has many other processes managed by its operating system. Some of them might be running. Your computer might be in a situation where it is loaded enough by other processes to the point of not being able to run your tasks or threads exactly every second. Read about thrashing.
You might want to use some genuine real-time operating system. But then, node.js probably won't run on it.
How to call a function every n milliseconds in “real world” time exactly?
You cannot do that reliably. Because your node.js process (it is actually single threaded, at the system threads level, see pthreads(7) and jfriend00's answer) might not get enough resources from your OS (so if other processes are loading your computer too much, node.js would be starved and won't be able to progress like you want; be also aware of possible priority inversions).
On Linux, see also shed(7), chrt(1), renice(1)
I suggest to make a cron which will run at every n seconds. If your program is complex and it may take more time then you can go with async.
npm install cron
var CronJob = require('cron').CronJob;
new CronJob('* * * * * *', function() {
console.log('You will see this message every second');
callYourFunc();
}, null, true, 'America/Los_Angeles');
For more read this link
Perhaps you could spawn a worker thread and block it while it’s waiting to do the work, in the way suggested by CertainPerformance in the comments. It may not be the most elegant way to do it but at least you can put the blocking logic aside so that it doesn’t affect the rest of the application.
Check out the example in the docs if you’re unfamiliar with the cluster module: https://nodejs.org/docs/latest-v10.x/api/cluster.html

Can process move from Ready Queue to Job Queue?

I'm working on a program that would simulate scheduling from creation to completion of processes. I need assistance to know can a process move back from ready queue to the job queue (in any case - may be an exception).
I'm not sure what you mean by "queue job". A process is either :
running (in that case no need to do anything)
sleeping, that means that the process is waiting for an input or an output. You can't force it to “wake up”. It'll wake up when the input or output operation it wants to make is possible.
stopped, that means that the process is currently suspended. There four different kind for it.
SIGTSTP, (most of the time triggered by CTRL + Z. Can be unstopped with fg command
SIGSTOP, meaning it has been roughly stopped. Can't do many things about that one.
SIGTTIN and SIGTTOU, but I don't have the knowledge for those two.
So you can dive in fg command that might helps you.
NB : sorry for bad english.

nodejs prioritise function execution

I've edited a library (ddp-client) to make use of a heartbeat timer, which sends out a ping every X seconds. However, I'm also doing some work with the bluetooth hardware, which I believe is responsible for pings sometimes not returning in time (because the bluetooth seems to block the event loop temporarily). Is there a way to prioritise a certain function on the event loop, so it will always be executed before others? I don't think setImmediate would be suitable here, since I don't know exactly when the response message from the server would arrive.
The implementation of the timer is roughly as follows:
every X seconds
if(ping outstanding) {
//Did not resolve in time
closeConnection()
} else {
ping outstanding = true
sendPing()
}
This works perfectly fine if I run it without the bluetooth module. When I enable the bluetooth module, pings sometimes do not get resolved because the time taken to scan for bluetooth is sometimes longer than the interval of the timer, leading to a disconnect, while it's actually still connected.
Is there a way to prioritise a certain function on the event loop, so it will always be executed before others?
No. node.js does not have a way for one piece of code to pre-empt another and always have priority. Any code that "hogs" the CPU a bit or otherwise blocks the event loop a bit should either be fixed to not do that or it can be moved into it's own child process and you can communicate with it via any one of the many interprocess communication schemes.
Or, alternatively, if the ping timer is really, really important to run on time, then maybe it should be in its own child process where it can always just run as scheduled with no chance of something else interrupting it.
Implementing precise timers like this is one thing that node.js is just not good at. Because it runs all your Javascript in a single thread, keeping a server instantly responsive or keeping timers running precisely on time requires that nobody ever blocks the event loop or hogs the CPU for longer than your timing threshold. The usual work-around is to move things into their own child process where they get their own priority with the CPU.

Single-Threaded Windows Service Delaying OnStop

I have a Windows Service (C# 4.0) that picks messages off of a private message queue and for each message sends one or more emails (typically 4 or 5 at most) based on message content.
Message volume is low so I have avoided complexity and left the service sinlge-threaded, but the emails are important so I need to ensure that on an SCM Stop Command any in-process messages/emails are processed/sent before the Stop completes.
In OnStop I am chekcing a static "inProcess" flag representing status and if it is set I am calling ServiceBase.RequestAdditionalTime(120000).
There are 2 problems:
The Stop Command completes immediately with some e-mail unsent, despite the request for 2 minutes.
Even if it worked I am only guessing at how long I should wait.
What is the best way to handle this in a single-threaded service?
Thanks for your help!
Greg
To fully answer, we'd need to see the structure of your message processing loop. But one thing I'm thinking is that the ServiceBase.RequestAdditionalTime() method is used to keep the SCM from complaining if a stop command (or pause, continue, start) takes too long, it doesn't mean your service will wait two minutes before stopping.
Thus, the only thing it truly does is keep the SCM from erroring out on a stop request, if you have a slow stop process.
See MSDN here: RequestAdditionalTime() method
What I'm wondering is if you get called in OnStop() and you set some complete flag, and the processing loop immediately exits when it sees this flag?
If you could post your code it would help me refine this answer, but from the question I wonder if you are expecting the call to wait for 2 minutes to let it process more, but you are setting something to tell the processing loop to stop. If this is not the case I can refine the answer further.
As for how long you should wait, that depends on how critical the emails are and how many are likely to be in the queue, and if they are persisted anywhere so that restarting the service would pick up where they left off.

Resources