Transient gen_server processes and updating pids - multithreading

I'm currently learning Erlang at a reasonable clip but have a question about gen_server with supervisors. If a gen_server process crashes and is consequentially restarted by a supervisor, it receives a new pid. Now, what if I want other processes to refer to that process by Pid? What are some good idiomatic ways to 'update' the Pid in those processes?
As an exercise with some practical application, I'm writing a lock server where a client can request a lock with an arbitrary key. I ideally would like to have a separate processes handle the locking and releasing of a particular lock, the idea being that I can use the timeout argument in gen_server to terminate the process if no one has requested it after N amount time, so that only currently relevant locks will stay in memory. Now, I have a directory process which maps the lock name to the lock process. When the lock process terminates, it deletes the lock from the directory.
My concern is how to handle the case where a client requests a lock while the lock process is in the middle of terminating. It hasn't shutdown yet, so sniffing that the pid is alive won't work. The lock process hasn't reached the clause that deletes it from the directory yet.
Is there a better way to handle this?
EDIT
There are two gen_servers currently: the 'directory' which maintains an ETS table from LockName -> Lock Process, and the 'lock servers' which are added dynamically to the supervision tree using start_child. Ideally I would like each lock server to handle talking with the clients directly, but am worried about the scenario of a request to acquire/release getting issued with call or cast when the process is in the middle of crashing (and thus won't respond to the message).
Starting with {local} or {global} won't work since there can be N amount of them.

The trick is to name the process and don't refer to it by its pid. You generally have 3 viable options,
Use registered names. This is what andreypopp suggests. You refer to the server by its registered name. locally registered names have to be atoms, which may somewhat limit you. globally registered names do not have this limitation, you can register any term.
The Supervisor knows the Pid. Ask it. You will have to pass the Supervisor Pid to the process.
Alternatively, use the gproc application (exists on http://github.com). It allows you to create a generic process registry - you could have done that by ETS, but steal good code rather than implement yourself.
The pid is usable if all processes are part of the same supervision tree. So the death of one of them means the death of the others. Thus, the Pids recycling doesn't matter.

Don't refer to gen_server process by pid.
You should provide API for your gen_server via gen_server:call/2 or gen_server:call/3 functions. They are accept ServerRef as first argument, which can be Name | {Name,Node} | {global,GlobalName} | pid(). So, you API would look like:
lock(Key) ->
gen_server:call(?MODULE, {lock, Key}).
release(Key) ->
gen_server:call(?MODULE, {release, Key}).
Note that this API is defined in the same module as your gen_server and I assume you starting you server with something like:
gen_server:start_link({local, ?MODULE}, ?MODULE, [], [])
So your API methods can lookup server not by pid, but by server name, which is equal to ?MODULE.
For more information, please see gen_server documentation.

You can completely avoid the use of your "lock_server" process by using the "erlang:monitor/demonitor" API.
When a client requests a lock, you issue the lock.. and do a erlang:monitor on the client.. This will return you a Monitor Reference.. You can then store this Reference along with your lock.. The beauty of this is that your directory server WILL be notified when the client dies.. you could implement the TIMEOUT thing in the client.
Here is a snippet from code I had written recently..
https://github.com/xslogic/phoebus/blob/master/src/table_manager.erl
Basically, the table_manager is a process that issues lock on a particular table resource to client.. if the client dies, the table is returned to the pool..

Related

Share data between node child processes

I have a small node script that gets ran in several node child processes, and I do not have access to the parent process.
The goal of the script is pretty simple, to return one element at random from an array. However, the element returned cannot be used by any of the other child processes.
The only solutions I can think of involve using redis or a database, and because this is such a tiny script, I would like to avoid that.
Here is an example of what I would like my code to look like:
var accounts = [acc1, acc2, acc3]
function() {
var usedAccounts = sharedStore.get('usedAccounts')
var unusedAccounts = filter(accounts, usedAccounts)
var account = getRandomAccount(unusedAccounts)
usedAccounts.push(account)
sharedStore.set('usedAccounts', usedAccounts)
return account
}
So far, the solutions I've thought of don't work because the sibling processes initially all get an empty list assigned to usedAccounts.
There are two problems you need to solve:
How to share data between multiple node processes without using the parent process to marshal data between them.
How to ensure that data is consistent across all the shared processes.
How to share data between multiple node processes.
Given your constraints with not wanting to use an external service (like Redis or another database service), and that nodejs doesn't have an easy way to use something like shared memory, a possible solution is to use a shared file between all the processes. Each process can read and write to a shared file, and use that that to get it's userAccount data.
The file could be JSON formatted and look something like this:
[
{
"accountName":"bob",
"accountUsed":false
},
{
"accountName":"alice",
"accountUsed":true
}
]
This would just be an array of userAccount objects, that also have a flag that indicate if the data is being read.
You app would:
GetAccountData():
Open the file
Read the file into memory
Iterate over the array
Find the first userAccount that is available
Set the accountUsed flag to true
Write the updated array back to the file
Close the file.
With having multiple processes reading and writing to a single resource is a well understood problem with concurrency called the Readers-Writers Problem.
How to ensure that data is consistent across all the shared processes.
To ensure data is consistent, you need to ensure that only one process can run the algorithm from above from start to finish at a time.
Operating Systems may provide exclusive locking of a file, but I nodejs has no native support for that.
A common mechanism would be to use a lockfile, and use it's existence to guard access to the datafile above. If it can't acquire a lock, it should wait a period of time, then attempt to reacquire the lock.
To acquire the lock:
Check if the lockfile exists.
If lockfile exists
Set a timer (setInterval) to acquire the lock
If the lockfile doesn't exist
Create the lockfile
If the lockfile creation fails (because it exists--race condition with another process)
Set a timer (setInterval) to acquire the lock
If the lockfile creation succeeds
Do GetAccountData();
Remove lockfile
This solution should work, but it's not without kludge. Using a synchronizing primative like a lock can cause your application to deadlock. Also using a timer to periodically acquire the lock is wasteful and can cause a race condition if not properly checking lock creation.
If your app crashes before it removes the lockfile, then you may create a deadlock situation. To guard against that, you might want to put a final unhandled exception handler to remove the lockfile if it was created by the process.
You will need to also make sure you only hold the lock long enough to do your serial work. Holding the lock for longer, will cause performance issues, and increase the likelihood of a deadlock.
I rather let each process have its own flat file that they can write. And each process will be able to read all the files written by all the processes concurrently or otherwise thus obviating need of lock-file.
Though you will have to figure out the logic as to how each process will write only its own file but reading all these files together brings out the source of truth

How does Erlang sleep (at night?)

I want to run a small clean up process every few hours on an Erlang server.
I know of the timer module. I saw an example in a tutorial used chained timer:sleep commands to wait for an event that would occur multiple days later, which I found strange. I understand that Erlang process are unique compared to those in other languages, but the idea of a process/thread sleeping for days, weeks, and even months at a time seemed odd.
So I set out to find out the details of what sleeping actually does. The closest I found was a blog post mentioning that sleep is implemented with a receive timeout, but that still left the question:
What do these sleep/sleep-like functions actually do?
Is my process taking up resources as it sleeps? Would having thousands of sleeping process use as many resources, as say, thousands of process servicing a recursive call that did nothing? Is there any performance penalty from repeatedly sleeping within processes, or sleeping for long periods of time? Is the VM constantly expending resources to see if the conditions to end the processes' sleep are up?
And as a side note, I'd appreciate if someone could comment on if there is a better way than sleeping to pause for hours or days at a time?
That is the Karma of any erlang process: it waits or dies :o)
when a process is spawned, it start executing until the last execution line, and die, returning the last evaluation.
To keep a process alive, there is no other solution to recursively loop in a never ending succession of calls.
of course there are several conditions that make it stop or sleep:
end of the loop: the process received a message which tell him to
stop recursion
a receive bloc: the process will wait until a message
matching one entry in the receive bloc is posted in the message
queue.
The VM scheduler stop it temporarily to let access to the CPU
to other processes
in the 2 last cases the execution will restart under the responsibility of the VM scheduler.
while waiting it uses no CPU bandwidth, but keeps the exact same memory layout it had when it started waiting. The Erlang OTP offers some means to reduce this memory layout to the minimum using the hibernate option (see the documentation of gen_serevr or gen_fsm, but it is for advanced usage only in my mind).
a simple way to create a "signal" that will fire a process at regular (or almost regular) interval is effectively to use receive block with timout (The timeout is limited to 65535 ms), for example:
on_tick_sec(Module,Function,Arglist,Period) ->
on_tick(Module,Function,Arglist,1000,Period,0).
on_tick_mn(Module,Function,Arglist,Period) ->
on_tick(Module,Function,Arglist,60000,Period,0).
on_tick_hr(Module,Function,Arglist,Period) ->
on_tick(Module,Function,Arglist,60000,Period*60,0).
on_tick(Module,Function,Arglist,TimeBase,Period,Period) ->
apply(Module,Function,Arglist),
on_tick(Module,Function,Arglist,TimeBase,Period,0);
on_tick(Module,Function,Arglist,TimeBase,Period,CountTimeBase) ->
receive
stop -> stopped
after TimeBase ->
on_tick(Module,Function,Arglist,TimeBase,Period,CountTimeBase+1)
end.
and usage:
1> Pid = spawn(util,on_tick_sec,[io,format,["hello~n"],5]).
<0.40.0>
hello
hello
hello
hello
2> Pid ! stop.
stop
3>
[edit]
The timer module is a standard gen_server running in a separate process. All the function in the timer module are public interfaces that execute a hidden gen_server:call or gen_server:cast to the timer server. This is a common usage to hide the internal of a server and allow further evolutions without impact on existing applications.
The server uses internally a table (ets) to store all the actions it has to do along with each timer reference and it uses its own function to be awaken when needed (at the end, the VM must take care of this ?).
So you can hibernate a process without any effect on the timer server behavior. The hibernation mechanism is
tricky, see documentation at hibernate/3 definition, you will see that yo have to "rebuild" the context by yourself since everything was removed from the process context, and a tuple(Module,Function,Arguments} is stored by the system to restart your process when needed.
cost some time in garbage collecting and process restart
It is why I said that it is really an advance feature that need good reason to be used.
There is also erlang:hibernate/3 that puts a process in "deep sleep", minimizing memory usage for it.

cross-process locking in linux

I am looking to make an application in Linux, where only one instance of the application can run at a time. I want to make it robust, such that if an instance of the app crashes, that it won't block all the other instances indefinitely. I would really appreciate some example code on how to do this (as there's lots of discussion on this topic on the web, but I couldn't find anything which worked when I tried it).
You can use file locking facilities that Linux provides. You haven't specified the language, however you might find this capability pretty much everywhere in some form or another.
Here is a simple idea how to do that in a C program. When the program starts you can take an exclusive non-blocking lock on the whole file using fcntl system call. When another instance of the applications is attempted to be started, it will get an error trying to lock the file, which will mean the application is already running.
Here is a small example how to take the full file lock using fcntl (this function provides facilities for putting byte range locks, but when length is 0, the full file is locked).
struct flock lock_struct;
memset(&lock_struct, 0, sizeof(lock_struct));
lock_struct.l_type = F_WRLCK;
lock_struct.l_whence = SEEK_SET;
lock_struct.l_pid = getpid();
ret = fcntl(fd, F_SETLK, &lock_struct);
Please note that you need to open a file first to put a lock. This means you need to have a file around to use for locking. It might be useful to put the it somewhere where it won't cause any distraction/confusion for other applications.
When the process terminates, all locks that it has taken will be released, so nothing will be blocked.
This is just one of the ideas. I'm pretty sure there are other ways around.
The conventional UNIX way of doing this is with PID files.
Before a process starts, it checks to see if a pre-determined file - usually /var/run/<process_name>.pid exists. If found, its an indication that a process is already running and this process quits.
If the file does not exist, this is the first process to run. It creates the file /var/run/<process_name>.pid and writes its PID into it. The process unlinks the file on exit.
Update:
To handle cases where a daemon has crashed & left behind the pid file, additional checks can be made during startup if a pid file was found:
Do a ps and ensure that a process with that PID doesn't exist
If it exists ensure that its a different process
from the said ps output
from /proc/$PID/stat

"How many links do I have?", asks an Erlang process

A process in Erlang will either call link/1 or spawn_link to create a link with another process. In a recent application i am working on i got curious on whether its possible for a process to know at a given instance, the number of other processes its linked to. is this possible ? is their a BIF ?
Then, also, when a linked process dies, i guess that if it were possible to know the number of linked processes, this number would be decremented automatically by the run-time system. Such a mechanism would be ideal in dealing with Parent-Child relationships in Erlang concurrent programs, even in simple ones which do not involve supervisors.
Well, is it possible for an Erlang process to know out-of-the-box perhaps via a BIF, the number of processes linked to it, such that whenever a linked process dies, this value is decremented automatically under-the-hood :)?
To expand on this question a little bit, consider a gen_server, which will handle thousands of messages via handle_info. In this part, its job is to dispatch child processes to handle the task as soon as it comes in. The aim of this is to make sure the server loop returns immediately to take up the next request. Now, the child process handles the task asynchronously and sends the reply back to the requestor before it dies. Please refer to this question and its answer before you continue. Now, what if, for every child process spawned off by the gen_server, a link is created, and i would like to use this link as a counter. I know, i know, everyone is going to be like " why not use the gen_server State, to carry say, a counter, and then increment or decrement it accordingly ? " :) Somewhere in the gen_server, i have:
handle_info({Sender,Task},State)->
spawn_link(?MODULE,child,[Sender,Task]),
%% At this point, the number of links to the gen_server is incremented
%% by the run-time system
{noreply,State};
handle_info( _ ,State) -> {noreply,State}.
The child goes on to do this:
child(Sender,Task)->
Result = (catch execute_task(Task)),
Sender ! Result,
ok. %% At this point the child process exits,
%% and i expect the link value to be decremented
Then finally, the gen_server has an exposed call like this:
get_no_of_links()-> gen_server:call(?MODULE,links).
handle_call(links, _ ,State)->
%% BIF to get number of instantaneous links expected here
Links = erlang:get_links(), %% This is fake, do not do it at home :)
{reply,Links,State};
handle_call(_ , _ ,State)-> {reply,ok,State}.
Now, some one may ask them selves, really, Why would anyone want to do this ?
Usually, its possible to create an integer in the gen_server State and then we do it ourselves, or at least make the gen_server handle_info of type {'EXIT',ChildPid,_Reason} and then the server would act accordingly. My thinking is that if it were possible to know the number of links, i would use this to know ( at a given moment in time), how many child processes are still busy working, this in turn may actually assist in anticipating server load.
From manual for process_info:
{links, Pids}:
Pids is a list of pids, with processes to which the process
has a link
3> process_info(self(), links).
{links,[<0.26.0>]}
4> spawn_link(fun() -> timer:sleep(100000) end).
<0.38.0>
5> process_info(self(), links).
{links,[<0.26.0>,<0.38.0>]}
I guess it could be used to count number of linked processes
Your process should run process_flag(trap_exit, true) and listen for messages of the form {'EXIT', Pid, Reason} which will arrive whenever a linked process exits. If you don't trap exits, the default behaviour will be for your linked process to exit when the other side of the link exits.
As for listening to when processes add links, you can use case process_info(self(), links) of {links, L} -> length(L) end or length(element(2, process_info(self(), links)), but you have to re-run this regularly as there is no way for your process to be notified whenever a link is added.
A process following OTP guidelines never needs to know how many processes are linked to it.

attach preempt_notifier to user process in linux

I am needing to identify whether a user process was ever preempted somehow, I understand we have hooks in preempt.h and sched.c which allow us to define preempt_notifiers which can in turn call sched_in and sched_out functions whenever a process is rescheduled or preempted.
But I still can't find out how can I attach a notifier to a particular process or pid in user space and then somehow log if this particular process was ever pre-empted. I'm assuming I have to write a module to do so, but how would I go about attaching a pid to a particular notifier?
The notifier is inherently per-process. When you register it, you are registering it for the current process. See the code in preempt_notifier_register(), it attaches the notifer to current->preempt_notifiers.
The pseudo-file /proc/<pid>/status contains a line nonvoluntary_ctxt_switches: which seems to be the information that you're after.

Resources