Generate background process with CLI to control it - linux

To give context, I'm building an IoT project that requires me to monitor some sensor inputs. These include Temperature, Fluid Flow and Momentary Button switching. The program has to monitor, report and control other output functions based on those inputs but is also managed by a web-based front-end. What I have been trying to do is have a program that runs in the background but can be controlled via shell commands.
My goal is to be able to do the following on a command line (bash):
pi#localhost> monitor start
sensors are now being monitored!
pi#localhost> monitor status
Temp: 43C
Flow: 12L/min
My current solution has been to create two separate programs, one that sits in the background, and the other is just a light-weight CLI. The background process listens to a bi-directional Linux Socket File which the CLI uses to send it commands. It then sends responses back through said socket file for the CLI to then process/display. This has given me many headaches but seemed the better option compared to using network sockets or mapped memory. I just have occasional problems with the socket file access when my program is improperly terminated which then requires me to "clean" the directory by manually deleting the socket file.
I'm also hoping to have the program insure there is only ever one instance of the monitor program running at any given time. I currently achieve this by capturing my pid and saving it to a file which I can look for when my program is starting. If the file exists, I self terminate with error. I really don't like this approach as it just feels too hacky for me.
So my question: Is there a better way to build a background process that can be easily controlled via command line? or is my current solution likely the best available?
Thanks in advance for any suggestions!

Related

Make a process running in background in Linux

I am developing an Linux application using Python3. This application synchronizes the user's file with the cloud. The file are in a specific folder. I want that a process or daemon should run in background and whenever there is a change in that folder, It should start synchronization process.
I have made modules in Python3 for synchronization but I don't know that How to run a process in background which should automatically detect the changes in that folder? This process should always run in background and should be started automatically after boot.
You have actually asked two distinct questions. Both have simple answers and plenty of good resources online, so I'm assuming you simply did not know what to look for.
Running a process in the background is called "daemonization". Search for "writing a daemon in python". This is a standard technique for all Posix based systems.
Monitoring a directory for changes is done through an API set called inotify. This is Linux specific, as each OS has its own solution.

NodeJS - Process Watcher

I was looking if there are any module that was able to start a process, in my case it would be a Binary File on a Unix system. I should be able to start, stop, restart and get the process status and if possible detect in real time when the process state changes (starting up or shutting down), I found some solutions but were not suitable, for example PM2 (they have a programmatically API) but I could not launch the binary file, I belive it only works for Node.js written applications. If someone knows a module or a simple and fast way to achieve this I would be thankfull.

Whats the easiest way to send messages to my linux daemon app?

Whats the easiest way to send messages to my linux daemon app? Is it possible that myapp foo bar can invoke a callback instead of starting a new process? Whats the standard way to communicate with a daemon? Is my most reasonable choice to stick the PID in /var/run and create a named pipe or a socket?
Whats is the standard way apps which run on the command line communicate to its daemon process? I'm assuming its impossible to ask linux to call a callback when I type in myapp foo bar?
Whats is the standard way apps which run on the command line communicate to its daemon process?
There are a number of ways to do this:
Dropboxes and Signals
A variety of locations are being used to store "pid files" containing a daemon's process ID number: /var/run/<app>/<app>.pid, /var/run/<app>.pid (thanks #Adam Katz for edit), /run/<app>/<app>.pid, /run/<app>.pid, (see Askubuntu: Why has /var/run been migrated to /run?).
When the pid of the daemon process is known, a command line program (which is running as a separate process) could communicate with the daemon in these ways:
Writing something into a prearranged place. This could be an ordinary file, a database table, or any convenient spot readable by the server.
Sending a signal to the daemon. The common way to do this locally is with the kill system call, int kill(pid_t pid, int sig);.
Old School Example: The server multiplexing daemon xinetd would reread its configuration file after receiving SIGHUP.
The send-a-signal methodology has been made redundant by the inotify API whereby a process can subscribe to file system events. Using signals is useful when you don't want the daemon to act on every file change, since not every change may leave the file in a valid state, as when modifying a system config file.
FIFO or Pipe
A fifo or pipe is simply a special file that blocks processes reading it until some other process has written to it. You can make a named pipe/fifo in the file system with mkfifo. The only tricky thing about this is that pipes should generally be opened unbuffered, e.g. with open() as opposed to fopen(). Scripting languages sometimes make a facility for pipe-friendly reading/writing: Perl had a line-buffered mode set with $|=1 that was useful for pipes.
More commonly, you see anonymous pipes all the time on the command line with the | symbol separating commands which are executed as separate processes.
Sockets
What about something newer, like mySQL? The mysql database system consists of a command line client mysql and a server mysqld and may also serve other clients either on the local machine or over the internet.
mySQL communicates using a socket. The server listens to a socket for connections, and forks new processes giving a socket descriptor to the child for processing. When the child is done processing, it can exit.
There are UNIX sockets and Internet sockets, with different namespaces. One guide to programming sockets in C on Linux would be the sockets chapter of the The GNU C Library manual.
No-wait I/O is an alternative to forking off processes. This is done in C with the select() system call, which allows having a process wait for an event on one or more files, including sockets, or a timeout. The GNU C Library docs includes a no-wait I/O socket server example
NodeJS is a server for the Javascript language, written as a single-threaded server using no-wait I/O, and shows that these techniques are still relevant in modern systems.
"Callbacks"
I'm assuming its impossible to ask linux to call a callback when I type in myapp foo bar?
Maybe. But it might be too much work to be worth the trouble.
When you type myapp foo bar into "Linux", that's not Linux, you are typing that into your command shell, which is a program running in its own process separate from everything else.
So, unless the functionality you want is built into the command shell, there is normally no way for that command shell to send messages to your other program without starting a new process.
The default command shell for many (but not all) Linux systems is /bin/bash. To communicate with a daemon that listens to sockets from bash we would need a socket opening procedure in bash. And, sure enough, it exists!
One can listen to a socket in bash. This is the basis for a daemon:
From: Simple Socket Server in Bash? answer by dtmilano:
Q: I have a simple debugging solution which writes things to 127.0.0.1:4444 and I'd like to be able to simply bind up a port from bash and print everything that comes across. Is there an easy way to do this?
A:
$ exec 3<>/dev/tcp/127.0.0.1/4444
$ cat <&3
One can also open a socket from bash to a remote process, i.e. communicate with a a daemon:
From: TCP connection, bash only
we learn exec 5<>"/dev/tcp/${SERVER}/${PORT}" redirects a TCP link to file descriptor 5 for both input and output.
Not all bash programs are compiled with TCP support. It is apparently Linux-distribution dependent. At least according to a comment on this answer by William Pursell
There are other shells besides bash. Many shells were developed back in the *nix days. ksh Korn shell. csh C-shell. Bourne shell sh. Ash shell. Wikipedia keeps a list of shells. And these shells each have their own advantages and disadvantages and are not entirely compatible with each other's formatting!
Fast forward about 30 years, and there aren't so many in common use now.
But an important feature exists here: each user can choose his own login shell. See the chsh command.
So where I am going here is that if bash doesn't support the communications you need to do, you could set up a command shell where special messages can be sent without opening a new process. This might save you a few milliseconds, and usually isn't worth it. But nothing is stopping you. You might even set up an ipython command shell, as suggested in https://stackoverflow.com/a/209670/103081, and python can import most anything you need to do socket communications with your specialized daemon.
There are too many inferior questions in your post, so they can't all be answered in a single compact response. Also, some questions are like taste compare request - I mean "what's the best way?" - and this kind of questions is usually disliked at SO. Despite this I'll try to provide at least basic references you can continue with.
Whats the easiest way to send messages to my linux daemon app?
There is no such way unless you are already using some framework which provide such way. Daemons are too different. Some require only a simple set of fixed messages, some require complex reconfiguration and on-the-fly modification of internal data. The main set of approaches is:
Unix signals are really the simplest way to provide protected (with minor simplification, only the same user or the superuser can send them), idempotent (repeat shall not change signal meaning), non-parameterized message; such message set is very limited (~50 freely used ones in Linux). They are delivered asynchronously but can be converted to internal message queue members in a target application. Their main usage is for simple messages as "stop gracefully", "reread your configs" or "turn on debug messages".
Unix domain sockets are the basic IPC transport for any interactions (except the most fast ones where shared memory is better). They allow datagrams (like UDP), byte streams (like TCP) and message streams (like SCTP or SPX). A listening socket shall be represented in file system with a path known to clients; Unix file access rights will regulate access to it or you can check credentials explicitly. Virtually any protocol can be built over such sockets, but only clients from the same running OS can connect it, such sockets aren't automatically visible even from host OS in virtualization case.
Non-pipe-style IPC: shared memory and semaphores can be used for high load interactions but they require strong usage discipline. There are two versions of such interfaces (SysV and Posix).
Internet domain sockets provide the same transports as Unix domain sockets, allow access from any remote client (not nessessarily on the same OS) but don't provide access checks automatically, so client authentication is needed. Also transport protection is often needed (SSL/TLS and so on).
Various frameworks are provided over socket interfaces which implement some parts of needed functionality such as registration, endpoint publishing, server search, and so on. For local interactions, this now starts with Dbus which is message-oriented and allows named servers and typed messages either in common OS space or user-local space.
Different netwide message transports are developed in last years (AMQP, ZeroMQ, Apache Kafka, etc.) They are designed for using in distributed network, but often without any authentication.
The main issue with self-made daemon is that you shall attach any of these interfaces explicitly and provide a framework to listen for, decode and process incoming messages. If your application has already to be message-driven, you can put it into an environment which is message-driven by design (Erlang, J2EE...), unless you are scared for rewriting to another language :)
Is it possible that myapp foo bar can invoke a callback instead of starting a new process?
It's more style related question than principal one. You can name a control tool in the same way as a long running one. There are such examples in practice: SysV init (telinit is only convenience name), firefox (without -no-remote it tries to connect to already running instance), but generally, separate names are preferred:
apachectl starts/stops/controls daemon but the daemon name is httpd;
rndc for named, gdc for gated, etc.
If a daemon binary is a very fat guy, cost to start it each time for a small control message is too high and shall be avoided. But if you really want to do this way, determine two things:
Do you want it to start automatically unless a running instance is found?
How will you distingiush daemon starting request from control request?
If you have strong answers to both questions, you needn't worry.
Whats the standard way to communicate with a daemon?
No, there is no a single standard way. There are a few standard ways, see above. One shall choose a way on determined pros and cons.
Is my most reasonable choice to stick the PID in /var/run and create a named pipe or a socket?
Again, see above, but you'd know that, unless the daemon has root rights, /run (/var/run) isn't writable. A non-root daemon shall use totally another directory (e.g. ~/run/ for its dedicated user home dir) or be provided with a writable subdirectory.
I'm assuming its impossible to ask linux to call a callback when I type in myapp foo bar?
You again asked for "callback" I don't understand exactly what it stands for. If your daemon is written in an imperative language out of event-driven environment, any message receiving from other processes shall be implemented explicitly. For Unix signals, you write signal receiving functions and configure process to start them on incoming signal. For sockets, you shall create socket, give it a known address, put it into listening mode and provide code to accept incoming connections, gather and decode messages, etc. Only for already event-driven environment, you can use standard methods.
For Erlang, such remote control is very often implemented as connect to daemon node and RPC request using gen_server:call() or analog. The called Erlang process processes this in generic code without any need to implement cases a requester was another process at the same node or from a separate one.
There are some frameworks which have already implemented some of this need in procedural environments; e.g. Dbus client libraries can be written in a way they
create a separate thread for Dbus socket listening;
accept incoming messages and call the callbacks you specify,
but this has its own disadvantages:
a callback is run in its own thread; you shall provide proper synchronization to avoid data spoiling;
with huge message rate, flow control is at least too rough, or even impossible, and a daemon could be overflowed early before it realises bus connect shall be stopped.
To summarize, your initial question was like "How I could reach my friend's home in an efficient way?" with minimum of details. You can simply step 20 meters if your friend resides next door, or you can ride a bike, or you need to call taxi, or you need to rent a space ship to reach back side of the Moon:) You should repeatedly narrow your requirements until a single variant remains.

NodeJS: how to run three servers acting as one single application?

My application is built with three distincts servers: each one of them serves a different purpose and they must stay separated (at least, for using more than one core). As an example (this is not the real thing) you could think about this set up as one server managing user authentication, another one serving as the game engine, another one as a pubsub server. Logically the "application" is only one and clients connect to one or another server depending on their specific need.
Now I'm trying to figure out the best way to run a setup like this in a production environment.
The simplest way could be to have a bash script that would run each server in background one after the other. One problem with this approach would be that in the case I need to restart the "application", I should have saved each server's pid and kill each one.
Another way would be to use a node process that would run each servers as its own child (using child_process.spawn). Node spawning nodes. Is that stupid for some reason? This way I'd have a single process to kill when I need to stop/restart the whole application.
What do you think?
If you're on Linux or another *nix OS, you might try writing an init script that start/stop/restart your application. here's an example.
Use specific tools for process monitoring. Monit for example can monitor your processes by their pid and restart them whenever they die, and you can manually restart each process with the monit-cmd or with their web-gui.
So in your example you would create 3 independent processes and tell monit to monitor each of them.
I ended up creating a wrapper/supervisor script in Node that uses child_process.spawn to execute all three processes.
it pipes each process stdout/stderr to the its stdout/stderr
it intercepts errors of each process, logs them, then exit (as it were its fault)
It then forks and daemonize itself
I can stop the whole thing using the start/stop paradigm.
Now that I have a robust daemon, I can create a unix script to start/stop it on boot/shutdown as usual (as #Levi says)
See also my other (related) Q: NodeJS: will this code run multi-core or not?

Process text files ftp'ed into a set of directories in a hosted server

The situation is as follows:
A series of remote workstations collect field data and ftp the collected field data to a server through ftp. The data is sent as a CSV file which is stored in a unique directory for each workstation in the FTP server.
Each workstation sends a new update every 10 minutes, causing the previous data to be overwritten. We would like to somehow concatenate or store this data automatically. The workstation's processing is limited and cannot be extended as it's an embedded system.
One suggestion offered was to run a cronjob in the FTP server, however there is a Terms of service restriction to only allow cronjobs in 30 minute intervals as it's shared-hosting. Given the number of workstations uploading and the 10 minute interval between uploads it looks like the cronjob's 30 minute limit between calls might be a problem.
Is there any other approach that might be suggested? The available server-side scripting languages are perl, php and python.
Upgrading to a dedicated server might be necessary, but I'd still like to get input on how to solve this problem in the most elegant manner.
Most modern Linux's will support inotify to let your process know when the contents of a diretory has changed, so you don't even need to poll.
Edit: With regard to the comment below from Mark Baker :
"Be careful though, as you'll be notified as soon as the file is created, not when it's closed. So you'll need some way to make sure you don't pick up partial files."
That will happen with the inotify watch you set on the directory level - the way to make sure you then don't pick up the partial file is to set a further inotify watch on the new file and look for the IN_CLOSE event so that you know the file has been written to completely.
Once your process has seen this, you can delete the inotify watch on this new file, and process it at your leisure.
You might consider a persistent daemon that keeps polling the target directories:
grab_lockfile() or exit();
while (1) {
if (new_files()) {
process_new_files();
}
sleep(60);
}
Then your cron job can just try to start the daemon every 30 minutes. If the daemon can't grab the lockfile, it just dies, so there's no worry about multiple daemons running.
Another approach to consider would be to submit the files via HTTP POST and then process them via a CGI. This way, you guarantee that they've been dealt with properly at the time of submission.
The 30 minute limitation is pretty silly really. Starting processes in linux is not an expensive operation, so if all you're doing is checking for new files there's no good reason not to do it more often than that. We have cron jobs that run every minute and they don't have any noticeable effect on performance. However, I realise it's not your rule and if you're going to stick with that hosting provider you don't have a choice.
You'll need a long running daemon of some kind. The easy way is to just poll regularly, and probably that's what I'd do. Inotify, so you get notified as soon as a file is created, is a better option.
You can use inotify from perl with Linux::Inotify, or from python with pyinotify.
Be careful though, as you'll be notified as soon as the file is created, not when it's closed. So you'll need some way to make sure you don't pick up partial files.
With polling it's less likely you'll see partial files, but it will happen eventually and will be a nasty hard-to-reproduce bug when it does happen, so better to deal with the problem now.
If you're looking to stay with your existing FTP server setup then I'd advise using something like inotify or daemonized process to watch the upload directories. If you're OK with moving to a different FTP server, you might take a look at pyftpdlib which is a Python FTP server lib.
I've been a part of the dev team for pyftpdlib a while and one of more common requests was for a way to "process" files once they've finished uploading. Because of that we created an on_file_received() callback method that's triggered on completion of an upload (See issue #79 on our issue tracker for details).
If you're comfortable in Python then it might work out well for you to run pyftpdlib as your FTP server and run your processing code from the callback method. Note that pyftpdlib is asynchronous and not multi-threaded, so your callback method can't be blocking. If you need to run long-running tasks I would recommend a separate Python process or thread be used for the actual processing work.

Resources