Strange descriptor closing in some linux programs

Strange descriptor closing in some linux programs - linux

While stracing some linux daemons (eg. sendmail) I noticed that some of them will call close() on a number of descriptors (usually ranging from 3 to 255) right at the beginning. Is this being done on purpose or is this some sort of a side effect of doing something else?

It is usually done as part of making a process a daemon.
All file descriptors are closed so that the long-running daemon does not unnecessarily hold any resources. For example, if a daemon were to inherit an open file and the daemon did not close it then the file could not be deleted (the storage for it would remain allocated until close) and the filesystem that the file is on could not be unmounted.
Daemonizing a process will also take a number of other actions, but those actions are beyond the scope of this question.

Related

Run a background process and free up terminal won't work with & [duplicate]

I am writing a Linux daemon . I found two ways to do it.
Daemonize your process by calling fork() and setting sid.
Running your program with &.
Which is the right way to do it?

From http://www.steve.org.uk/Reference/Unix/faq_2.html#SEC16
Here are the steps to become a daemon:
fork() so the parent can exit, this returns control to the command line or shell invoking your program. This step is required so that the new process is guaranteed not to be a process group leader. The next step, setsid(), fails if you're a process group leader.
setsid() to become a process group and session group leader. Since a controlling terminal is associated with a session, and this new session has not yet acquired a controlling terminal our process now has no controlling terminal, which is a Good Thing for daemons.
fork() again so the parent, (the session group leader), can exit. This means that we, as a non-session group leader, can never regain a controlling terminal.
chdir("/") to ensure that our process doesn't keep any directory in use. Failure to do this could make it so that an administrator couldn't unmount a filesystem, because it was our current directory. [Equivalently, we could change to any directory containing files important to the daemon's operation.]
umask(0) so that we have complete control over the permissions of anything we write. We don't know what umask we may have inherited. [This step is optional]
close() fds 0, 1, and 2. This releases the standard in, out, and error we inherited from our parent process. We have no way of knowing where these fds might have been redirected to. Note that many daemons use sysconf() to determine the limit _SC_OPEN_MAX. _SC_OPEN_MAX tells you the maximun open files/process. Then in a loop, the daemon can close all possible file descriptors. You have to decide if you need to do this or not. If you think that there might be file-descriptors open you should close them, since there's a limit on number of concurrent file descriptors.
Establish new open descriptors for stdin, stdout and stderr. Even if you don't plan to use them, it is still a good idea to have them open. The precise handling of these is a matter of taste; if you have a logfile, for example, you might wish to open it as stdout or stderr, and open '/dev/null' as stdin; alternatively, you could open '/dev/console' as stderr and/or stdout, and '/dev/null' as stdin, or any other combination that makes sense for your particular daemon.
Better yet, just call the daemon() function if it's available.

I suggest not writing your program as a daemon at all. Make it run in the foreground with the file descriptors, current directory, process group, etc as given to it.
If you want to then run this program as a daemon, use start-stop-daemon(8), init(8), runsv (from runit), upstart, systemd, or whatever to launch your process as a daemon. That is, let your user decide how to run your program and don't enforce that it must run as a daemon.

Just use daemon(3) (from unistd.h).
The daemon() function is for programs
wishing to detach themselves from the
controlling terminal and run in the
background as system daemons. ...

The first. The second is not daemonizing, but running on the background. Daemonized programs should be on its own session and process group, and should not have a controlling terminal.

Actually to make a daemon you have to double fork.
Running the program with a & makes the shell run the program in the background, which does not make it a daemon. Daemons have init (pid 1) as a parent, that's why the double fork is needed.
So the nice way to do things, if your program is a daemon, would be to take care of this issue yourself (there are more methods, see here too). You could also use the start-stop-daemon program.

What language are you using? Some languages have helper methods that make daemonizing easier. For example, Ruby has the daemons package.

Synchronizing file processing threads across servers in linux

I need to build a linux service/daemon which processes files. This daemon will most likely be multi-threaded and most likely be running on more than one node. What would be the best way to synchronize the threads of all daemons such that no two threads are processing the same file?
A couple ideas came to mind but wondering whether there is a better approach as I'm new to linux.
Create a directory structure such that only one daemon processes a directory. The daemon itself should be able to easily synchronize the threads within it such that no two threads are processing the same file.
Determine some mechanism using open() and maybe file attributes such that once a process can successfully open a file exclusively when the file is in some state, maybe some file attribute not set yet, the state is changed, by changing some file attribute, and that daemon can process the file knowing that no other daemon will process it.
Come up with a naming convention such that the names are somewhat equally distributed across some numerical name. Each daemon could then be configured to process some modulo number.
Example: file name = 987654321
We have a daemon running on two nodes. The configuration for each daemon would indicate the number of daemons and which modulo the daemon should process. Therefore one daemon would process modulo value 0 and the other would process modulo value 1.
987654321 % 2 = 1 so it would be processed by the daemon processing modulo 1.
I guess we could have a single daemon which divvies out the work to the processing daemons. The processing daemons could communicate with this single daemon which I'll call the "work manager" via some IPC mechanism.
Thanks,
Nick

If you're going to implement logic in Python, you can use python's queue class.
It designed for exchanging data between multiple threads.
You can put your files and/or directories in queue, then each thread will access the queue and get that file. In that way, threads will never hold the same object.

close()->open() in Linux, how safe is it?

A process running on Linux writes some data to a file on the file system, and then invokes close(). Immediately afterwards, another process invokes open() and reads from the file.
Is the second process always 100% guaranteed to see an updated file?
What happens when using a network filesystem, and the two processes run on the same host?
What happens when the two processes are on different hosts?

close() does not guarantee that the data is written to disk, you have to use fsync() for this. See Does Linux guarantee the contents of a file is flushed to disc after close()? for a similar question.

process that don't have file descriptor

when I do
ls proc/[pid]/fd
sometimes I don't get output, It seems that there are no file descriptor in that file.
What does that mean when a process doesn't have file descriptor?

The process in question is more likely a deamon — daemon processed will intentionally close standard file descriptors in order to avoid hanging onto their resources. (They will also chdir to the root directory, invoke an extra fork() and perform a number of more obscure operations for the same reason.)

multi-threaded file locking in linux

I'm working on a multithreaded application where multiple threads may want exclusive access to the same file. I'm looking for a way of serializing these operations. I was planning to use flock, lockf, or fcntl locking. However it appears that with these methods an attempt to lock a file by a second thread when a first thread already owns the lock will be granted, because the two threads are in the same process. This is according to the manpages for flock and fnctl (and I guess in linux lockf is implemented with fnctl). Also supported by this other question. So, are there other ways of locking a file in linux which works at a thread-level instead of a process-level?
Some alternatives that I came up with which I do not like are:
1) Use a lockfile (xxx.lock) opened with O_CREAT | O_EXCL flags. This call will succeed only in one thread if there is contention. The problem with this is that then other threads have to spin on the call until they achieve the lock, meaning that I have to _yield() or sleep() which makes me think this is not a great option.
2) Keep a mutex'ed list of all open files. When a thread wants to open/close a file it has to lock the list first. When opening a file, it searches the list to see if it's open. This sounds particularly inefficient because it requires a significant amount of work even if the file is not owned yet.
Are there other ways of doing this?
Edit:
I just discovered this text in my system's manpages which isn't in the online man pages:
If a process uses open(2) (or similar) to obtain more than one descriptor for the same file, these descriptors are treated independently by flock(). An attempt to lock the file using one of these file descriptors may be denied by a lock that the calling process has already placed via another descriptor.
I'm not happy about the words "may be denied", I'd prefer "will be denied" but I guess it's time to test that.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string