Do I need to close a file before calling syncfs() - linux

On my embedded system, I want to make sure that the data is safely written when I close a file - if the system reports that the data was saved, the user should be able to remove power immediately.
I know that the proper way to do this is fsync(), fclose(), and fsync() on the directory (cfr. this blog entry). However, it's a bit tricky to get a file descriptor for the directory in my case (I'd have to go through /proc/self/fd to find back the filename and derive the directory from there). It would be much simpler for me to just do syncfs() on the entire filesystem - I know that this is the only file that is open on the filesystem anyway.
Now my question is:
Is it sufficient to do syncfs()?
Do I need to fclose() the FILE * first (for the directory entry to be up-to-date)? Or is fflush() sufficient?
If it needs to be closed, is it useful to dup() the file descriptor before closing so I can use it directly for syncfs()?

First of all, don't mix standard library <stdio.h> calls (like fprintf(3) or fopen(3)) with system calls (like open(2) or close(2) or sync(2)) as the formers are library routines that use in-process' buffers to store temporary data, for which the system is unaware, and the others are operating system interfaces that make the system responsible for the data maintainance from now onwards. You'll distinguish them easily as the former use FILE * descriptors to operate, while the last use int integer descriptors to operate on.
So if you use a system call to ensure your data is properly synced to disk, it is absolutely neccessary to first fflush(3) your process' buffer data before you do the filesystem sync(2) or fsync(2) call.
No sync(2) is warranted to happen at fclose(3) or even on close(2) time, or in the atexit() callbacks your process does before exit().
The operating system buffers are write delayed for performance reasons, and close(2) is not an event that makes it to trigger such a thing. Just think that many processes can be reading and writing the same file at the same time, and each close(2) triggering a filesystem flush could be a pain to achieve. Operating system triggers such calls at regular intervals, on umount(2) system calls, on system shutdown, and on specific calls to the sync(2) and fsync(2) system calls.
If you need to maintain the FILE *fd descriptor open, just do a fflush(fd) for that descriptor to ensure that the operating system has all its buffers for fwrite(3)d or fprintf(3)ed data first.
So finally, if you are using <stdio.h> functions, first do a fflush() for all the FILE * descriptors you have written to, or call fflush(NULL); to tell stdio to synch all descriptors in one call. Then do the sync(2) or fsync(2) call to ensure all your data is physically on disk. No need to close anything.
FILE *fd;
...
fflush(fd);
fsync(fileno(fd));
/* here you know that up to the last write(2) or fwrite(3)...
* data is synced to disk */
By the way, your approach of going to /dev/fd/<number> to get the descriptor (that you had previously) is faulty for two reasons:
Once you close your descriptor, /dev/fd/<number> is not anymore the descriptor you want. Normally, it doesn't exist, even. Just try this:
#include <string.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
int main()
{
int fd;
char fn[] = "/dev/fd/1";
close(1); /* close standard output */
fd = open(fn, O_RDONLY); /* try to reopen from /dev/fd */
if (fd < 0) {
fprintf(stderr,
"%s: %s(errno=%d)\n",
fn,
strerror(errno),
errno);
exit(EXIT_FAILURE);
}
exit(EXIT_SUCCESS);
} /* main */
You cannot get the directory where an open file belongs to with only the file descriptor. In a multilinked file, there can be thousands of directories just pointing to it. There's nothing on the inode (or in the open file structure) that allows you to get the path used to open that file. A common way to use temporary files is just to create them and immediately unlink(2) them, so nobody can open it again. As much as you retain the file open you have access to it, but no path points to it anymore.

Enable the "sync" flag in your filesystem (/etc/fstab), default is "async" (disabled) . When this flag is enabled, all changes to the according filesystem are inmediately flushed to disk. This makes your entire filesystem slow, but depending on your embedded system requirements, this can be a great option to consider.

Related

Is it required to use O_TRUNC and O_APPEND together?

I was going through the book The Linux Programming Interface. On page 73 in Chapter 4,
it is written:
fd = open("w.log", O_WRONLY | O_CREAT | O_TRUNC | O_APPEND, S_IRUSR | S_IWUSR);
I read that O_TRUC flag is used to truncate the file length to zero that destroys any existing data in the file.
O_APPEND flag is used to append data to the end of the file.
The kernel records a file offset, sometimes also called the read-write offset or pointer. This is the location in the file at which the next read() or write() will commence.
I am confused that if the file is truncated and the kernel does the subsequent writing at the end of the file, why is the append flag is needed to explicitly tell to append at the end of the file?
Without the append flag (if the file is truncated), the kernel writes at the end of the file for the subsequent write() function call.
O_APPEND flag is used to append data to the end of the file.
That's true, but incomplete enough to be potentially misleading. And I suspect that you are in fact confused in that regard.
The kernel records a file offset, sometimes also called the read-write offset or pointer. This is the location in the file at which the next read() or write() will commence.
That's also incomplete. There is a file offset associated with at least each seekable file. That is the position where the next read() will commence. It is where the next write() will commence if the file is not open in append mode, but in append mode every write happens at the end of the file, as if it were repositioned with lseek(fd, 0, SEEK_END) before each one. In that case, then, the current file offset might not be the position where the next write() will commence.
I am confused that if the file is truncated and the the kernel does the subsequent writing at the end of the file why the append flag is needed to explicitly tell to append at the end of the file ?
It is not needed to cause the first write (by any process) after truncation to occur at the end of the file because immediately after the file has been truncated there isn't any other position.
With out the append flag (if the file is truncated), the kernel writes at the end of the file for the subsequent write() function call.
It is not needed for subsequent writes either, as long as the file is not repositioned or externally modified. Otherwise, the location of the next write depends on whether the file is open in append mode or not.
In practice, it is not necessarily the case that every combination of flags is useful, but the combination of O_TRUNC and O_APPEND has observably different effect than does either flag without the other, and the combination is useful in certain situations.
O_APPEND rarely makes sense with O_TRUNC. I think no combination of the C fopen modes will produce that combination (on POSIX systems, where this is relevant).
O_APPEND ensures that every write is done at the end of the file, automatically, regardless of the write position. In particular, this means that if multiple processes are writing to the file, they do not stomp over each other's writes.
note that POSIX does not require the atomic behavior of O_APPEND. It requires that an automatic seek takes place to the (current) end of the file before the write, but it doesn't require that position to still be the end of of the file when the write occurs. Even on implementations which feature atomic O_APPEND, it might not work for all file systems. The Linux man page on open cautions that O_APPEND doesn't work atomically on NFS.
Now, if every process uses O_TRUNC when opening the file, it will be clobbering everything that every other process wrote. That conflicts with the idea that the processes shouldn't be clobbering each other's writes, for which O_APPEND was specified.
O_APPEND is not required for appending to a file by a single process which is understood to be the only writer. It is possible to just seek to the end and then start writing new data. Sometimes O_APPEND is used in the exclusive case anyway simply because it's a programming shortcut. We don't have to bother making an extra call to position to the end of the file. Compare:
FILE *f = fopen("file.txt", "a");
// check f and start writing
versus:
FILE *f = fopen("file.txt", "r+");
// check f
fseek(f, 0, SEEK_END); // go to the end, also check this for errors
// start writing
We can think about the idea that we have a group of processes using O_APPEND to a file, such that the first one also performs O_TRUNC to truncate it first. But it seems awkward to program this; it's not easy for a process to tell whether it is the first one to be opening the file.
If such a situation is required on, say, boot-up, where the old file from before the boot is irrelevant for some reason, just have a boot-time action (script or whatever) remove the old file before these multiple processes are started. Each one then uses O_CREAT to create the file if necessary (in case it is the first process) but without O_TRUNC (in case they are not the first process), and with O_APPEND to do the atomic (if available) appending thing.
The two are entirely independent. The file is simply opened with O_APPEND because it's a log file.
The author wants concurrent messages to concatenate instead of overwrite each other (e.g. if the program forks), and if an admin or log rotation tool truncates the file then new messages should start being written at line #1 instead of at line #1000000 where the last log entry was written. This would not happen without O_APPEND.

If a file has been deleted, but a process still holds a file handle, can another process access it? [duplicate]

This question already has answers here:
Link to a specific inode
(2 answers)
Closed 3 years ago.
On Linux, suppose a process opens a file for writing, something deletes the file (perhaps misconfigured log rotation), but the process keeps running, keeps the file handle open, and keeps writing to it. My understanding is that in this case, the storage used by the file still exists on disk, until the process terminates.
Suppose I want to read that file. Is there any way for another process to open the file pointed to by that file handle, or to otherwise get access to the data written to it?
Yes it can. Via /proc/$pid/fd/$fd.
Example:
#include <unistd.h>
#include <fcntl.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
int fd;
if(0>(fd = open("DELETED", O_CREAT|O_RDWR|O_CLOEXEC, 0600))) return perror("open"),1;
static char const msg[]="got me\n";
(void)write(fd, msg, sizeof(msg));
if(0>(unlink("DELETED"))) return perror("unlink"),1;
char buf[128];
sprintf(buf,"cat /proc/%ld/fd/%d", (long)getpid(), fd);
system(buf);
}
(Here I'm accessing it from a(n indirect) child process, but this is not a requirement. It works from unrelated processes as well.)
The /proc/$pid/fd/$fd items appear as symlinks in the filesystem.
They usually point to the name the file was opened as but when the file is deleted, the original link target has a " (deleted)" appended to it as in
lrwx------ 1 petr petr 64 Aug 19 12:45 /proc/32027/fd/3 -> '/home/petr/DELETED (deleted)'
yet in spite of such a target being nonexistent such a proc symlink works (through some dark kernel magic, presumably).
Suppose I want to read that file. Is there any way for another process to open the file pointed to by that file handle, or to otherwise get access to the data written to it?
As long as any process has the file open, its i-node and any other data will remain on disk. It is at least possible in principle to find that i-node and that data, and read them directly from the disk, though that's not exactly the same thing as opening the file. It may even be possible to do that after the last process closes the file -- this is how undeletion utilities work, and these do exist for Linux filesystems.

FUSE's write sequence guarantees

Should write() implementations assume random-access, or can there be some assumptions, like that they'll ever be performed sequentially, and at increasing offsets?
You'll get extra points for a link to the part of a POSIX or SUS specification that describes the VFS interface.
Random, for certain. There's a reason why the read and write interfaces take both size and offset. You'll notice that there isn't a seek field in the fuse_operations struct; when a user program calls seek/lseek on a FUSE file, the offset in the kernel file descriptor is updated, but the FUSE fs isn't notified at all. Later reads and writes just start coming to you with a different offset, and you should be able to handle that. If something about your implementation makes it impossible, you should probably return -EIO on the writes you can't satisfy.
Unless there is something unusual about your FUSE filesystem that would prevent an existing file from being opened for write, your implementation of the write operation must support writes to any offset — an application can write to any location in a file by lseek()-ing around in the file while it's open, e.g.
fd = open("file", O_WRONLY);
lseek(fd, SEEK_SET, 100);
write(fd, ...);
lseek(fd, SEEK_SET, 0);
write(fd, ...);

How do dev files work?

How guys from linux make /dev files. You can write to them and immediately they're erased.
I can imagine some program which constantly read some dev file:
FILE *fp;
char buffer[255];
int result;
fp = fopen(fileName, "r");
if (!fp) {
printf("Open file error");
return;
}
while (1)
{
result = fscanf(fp, "%254c", buffer);
printf("%s", buffer);
memset(buffer, 0, 255);
fflush(stdout);
sleep(1);
}
fclose(fp);
But how to delete content in there? Closing a file and opening them once again in "w" mode is not the way how they done it, because you can do i.e. cat > /dev/tty
What are files? Files are names in a directory structure which denote objects. When you open a file like /home/joe/foo.txt, the operating system creates an object in memory representing that file (or finds an existing one, if the file is already open), binds a descriptor to it which is returned and then operations on that file descriptor (like read and write) are directed, through the object, into file system code which manipulates the file's representation on disk.
Device entries are also names in the directory structure. When you open some /dev/foo, the operating system creates an in-memory object representing the device, or finds an existing one (in which case there may be an error if the device does not support multiple opens!). If successful, it binds a new file descriptor to the device obejct and returns that descriptor to your program. The object is configured in such a way that the operations like read and write on the descriptor are directed to call into the specific device driver for device foo, and correspond to doing some kind of I/O with that device.
Such entries in /dev/ are not files; a better name for them is "device nodes" (a justification for which is the name of the mknod command). Only when programmers and sysadmins are speaking very loosely do they call them "device files".
When you do cat > /dev/tty, there isn't anything which is "erasing" data "on the other end". Well, not exactly. Basically, cat is calling write on a descriptor, and this results in a chain of function calls which ends up somewhere in the kernel's tty subsystem. The data is handed off to a tty driver which will send the data into a serial port, or socket, or into a console device which paints characters on the screen or whatever. Virtual terminals like xterm use a pair of devices: a master and slave pseudo-tty. If a tty is connected to a pseudo-tty device, then cat > /dev/tty writes go through a kind of "trombone": they bubble up on the master side of the pseudo-tty, where in fact there is a while (1) loop in some user-space C program receiving the bytes, like from a pipe. That program is xterm (or whatever); it removes the data and draws the characters in its window, scrolls the window, etc.
Unix is designed so that devices (tty, printer, etc) are accessed like everything else (as a file) so the files in /dev are special pseudo files that represent the device within the file-system.
You don't want to delete the contents of such a device file, and honestly it could be dangerous for your system if you write to them willy-nilly without understanding exactly what you are doing.
Device files are not normal files, if "normal file" refers to an arbitrary sequence of bytes, often stored on a medium. But not all files are normal files.
More broadly, files are an abstraction referring to a system service and/or resource, a service being something you can send information to for some purpose (e.g., for a normal file, write data to storage) and a resource being something you request data from for some purpose (e.g., for a normal file, read data from storage). C defines a standard for interfacing with such a service/resource.
Device files fit within this definition, but they do not not necessarily match my more specific "normal file" examples of reading and writing to and from storage. You can directly create dev files, but the only meaningful reason to do so is within the context of a kernel module. More often you may refer to them (e.g., with udev), keeping in mind they are actually created by the kernel and represent an interface with the kernel. Beyond that, the functioning of the interface differs from dev file to dev file.
I've also found quiet nice explanation:
http://lwn.net/images/pdf/LDD3/ch18.pdf

In general, on ucLinux, is ioctl faster than writing to /sys filesystem?

I have an embedded system I'm working with, and it currently uses the sysfs to control certain features.
However, there is function that we would like to speed up, if possible.
I discovered that this subsystem also supports and ioctl interface, but before rewriting the code, I decided to search to see which is a faster interface (on ucLinux) in general: sysfs or ioctl.
Does anybody understand both implementations well enough to give me a rough idea of the difference in overhead for each? I'm looking for generic info, such as "ioctl is faster because you've removed the file layer from the function calls". Or "they are roughly the same because sysfs has a very simple interface".
Update 10/24/2013:
The specific case I'm currently doing is as follows:
int fd = open("/sys/power/state",O_WRONLY);
write( fd, "standby", 7 );
close( fd );
In kernel/power/main.c, the code that handles this write looks like:
static ssize_t state_store(struct kobject *kobj, struct kobj_attribute *attr,
const char *buf, size_t n)
{
#ifdef CONFIG_SUSPEND
suspend_state_t state = PM_SUSPEND_STANDBY;
const char * const *s;
#endif
char *p;
int len;
int error = -EINVAL;
p = memchr(buf, '\n', n);
len = p ? p - buf : n;
/* First, check if we are requested to hibernate */
if (len == 7 && !strncmp(buf, "standby", len)) {
error = enter_standby();
goto Exit;
((( snip )))
Can this be sped up by moving to a custom ioctl() where the code to handle the ioctl call looks something like:
case SNAPSHOT_STANDBY:
if (!data->frozen) {
error = -EPERM;
break;
}
error = enter_standby();
break;
(so the ioctl() calls the same low-level function that the sysfs function did).
If by sysfs you mean the sysfs() library call, notice this in man 2 sysfs:
NOTES
This System-V derived system call is obsolete; don't use it. On systems with /proc, the same information can be obtained via
/proc/filesystems; use that interface instead.
I can't recall noticing stuff that had an ioctl() and a sysfs interface, but probably they exist. I'd use the proc or sys handle anyway, since that tends to be less cryptic and more flexible.
If by sysfs you mean accessing files in /sys, that's the preferred method.
I'm looking for generic info, such as "ioctl is faster because you've removed the file layer from the function calls".
Accessing procfs or sysfs files does not entail an I/O bottleneck because they are not real files -- they are kernel interfaces. So no, accessing this stuff through "the file layer" does not affect performance. This is a not uncommon misconception in linux systems programming, I think. Programmers can be squeamish about system calls that aren't well, system calls, and paranoid that opening a file will be somehow slower. Of course, file I/O in the ABI is just system calls anyway. What makes a normal (disk) file read slow is not the calls to open, read, write, whatever, it's the hardware bottleneck.
I always use low level descriptor based functions (open(), read()) instead of high level streams when doing this because at some point some experience led me to believe they were more reliable for this specifically (reading from /proc). I can't say whether that's definitively true.
So, the question was interesting, I built a couple of modules, one for ioctl and one for sysfs, the ioctl implementing only a 4 bytes copy_from_user and nothing more, and the sysfs having nothing in its write interface.
Then a couple of userspace test up to 1 million iterations, here the results:
time ./sysfs /sys/kernel/kobject_example/bar
real 0m0.427s
user 0m0.056s
sys 0m0.368s
time ./ioctl /run/temp
real 0m0.236s
user 0m0.060s
sys 0m0.172s
edit
I agree with #goldilocks answer, HW is the real bottleneck, in a Linux environment with a well written driver choosing ioctl or sysfs doesn't make a big difference, but if you are using uClinux probably in your HW even few cpu cycles can make a difference.
The test I've done is for Linux not uClinux and it never wanted to be an absolute reference profiling the two interfaces, my point is that you can write a book about how fast is one or another but only testing will let you know, took me few minutes to setup the thing.

Resources