What does lseek() mean for a directory file descriptor? - linux

According to strace, lseek(fd, 0, SEEK_END) = 9223372036854775807 when fd refers to a directory. Why is this syscall succeeding at all? What does lseek() mean for a dir fd?

On my test system, if you use opendir(), and readdir() through all the entries in the directory, telldir() then returns the same value:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <dirent.h>
int main(int argc, char *argv[]) {
int fd = open(".", O_RDONLY);
if (fd < 0) {
perror("open");
return 1;
}
off_t o = lseek(fd, 0, SEEK_END);
if (o == (off_t)-1) {
perror("lseek");
return 1;
}
printf("Via lseek: %ld\n", (long)o);
close(fd);
DIR *d = opendir(".");
if (!d) {
perror("opendir");
return 1;
}
while (readdir(d)) {
}
printf("via telldir: %ld\n", telldir(d));
closedir(d);
return 0;
}
outputs
Via lseek: 9223372036854775807
via telldir: 9223372036854775807
Quoting from the telldir(3) man page:
In early filesystems, the value returned by telldir() was a simple file offset within a directory. Modern filesystems use tree or hash structures, rather than flat tables, to represent directories. On such filesystems, the value returned by telldir() (and used internally by readdir(3)) is a "cookie" that is used by the implementation to derive a position within a directory. Application programs should treat this strictly as an opaque value, making no assumptions about its contents.
It's a magic number that indicates that the index into the directory's contents is at the end. Don't count on the number always being the same, or being portable. It's a black box. And stick with the dirent API for traversing directory contents unless you really know exactly what you're doing (Under the hood on Linux + glibc, opendir(3) calls openat(2) on the directory, readdir(3) fetches information about its contents with getdents(2), and seekdir(3) calls lseek(2), but that's just implementation details)

Related

How to monitor changes to pseudo-filesystem on Linux?

As neither dnotify nor inotify is able to monitor changes to pseudo-filesystem content is there an automated way to discover file/directory creation/deletion inside (for example) /sys/block directory?
Of course, I can scan the directory periodically on my own but hope that there is a smarter way.
I decided to use a somewhat naive workaround and monitor /dev directory (which is supported by inotify) instead of /sys/block. Fortunately, each /sys/block entry has its counterpart inside /dev (but not vice versa) so I just check whether an entry that appeared in /dev is also present inside /sys/block.
Not very elegant but sufficient for me.
#include <stdio.h>
#include <sys/inotify.h>
#include <unistd.h>
#include <assert.h>
#include <linux/limits.h>
#include <sys/stat.h>
int main(void)
{
int fd = inotify_init();
assert(fd >= 0);
int wd = inotify_add_watch(fd, "/dev", IN_CREATE);
assert(wd >= 0);
for(;;) {
char _event[sizeof(struct inotify_event) + NAME_MAX + 1];
int res = read(fd, _event, sizeof(_event));
assert(res > 0);
struct inotify_event *event = (struct inotify_event *) _event;
if(event -> len > 0 && event -> mask & IN_CREATE && !(event -> mask & IN_ISDIR)) {
char dev_name[NAME_MAX + 1];
sprintf(dev_name, "/sys/block/%s/stat", event -> name);
struct stat statbuf;
if(0 == stat(dev_name, &statbuf))
printf("new entry appeared: %s\n", event -> name);
}
}
}

What is the cause of the hard limit on the directory nesting depth returned by getcwd on macOS and how can it be circumvented?

On linux and macOS, directories can be nested to seemingly arbitrary depth, as demonstrated by the following C program. However, on macOS but not on linux, there seems to be a hard limit on the nesting level returned by getcwd, specifically a nesting level of 256. When that limit is reached, getcwd returns ENOENT, a rather strange error code. Where does this limit come from? Is there a way around it?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/stat.h>
#include <sys/types.h>
void fail(char *msg) { perror(msg); exit(1); }
void create_nested_dirs(int n) {
int i;
char name[10];
char cwd[10000];
if (chdir("/tmp") < 0) fail("chdir(\"/tmp\")");
for (i=2; i<=n; i++) {
sprintf(name, "%09d", i);
printf("%s\n",name);
if (mkdir(name, 0777) < 0 && errno != EEXIST) fail("mkdir");
if (chdir(name) < 0) fail("chdir(name)");
if (getcwd(cwd, sizeof(cwd)) == NULL) fail("getcwd");
printf("cwd = \"%s\" strlen(cwd)=%d\n", cwd, strlen(cwd));
}
}
int main() {
long ret = pathconf("/", _PC_PATH_MAX);
printf("PATH_MAX is %ld\n", ret);
create_nested_dirs(300);
return 0;
}
Update
The above program was updated to print the value returned by pathconf("/", _PC_PATH_MAX) and to print the length of the path returned by getcwd.
On my machine running macOS Mojave 10.14, the PATH_MAX is 1024 and the longest string correctly returned by getcwd is 2542 characters long. Then a 2552 character long directory of nesting depth 256 is created by mkdir and then after a successful chdir to that directory a getcwd fails with ENOENT.
If the sprintf(name, "%09d", i); is changed to sprintf(name, "%03d", i); the paths are considerably shorter but the getcwd still fails when the directory nesting depth reaches 256.
So the limiting factor here is the nesting depth, not PATH_MAX.
My understanding of the source code here is that the meat of the work is done by the call fcntl(fd, F_GETPATH, b) so the problem may be in fcntl.

Monitor directory recursively for file additions/modifications/deletions

I need to watch a directory with several subdirectories, each of which has files which I need to monitor for file additions, modifications and deletions.
I found some example code, and had to modify it slightly to get it working, but it doesn't exactly do what I need. It can find a file rename, or delete within a directory (but not a subdirectory), but doesn't respond to file modifications.
The way that I can find using a Google search is to monitor each file individually; however, I have several hundreds of thousands of files to monitor, and holding a file descriptor to each is probably unwise.
Is there a way under FreeBSD to do what I need to do? Or will I have to find an alternative solution?
#include <sys/types.h>
#include <sys/event.h>
#include <sys/time.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int f, kq, nev;
struct kevent change;
struct kevent event;
kq = kqueue();
if (kq == -1)
perror("kqueue");
f = open("/tmp/foo", O_RDONLY);
if (f == -1)
perror("open");
EV_SET(&change, f, EVFILT_VNODE,
EV_ADD | EV_ENABLE | EV_ONESHOT,
NOTE_DELETE | NOTE_EXTEND | NOTE_WRITE | NOTE_ATTRIB,
0, 0);
for (;;) {
nev = kevent(kq, &change, 1, &event, 1, NULL);
if (nev == -1)
perror("kevent");
else if (nev > 0) {
if (event.fflags & NOTE_DELETE) {
printf("File deleted\n");
break;
}
if (event.fflags & NOTE_EXTEND ||
event.fflags & NOTE_WRITE)
printf("File modified\n");
if (event.fflags & NOTE_ATTRIB)
printf("File attributes modified\n");
}
}
close(kq);
close(f);
return EXIT_SUCCESS;
}
As you rightly guessed, kqueue is not scalable because you have to hold a handle to the file / directory in question, even if in O_RDONLY mode. On Linux, one would use inotify for this purpose (http://linux.die.net/man/7/inotify), but I believe there is no FreeBSD port of this kernel feature!
If you have the time and resources, what you could do is look at the code for audit on BSD (http://www.freebsd.org/cgi/man.cgi?query=audit&sektion=4) and try to code up a version of inotify for BSD! O_O

Ordering file location on linux partition

I have a process which processes a lot of files (~96,000 files, ~12 TB data). Several runs of the process has left the files scattered about the drive. Each iteration in the process, uses several files. This leads to a lot of whipsawing around the disk collecting the files.
Ideally, I would like the process to write the files it uses in order, so that the next run will read them in order (file sizes change). Is there a way to hint at a physical ordering/grouping, short of writing to the raw partition?
Any other suggestions would be helpful.
Thanks
There are two system calls you might lookup: fadvise64, fallocate tell the kernel how you intend to read or write a given file.
Another tip is the "Orlov block allocator" (Wikipedia, LWN) affects the way the kernel will allocate new directories and file-entries.
In the end I decided not to worry about writing the files in any particular ordering. Instead, before I started a run, I would figure out where the first block of each file was located, and then sort the file processing order by first block location. Not perfect, but it did make a big difference in processing times.
Here's the C code I used to get the first block of supplied file list I adapted it from example code I found online (can't seem to find the original source).
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <linux/fs.h>
//
// Get the first block for each file passed to stdin,
// write filename & first block for each file to stdout
//
int main(int argc, char **argv) {
int fd;
int block;
char fname[512];
while(fgets(fname, 511, stdin) != NULL) {
fname[strlen(fname) - 1] = '\0';
assert(fd=open(fname, O_RDONLY));
block = 0;
if (ioctl(fd, FIBMAP, &block)) {
printf("FIBMAP ioctl failed - errno: %s\n", strerror(errno));
}
printf("%010d, %s\n", block, fname);
close(fd);
}
return 0;
}

How can I get the source code for the linux utility tail?

this command is really very useful but where I can get the source code to see what is going on inside .
thanks .
The tail utility is part of the coreutils on linux.
Source tarball: ftp://ftp.gnu.org/gnu/coreutils/coreutils-7.4.tar.gz
Source file: https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c (original http link)
I've always found FreeBSD to have far clearer source code than the gnu utilities. So here's tail.c in the FreeBSD project:
http://svnweb.freebsd.org/csrg/usr.bin/tail/tail.c?view=markup
Poke around the uclinux site. Since they distributed the software, they are required to make the source available one way or another.
Or, you could read man fseek and guess at how it might be done.
NB-- See William's comments below, there are cases when you can't use seek.
You might find it an interesting exercise to write your own. The vast majority of the Unix command-line tools are a page or so of fairly straightforward C code.
To just look at the code, the GNU CoreUtils sources are easily found on gnu.org or your favorite Linux mirror site.
/`*This example implements the option n of tail command.*/`
#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <getopt.h>
#define BUFF_SIZE 4096
FILE *openFile(const char *filePath)
{
FILE *file;
file= fopen(filePath, "r");
if(file == NULL)
{
fprintf(stderr,"Error opening file: %s\n",filePath);
exit(errno);
}
return(file);
}
void printLine(FILE *file, off_t startline)
{
int fd;
fd= fileno(file);
int nread;
char buffer[BUFF_SIZE];
lseek(fd,(startline + 1),SEEK_SET);
while((nread= read(fd,buffer,BUFF_SIZE)) > 0)
{
write(STDOUT_FILENO, buffer, nread);
}
}
void walkFile(FILE *file, long nlines)
{
off_t fposition;
fseek(file,0,SEEK_END);
fposition= ftell(file);
off_t index= fposition;
off_t end= fposition;
long countlines= 0;
char cbyte;
for(index; index >= 0; index --)
{
cbyte= fgetc(file);
if (cbyte == '\n' && (end - index) > 1)
{
countlines ++;
if(countlines == nlines)
{
break;
}
}
fposition--;
fseek(file,fposition,SEEK_SET);
}
printLine(file, fposition);
fclose(file);
}
int main(int argc, char *argv[])
{
FILE *file;
file= openFile(argv[2]);
walkFile(file, atol(argv[1]));
return 0;
}
/*Note: take in mind that i not wrote code to parse input options and arguments, neither code to check if the lines number argument is really a number.*/

Resources