Monitor directory recursively for file additions/modifications/deletions - freebsd

I need to watch a directory with several subdirectories, each of which has files which I need to monitor for file additions, modifications and deletions.
I found some example code, and had to modify it slightly to get it working, but it doesn't exactly do what I need. It can find a file rename, or delete within a directory (but not a subdirectory), but doesn't respond to file modifications.
The way that I can find using a Google search is to monitor each file individually; however, I have several hundreds of thousands of files to monitor, and holding a file descriptor to each is probably unwise.
Is there a way under FreeBSD to do what I need to do? Or will I have to find an alternative solution?
#include <sys/types.h>
#include <sys/event.h>
#include <sys/time.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int f, kq, nev;
struct kevent change;
struct kevent event;
kq = kqueue();
if (kq == -1)
perror("kqueue");
f = open("/tmp/foo", O_RDONLY);
if (f == -1)
perror("open");
EV_SET(&change, f, EVFILT_VNODE,
EV_ADD | EV_ENABLE | EV_ONESHOT,
NOTE_DELETE | NOTE_EXTEND | NOTE_WRITE | NOTE_ATTRIB,
0, 0);
for (;;) {
nev = kevent(kq, &change, 1, &event, 1, NULL);
if (nev == -1)
perror("kevent");
else if (nev > 0) {
if (event.fflags & NOTE_DELETE) {
printf("File deleted\n");
break;
}
if (event.fflags & NOTE_EXTEND ||
event.fflags & NOTE_WRITE)
printf("File modified\n");
if (event.fflags & NOTE_ATTRIB)
printf("File attributes modified\n");
}
}
close(kq);
close(f);
return EXIT_SUCCESS;
}

As you rightly guessed, kqueue is not scalable because you have to hold a handle to the file / directory in question, even if in O_RDONLY mode. On Linux, one would use inotify for this purpose (http://linux.die.net/man/7/inotify), but I believe there is no FreeBSD port of this kernel feature!
If you have the time and resources, what you could do is look at the code for audit on BSD (http://www.freebsd.org/cgi/man.cgi?query=audit&sektion=4) and try to code up a version of inotify for BSD! O_O

Related

How to monitor changes to pseudo-filesystem on Linux?

As neither dnotify nor inotify is able to monitor changes to pseudo-filesystem content is there an automated way to discover file/directory creation/deletion inside (for example) /sys/block directory?
Of course, I can scan the directory periodically on my own but hope that there is a smarter way.
I decided to use a somewhat naive workaround and monitor /dev directory (which is supported by inotify) instead of /sys/block. Fortunately, each /sys/block entry has its counterpart inside /dev (but not vice versa) so I just check whether an entry that appeared in /dev is also present inside /sys/block.
Not very elegant but sufficient for me.
#include <stdio.h>
#include <sys/inotify.h>
#include <unistd.h>
#include <assert.h>
#include <linux/limits.h>
#include <sys/stat.h>
int main(void)
{
int fd = inotify_init();
assert(fd >= 0);
int wd = inotify_add_watch(fd, "/dev", IN_CREATE);
assert(wd >= 0);
for(;;) {
char _event[sizeof(struct inotify_event) + NAME_MAX + 1];
int res = read(fd, _event, sizeof(_event));
assert(res > 0);
struct inotify_event *event = (struct inotify_event *) _event;
if(event -> len > 0 && event -> mask & IN_CREATE && !(event -> mask & IN_ISDIR)) {
char dev_name[NAME_MAX + 1];
sprintf(dev_name, "/sys/block/%s/stat", event -> name);
struct stat statbuf;
if(0 == stat(dev_name, &statbuf))
printf("new entry appeared: %s\n", event -> name);
}
}
}

What does lseek() mean for a directory file descriptor?

According to strace, lseek(fd, 0, SEEK_END) = 9223372036854775807 when fd refers to a directory. Why is this syscall succeeding at all? What does lseek() mean for a dir fd?
On my test system, if you use opendir(), and readdir() through all the entries in the directory, telldir() then returns the same value:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <dirent.h>
int main(int argc, char *argv[]) {
int fd = open(".", O_RDONLY);
if (fd < 0) {
perror("open");
return 1;
}
off_t o = lseek(fd, 0, SEEK_END);
if (o == (off_t)-1) {
perror("lseek");
return 1;
}
printf("Via lseek: %ld\n", (long)o);
close(fd);
DIR *d = opendir(".");
if (!d) {
perror("opendir");
return 1;
}
while (readdir(d)) {
}
printf("via telldir: %ld\n", telldir(d));
closedir(d);
return 0;
}
outputs
Via lseek: 9223372036854775807
via telldir: 9223372036854775807
Quoting from the telldir(3) man page:
In early filesystems, the value returned by telldir() was a simple file offset within a directory. Modern filesystems use tree or hash structures, rather than flat tables, to represent directories. On such filesystems, the value returned by telldir() (and used internally by readdir(3)) is a "cookie" that is used by the implementation to derive a position within a directory. Application programs should treat this strictly as an opaque value, making no assumptions about its contents.
It's a magic number that indicates that the index into the directory's contents is at the end. Don't count on the number always being the same, or being portable. It's a black box. And stick with the dirent API for traversing directory contents unless you really know exactly what you're doing (Under the hood on Linux + glibc, opendir(3) calls openat(2) on the directory, readdir(3) fetches information about its contents with getdents(2), and seekdir(3) calls lseek(2), but that's just implementation details)

Bus error opening and mmap'ing a file

I want to create a file and map it into memory. I think that my code will work but when I run it I'm getting a "bus error". I searched google but I'm not sure how to fix the problem. Here is my code:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/mman.h>
#include <string.h>
int main(void)
{
int file_fd,page_size;
char buffer[10]="perfect";
char *map;
file_fd=open("/tmp/test.txt",O_RDWR | O_CREAT | O_TRUNC ,(mode_t)0600);
if(file_fd == -1)
{
perror("open");
return 2;
}
page_size = getpagesize();
map = mmap(0,page_size,PROT_READ | PROT_WRITE,MAP_SHARED,file_fd,page_size);
if(map == MAP_FAILED)
{
perror("mmap");
return 3;
}
strcpy(map, buffer);
munmap(map, page_size);
close(file_fd);
return 0;
}
You are creating a new zero sized file, you can't extend the file size with mmap. You'll get a bus error when you try to write outside the content of the file.
Use e.g. fallocate() on the file descriptor to allocate room in the file.
Note that you're also passing the page_size as the offset to mmap, which doesn't seem to make much sense in your example, you'll have to first extend the file to pagesize + strlen(buffer) + 1 if you want to write buf at that location. More likely you want to start at the beginning of the file, so pass 0 as the last argument to mmap.

Ordering file location on linux partition

I have a process which processes a lot of files (~96,000 files, ~12 TB data). Several runs of the process has left the files scattered about the drive. Each iteration in the process, uses several files. This leads to a lot of whipsawing around the disk collecting the files.
Ideally, I would like the process to write the files it uses in order, so that the next run will read them in order (file sizes change). Is there a way to hint at a physical ordering/grouping, short of writing to the raw partition?
Any other suggestions would be helpful.
Thanks
There are two system calls you might lookup: fadvise64, fallocate tell the kernel how you intend to read or write a given file.
Another tip is the "Orlov block allocator" (Wikipedia, LWN) affects the way the kernel will allocate new directories and file-entries.
In the end I decided not to worry about writing the files in any particular ordering. Instead, before I started a run, I would figure out where the first block of each file was located, and then sort the file processing order by first block location. Not perfect, but it did make a big difference in processing times.
Here's the C code I used to get the first block of supplied file list I adapted it from example code I found online (can't seem to find the original source).
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <linux/fs.h>
//
// Get the first block for each file passed to stdin,
// write filename & first block for each file to stdout
//
int main(int argc, char **argv) {
int fd;
int block;
char fname[512];
while(fgets(fname, 511, stdin) != NULL) {
fname[strlen(fname) - 1] = '\0';
assert(fd=open(fname, O_RDONLY));
block = 0;
if (ioctl(fd, FIBMAP, &block)) {
printf("FIBMAP ioctl failed - errno: %s\n", strerror(errno));
}
printf("%010d, %s\n", block, fname);
close(fd);
}
return 0;
}

How can I get the source code for the linux utility tail?

this command is really very useful but where I can get the source code to see what is going on inside .
thanks .
The tail utility is part of the coreutils on linux.
Source tarball: ftp://ftp.gnu.org/gnu/coreutils/coreutils-7.4.tar.gz
Source file: https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c (original http link)
I've always found FreeBSD to have far clearer source code than the gnu utilities. So here's tail.c in the FreeBSD project:
http://svnweb.freebsd.org/csrg/usr.bin/tail/tail.c?view=markup
Poke around the uclinux site. Since they distributed the software, they are required to make the source available one way or another.
Or, you could read man fseek and guess at how it might be done.
NB-- See William's comments below, there are cases when you can't use seek.
You might find it an interesting exercise to write your own. The vast majority of the Unix command-line tools are a page or so of fairly straightforward C code.
To just look at the code, the GNU CoreUtils sources are easily found on gnu.org or your favorite Linux mirror site.
/`*This example implements the option n of tail command.*/`
#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
#include <getopt.h>
#define BUFF_SIZE 4096
FILE *openFile(const char *filePath)
{
FILE *file;
file= fopen(filePath, "r");
if(file == NULL)
{
fprintf(stderr,"Error opening file: %s\n",filePath);
exit(errno);
}
return(file);
}
void printLine(FILE *file, off_t startline)
{
int fd;
fd= fileno(file);
int nread;
char buffer[BUFF_SIZE];
lseek(fd,(startline + 1),SEEK_SET);
while((nread= read(fd,buffer,BUFF_SIZE)) > 0)
{
write(STDOUT_FILENO, buffer, nread);
}
}
void walkFile(FILE *file, long nlines)
{
off_t fposition;
fseek(file,0,SEEK_END);
fposition= ftell(file);
off_t index= fposition;
off_t end= fposition;
long countlines= 0;
char cbyte;
for(index; index >= 0; index --)
{
cbyte= fgetc(file);
if (cbyte == '\n' && (end - index) > 1)
{
countlines ++;
if(countlines == nlines)
{
break;
}
}
fposition--;
fseek(file,fposition,SEEK_SET);
}
printLine(file, fposition);
fclose(file);
}
int main(int argc, char *argv[])
{
FILE *file;
file= openFile(argv[2]);
walkFile(file, atol(argv[1]));
return 0;
}
/*Note: take in mind that i not wrote code to parse input options and arguments, neither code to check if the lines number argument is really a number.*/

Resources