Linux Implement open file descriptors C

Linux Implement open file descriptors C - linux

1) Is it any alternative to looping through /proc in order to get the total number of open file descriptors?
I used the following dirs:
/proc/PID/fd/*
/proc/PID/maps
/proc/PID/cwd
/proc/PID/root
/proc/PID/exe
2) The number is different from lsof | wc -l and cat /proc/sys/fs/file-nr
3) Loaded dynamically linked libraries and current working directories can be counted as open file descriptors?
Implementation all open file descriptors in C for Linux

How you count this depends on what information you are interested in.
Looking through /proc/PID/fd/* will give you the number of open file descriptors. However, one caveat is that two processes may actually share a file descriptor, if you fork then the child process inherits the file descriptor from its parent, and this method will then count it twice, once for each process.
/proc/PID/maps will show you the memory map of the process, which can include the loaded executable itself and dynamically linked libraries, but also includes things that don't correspond to files like the heap, the stack, the vdso section which is a virtual shared object exported by the kernel, and so on.
lsof will list a variety of ways that files can be in use, which includes more than just file descriptors; it also includes the executable and shared libraries, but does not include the memory regions that don't correspond to files that show up in /proc/PID/maps like the stack, heap, vdso section, etc.
/proc/sys/fs/file-nr will report the number of open kernel file handles. A kernel file handle is different than a file descriptor; there can be more than one file descriptor open that point to the same file handle, for instance, by calling dup or dup2.
These differences explain why you're getting different numbers from these different ways of counting. The question is, what purpose are you using this count for? That will help answer which way of counting you should actually use.

1) no, but it seems you are confused as to what constitutes an open file descriptor, as suggested by your second question
2) see http://codingtragedy.blogspot.com/2015/04/nofile-ulimit-n-rlimitnofile-most.html - while it explains handling of a resource limits which may seem irrelevant, it also explains the difference between a file descriptor and a 'struct file' which you most likely want, and it even covers your lsof usage.
3) Again, it is unclear what is your actual question. current working directory is not a file descriptor and is only represented with an inode. A process may or may not keep fd for a linked library around, but mapping itself occupies a 'struct file'.

Related

Is there a simple way to fork a file descriptor?

I've just read a handful of man pages: dup, dup2, fcntl, pread/pwrite, mmap, etc.
Currently I am using mmap, but it's not the nicest thing in the world because I have to manage file offset and buffer length myself and basically reimplement read/write in userspace.
From what I gathered:
dup, dup2, fcntl just create aliases for the fds, so their offsets and flags are shared - reading from one advances the offset of the others.
pread/pwrite can be buggy and give inconsistent results.
mmap is buggy on linux when given some uncommon flags, but I don't need them.
Am I missing something or is mmap really the way to go?
(Note that re-open()ing a file is dangerous on POSIX - unlike Windows, POSIX provides no guarantees on the path not being moved/deleted while the file is open. On POSIX, you can open a path, move the file, and still read from it. You can even delete the file sometimes. I also couldn't find anything that can open an inode.)
I'd like answers for at least the most common POSIX variants, if there's no one answer for them all.

On Linux, opening /proc/self/fd/$NUM will work regardless of whether the file still has the same name it had the first time you opened it, and will generate a new open file description (i.e. a new fd with independent offset and flags).
I don't know of any POSIXly portable way of doing this.
(I also don't know what you mean about pread/pwrite being buggy...)

Level 2 I/O in Linux using readdir() possible?

I am trying to traverse a directory structure and open every file in that structure. To traverse, I am using opendir() and readdir(). Since I already have the entity, it seems stupid to build a path and open the file -- that presumably forces Linux to find the directory and file I just traversed.
Level 2 I/O (open, creat, read, write) require a path. Is there any call to either open a filename inside a directory, or open a file given an inode?

You probably should use nftw(3) to recursively traverse a file tree.
Otherwise, in a portable way, construct your directory + filename path using e.g.
snprintf(pathbuf, sizeof(pathbuf), "%s/%s", dirname, filename);
(or perhaps using asprintf(3) but don't forget to later free the result)
And to answer your question about opening a file in a directory, you could use the Linux or POSIX2008 specific openat(2). But I believe that you should really use nftw or construct your path like suggested above. Read also about O_PATH and O_TMPFILE in open(2).
BTW, the kernel has to access several times the directory (actually, the metadata is cached by file system kernel code), just because another process could have written inside it while you are traversing it.
Don't even think of opening a file thru its inode number: this will violate several file system abstractions! (but might be hardly possible by insane and disgusting tricks, e.g. debugfs - and this could probably harm very strongly your filesystem!!).
Remember that files are generally inodes, and can have zero (a process did open then unlink(2) a file while keeping the open file descriptor), one (this is the usual case), or several (e.g. /foo/bar1 and /gee/bar2 could be hard-linked using link(2) ....) file names.
Some file systems (e.g. FAT ...) don't have real inodes. The kernel fakes something in that case.

Is file object local to every process or System wide?

As a Linux device driver developer i was in the idea that file object is local structure to every process and its address is available in the fd table for the corresponding fd. But when i came across section 5.6 in Linux Programming interface by Michale Kerrisk which states that
Two different file descriptors that refer to the same open file description share
a file offset value. Therefore, if the file offset is changed via one file descriptor
(as a consequence of calls to read(), write(), or lseek()), this change is visible
through the other file descriptor. This applies both when the two file descrip
tors belong to the same process and when they belong to different processes.
I am befuddled...Kindly some one help me improve my understanding.

Each process does have its own file descriptor table, and each time a file is open()ed yields a separate file description. So there is sanity there!
The exception is when a file descriptor is duplicated, either within a process (via dup()) or across processes (by one process fork()ing a copy with all the same FDs, or by passing a file descriptor through a UNIX domain socket). When this happens, the two descriptors end up sharing some properties with each other, including the offset.
This is not necessarily a bad thing. It means, for instance, that two processes that are both writing to a shared file descriptor will not end up overwriting each other's output. It can sometimes have unexpected results, though. But it's not usually something that you'd end up with without knowing about it.

Can inode and crtime be used as a unique file identifier?

I have a file indexing database on Linux. Currently I use file path as an identifier.
But if a file is moved/renamed, its path is changed and I cannot match my DB record to the new file and have to delete/recreate the record. Even worse, if a directory is moved/renamed, then I have to delete/recreate records for all files and nested directories.
I would like to use inode number as a unique file identifier, but inode number can be reused if file is deleted and another file created.
So, I wonder whether I can use a pair of {inode,crtime} as a unique file identifier.
I hope to use i_crtime on ext4 and creation_time on NTFS.
In my limited testing (with ext4) inode and crtime do, indeed, remain unchanged when renaming or moving files or directories within the same file system.
So, the question is whether there are cases when inode or crtime of a file may change.
For example, can fsck or defragmentation or partition resizing change inode or crtime or a file?
Interesting that
http://msdn.microsoft.com/en-us/library/aa363788%28VS.85%29.aspx says:
"In the NTFS file system, a file keeps the same file ID until it is deleted."
but also:
"In some cases, the file ID for a file can change over time."
So, what are those cases they mentioned?
Note that I studied similar questions:
How to determine the uniqueness of a file in linux?
Executing 'mv A B': Will the 'inode' be changed?
Best approach to detecting a move or rename to a file in Linux?
but they do not answer my question.

{device_nr,inode_nr} are a unique identifier for an inode within a system
moving a file to a different directory does not change its inode_nr
the linux inotify interface enables you to monitor changes to inodes (either files or directories)
Extra notes:
moving files across filesystems is handled differently. (it is infact copy+delete)
networked filesystems (or a mounted NTFS) can not always guarantee the stability of inodenumbers
Microsoft is not a unix vendor, its documentation does not cover Unix or its filesystems, and should be ignored (except for NTFS's internals)
Extra text: the old Unix adagium "everything is a file" should in fact be: "everything is an inode". The inode carries all the metainformation about a file (or directory, or a special file) except the name. The filename is in fact only a directory entry that happens to link to the particular inode. Moving a file implies: creating a new link to the same inode, end deleting the old directory entry that linked to it.
The inode metatata can be obtained by the stat() and fstat() ,and lstat() system calls.

The allocation and management of i-nodes in Unix is dependent upon the filesystem. So, for each filesystem, the answer may vary.
For the Ext3 filesystem (the most popular), i-nodes are reused, and thus cannot be used as a unique file identifier, nor is does reuse occur according to any predictable pattern.
In Ext3, i-nodes are tracked in a bit vector, each bit representing a single i-node number. When an i-node is freed, it's bit is set to zero. When a new i-node is needed, the bit vector is searched for the first zero-bit and the i-node number (which may have been previously allocated to another file) is reused.
This may lead to the naive conclusion that the lowest numbered available i-node will be the one reused. However, the Ext3 file system is complex and highly optimised, so no assumptions should be made about when and how i-node numbers can be reused, even though they clearly will.
From the source code for ialloc.c, where i-nodes are allocated:
There are two policies for allocating an inode. If the new inode is a
directory, then a forward search is made for a block group with both
free space and a low directory-to-inode ratio; if that fails, then of
he groups with above-average free space, that group with the fewest
directories already is chosen. For other inodes, search forward from
the parent directory's block group to find a free inode.
The source code that manages this for Ext3 is called ialloc and the definitive version is here: https://github.com/torvalds/linux/blob/master/fs/ext3/ialloc.c

I guess the dB application would need to consider the case where the file is subject to restoration from backup, which would preserve the file crtime, but not the inode number.

Code segment sharing between two processes

Suppose we run two processes back to back say :-
$ grep abc abc.txt ==> pid 100
$ grep def def.txt ==> pid 101
I read in the book "Beginning Linux programming" chapter# 11 that the code section of the processes would be shared, as it is read only. Is it so? I think if grep is compiled as shared library only then the code section would be shared.
One more question, in case of shared libraries how does the OS knows that the library has already been loaded or not? Suppose if 2 processes are simultaneously calling a shared library function then how does the virtual address of two processes be converted to physical address pointing the same location in RAM?

The OS doesn't load files into memory anymore. Instead, files are memory mapped. This means an inode and an offset of a file on disk will be connected to a page in memory. This makes it pretty simple to find out if some part of a file has already been loaded. Also, you can keep only part of a file in RAM (after setup, you don't need the setup code anymore, so you can "forget" about it and reuse those pages for something more useful).

The libraries and executables are not loaded, but mapped into memory with mmap(2). Basically, when you mmap() something with MAP_SHARED flag, others who map the same file will get the same memory pages.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string