I'm working on a virtual file system which isn't disk based, kind of like /proc. Now I want to create a symlink within it to a target on a ext3 file system. I haven't found any standard documentation on ways to achieve this. What I've guessed so far is that I have to write a function to put in for symlink in struct inode_operations. But frankly I'm at a loss even with the function parameters.
If it matters, I started off with this tutorial on LWN: http://lwn.net/Articles/13325/
EDIT: I'm working with libfs, not FUSE at the moment
Presumably you're using fuse, if you're not, do :)
All you have to do is implement the getattr function to tell the kernel that the object is a symlink, then implement the readlink function and return the path that the link should link to; the kernel will do the rest.
I was able to accomplish it finally. Here is what I did (some details may differ depending on what the filesystem wants to achieve):
Create inode of the symlink with the S_IFLNK mode and add the target to the i_private field.
Implement follow_link because generic_readlink requires it to be present
static void *sample_follow_link (struct dentry *dentry, struct nameidata *nd)
{
nd->depth = 0;
nd_set_link(nd, (char *)dentry->d_inode->i_private);
return NULL;
}
static struct inode_operations sample_inode_ops = {
.readlink = generic_readlink,
.follow_link = sample_follow_link,
};
.....
//in the function for the dentry and inode creation
inode->i_op = sample_inode_ops
I would suggest to take a look at linux/fs/ext2/ source code. Files symlink.c, inode.c, namei.c and probably few others. You will get some idea as of what needs to be done. Contrary to expectation, filesystem code of the individual filesystems is actually very short and easy to read.
But maybe instead of creating new virtual filesystem, you might ask yourself another question, wouldn't fuse user level filesystem be enough in my case? They have slightly better documentation to creating virtual filesystems and a few more examples.
Related
I need to understand which files consumes iops of my hard disc. Just using "strace" will not solve my problem. I want to know, which files are really written to disc, not to page cache. I tried to use "systemtap", but I cannot understand how to find out which files (filenames or inodes) consumes my iops. Is there any tools, which will solve my problem?
Yeah, you can definitely use SystemTap for tracing that. When upper-layer (usually, a VFS subsystem) wants to issue I/O operation, it will call submit_bio and generic_make_request functions. Note that these doesn't necessary mean a single physical I/O operation. For example, writes from adjacent sectors can be merged by I/O scheduler.
The trick is how to determine file path name in generic_make_request. It is quite simple for reads, as this function will be called in the same context as read() call. Writes are usually asynchronous, so write() will simply update page cache entry and mark it as dirty, while submit_bio gets called by one of the writeback kernel threads which doesn't have info of original calling process:
Writes can be deduced by looking at page reference in bio structure -- it has mapping of struct address_space. struct file which corresponds to an open file also contains f_mapping which points to the same address_space instance and it also points to dentry containing name of the file (this can be done by using task_dentry_path)
So we would need two probes: one to capture attempts to read/write a file and save path and address_space into associative array and second to capture generic_make_request calls (this is performed by probe ioblock.request).
Here is an example script which counts IOPS:
// maps struct address_space to path name
global paths;
// IOPS per file
global iops;
// Capture attempts to read and write by VFS
probe kernel.function("vfs_read"),
kernel.function("vfs_write") {
mapping = $file->f_mapping;
// Assemble full path name for running task (task_current())
// from open file "$file" of type "struct file"
path = task_dentry_path(task_current(), $file->f_path->dentry,
$file->f_path->mnt);
paths[mapping] = path;
}
// Attach to generic_make_request()
probe ioblock.request {
for (i = 0; i < $bio->bi_vcnt ; i++) {
// Each BIO request may have more than one page
// to write
page = $bio->bi_io_vec[i]->bv_page;
mapping = #cast(page, "struct page")->mapping;
iops[paths[mapping], rw] <<< 1;
}
}
// Once per second drain iops statistics
probe timer.s(1) {
println(ctime());
foreach([path+, rw] in iops) {
printf("%3d %s %s\n", #count(iops[path, rw]),
bio_rw_str(rw), path);
}
delete iops
}
This example script is works for XFS, but needs to be updated to support AIO and volume managers (including btrfs). Plus I'm not sure how it will handle metadata reads and writes, but it is a good start ;)
If you want to know more on SystemTap you can check out my book: http://myaut.github.io/dtrace-stap-book/kernel/async.html
Maybe iotop gives you a hint about which process are doing I/O, in consequence you have an idea about the related files.
iotop --only
the --only option is used to see only processes or threads actually doing I/O, instead of showing all processes or threads
Is it possible to acquire the struct fuse_file_info* fi in the function truncate()? Why is it not there in the first place?
int truncate(const char* path, off_t size)
I'm storing my file descriptor in the file handler, fh, of the fuse_file_info structure. The function open() appears to be called beforehand so that structure is created for the file. The description of fh is: "File handle. May be filled in by filesystem in open(). Available in all other file operations".
(As a last resort I'm thinking of having a structure to store this information, saved into a hash map, and then use the file handler to store the key. This would allow me to search the structure, using the path, in order to find the respective file descriptor.)
Note: I'm actually using jnr-fuse but since it mimics libfuse I'm not asking specifically for it; what works for one should (sort of) work for the other.
Why is it not there in the first place?
Because of implementation of truncate in the Linux kernel. You can see signature here.
I'm looking for a way to find out the memory addresses of TLS segments for the current thread on linux, amd64. Bonus point for a solution that works on OSX.
Looked into various language runtime or GC (like boehm), but couldn't go through the multiple layer of abstractions to support all kind of systems so far. Any help appreciated.
Did you have a look at the solution Martin and I came up with in druntime?
What we do there boils down to scanning the segments in the corresponding dl_phdr_info (obtained by looking for the correct one using dl_iterate_phdr) for the segment with type PT_TLS, and storing its module id and size.
You can then get the start of the address range on the current thread by calling __tls_get_addr for offset 0 and the module id (there is an offset on some archs), and the end by simply adding the size you determined to that. If you do not need to support shared libraries, you can also simply use fs/gs on x86 for that (might be required if you want to link a static executable).
This works for Linux and FreeBSD (and probably other ELF platforms), but not OS X. There, the best I could come up with so far is this:
void _d_dyld_getTLSRange(void* arbitraryTLSSymbol, void** start, size_t* size) {
dyld_enumerate_tlv_storage(
^(enum dyld_tlv_states state, const dyld_tlv_info *info) {
assert(state == dyld_tlv_state_allocated);
if (info->tlv_addr <= arbitraryTLSSymbol &&
arbitraryTLSSymbol < (info->tlv_addr + info->tlv_size)
) {
// Found the range we are looking for.
*start = info->tlv_addr;
*size = info->tlv_size;
}
}
);
}
The naive implementation currently used in LDC's druntime does not quite handle shared libraries, though, and dyld_enumerate_tlv_storage is from dyld_priv.h, which might or might not be a problem for App Store publishing.
On Linux, the thread-specific segment is set up via arch_prtcl(ARCH_SET_FS, <addr>) call. You can find out what it was set to in the current thread via arch_prctl(ARCH_GET_FS, ...).
Bonus point for a solution that works on OSX.
OSX is a completely different OS, and uses completely different mechanism for its TLS support.
I would like to be able to read the routing table from kernel space...
In user space, this information is readable in /proc/net/route, but I don't know how to read the same information from kernel space..
I don't want to modify, only read..
any ideas?
To fetch the routing table, you would need to send a message of type RTM_GETROUTE to the kernel using a socket of the AF_NETLINK family — this is the rtnetlink(7) interface.
For convenience, rather than sending messages over a socket, you can use the libnetlink(3) library, and call int rtnl_wilddump_request(struct rtnl_handle *rth, int family, RTM_GETROUTE).
For an even simpler cross-platform abstraction, you could use the libdnet library, which has a function int route_get(route_t *r, struct route_entry *entry).
You may find out where in kernel source code tree this file is created in '/proc' pseoudo filesystem by searching the "route" keyword or "create_proc_... smth" Take a look at how such files are created for your kernel in source.
I suspect it's located somewhere in net/ipv4/route.c
I am looking for a fast way to find the number of files in a directory on Linux.
Any solution that takes linear time in the number of files in the directory is NOT acceptable (e.g. "ls | wc -l" and similar things) because it would take a prohibitively long amount of time (there are tens or maybe hundreds of millions of files in the directory).
I'm sure the number of files in the directory must be stored as a simple number somewhere in the filesystem structure (inode perhaps?), as part of the data structure used to store the directory entries - how can I get to this number?
Edit: The filesystem is ext3. If there is no portable way of doing this, I am willing to do something specific to ext3.
Why should the data structure contain the number? A tree doesn't need to know its size in O(1), unless it's a requirement (and providing that, could require more locking and possibly a performance bottleneck)
By tree I don't mean including subdir contents, but files with -maxdepth 1 -- supposing they are not really stored as a list..
edit: ext2 stored them as a linked list.
modern ext3 implements hashed B-Trees
Having said that, /bin/ls does a lot more than counting, and actually scans all the inodes. Write your own C program or script using opendir() and readdir().
from here:
#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
int main()
{
int count;
struct DIR *d;
if( (d = opendir(".")) != NULL)
{
for(count = 0; readdir(d) != NULL; count++);
closedir(d);
}
printf("\n %d", count);
return 0;
}
You can use inotify to track and record file create and unlink events in the monitored directory. It would distribute the total time required to maintain file count and allow you to retrieve the current file count instantaneously.
The inode for the directory does not store the number of files in it, since usually the file count is not needed separately from the list of names in the directory. The directory inode's link count does indirectly give the number of sub-directories (st_nlink is number of sub-dirs plus two).
I think you have no choice except read through the whole list of files in the directory. find might or might not be faster than ls.
This is an example of why large directories are a problem, even when the directory is implemented using a B-tree.
There's no portable way to do this. The low-level file primitives, i.e. readdir, work as if it's a linear list. Clearly, that's an abstraction, and some filesystems might store a count. However, accessing it is inherently filesystem-specific.
If you are willing to jump through hoops you may have each directory in a different filesystem, use quotas, and get the info with the "repquota" command.