embedded linux: readdir() sometimes failing with EFAULT - linux

I've had some readdir() issues occur in an embedded app, so I added this self-contained test at a convenient place in the app code:
FILE *f;
DIR *d;
f = fopen ("/mnt/mydir/myfile", "r");
printf ("fopen %p\r\n", f);
if (f) fclose(f);
d = opendir ("/mnt/mydir");
printf ("opendir ret %p\r\n", f);
if (d)
{
struct dirent *entry;
do
{
errno = 0;
entry = readdir (d);
printf ("readdir ret %p %s, errno %d %s\r\n", entry, entry ? entry->d_name : "", errno, strerror(errno));
} while (entry);
closedir (d);
}
/mnt/mydir is an NFS mount (although I'm not sure if that's relevant). The fopen() call to open a file in that dir always succeeds, and the opendir() on the dir also always succeeds. However, sometimes (most) the readdir() fails with errno=EFAULT.
I don't believe anywhere else in the app is doing anything with that dir. The test is exactly as written, all variables are local stack scope.
If I run it as a standalone program, it always succeeds.
Can anyone offer any suggestions as to what could cause EFAULT here? I'm pretty sure my DIR pointer variable is not being corrupted, although the DIR structure itself could be I guess. I haven't seen any evidence elsewhere of heap corruption.

From man 2 readdir page:
EFAULT Argument points outside the calling process's address space.
This means that your structure is corrupted

I think I found the problem. The uClibc implementation of opendir/readdir does a stat() on the directory, then later does a stack alloca() of size statbuf.st_blksize. My NFS directory was mounted with rsize=512KB, causing readdir() to try and allocate 512KB on the stack to hold the dents. My embedded setup does not have that much room between stacks, so at some point was hitting something below in memory and causing EFAULT.
If I change my NFS mount options to rsize=4096, it works fine.

Related

Understanding file descriptor duplication in bash

I'm having a hard time understanding something about redirections in bash.
I'll start with what I know:
Each process has file descriptors opened which it can write to/read from. These file descriptors may represent files on disk, terminals, devices, etc.
When we start teminal with bash, we have file stdin (0) stdout (1) and stderr (2) opened, pointing to the terminal. Whenever we run a command (a new process), that process inherits the file descriptors of its parent (bash), so by default, it will print stdout and stderr messages to the terminal, and read from terminal also.
When we redirect, for example:
$ ls 1>filelist
We're actually changing file descriptor 1 of the ls process, to point to the filelist file, instead of the terminal. So when ls will write(1, ...) it will go to the file.
So to sum it up, a redirection is basically changing the file to which the file descriptor to which the program writes/reads to/from refers to.
Now, let's say I have the following C program:
#include <stdio.h>
#include <fcntl.h>
int main()
{
int fd = 0;
fd = open("info.log", O_CREAT | O_RDWR);
printf("%d", fd);
write(fd, "INFO::", 6);
return 0;
}
This program opens a file info.log, which is referred to by a file descriptor (usually 3).
Indeed, if I now compile this program and run it:
$ ./app
3
It creates the file info.log which contains the "INFO::" text in it.
But here's what I don't get: according to the logic described above, if I now redirect FD 3 to another file:
$ ./app 3> another_file
The text should be written to this other file, but for some reason, it doesn't.
Can someone explain?
Hint: when you run ./app 3> another_file, it'll print "4" instead of "3".
More detailed explanation: when you run ./app 3> another_file in the shell, a series of things happens:
The shell fork()s a subprocess that'll run ./app. The subprocess is basically a clone of its parent process so, it'll still be running the shell program.
In that subprocess, the shell opens "another_file" on file descriptor #3 for writing.
Then it uses one of the execl() family of calls to execute the ./app binary (with "another_file" still open on FD#3).
The program runs open("info.log", O_CREAT | O_RDWR), which creates "info.log" and opens it on the next available file descriptor. Since FD#3 is already in use, that's FD#4.
The program writes "INFO::" to FD#4, which is "info.log".
Since open() uses a new FD, it's not really affected by any active redirects. And actually, if the program did open something on FD#3, that'd replace the connection to "another_file" with whatever it had opened instead, essentially overriding the redirect.
If the program wanted to use the redirect, it'd have to write to FD#3 without first opening anything on it. This is what's normally done with FD#1 and 2 (standard output and error), and that's why redirecting those works.

NASM x86_64: File opening (SYS_OPEN) error list?

I'm coding a linux x64 assembly program that read a file and I want to handle errors like File Not Found or permission errors.
Where can I find a list of SYS_OPEN error codes?
Approaches to find codes (kinda fun)
My code to open a file:
SYS_OPEN equ 2
O_RDONLY equ 0
section .data
filename db "file.txt", 0
section .text
global _start
_start:
mov rax, SYS_OPEN
mov rdi, filename
mov rsi, O_RDONLY
mov rdx, 0644o
syscall
[...]
When the file is successfully opened the RAX register points to the file descriptor (positive integer), if fails RAX point to an error (negative integer). I managed to raise a permission error by removing all permissions for all users:
chmod 0000 file.txt
This causes an error with code -13. By deleting the file, I managed to get error -2. Where can I find a list of SYS_OPEN error codes?
PS: Maybe my googling skills are rusty
Linux system call return values from -4095 to -1 are -errno codes. (The actual highest error number that Linux has actually defined is currently about 133, EHWPOISON, but that's the official range.)
strace ./myprog can decode them for you so you don't need to actually write error checking in your toy programs when playing around with system calls.
For example:
$ strace touch /tmp/xyjklj/bar
... (dynamic linker / process startup stuff)
openat(AT_FDCWD, "/tmp/xyjklj/bar", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 ENOENT (No such file or directory)
utimensat(AT_FDCWD, "/tmp/xyjklj/bar", NULL, 0) = -1 ENOENT (No such file or directory)
... (more system calls as touch(1) finds a locale-specific set of error messages and prints
(The -1 is what the libc wrapper function actually returns; the errno code is what strace decoded from the asm syscall return value, which the glibc wrapper will store in errno. When using raw system calls in asm, you don't have to waste instructions doing that. But strace will still say "-1", not the numeric error code)
Documentation of most ways SYS_open can fail
Each system call man page documents which error codes that particular system call can fail with, and in which cases that can happen. (Those list aren't fully exhaustive, for example not covering weird things a specific filesystem like NFS could return, like EMULTIHOP (see comments).)
For your case, see the ERRORs section of the open(2) man page. e.g. there are several entries for ENOENT, covering all the cases which can lead to that return value.
ENOENT - O_CREAT is not set and the named file does not exist.
ENOENT - A directory component in pathname does not exist or is a
dangling symbolic link.
ENOENT - pathname refers to a nonexistent directory, O_TMPFILE and
one of O_WRONLY or O_RDWR were specified in flags, but
this kernel version does not provide the O_TMPFILE
functionality.
(Spoiler alert, 2 is ENOENT, so -2 is -ENOENT.)
There are of course lots of other fun ways that pathname and file access stuff (and open(2) in particular) can error, including:
EACCES (-13) - The requested access to the file is not allowed, or search
permission is denied for one of the directories in the
path prefix of pathname, or the file did not exist yet and
write access to the parent directory is not allowed. (See
also path_resolution(7).)
EFAULT - pathname points outside your accessible address space.
ENAMETOOLONG -
pathname was too long.
EBUSY - O_EXCL was specified in flags and pathname refers to a
block device that is in use by the system (e.g., it is
mounted).
[this would require root, otherwise you'd get EACCESS]
ETXTBSY - pathname refers to an executable image which is currently
being executed and write access was requested.
EWOULDBLOCK -
The O_NONBLOCK flag was specified, and an incompatible
lease was held on the file (see fcntl(2)).
ENODEV - pathname refers to a device special file and no
corresponding device exists. (This is a Linux kernel bug;
in this situation ENXIO must be returned.)
ELOOP - Too many symbolic links were encountered in resolving
pathname.
EISDIR - pathname refers to a directory and the access requested
involved writing (that is, O_WRONLY or O_RDWR is set).
ENOTDIR -
A component used as a directory in pathname is not, in
fact, a directory, or O_DIRECTORY was specified and
pathname was not a directory.
EPERM - The O_NOATIME flag was specified, but the effective user
ID of the caller did not match the owner of the file and
the caller was not privileged.
As well as various limits like number of open files (ENFILE, EMFILE), or ENOSPC disk space full. The above is not a complete list, I just took one each the ways to get many (but not all) of the error codes.
As per funnydman's answer, you can look up the number -> symbolic meaning of error values in man pages. Or look in /usr/include/asm-generic/errno-base.h (The full path may differ on some systems, and you'd only include this file indirectly, via #include <errno.h>)
You can interpret this as values of errno, here is the table (to list all of the codes use errno -l), also take a look at the docs. A part of the table:
number
hex
symbol
description
2
0x02
ENOENT
No such file or directory
13
0x0d
EACCES
Permission denied
There is described a reason of such decision: https://stackoverflow.com/a/6008711/9926721

shmget() returns ENOENT with IPC_CREAT

I'm using shmget() to allocate a shared memory segment that I then use with pthread_mutex_init() to create a mutex shared between processes. Generally, this works as expected. However, occasionally shmget() will return ENOENT. Reading the man page, this should only occur if the shmflg doesn't include IPC_CREAT, however I am including that. Here's a snip-it of my code:
shmid_ = shmget( MYLOCK_KEY_ID, sizeof(pthread_mutex_t), IPC_CREAT | IPC_EXCL | 0666 );
if ( errno == ENOENT ) {
// This should never occur since IPC_CREAT was specified
std::cerr
<< "shmget() returned ENOENT (it thinks IPC_CREAT wasn't specified).\n"
<< "This seems to be a bug in shmget()?" << std::endl;
exit(1);
}
I'm totally lost as to what could be going on. I've tried this on several systems (Linux kernels 2.6.32 and 3.3.5) but both exhibit the same behavior. Currently, when I obtain this failure mode, I just repeat the process and it usually works. But that seems kind of kludgey and I don't know if this is a bug in shmget() or if I'm just doing something wrong.
Any ideas?
Your if statement is not checking the returned value - the man pages say to check shmid_ for -1 and then check errno.
RETURN VALUE
A valid segment identifier, shmid, is returned on success, -1 on error.
What you are doing is just checking errno - it could be ENOENT after some other call to some other function that failed.

mount failed, errno is 20?

I'm newbie in linux program. why following code failed? its output is "failed 20".
but in terminal the command: sudo mount /dev/sdb /home/abc/work/tmp works.
void main()
{
int rtn;
rtn=mount("/dev/sdb","/home/abc/work/tmp","vfat",MS_BIND,"");
if (rtn==-1)
printf("failed %d.\n",errno);
else
printf("OK!\n");
}
You can't bind-mount a device, only a directory. Try providing a useful value for mountflags.
Error 20 is ENOTDIR (http://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/Errors/unix_system_errors.html).
I think with MS_BIND, you would need the first argument to be an actual directory somewhere, not a device. See also the man page for mount
What you are trying to do would be equivalent to sudo mount --bind /dev/sdb /home/abc/work/temp which will give you an error too.
You should print out not just the errno value, but also the corresponding error message:
printf("failed %d: %s\n", errno, strerror(errno));
This should reveal the reason for the problem. ("Not a directory", so /home/abc/work/tmp does not seem to be a directory.)
(There are various other problems with your code, such as missing #include statements, and writing error messages to stdout and not stderr, but those are irrelevant to your problem at hand. You can fix them later.)

Relinking an anonymous (unlinked but open) file

In Unix, it's possible to create a handle to an anonymous file by, e.g., creating and opening it with creat() and then removing the directory link with unlink() - leaving you with a file with an inode and storage but no possible way to re-open it. Such files are often used as temp files (and typically this is what tmpfile() returns to you).
My question: is there any way to re-attach a file like this back into the directory structure? If you could do this it means that you could e.g. implement file writes so that the file appears atomically and fully formed. This appeals to my compulsive neatness. ;)
When poking through the relevant system call functions I expected to find a version of link() called flink() (compare with chmod()/fchmod()) but, at least on Linux this doesn't exist.
Bonus points for telling me how to create the anonymous file without briefly exposing a filename in the disk's directory structure.
A patch for a proposed Linux flink() system call was submitted several years ago, but when Linus stated "there is no way in HELL we can do this securely without major other incursions", that pretty much ended the debate on whether to add this.
Update: As of Linux 3.11, it is now possible to create a file with no directory entry using open() with the new O_TMPFILE flag, and link it into the filesystem once it is fully formed using linkat() on /proc/self/fd/fd with the AT_SYMLINK_FOLLOW flag.
The following example is provided on the open() manual page:
char path[PATH_MAX];
fd = open("/path/to/dir", O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR);
/* File I/O on 'fd'... */
snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file", AT_SYMLINK_FOLLOW);
Note that linkat() will not allow open files to be re-attached after the last link is removed with unlink().
My question: is there any way to re-attach a file like this back into the directory structure? If you could do this it means that you could e.g. implement file writes so that the file appears atomically and fully formed. This appeals to the my compulsive neatness. ;)
If this is your only goal, you can achieve this in a much simpler and more widely used manner. If you are outputting to a.dat:
Open a.dat.part for write.
Write your data.
Rename a.dat.part to a.dat.
I can understand wanting to be neat, but unlinking a file and relinking it just to be "neat" is kind of silly.
This question on serverfault seems to indicate that this kind of re-linking is unsafe and not supported.
Thanks to #mark4o posting about linkat(2), see his answer for details.
I wanted to give it a try to see what actually happened when trying to actually link an anonymous file back into the filesystem it is stored on. (often /tmp, e.g. for video data that firefox is playing).
As of Linux 3.16, there still appears to be no way to undelete a deleted file that's still held open. Neither AT_SYMLINK_FOLLOW nor AT_EMPTY_PATH for linkat(2) do the trick for deleted files that used to have a name, even as root.
The only alternative is tail -c +1 -f /proc/19044/fd/1 > data.recov, which makes a separate copy, and you have to kill it manually when it's done.
Here's the perl wrapper I cooked up for testing. Use strace -eopen,linkat linkat.pl - </proc/.../fd/123 newname to verify that your system still can't undelete open files. (Same applies even with sudo). Obviously you should read code you find on the Internet before running it, or use a sandboxed account.
#!/usr/bin/perl -w
# 2015 Peter Cordes <peter#cordes.ca>
# public domain. If it breaks, you get to keep both pieces. Share and enjoy
# Linux-only linkat(2) wrapper (opens "." to get a directory FD for relative paths)
if ($#ARGV != 1) {
print "wrong number of args. Usage:\n";
print "linkat old new \t# will use AT_SYMLINK_FOLLOW\n";
print "linkat - <old new\t# to use the AT_EMPTY_PATH flag (requires root, and still doesn't re-link arbitrary files)\n";
exit(1);
}
# use POSIX qw(linkat AT_EMPTY_PATH AT_SYMLINK_FOLLOW); #nope, not even POSIX linkat is there
require 'syscall.ph';
use Errno;
# /usr/include/linux/fcntl.h
# #define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
# #define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
# #define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
unless (defined &AT_SYMLINK_NOFOLLOW) { sub AT_SYMLINK_NOFOLLOW() { 0x0100 } }
unless (defined &AT_SYMLINK_FOLLOW ) { sub AT_SYMLINK_FOLLOW () { 0x0400 } }
unless (defined &AT_EMPTY_PATH ) { sub AT_EMPTY_PATH () { 0x1000 } }
sub my_linkat ($$$$$) {
# tmp copies: perl doesn't know that the string args won't be modified.
my ($oldp, $newp, $flags) = ($_[1], $_[3], $_[4]);
return !syscall(&SYS_linkat, fileno($_[0]), $oldp, fileno($_[2]), $newp, $flags);
}
sub linkat_dotpaths ($$$) {
open(DOTFD, ".") or die "open . $!";
my $ret = my_linkat(DOTFD, $_[0], DOTFD, $_[1], $_[2]);
close DOTFD;
return $ret;
}
sub link_stdin ($) {
my ($newp, ) = #_;
open(DOTFD, ".") or die "open . $!";
my $ret = my_linkat(0, "", DOTFD, $newp, &AT_EMPTY_PATH);
close DOTFD;
return $ret;
}
sub linkat_follow_dotpaths ($$) {
return linkat_dotpaths($_[0], $_[1], &AT_SYMLINK_FOLLOW);
}
## main
my $oldp = $ARGV[0];
my $newp = $ARGV[1];
# link($oldp, $newp) or die "$!";
# my_linkat(fileno(DIRFD), $oldp, fileno(DIRFD), $newp, AT_SYMLINK_FOLLOW) or die "$!";
if ($oldp eq '-') {
print "linking stdin to '$newp'. You will get ENOENT without root (or CAP_DAC_READ_SEARCH). Even then doesn't work when links=0\n";
$ret = link_stdin( $newp );
} else {
$ret = linkat_follow_dotpaths($oldp, $newp);
}
# either way, you still can't re-link deleted files (tested Linux 3.16 and 4.2).
# print STDERR
die "error: linkat: $!.\n" . ($!{ENOENT} ? "ENOENT is the error you get when trying to re-link a deleted file\n" : '') unless $ret;
# if you want to see exactly what happened, run
# strace -eopen,linkat linkat.pl
Clearly, this is possible -- fsck does it, for example. However, fsck does it with major localized file system mojo and will clearly not be portable, nor executable as an unprivileged user. It's similar to the debugfs comment above.
Writing that flink(2) call would be an interesting exercise. As ijw points out, it would offer some advantages over current practice of temporary file renaming (rename, note, is guaranteed atomic).
Kind of late to the game but I just found http://computer-forensics.sans.org/blog/2009/01/27/recovering-open-but-unlinked-file-data which may answer the question. I haven't tested it, though, so YMMV. It looks sound.

Resources