I have a program which opens a static file with sys_open() and wants to receive a file descriptor equals to zero (=stdin). I have the ability to write to the file, remove it or modify it, so I tried to create a symbolic link to /dev/stdin from the static file name. It opens stdin, but returns with the lowest available fd (not equal to zero). How can I cause the syscall to return zero, without hooking the syscall or modifying the program itself? it that even possible?
(It's part of a challenge, not a real case scenario)
Thank you as always
Posix guarantees that the lowest available FD will be returned. Therefore you can just invoke the program with stdin closed:
./myprogram 0>&-
Related
I was going through the book The Linux Programming Interface. On page 73 in Chapter 4,
it is written:
fd = open("w.log", O_WRONLY | O_CREAT | O_TRUNC | O_APPEND, S_IRUSR | S_IWUSR);
I read that O_TRUC flag is used to truncate the file length to zero that destroys any existing data in the file.
O_APPEND flag is used to append data to the end of the file.
The kernel records a file offset, sometimes also called the read-write offset or pointer. This is the location in the file at which the next read() or write() will commence.
I am confused that if the file is truncated and the kernel does the subsequent writing at the end of the file, why is the append flag is needed to explicitly tell to append at the end of the file?
Without the append flag (if the file is truncated), the kernel writes at the end of the file for the subsequent write() function call.
O_APPEND flag is used to append data to the end of the file.
That's true, but incomplete enough to be potentially misleading. And I suspect that you are in fact confused in that regard.
The kernel records a file offset, sometimes also called the read-write offset or pointer. This is the location in the file at which the next read() or write() will commence.
That's also incomplete. There is a file offset associated with at least each seekable file. That is the position where the next read() will commence. It is where the next write() will commence if the file is not open in append mode, but in append mode every write happens at the end of the file, as if it were repositioned with lseek(fd, 0, SEEK_END) before each one. In that case, then, the current file offset might not be the position where the next write() will commence.
I am confused that if the file is truncated and the the kernel does the subsequent writing at the end of the file why the append flag is needed to explicitly tell to append at the end of the file ?
It is not needed to cause the first write (by any process) after truncation to occur at the end of the file because immediately after the file has been truncated there isn't any other position.
With out the append flag (if the file is truncated), the kernel writes at the end of the file for the subsequent write() function call.
It is not needed for subsequent writes either, as long as the file is not repositioned or externally modified. Otherwise, the location of the next write depends on whether the file is open in append mode or not.
In practice, it is not necessarily the case that every combination of flags is useful, but the combination of O_TRUNC and O_APPEND has observably different effect than does either flag without the other, and the combination is useful in certain situations.
O_APPEND rarely makes sense with O_TRUNC. I think no combination of the C fopen modes will produce that combination (on POSIX systems, where this is relevant).
O_APPEND ensures that every write is done at the end of the file, automatically, regardless of the write position. In particular, this means that if multiple processes are writing to the file, they do not stomp over each other's writes.
note that POSIX does not require the atomic behavior of O_APPEND. It requires that an automatic seek takes place to the (current) end of the file before the write, but it doesn't require that position to still be the end of of the file when the write occurs. Even on implementations which feature atomic O_APPEND, it might not work for all file systems. The Linux man page on open cautions that O_APPEND doesn't work atomically on NFS.
Now, if every process uses O_TRUNC when opening the file, it will be clobbering everything that every other process wrote. That conflicts with the idea that the processes shouldn't be clobbering each other's writes, for which O_APPEND was specified.
O_APPEND is not required for appending to a file by a single process which is understood to be the only writer. It is possible to just seek to the end and then start writing new data. Sometimes O_APPEND is used in the exclusive case anyway simply because it's a programming shortcut. We don't have to bother making an extra call to position to the end of the file. Compare:
FILE *f = fopen("file.txt", "a");
// check f and start writing
versus:
FILE *f = fopen("file.txt", "r+");
// check f
fseek(f, 0, SEEK_END); // go to the end, also check this for errors
// start writing
We can think about the idea that we have a group of processes using O_APPEND to a file, such that the first one also performs O_TRUNC to truncate it first. But it seems awkward to program this; it's not easy for a process to tell whether it is the first one to be opening the file.
If such a situation is required on, say, boot-up, where the old file from before the boot is irrelevant for some reason, just have a boot-time action (script or whatever) remove the old file before these multiple processes are started. Each one then uses O_CREAT to create the file if necessary (in case it is the first process) but without O_TRUNC (in case they are not the first process), and with O_APPEND to do the atomic (if available) appending thing.
The two are entirely independent. The file is simply opened with O_APPEND because it's a log file.
The author wants concurrent messages to concatenate instead of overwrite each other (e.g. if the program forks), and if an admin or log rotation tool truncates the file then new messages should start being written at line #1 instead of at line #1000000 where the last log entry was written. This would not happen without O_APPEND.
I'm a beginner in assembly (using nasm). I'm learning assembly through a college course.
I'm trying to understand the behavior of the sys_read linux system call when it's invoked. Specifically, sys_read stops when it reads a new line or line feed. According to what I've been taught, this is true. This online tutorial article also affirms the fact/claim.
When sys_read detects a linefeed, control returns to the program and the users input is located at the memory address you passed in ECX.
I checked the linux programmer's manual for the sys_read call (via "man 2 read"). It does not mention the behavior when it's supposed to, right?
read() attempts to read up to count bytes from file descriptor fd
into the buffer starting at buf.
On files that support seeking, the read operation commences at the
file offset, and the file offset is incremented by the number of bytes
read. If the file offset is at or past the end of file, no bytes are
read, and read() returns zero.
If count is zero, read() may detect the errors described below. In
the absence of any errors, or if read() does not check for errors, a
read() with a count of 0 returns zero and has no other effects.
If count is greater than SSIZE_MAX, the result is unspecified.
So my question really is, why does the behavior happen? Is it a specification in the linux kernel that this should happen or is it a consequence of something else?
It's because you're reading from a POSIX tty in canonical mode (where backspace works before you press return to "submit" the line; that's all handled by the kernel's tty driver). Look up POSIX tty semantics / stty / ioctl. If you ran ./a.out < input.txt, you wouldn't see this behaviour.
Note that read() on a TTY will return without a newline if you hit control-d (the EOF tty control-sequence).
Assuming that read() reads whole lines is ok for a toy program, but don't start assuming that in anything that needs to be robust, even if you've checked that you're reading from a TTY. I forget what happens if the user pastes multiple lines of text into a terminal emulator. Quite probably they all end up in a single read() buffer.
See also my answer on a question about small read()s leaving unread data on the terminal: if you type more characters on one line than the read() buffer size, you'll need at least one more read system call to clear out the input.
As you noted, the read(2) libc function is just a thin wrapper around sys_read. The answer to this question really has nothing to do with assembly language, and is the same for systems programming in C (or any other language).
Further reading:
stty(1) man page: where you can change which control character does what.
The TTY demystified: some history, and some diagrams showing how xterm, the kernel, and the process reading from the tty all interact. And stuff about session management, and signals.
https://en.wikipedia.org/wiki/POSIX_terminal_interface#Canonical_mode_processing and related parts of that article.
This is not an attribute of the read() system call, but rather a property of termios, the terminal driver. In the default configuration, termios buffers incoming characters (i.e. what you type) until you press Enter, after which the entire line is sent to the program reading from the terminal. This is for convenience so you can edit the line before sending it off.
As Peter Cordes already said, this behaviour is not present when reading from other kinds of files (like regular files) and can be turned off by configuring termios.
What the tutorial says is garbage, please disregard it.
When a program begins, does it take file descriptors 0, 1 and 2 for stdin, stdout and stderr by default?. And will API calls such as open(...), socket(...) not return 0, 1 and 2 since these values are already taken?. Is there any case in which open(...) or socket(...) would return 0, 1 or 2. And 0, 1 and 2 are not related with stdin, stdout, and stderr.
At the file descriptor level, stdin is defined to be file descriptor 0, stdout is defined to be file descriptor 1; and stderr is defined to be file descriptor 2. See this.
Even if your program -or the shell- changes (e.g. redirect with dup2(2)) what is file descriptor 0, it always stays stdin (since by definition STDIN_FILENO is 0).
So of course stdin could be a pipe or a socket or a file (not a terminal). You could test with isatty(3) if it is a tty, and/or use fstat(2) to get status information on it.
Syscalls like open(2) or pipe(2) or socket(2) may give e.g. STDIN_FILENO (i.e. 0) if that file descriptor is free (e.g. because it has been close(2)-d before). But when that occurs, it is still stdin by definition.
Of course, in stdio(3), the FILE stream stdin is a bit more complex. Your program could fclose(3), freopen(3), fdopen(3) ...
Probably the kernel sets stdin, stdout, and stderr file descriptors to the console when magically starting /sbin/init as the first process.
Though a couple of answers already existed but I didn't find them informative enough that they explained the complete story.
Since I went ahead and researched more, I am adding my findings.
Whenever a process starts, an entry of the running process is added to the /proc/<pid> directory. This is the place where all of the data related to the process is kept. Also, on process start the kernel allocates 3 file-descriptors to the process for communication with the 3 data streams referred to as stdin, stdout and stderr.
the linux kernel uses an algorithm to always create a FD with the lowest possible integer value so these data-streams are mapped to the numbers 0, 1 and 2.
Since these are nothing but references to a stream and we can close a stream. One can easily call close(<fd>), in our case close(1), to close the file descriptor.
on doing ls -l /proc/<pid>/fd/, we see only 2 FDs listed there 0 and 2.
If we now do an open() call the kernel will create a new FD to map this new file reference and since kernel uses lowest integer first algorithm, it will pick up the integer value 1.
So now, the new FD created points to the file we opened (using the open() syscall)
Any of the data transfer that happens now is not via the default data-stream that was earlier linked but the new file that we opened.
So yes, we can map the FD 0, 1 or 2 to any of the file and not necessary stdin, stdout or stderr
When a program begins, does it take file descriptor 0 1 2 for stdin, stdout and stderr by default .
If you launch your program in an interactive shell normally, yes.
By #EJP:
Inheriting a socket as FD 0 also happens if the program is started by inetd, or anything else that behaves the same way.
will the API such as open(...), socket(...) not return 0 1 2 since these value are already be taken.
Yes.
Is there any case that open(...) or socket(...) would return 0 1 2.
Yes, if you do something like
close(0);
close(1);
close(2);
open(...); /* this open will return 0 if success */
How guys from linux make /dev files. You can write to them and immediately they're erased.
I can imagine some program which constantly read some dev file:
FILE *fp;
char buffer[255];
int result;
fp = fopen(fileName, "r");
if (!fp) {
printf("Open file error");
return;
}
while (1)
{
result = fscanf(fp, "%254c", buffer);
printf("%s", buffer);
memset(buffer, 0, 255);
fflush(stdout);
sleep(1);
}
fclose(fp);
But how to delete content in there? Closing a file and opening them once again in "w" mode is not the way how they done it, because you can do i.e. cat > /dev/tty
What are files? Files are names in a directory structure which denote objects. When you open a file like /home/joe/foo.txt, the operating system creates an object in memory representing that file (or finds an existing one, if the file is already open), binds a descriptor to it which is returned and then operations on that file descriptor (like read and write) are directed, through the object, into file system code which manipulates the file's representation on disk.
Device entries are also names in the directory structure. When you open some /dev/foo, the operating system creates an in-memory object representing the device, or finds an existing one (in which case there may be an error if the device does not support multiple opens!). If successful, it binds a new file descriptor to the device obejct and returns that descriptor to your program. The object is configured in such a way that the operations like read and write on the descriptor are directed to call into the specific device driver for device foo, and correspond to doing some kind of I/O with that device.
Such entries in /dev/ are not files; a better name for them is "device nodes" (a justification for which is the name of the mknod command). Only when programmers and sysadmins are speaking very loosely do they call them "device files".
When you do cat > /dev/tty, there isn't anything which is "erasing" data "on the other end". Well, not exactly. Basically, cat is calling write on a descriptor, and this results in a chain of function calls which ends up somewhere in the kernel's tty subsystem. The data is handed off to a tty driver which will send the data into a serial port, or socket, or into a console device which paints characters on the screen or whatever. Virtual terminals like xterm use a pair of devices: a master and slave pseudo-tty. If a tty is connected to a pseudo-tty device, then cat > /dev/tty writes go through a kind of "trombone": they bubble up on the master side of the pseudo-tty, where in fact there is a while (1) loop in some user-space C program receiving the bytes, like from a pipe. That program is xterm (or whatever); it removes the data and draws the characters in its window, scrolls the window, etc.
Unix is designed so that devices (tty, printer, etc) are accessed like everything else (as a file) so the files in /dev are special pseudo files that represent the device within the file-system.
You don't want to delete the contents of such a device file, and honestly it could be dangerous for your system if you write to them willy-nilly without understanding exactly what you are doing.
Device files are not normal files, if "normal file" refers to an arbitrary sequence of bytes, often stored on a medium. But not all files are normal files.
More broadly, files are an abstraction referring to a system service and/or resource, a service being something you can send information to for some purpose (e.g., for a normal file, write data to storage) and a resource being something you request data from for some purpose (e.g., for a normal file, read data from storage). C defines a standard for interfacing with such a service/resource.
Device files fit within this definition, but they do not not necessarily match my more specific "normal file" examples of reading and writing to and from storage. You can directly create dev files, but the only meaningful reason to do so is within the context of a kernel module. More often you may refer to them (e.g., with udev), keeping in mind they are actually created by the kernel and represent an interface with the kernel. Beyond that, the functioning of the interface differs from dev file to dev file.
I've also found quiet nice explanation:
http://lwn.net/images/pdf/LDD3/ch18.pdf
I'm implementing a system call in a 2.6.22 kernel. In my system call I obtain a file descriptor like this:
fd = sys_open(filename, O_WRONLY|O_CREAT, 0544);
However, I get a negative number (-13) for fd when filename points to a read only file. The problem is that I need to write to filename, even if it's read only or owned by another user.
So my question is this, how can I write to a read only file from the kernel?
And yes, I've read the post in linux journal that says writing to a file from the kernel is a bad idea.
I still need to do it.
The negative number isn't a file descriptor, it's an error code. Specifically it will be the negative version of one of the errno.h error numbers.
In this case as you have -13 you are looking at error 13 which is EACCES meaning that you don't have permission to write to the file.