how to find what processes have written to a file on Linux - linux

Is there a way to find out which process wrote to a give file earlier. I am having a problem where multiple processes seem to be writing to a file. I know one of the processes but not sure who else is writing to the file. I am on linux/ubuntu. Is there a way a log is mantained by the OS on what processes have written to a specified file

Create a small monitoring process which will log periodically who is currently accessing the file.
You can write a small script using fuser. Is here a quick example (to be improved)
#!/bin/bash
log=~/file-access.log
while true
do
fuser your_file >> $log
sleep 0.2s
done
But you will have to be lucky that the process writing to this file takes enough time to have the chance to detect it with fuser.

No, there is nothing by default to keep track of which processes wrote to a file after the fact.
If you can repro at will, inotify or similar can help you monitor who is writing to the file as it happens.

Related

How do I ensure a complete reading of a log after the process that writes it ends on Linux?

This is under Ubuntu 20.04.
There's a script that appends to a file via shell redirection.
I want to read that file after the script's process has ended and all data has been written.
I'm using pgrep to check when the script ends (I have carefully checked that this check works).
I have noted that the file may not be fully written even if the process ended.
Because of what I have read, this can happen because of buffering. A side question would be: can this actually happen or am I misunderstanding something?
I'm thinking on using lsof/inotifywait/a loop with fuser to await the file closing. Is this the right wait to manage this situations?
What I don't really understand is: if the process that opened the file exited, who will show as the file "opener" on lsof/inotifywait/fuser output?
If you're worried about the file not having been written to disk due to buffering and it's in a process where you don't have the file descriptor, you can force the system to write them to disk with the sync <file> command or sync function in unistd.h.

How to use the attach the same console as output for a process and input for another process?

I am trying to use suckless ii irc client. I can listen to a channel by tail -f out file. However is it also possible for me to input into the same console by starting an echo or cat command?
If I background the process, it actually displays the output in this console but that doesn't seem to be right way? Logically, I think I need to get the fd of the console (but how to do that) and then force the tail output to that fd and probably background it. And then use the present bash to start a cat > in.
Is it actually fine to do this or is that I am creating a lot of processes overhead for a simple task? In other words piping a lot of stuff is nice but it creates a lot of overhead which ideally has to be in a single process if you are going to repeat that task it a lot?
However is it also possible for me to input into the same console by starting an echo or cat command?
Simply NO! cat writes the current content. cat has no idea that the content will grow later. echo writes variables and results from the given command line. echo itself is not made for writing the content of files.
If I background the process, it actually displays the output in this console but that doesn't seem to be right way?
If you do not redirect the output, the output goes to the console. That is the way it is designed :-)
Logically, I think I need to get the fd of the console (but how to do that) and then force the tail output to that fd and probably background it.
As I understand that is the opposite direction. If you want to write to the stdin from the process, you simply can use a pipe for that. The ( useless ) example show that cat writes to the pipe and the next command will read from the pipe. You can extend to any other pipe read/write scenario. See link given below.
Example:
cat main.cpp | cat /dev/stdin
cat main.cpp | tail -f
The last one will not exit, because it waits that the pipe gets more content which never happens.
Is it actually fine to do this or is that I am creating a lot of processes overhead for a simple task? In other words piping a lot of stuff is nice but it creates a lot of overhead which ideally has to be in a single process if you are going to repeat that task it a lot?
I have no idea how time critical your job is, but I believe that the overhead is quite low. Doing the same things in a self written prog must not be faster. If all is done in a single process and no access to the file system is required, it will be much faster. But if you also use system calls, e.g. file system access, it will not be much faster I believe. You always have to pay for the work you get.
For IO redirection please read:
http://www.tldp.org/LDP/abs/html/io-redirection.html
If your scenario is more complex, you can think of named pipes instead of IO redirection. For that you can have a look at:
http://www.linuxjournal.com/content/using-named-pipes-fifos-bash

How to create a virtual command-backed file in Linux?

What is the most straightforward way to create a "virtual" file in Linux, that would allow the read operation on it, always returning the output of some particular command (run everytime the file is being read from)? So, every read operation would cause an execution of a command, catching its output and passing it as a "content" of the file.
There is no way to create such so called "virtual file". On the other hand, you would be
able to achieve this behaviour by implementing simple synthetic filesystem in userspace via FUSE. Moreover you don't have to use c, there
are bindings even for scripting languages such as python.
Edit: And chances are that something like this already exists: see for example scriptfs.
This is a great answer I copied below.
Basically, named pipes let you do this in scripting, and Fuse let's you do it easily in Python.
You may be looking for a named pipe.
mkfifo f
{
echo 'V cebqhpr bhgchg.'
sleep 2
echo 'Urer vf zber bhgchg.'
} >f
rot13 < f
Writing to the pipe doesn't start the listening program. If you want to process input in a loop, you need to keep a listening program running.
while true; do rot13 <f >decoded-output-$(date +%s.%N); done
Note that all data written to the pipe is merged, even if there are multiple processes writing. If multiple processes are reading, only one gets the data. So a pipe may not be suitable for concurrent situations.
A named socket can handle concurrent connections, but this is beyond the capabilities for basic shell scripts.
At the most complex end of the scale are custom filesystems, which lets you design and mount a filesystem where each open, write, etc., triggers a function in a program. The minimum investment is tens of lines of nontrivial coding, for example in Python. If you only want to execute commands when reading files, you can use scriptfs or fuseflt.
No one mentioned this but if you can choose the path to the file you can use the standard input /dev/stdin.
Everytime the cat program runs, it ends up reading the output of the program writing to the pipe which is simply echo my input here:
for i in 1 2 3; do
echo my input | cat /dev/stdin
done
outputs:
my input
my input
my input
I'm afraid this is not easily possible. When a process reads from a file, it uses system calls like open, fstat, read. You would need to intercept these calls and output something different from what they would return. This would require writing some sort of kernel module, and even then it may turn out to be impossible.
However, if you simply need to trigger something whenever a certain file is accessed, you could play with inotifywait:
#!/bin/bash
while inotifywait -qq -e access /path/to/file; do
echo "$(date +%s)" >> /tmp/access.txt
done
Run this as a background process, and you will get an entry in /tmp/access.txt each time your file is being read.

Shell script process is getting killed automatically

I am facing problem with shell script i have ascript which will be running in infinite loop so say its havin PID X.The process is running for 4-5 hours but automatically the process getting killed.This is happening only for some long time running system and i am observing some times after 2 hours also its getting killed.
I am not able to find the reason the why its going down why its getting killed.No one is using the system other than me.And i am running the process as a root user.
Can any one explain or suspect the reason who is killing the process.
Below is the sample script
#!/bin/bash
until ./test.tcl; do
echo "Server 'test.tcl' crashed with exit code $?. Respawing.." >&2
done
In test.tcl script i am running it for infinite loop and the script is used to trap signal and do some special operation.But we find that test.tcl is also going down.
So is there any way from where i capture who and how it gets killed.
Enable core dump in your system, it is the most commonly used method for app crash analysis. I know it is a bit painful to gdb core file, but more or less you can find something out of it.
Here is a reference link for you.(http://www.cyberciti.biz/tips/linux-core-dumps.html).
Another way to do this is tracing you script by "strace -p PID-X", note that it will slow down your system, espeically several hours in your case, but it can be the last resort.
Hope above helpful to you.
Better to check all the signals generated and caught by OS at that time by specific script might be one of signal is causing to kill your process.

Logging to a non blocking named pipe?

I have a question, and I could'nt find help anywhere on stackoverflow or the web.
I have a program (celery distributed task queue) and I have multiple instances (workers) each having a logfile (celery_worker1.log, celery_worker2.log).
The important errors are stored to a database, but I like to tail these logs from time to time when running new operations to make sure everything is ok (the loglevel is lower).
My problem: these logs are taking a lot of disk space.
What I would like to do: be able to "watch" the logs (tail -f) only when I need it, without them taking a lot of space.
My ideas until now:
outputing logs to stdout, not to a file: not possible here since I have many workers outputing to different files, but I want to tail them all at once (tail -f celery_worker*.log)
using logrotate: it is an "OK" solution for me. I don't want this to be a daily task but would rather not put a minute crontab for this, and more, the server is not mine so that would mean some work on the admin-sys side
using named pipes: it looked good at first sight but I didn't know that named pipes (linux FIFO) where blocking. Hence, when I don't tail -f ALL of the pipes at the same time, or when I just quit my tail, the writing operations from the logger are blocked.
Is there a way to have a non-blocking named pipe, which would just throw to stdout when tailed, and throw to /dev/null when not?
Or are there technical difficulties to such a type of pipe? If there are, what are they?
Thank you for your answers!
Have each worker log to stdout, but connect each stdout to a utility that automatically spools and rotates logs based on size or time. multilog and svlogd are examples of such. For those programs, you'd merely tail the "current" log file.
You're right that logrotate is not quite the right solution for the problem you have.
Named pipes won't work as you want. At best, your writers could fill up their pipes and then discard subsequent logs, which is the inverse of the behavior you want.
You could try shared memory device man:shm_overview or perhaps a number of them. You need to organise them as circular buffers so they'd store last N kb of your log and whenever you read them with reader it will output everything to your console. This approach is adopted by busybox's syslog/logread suit (see logread.c).

Resources