How to Track All Output Files From Executable, If Possible?: - linux

I have been assigned to a project with no documentation and lots of unmanaged code.
There are explicitly-declared paths throughout the project (and in fact, they are environmental variables that were being set to different values in different places) that point to output files. I've changed these to redirect the output to directories in my workspace, but yet the files are not produced, nor can I find them in my workspace. I believe they're being created somewhere else in the filesystem. As I mentioned before, there are many different places that the environmental variables are assigned, through the use of scripts. I thought I had gotten the relevant scripts, but apparently I am missing something.
Is there a utility I can use to track all file output from a particular executable (print out all file names read/written)?
I am working under Fedora and the project is written primarily in Fortran.

strace will print details for every syscall, you can simply filter the output for calls to open().

One option is lsof e.g.
lsof -p <PID>

Related

Linux ~/.bashrc export most recent directory

I have several environment variables in my ~/.bashrc that point to different directories. I am running a program that creates a new folder every time that it runs and puts a time stamp in the directory name. For example, baseline_2015_11_10_15_40_31-model-stride_1-type_1. Is there away of making a variable that can link to the last created directory?
cd $CURRENT_DIR
Your mileage may vary a lot depending on what exactly do you need to accomplish. However, it almost all cases I would advise against doing something that weird and unreliable like what's described below and revise your architecture to avoid hunting for directories.
Method 1
If your program creates a subdirectory inside current directory, and you always know that nothing else happens in that directory and you want a subdirectory with latest creation timestamp, then you can do something like:
your_complex_program_that_creates_dir
TARGET_DIR=$(ls -t1 --group-directories-first | head -n1)
cd "$TARGET_DIR"
Method 2
If a lot of stuff happens on the system, then you'll end up monitoring what your program does with the filesystem and reacting when it creates a directory. There are two ways to do that, using strace and inotify, both are relatively complex. Here's the way to do that with strace:
strace -o some_temp_file.strace your_complex_program_that_creates_dir
TARGET_DIR=$(sed -ne '/^mkdir(/ { s/^mkdir("\(.*\)", .*).*$/\1/; p }' some_temp_file.strace
cd "$TARGET_DIR"
This snippet runs your_complex_program_that_creates_dir under control of strace, which essentially logs every system call your program makes into a file. Afterwards, this file is analyzed to seek a line like
mkdir("target_dir", 0777) = 0
and extract value of "target_dir" into a variable. Note that:
if your program creates more than 1 directory (even for temporary purposes and deletes them afterwards, or whatever) — there's really no way to determine which of them to grab
running a program with strace is much slower that normal due to huge overhead of logging all the syscalls.
it's super non-portable — facilities like strace exist on most modern OS, but implementations will vary a lot
A solution with inotify works in the same way, but using different mechanism — i.e. it uses OS hook to log all the operations that process performs with file system and then react to it (remember created directory).
However, I repeat, I'd strongly suggest against using any of these solutions beyond research interest.

How can I know which files were modified by a specific process in linux machines?

I need to get list of all modified files on my linux machines (AIX, Solaris, Red Hat, CentOS, HP-UX) in a specific time range (similar to proc mon or forfiles in Windows)
I tried to use find command. But since it didn't search per specific PID I got too many results.
I wanted to narrow down the results by looking for files that were modified by specific process. I used the lsof command for specific PID. but I got list of files that were accessed, which wasn't helpful for me, because I could not know if the process changed them.
I tried the strace command for specific PID, but the output was to hard to work with (too much irrelevant info, and I need it for 24 hours time range)
I kind of got to a dead end. Any ideas?
(In short - I want to get list of all modified files by a specific process in a specific time range)
Linux does not maintain a log of a record, of any kind, of which files were modified by which process.
The only logged information is each file's last modification timestamp. And even that can be arbitrarily adjusted by any process, which has appropriate privileges, to be ten years in the future, for example.
The short answer is that the information you're looking for does not exist.
The closest what I know of for your usecase is SELinux. This will only work if SELinux is enabled on your Operating System.
SELinux is capable of logging a bunch of information along with uid, gid, and PIDs ( exactly what you need ) for different operations.
For more details look at:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security_Guide/sec-Understanding_Audit_Log_Files.html

How to check if a file is opened in Linux?

The thing is, I want to track if a user tries to open a file on a shared account. I'm looking for any record/technique that helps me know if the concerned file is opened, at run time.
I want to create a script which monitors if the file is open, and if it is, I want it to send an alert to a particular email address. The file I'm thinking of is a regular file.
I tried using lsof | grep filename for checking if a file is open in gedit, but the command doesn't return anything.
Actually, I'm trying this for a pet project, and thus the question.
The command lsof -t filename shows the IDs of all processes that have the particular file opened. lsof -t filename | wc -w gives you the number of processes currently accessing the file.
The fact that a file has been read into an editor like gedit does not mean that the file is still open. The editor most likely opens the file, reads its contents and then closes the file. After you have edited the file you have the choice to overwrite the existing file or save as another file.
You could (in addition of other answers) use the Linux-specific inotify(7) facilities.
I am understanding that you want to track one (or a few) particular given file, with a fixed file path (actually a given i-node). E.g. you would want to track when /var/run/foobar is accessed or modified, and do something when that happens
In particular, you might want to install and use incrond(8) and configure it thru incrontab(5)
If you want to run a script when some given file (on a native local, e.g. Ext4, BTRS, ... but not NFS file system) is accessed or modified, use inotify incrond is exactly done for that purpose.
PS. AFAIK, inotify don't work well for remote network files, e.g. NFS filesystems (in particular when another NFS client machine is modifying a file).
If the files you are fond of are somehow source files, you might be interested by revision control systems (like git) or builder systems (like GNU make); in a certain way these tools are related to file modification.
You could also have the particular file system sits in some FUSE filesystem, and write your own FUSE daemon.
If you can restrict and modify the programs accessing the file, you might want to use advisory locking, e.g. flock(2), lockf(3).
Perhaps the data sitting in the file should be in some database (e.g. sqlite or a real DBMS like PostGreSQL ou MongoDB). ACID properties are important ....
Notice that the filesystem and the mount options may matter a lot.
You might want to use the stat(1) command.
It is difficult to help more without understanding the real use case and the motivation. You should avoid some XY problem
Probably, the workflow is wrong (having a shared file between several users able to write it), and you should approach the overall issue in some other way. For a pet project I would at least recommend using some advisory lock, and access & modify the information only thru your own programs (perhaps setuid) using flock (this excludes ordinary editors like gedit or commands like cat ...). However, your implicit use case seems to be well suited for a DBMS approach (a database does not have to contain a lot of data, it might be tiny), or some index locked file like GDBM library is handling.
Remember that on POSIX systems and Linux, several processes can access (and even modify) the same file simultaneously (unless you use some locking or synchronization).
Reading the Advanced Linux Programming book (freely available) would give you a broader picture (but it does not mention inotify which appeared aften the book was written).
You can use ls -lrt, it displays the last RW operations in the shell. Then you can conclude whether the file is opened or not. Make sure that you are in the exact directory.

How to tell if a given process opened files with O_DIRECT?

I would like to tell if a process has opened any files using O_DIRECT, but I can only examine it after the process was launched (i.e. strace is not an option). I tried looking in /proc/$pid/fd/ to see if there was anything useful, but there wasn't. My goal is to track down if any of several hundred users on a system have opened files with O_DIRECT. Is this possible?
Since kernel 2.6.22, /proc/$pid/fdinfo/$fd contains a flags field, in octal. See http://www.kernel.org/doc/man-pages/online/pages/man5/proc.5.html
I don't think it's visible in /proc or elsewhere in user space.
With kernel code, it's possible:
1. Get the process's task_struct (use find_task_by_pid).
2. Go over files - use task->files->count and task->files->fd_array.
3. Look for file->f_flags & O_DIRECT.

Symbolic link to latest file in a folder

I have a program which requires the path to various files. The files live in different folders and are constantly updated, at irregular intervals.
When the files are updated, they change name, so, for instance, in the folder dir1 I have fv01 and fv02. Later on the day someone adds fv02_v1; the day after someone adds fv03 and so on. In other words, I always have an updated file but with different name.
I want to create a symbolic link in my "run" folder to these files, such that said link always points to the latest file created.
I can do this in Python or Bash, but I was wondering what is out there, as this is hardly an uncommon problem.
How would you go about it?
Thank you.
Juan
PS. My operating system is Linux. I currently have a simple daemon (Python) that looks every once in a while (refreshes every minute) for the latest file. Seems kind of an overkill to me.
Unless there is some compelling reason that you have left unstated (e.g. thousands of files in the directory) just do it the way you suggest with a script sorting the files by modification time. There is no secret method that I am aware of.
You could write a daemon using inotify to monitor your directories and immediately set your links but that seems like overkill.
Edit: I just saw your edit. Since you have the daemon already, inotify might not be such a bad idea. It would be somewhat more efficient than constantly querying since the OS will tell you when something in your directories has changed.
I don't know python well enough to point you to anything specific but there must exist a wrapper for inotify.

Resources