When does a process acquire a file to read - linux

I have a script that takes a list of servers from an input file one by one and executes some commands on each server. I want to be able to update the input file while this script is running, without affecting the input of the first process, and re-run the script with the second list of servers. Can this be done safely?

When you run a command like file > my_script the contents located at file are piped into my_script (as a file descriptor). This decouples the contents from the name, meaning you can immediately modify/replace file in another process.
If you instead run a command like my_script file you're passing the name "file" to my_script, which may read from that file at any point (or write to it, delete it, etc.), thus you can't safely change file while the script is running. Notably this doesn't happen immediately; a long running process might not read from file until much later, after you've already edited the file.
Therefore if you design your program to read from stdin you can safely modify the input file and re-run the command while the first process is still running.

Let say that your process is running and if you want to change the file, just mv the file aside and copy your new input file. That way if the process hasn't completely read the input file into memory, it will still have a file-descriptor open to the previous file and will run unaffected. Ofcourse this all depends on how the process is implemented, if it tries to re-open the file during the course of execution, it will see new files contents.
process inputfile
mv inputfile inputfile.running
mv newinput inputfile

Related

Why do I get no error when running the same Python script on multiple terminals at the same time?

I know from experience that if I try to open the same file in Vim in multiple terminals at the same time, I get an error. (Maybe because of temporary files?)
And I know from experience that if I open a text file in Python and read through it, I have to reset the pointer when I'm done.
But I've found that if I run the same Python script in multiple terminals at the same time, I don't get any error; it just successfully runs the script in both. How does this work? Doesn't Python need to read my script from the beginning in order to run it? Is the script copied to a temporary file, or something?
I know from experience that if I try to open the same file in Vim in multiple terminals at the same time, I get an error.
That's not actually true. Vim actually will let you open the same file in multiple terminals at the same time; it's just that it gives you a warning first to let you know that this is happening, so you can abort before you make changes. (It's not safe to modify the file concurrently in two different instances of Vim, because the two instances won't coordinate at all.)
Furthermore, Vim will only give you this warning if you try to open the same file for editing in multiple terminals at the same time. It won't complain if you're just opening the file for reading (using the -R flag).
And I know from experience that if I open a text file in Python and read through it, I have to reset the pointer when I'm done.
That's not exactly true, either. If you make multiple separate calls to open, you'll have multiple separate file objects, and each separately maintains its position in the file. So something like
with open('filename.txt', 'r') as first:
with open('filename.txt', 'r') as second:
print(first.read())
print(second.read())
will print the complete contents of filename.txt twice.
The only reason you'd need to reset the position when you're done reading a file is if you want to use the same file object to read the file again, or if you've opened the file in read/write mode (r+ rather than r) and you now want to switch from reading to writing.
But I've found that if I run the same Python script in multiple terminals at the same time, I don't get any error; it just successfully runs the script in both. How does this work? Doesn't Python need to read my script from the beginning in order to run it? Is the script copied to a temporary file, or something?
As I think should now be clear — there's no problem here. There's no reason that two instances of Python can't both read the same script file at the same time. Linux allows that. (And in fact, if you delete the file, Linux will keep the file on disk until all programs that had it open have either closed it or exited.)
In fact, there's also no reason that two processes can't write to the same file at the same time, though here you have to be very careful to avoid the processes causing problems for each other or corrupting the file.
terminal is just running the command you said it to execute, there is no pointer or anything
you jus

Cannot write to file when using task manager for python script

I have created a python script that given an input file, will run NMap on arguments from the input file. It then writes to an output file in csv format. My script works fine and as intended when I run from IDLE, but when my script runs from the task manager, it never overwrites the excel/csv file I tell my script to write to. The path I provide in the file:
ipResults = r'C:\Users\________\Documents\Results.csv'
I've left out the username for security concerns.
I've set the script to run when I log on. When I log on, I see the output/results in a taskeng.exe window with with a python symbol and rocketship. But when it finishes running, Results.csv does not get updated. As said previously, when running through IDLE, the script does overwrite Results.csv.
Currently I have set my file to both w+ changing it from w to see if that's the error but no such luck. I'm fine with the program overwriting my past results, in fact that's what I want, but when my script is ran through the task manager it does not overwrite the Results.csv file.
Simply checking the run with highest privilege box when setting the task up on task manager fixed my error, I am now able to write to my output file.

Output-redirection should recreate the destination-file

I can redirect the output of a process to a file
./prog > a.txt
But if I delete a.txt and do not restart prog, then no more output will get into a.txt. The same is the case if I use the append-redirect >>.
Is there a way to make my redirection recreate the file when it is deleted during the runtime of prog?
Redirection is part of the OS I think and not of prog. So maybe there are some tools or settings.
Thanks!
At the OS level, a file is made up of many components:
the content, stored somewhere on the storage device;
an i-node that keeps all file information except the name;
the name, listed in a directory (also stored on the storage device);
when the file is open, each application that opens it handle memory buffers that keep some of the file content.
All these are linked and the OS keeps their booking.
If you delete the file while it is open by another application (the redirect operator > keeps it open until ./prog completes), only the name is removed from the directory. The other pieces of the puzzle are still there and they keep working until the last application that keeps the file open closes it. This is when the file content is discarded on the storage medium.
If you delete the file, while ./prog keeps running and producing output the file grows and uses space on the storage medium but it cannot be open again because there is no way to access it. Only the programs that have it already open when it was deleted can still access the file until they close it.
Even if you re-create the file, it is a different file that happens to have the same name as the deleted one. ./prog is not affected, its output goes to the old, deleted file.
When its output is redirected, apart from restarting ./prog, there is no way to persuade it to store its output in a different file when a.txt is deleted.
There are several ways to make this happen if ./prog writes itself into a.txt (they all require changing the code of ./prog).
You can use gdb to redirect the output of program to file when original file is deleted.
Refer to this post.
For later references, I give the only excerpt from the post:
Find the files that are opened by the process using /proc/<pid>/fd.
Attach the PID of program to gdb.
Close the file descriptor of the deleted file through gdb session.
Redirect the program output to another file using gdb calls.
Examples
Suppose that PID of program is 19080 and file descriptor of deleted file is 2.
gdb attach 19080
ls -l /proc/19080/fd
gdb> p close(2)
$1 = 0
gdb> p fopen("/tmp/file", "w")
$2 = 20746416
(gdb) p fileno($2)
$3 = 7
gdb> quit
N.B.: If data of the deleted file is required, recover the deleted text file before closing the file handle:
cp -pv /proc/19080/fd/2 recovered_file.txt

Check if file was created and deleted?

I need to test a program that creates temporary file. When run finishes it deletes the file. How can I check it file has be created and deleted.
I am thinking about sending some signals to process (like Ctrl-Z) to suspend it and check but should be simpler ways.
I am using bash in Linux.
Since you don't have access to the program code, then you could use the strace tool to intercept all the system calls issued by the process. Then with simple greps you can look for file creation, deletion and all related operations. Probably you have to use the "-f" option to make sure everything is logged including the operations performed by any process's child
If you can, with some certainty, know when the file will be created you can use the link command to hang onto the file. For example, in the code you are testing there is a sequence like:
open some temporary file for writing
# do stuff
write stuff to open file
close file
unlink file
If, in between the open and unlink in the tested program, you can run
ln the_temp_file my_temp_file
then when the unlink occurs, the tested program will have no idea that you have a hard link to the file so it didn't get removed from the file system.
This will not work for symbolic links ln -s so your link will need to be on the same physical device.

Recover Deleted File Stuck In Linux shell process

I have a background process that is running for a long time and using a file to write the logs in it. It`s size has increased too large. I just deleted the file and created a new one with the same name and same permission and ownership but the new file does not get any entry.
Old file is marked as deleted and still being used by the process which can clearly be seen by lsof command.
Plz let me know, is there any way that I can recover that file and.
Your positive response will really be much helpful.
If the file is still open by some process, you can recover it using the /proc filesystem.
First, check the file descriptor number under which that file is opened in that process. If the file is opened in a process with PID X, use the lsof command as follows:
lsof -p X
This will show a list of files that are currently opened by X. The 4th column shows the file descriptors and the last column shows the name of the mount point and file system where the file lives (ignore the u, r and other flags after the file descriptor number, they just indicate whether the file is opened for reading, writing, etc.)
If the file descriptor number is Y, you can access its contents in /proc/X/fd/Y. So, something like this would recover it:
cp /proc/X/fd/Y /tmp/recovered_file

Resources