I need to test a program that creates temporary file. When run finishes it deletes the file. How can I check it file has be created and deleted.
I am thinking about sending some signals to process (like Ctrl-Z) to suspend it and check but should be simpler ways.
I am using bash in Linux.
Since you don't have access to the program code, then you could use the strace tool to intercept all the system calls issued by the process. Then with simple greps you can look for file creation, deletion and all related operations. Probably you have to use the "-f" option to make sure everything is logged including the operations performed by any process's child
If you can, with some certainty, know when the file will be created you can use the link command to hang onto the file. For example, in the code you are testing there is a sequence like:
open some temporary file for writing
# do stuff
write stuff to open file
close file
unlink file
If, in between the open and unlink in the tested program, you can run
ln the_temp_file my_temp_file
then when the unlink occurs, the tested program will have no idea that you have a hard link to the file so it didn't get removed from the file system.
This will not work for symbolic links ln -s so your link will need to be on the same physical device.
Related
I am writing a program that handles some data on a server. Throughout the program, many files are made and sent as input into other programs. To do this, I usually make the command string, then run it like so:
cmd = "prog input_file1 input_file2 > outputfile"
os.system(cmd)
When I run the command, however, the programs being called report that they cannot open the files. If I run the python code on my local computer, it is fine. When I loaded it onto the server, it started to fail. I think this is related to issues with permissions, but am not sure how I can fix this. Many of the files, particularly the output files, are being created at run time. The input files have full permissions for all users. Any help or advice would be appreciated!
Cheers!
The python code you list is simple and correct, so the problem is likely not in the two lines of your example. Here are some related areas for you to check out.
Permissions
The user running the python script must have the appropriate permission (read, write, execute). I see from comments that you've already checked this.
What command are you running
If the command is literally typed into your source code like in the example, then you know what command is being run, but if you are generating any part of it (eg. the list of operands, the name of the output file, other parameters, etc), make sure there are no bugs in the portions of your code that generate the command. For example before the call to os.system(cmd) consider including a line like print("About to execute: " + cmd) so you can see exactly what will be run.
Directly invoke the command
If all the above looks good, try to execute the command directly at a terminal on your server. What output do you get then. It's possible that the problem is with the underlying command itself rather than your python code.
I have a script that takes a list of servers from an input file one by one and executes some commands on each server. I want to be able to update the input file while this script is running, without affecting the input of the first process, and re-run the script with the second list of servers. Can this be done safely?
When you run a command like file > my_script the contents located at file are piped into my_script (as a file descriptor). This decouples the contents from the name, meaning you can immediately modify/replace file in another process.
If you instead run a command like my_script file you're passing the name "file" to my_script, which may read from that file at any point (or write to it, delete it, etc.), thus you can't safely change file while the script is running. Notably this doesn't happen immediately; a long running process might not read from file until much later, after you've already edited the file.
Therefore if you design your program to read from stdin you can safely modify the input file and re-run the command while the first process is still running.
Let say that your process is running and if you want to change the file, just mv the file aside and copy your new input file. That way if the process hasn't completely read the input file into memory, it will still have a file-descriptor open to the previous file and will run unaffected. Ofcourse this all depends on how the process is implemented, if it tries to re-open the file during the course of execution, it will see new files contents.
process inputfile
mv inputfile inputfile.running
mv newinput inputfile
I thought about a concurrency issue (in Solaris), what happen if while reading someone tries to delete the same file. I have a query regarding file existence in the Solaris/Linux. suppose I have a file test.txt, I have open it in vi editor, and then I have open a duplicate session and remove that file, but even after deleting that file I am able to read that file. so here are my questions:
Do I need to thinks about any locking mechanism while reading, so no one able to delete same file while reading.
What is the reason of showing different behavior from windows(like in windows if file is open in in some editor than we can not delete that file)
After removing that file, how I am still able to read that file, if I haven't closed file from vi editor.
I am asking files in general,but yes platform specific i.e. unix. what will happen if I am using a java program (buffer reader) for read file and file is deleted while reading, does buffer reader still able to read the file for next chunk or not?
You have basically 2 or 3 unrelated questions there. Text editors like to read the whole file into memory at the start of the editing session. Imagine every character you type being saved to disk immediately, with all characters after it in the file being rewritten one place further along to make room. That would be awful. Much better that the thing you're actually editing is a memory representation of the file (array of pointers to lines, probably with some metadata attached) which only gets converted back into a linear stream when you explicitly save.
Any relatively recent version of vim will notify you if the file you are editing is deleted from its original location with the message
E211: File "filename" no longer available
This warning is not just for unix. gvim on Windows will give it to you if you delete the file being edited. It serves as a reminder that you need to save the version you're working on before you exit, if you don't want the file to be gone.
(Note: the warning doesn't appear instantly - vim only checks for the original file's existence when you bring it back into the foreground after having switched away from it.)
So that's question 1, the behavior of text editors - there's no reason for them to keep the file open for the whole session because they aren't actually using it except at startup and during a save operation.
Question 2, why do some Windows editors keep the file open and locked - I don't know, Windows people are nuts.
Question 3, the one that's actually about unix, why do open files stay accessible after they're deleted - this is the most interesting one. The answer, guaranteed to shock you when presented directly:
There is no command, function, syscall, or any other method which actually requests deletion of a file.
Underlying rm and any other command that may appear to delete a file there is the system call unlink. And it's called unlink, not remove or deletefile or anything similar, because it doesn't remove a file. It removes a link (a.k.a. directory entry) which is an association between a file and a name in a directory. (Note: ANSI C added remove as a more generic function to appease non-unix people who had no intention of implementing unix filesystem semantics, but on unix, remove is just a rmdir if the target is a directory, and unlink for everything else.)
A file can have multiple links (see the ln command for how they are created), which means that the same file is known by multiple names. If you rm one of them, the others stick around and the file is not deleted. What happens when you remove the last link? Well, now you have a file with no name. But names are only one kind of reference to a file. There are at least 2 others: file descriptors and mmap regions. When the last reference to a file goes away, that's when the file is deleted.
Since references come in several forms, there are many kinds of events that can cause a file to be deleted. Here are some examples:
unlink (rm, etc.)
close file descriptor
dup2 (can implicitly closes a file descriptor before replacing it with a copy of a different file descriptor)
exec (can cause file descriptors to be closed via close-on-exec flag)
munmap (unmap memory region)
mmap (if you create a new memory map at an address that's already mapped, the old mapping is unmapped)
process death (which closes all file descriptors and unmaps all memory mappings of the process)
normal exit
fatal signal generated by the kernel (^C, segfault)
fatal signal sent from another process (kill)
I won't call that a complete list. And I don't encourage anyone to try to build a complete list. Just know that rm is "remove name", not "remove file", and files go away as soon as they're not in use.
If you want to destroy the contents of a file immediately, truncate it. All processes already using it will find that its size has suddenly become 0. (This is destruction as far as the normal file access methods are concerned. To destroy it more thoroughly so that even someone with raw disk access can't read what used to be there, you need to overwrite it. There's a tool called shred for that.)
I think your question has nothing to do with the difference between Windows/Linux. It's about how VI works.
when using VI to edit a file. VI will create a .swp file. And the .swp file is what you are actually editing. At the same time, if other users delete the original file will not effect your editing.
And when you type :w in VI, VI will use .swp file to overwrite the original file.
I am trying to write a script or a piece of code to archive files, but I do not want to archive anything that is currently open. I need to find a way to determine what files in a directory are open. I want to use either Perl or a shell script, but can try use other languages if needed. It will be in a Linux environment and I do not have the option to use lsof. I have also had inconsistant results with fuser. Thanks for any help.
I am trying to take log files in a directory and move them to another directory. If the files are open however, I do not want to do anything with them.
You are approaching the problem incorrectly. You wish to keep files from being modified underneath you while you are reading, and cannot do that without operating system support. The best that you can hope for in a multi-user system is to keep your archive metadata consistent.
For example, if you are creating the archive directory, make sure that the number of bytes stored in the archive matches the directory. You can checksum the file contents before and after reading the filesystem and compare that with what you wrote to the archive and perhaps flag it as "inconsistent".
What are you trying to accomplish?
Added in response to comment:
Look at logrotate to steal ideas about how to handle this consistently just have it do the work for you. If you are concerned that rename of files will make processes that are currently writing them will break things, take a look at man 2 rename:
rename() renames a file, moving it
between directories if required. Any
other hard links to the file (as
created using link(2)) are unaffected.
Open file descriptors for oldpath are
also unaffected.
If newpath already exists it will be atomically replaced (subject
to a few conditions; see ERRORS
below), so that there is no point at
which another process attempting to
access newpath will find it missing.
Try ls -l /proc/*/fd/* as root.
msw has answered the question correctly but if you want to file the list of open processes, the lsof command will give it to you.
How could I track changes of specific directory in UNIX? For example, I launch some utility which create some files during its execution. I want to know what exact files were created during one particular launch. Is there any simple way to get such information? Problem is that:
I cannot flush directory content after script execution
Files created with the name that has hash as a compound part. There is no possibility to get this hash from script for subsequent search.
There could be several scripts executed simultaneously, I do not want to see files created by another process in the same folder.
Please notice that I do not want to know whether directory has been changed as stated here, I need filenames which ideally could be grepped to match specific pattern.
You need to subscribe to file system change notifications.
You should use something like FAM, gamin, or inotify to detect when a file has been created, closed, etc.
You could use strace -f myscript to trace all system calls made by the script, and use grep to filter the system calls that create new files.
You could use the Linux Auditing System. Here is a howto link:
http://www.cyberciti.biz/tips/linux-audit-files-to-see-who-made-changes-to-a-file.html
You can use the script command to track the commands launched.