How to find the source of a copied file?

How to find the source of a copied file? - linux

I have a file that I copied sometime back, but I forgot the source of it. Is there a way to find the source of the copied file? I don't remember which terminal I have used to try and check with Esc+P
Command used: cp -rf $source/file $destination/file
Thanks in advance!

You could try history | grep your_filename.

A Linux system has many files (and if you think of /proc/, it could change at every moment). And some other process can write or create (or append or truncate) files (e.g. some crontab(1) job...)
Assume you do know some parent directory containing the source file. Suppose it is /home/foo.
Then, you might use find(1) and some hashing command like md5sum(1) to compute and collect the hash of every file.
Use the property that two files A and B with identical contents (a sequence of bytes) have the same md5sum. Of course, the converse is false, but in practice unlikely.
So run first
find /home/foo -type f -exec md5sum '{}' \; > /tmp/foo-md5
then do seekingmd5=$(md5sum A )
then grep $seekingmd5 /tmp/foo-md5 will find lines for files having the same md5 than your original A
Depending on your filesystem and hardware, this could take hours.
You could accelerate slightly things by writing a C program using nftw(3) with md5init etc...

Related

Bash Scripting with xargs to BACK UP files

I need to copy a file from multiple locations to the BACK UP directory by retaining its directory structure. For example, I have a file "a.txt" at the following locations /a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt, I now need to copy this file from multiple locations to the backup directory /tmp/backup. The end result should be:
when i list /tmp/backup/a --> it should contain /b/a.txt /c/a.txt /d/a.txt & /e/a.txt.
For this, I had used the command: echo /a/*/a.txt | xargs -I {} -n 1 sudo cp --parent -vp {} /tmp/backup. This is throwing the error "cp: cannot stat '/a/b/a.txt /a/c/a.txt a/d/a.txt a/e/a.txt': No such file or directory"
-I option is taking the complete input from echo instead of individual values (like -n 1 does). If someone can help debug this issue that would be very helpful instead of providing an alternative command.

Use rsync with the --relative (-R) option to keep (parts of) the source paths.
I've used a wildcard for the source to match your example command rather than the explicit list of directories mentioned in your question.
rsync -avR /a/*/a.txt /tmp/backup/

Do the backups need to be exactly the same as the originals? In most cases, I'd prefer a little compression. [tar](https://man7.org/linux/man-pages/man1/tar.1.html) does a great job of bundling things including the directory structure.
tar cvzf /path/to/backup/tarball.tgz /source/path/
tar can't update compressed archives, so you can skip the compression
tar uf /path/to/backup/tarball.tar /source/path/
This gives you versioning of a sort, as if only updates changed files, but keeps the before and after versions, both.
If you have time and cycles and still want the compression, you can decompress before and recompress after.

Find out the path to the original file of a hard link

Now, I get the feeling that some people will think that there was no original file of a hard link, but I would strongly disagree because of the following experiment I did.
Let's create a file with the content pwd and make a hard link to a subfolder:
echo "pwd" > original
mkdir subfolder
cp -l original subfolder/hardlink
Now let's see what the files output if I run it with shell:
sh original
sh subfolder/hardlink
The output is the same, even though the file hardlink is in a subfolder!
Sorry, for the long intro, but I wanted to make sure that nobody says that my following question is irrelevent.
So my question now is: If the content of the original file was not conveniently pwd, how do I find out the path to the original file from a hard link file?
I know that linux programs seem to know the path somehow, but not the filename, because some programs returned error messages that <path to original file>/hardlinkname was not found. But how do they do that?
Thanks in advance for an answer!
Edit: Btw, I fixed the error messages mentioned above by naming the hard links the same as the original file.

But how do they do that?
By looking for the same inode value. Here's one way you can list files with the same inode:
find /home -xdev -samefile original
replace /home with any other starting directory for find to start searching.
how do I find out the path to the original file from a hard link file?
For hard links there are no multiple files, just one file (inode) with multiple (file) names.
ADDENDUM:
is there no other way to find the hard links of an inode than searching through folders?
ln, ls, find, and stat are the common ways of discovering and querying the filesystem for inodes. Then depending on what next you want to accomplish, many file, directory, archiving, and searching commands recognize inode values. Some may require a special -inum or --follow or equivalent option to specify inodes.
The find example I gave above is just one such usage. Another is to combine with xargs to operate on all the found files. Here's one way to delete them all:
find /home -xdev -samefile original | xargs rm
Look under --help for other standard os commands. Most Linux distributions also come with help files that explain inodes and which tools work with inodes.

pwd is the present working directory, so of course, the output should be the same, since you didnt cd't into your subfolder.
Sorry to say, but there is no "original" file if you create other hardlinks. If you want to get other hardlinks of a file, look at How to find all hard links to a given file? for example.

Agree with #Emacs User. Your example of pwd is irrelevant and confused you.
There is no concept of original file for hard-links. The file names just act as a reference count to the content on the disk pointed by the i-node (see 'ls -li original subfolder/hardlink'). So even if you delete the original file hardlink still points to the same content.

It is impossible to find out as all hard links are treated the same way pointing to one inode.

How to list recently deleted files from a directory?

I'm not even sure if this is easily possible, but I would like to list the files that were recently deleted from a directory, recursively if possible.
I'm looking for a solution that does not require the creation of a temporary file containing a snapshot of the original directory structure against which to compare, because write access might not always be available. Edit: If it's possible to achieve the same result by storing the snapshot in a shell variable instead of a file, that would solve my problem.
Something like:
find /some/directory -type f -mmin -10 -deletedFilesOnly
Edit: OS: I'm using Ubuntu 14.04 LTS, but the command(s) would most likely be running in a variety of Linux boxes or Docker containers, most or all of which should be using ext4, and to which I would most likely not have access to make modifications.

You can use the debugfs utility,
debugfs is a simple to use RAM-based file system specially designed
for debugging purposes
First, run debugfs /dev/hda13 in your terminal (replacing /dev/hda13 with your own disk/partition).
(NOTE: You can find the name of your disk by running df / in the terminal).
Once in debug mode, you can use the command lsdel to list inodes corresponding with deleted files.
When files are removed in linux they are only un-linked but their
inodes (addresses in the disk where the file is actually present) are
not removed
To get paths of these deleted files you can use debugfs -R "ncheck 320236" replacing the number with your particular inode.
Inode Pathname
320236 /path/to/file
From here you can also inspect the contents of deleted files with cat. (NOTE: You can also recover from here if necessary).
Great post about this here.

So a few things:
You may have zero success if your partition is ext2; it works best with ext4
df /
Fill mount point with result from #2, in my case:
sudo debugfs /dev/mapper/q4os--desktop--vg-root
lsdel
q (to exit out of debugfs)
sudo debugfs -R 'ncheck 528754' /dev/sda2 2>/dev/null (replace number with one from step #4)

Thanks for your comments & answers guys. debugfs seems like an interesting solution to the initial requirements, but it is a bit overkill for the simple & light solution I was looking for; if I'm understanding correctly, the kernel must be built with debugfs support and the target directory must be in a debugfs mount. Unfortunately, that won't really work for my use-case; I must be able to provide a solution for existing, "basic" kernels and directories.
As this seems virtually impossible to accomplish, I've been able to negotiate and relax the requirements down to listing the amount of files that were recently deleted from a directory, recursively if possible.
This is the solution I ended up implementing:
A simple find command piped into wc to count the original number of files in the target directory (recursively). The result can then easily be stored in a shell or script variable, without requiring write access to the file system.
DEL_SCAN_ORIG_AMOUNT=$(find /some/directory -type f | wc -l)
We can then run the same command again later to get the updated number of files.
DEL_SCAN_NEW_AMOUNT=$(find /some/directory -type f | wc -l)
Then we can store the difference between the two in another variable and update the original amount.
DEL_SCAN_DEL_AMOUNT=$(($DEL_SCAN_ORIG_AMOUNT - $DEL_SCAN_NEW_AMOUNT));
DEL_SCAN_ORIG_AMOUNT=$DEL_SCAN_NEW_AMOUNT
We can then print a simple message if the number of files went down.
if [ $DEL_SCAN_DEL_AMOUNT -gt 0 ]; then echo "$DEL_SCAN_DEL_AMOUNT deleted files"; fi;
Return to step 2.
Unfortunately, this solution won't report anything if the same amount of files have been created and deleted during an interval, but that's not a huge issue for my use case.
To circumvent this, I'd have to store the actual list of files instead of the amount, but I haven't been able to make that work using shell variables. If anyone could figure that out, I'd help me immensely as it would meet the initial requirements!
I'd also like to know if anyone has comments on either of the two approaches.

Try:
lsof -nP | grep -i deleted

history >> history.txt
Look for all rm statements.

cronjob to remove files older than 99 days

I have to make a cronjob to remove files older than 99 days in a particular directory but I'm not sure the file names were made by trustworthy Linux users. I must expect special characters, spaces, slash characters, and others.
Here is what I think could work:
find /path/to/files -mtime +99 -exec rm {}\;
But I suspect this will fail if there are special characters or if it finds a file that's read-only, (cron may not be run with superuser privileges). I need it to go on if it meets such files.

When you use -exec rm {} \;, you shouldn't have any problems with spaces, tabs, returns, or special characters because find calls the rm command directly and passes it the name of each file one at a time.
Directories won't' be removed with that command because you aren't passing it the -r parameter, and you probably don't want too. That could end up being a bit dangerous. You might also want to include the -f parameter to do a force in case you don't have write permission. Run the cron script as root, and you should be fine.
The only thing I'd worry about is that you might end up hitting a file that you don't want to remove, but has not been modified in the past 100 days. For example, the password to stop the autodestruct sequence at your work. Chances are that file hasn't been modified in the past 100 days, but once that autodestruct sequence starts, you wouldn't want the one to be blamed because the password was lost.
Okay, more reasonable might be applications that are used but rarely modified. Maybe someone's resume that hasn't been updated because they are holding a current job, etc.
So, be careful with your assumptions. Just because a file hasn't been modified in 100 days doesn't mean it isn't used. A better criteria (although still questionable) is whether the file has been accessed in the last 100 days. Maybe this as a final command:
find /path/to/files -atime +99 -type f -exec rm -f {}\;
One more thing...
Some find commands have a -delete parameter which can be used instead of the -exec rm parameter:
find /path/to/files -atime +99 -delete
That will delete both found directories and files.
One more small recommendation: For the first week, save the files found in a log file instead of removing them, and then examine the log file. This way, you make sure that you're not deleting something important. Once you're happy thet there's nothing in the log file you don't want to touch, you can remove those files. After a week, and you're satisfied that you're not going to delete anything important, you can revert the find command to do the delete for you.

If you run rm with the -f option, your file is going to be deleted regardless of whether you have write permission on the file or not (all that matters is the containing folder). So, either you can erase all the files in the folder, or none. Add also -r if you want to erase subfolders.
And I have to say it: be very careful! You're playing with fire ;) I suggest you debug with something less harmful likfe the file command.
You can test this out by creating a bunch of files like, e.g.:
touch {a,b,c,d,e,f}
And setting permissions as desired on each of them

You should use -execdir instead of -exec. Even better, read the full Security considerations for find chapter in the findutils manual.

Please, always use rm [opts] -- [files], this will save you from errors with files like -rf wiich would otherwise be parsed as options. When you provide file names, then end all options.

Linux shell:Is it possible to speedup finding files using "find" by using a predefined list of files/folders?

I primarily program in Linux, using tcsh shell. By default, my current directory is the root of my code base - I use "find" to locate whichever file I'm interested in modifying, and then once find shows up the location of the file, I can then edit/modify on Vim.
The problem is, due to the size of the code base, every time I ask find to show up the location of a file , it takes at least 4-5 seconds to complete the search, which are too short to be used for anything else !! So, since the rate is new files being added to the code base is very small, i'm looking for a way as follows:
1) Generate the list of all files in my code base
2) Have find look in only those locations/files to answer my query
I've seen how opening up files in cscope is lightning fast, as it stores the list of files previously. I'd like to use the same mechanism for find, just not from within the cscope window, but from the generic cmd line.
Any ideas ?

Install the locate, mlocate, or slocate package from your distribution, and either wait for cron to run the update task :) or run the updatedb command manually via the /etc/cron.daily/mlocate or similar file.
$ time locate kernel.txt
/home/sarnold/Local/linux-2.6/Documentation/sysctl/kernel.txt
/home/sarnold/Local/linux-2.6-config-all/Documentation/sysctl/kernel.txt
/home/sarnold/Local/linux-apparmor/Documentation/sysctl/kernel.txt
/usr/share/doc/libfuse2/kernel.txt.gz
real 0m0.595s

Yes. See slocate (or updatedb & locate).
The -U flag is particularily interesting because you can just index the directory that contains your code (and thus, updating or creating the database will be quick).

You could write a list of directories to a file and use them in your find command:
$ find /path/to/src -type d > dirs
$ find $(cat dirs) -type f -name "foo"
Alternatively, write a list of files to a file and use grep on it. The list of files is more likely to change than the list of dirs though.
$ find /path/to/src -type f > files
$ vi $(grep foo files)

find in conjunction with xargs (substituting -exec) does differ significantly in execution timings:
http://forrestrunning.wordpress.com/2011/08/01/find-exec-xargs/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string