How do I find missing files from a backup? - linux

So I want to backup all my music to an external hard drive. This worked well for the most part using Grsync, but some didn't copy over because of encoding issues with the file name.
I would like to compare my two music directories (current and backup) to see what files were missed, so I can copy these over manually.
What is a good solution for this? Note there are many many files, so ideally I don't want a tool that wastes time comparing the file contents. I just need to know if a file is missing from the backup that is in the source.
Are there good command line or gui solutions that can do this in good time?

Go to the top level directory in each set.
find . -type f -print | sort > /tmp/listfile.txt
Set up a sorted list for each directory, and diff should help you spot the differences.

Related

Unix create multiple files with same name in a directory

I am looking for some kind of logic in linux where I can place files with same name in a directory or file system.
For e.g. i create a file abc.txt, so the next time if any process creates abc.txt it should automatically check and make the file named as abc.txt.1 should be created, then next time abc.txt.2 and so on...
Is there a way to achieve this.
Any logic or third party tools are also welcomed.
You ask,
For e.g. i create a file abc.txt, so the next time if any process
creates abc.txt it should automatically check and make the file named
as abc.txt.1 should be created
(emphasis added). To obtain such an effect automatically, for every process, without explicit provision by processes, it would have to be implemented as a feature of the filesystem containing the files. Such filesystems are called versioning filesystems, though typically the details are slightly different from what you describe. Most importantly, however, although such filesystems exist for Linux, none of them are mainstream. To the best of my knowledge, none of the major Linux distributions even offers one as a distribution-supported option.
Although it's a bit dated, see also Linux file versioning?
You might be able to approximate that for many programs via a customized version of the C standard library, but that's not foolproof, and you should not expect it to have universal effect.
It would be an altogether different matter for an individual process to be coded for such behavior. It would need to check for existing files and choose an appropriate name when opening each new file. In doing so, some care needs to be taken to avoid related race conditions, but it can be done. Details would depend on the language in which you are writing.
You can use BASH expression to achieve this. For example if I wanted to make 10 files all with the same name, but having a unique number value I would do the following:
# touch my_file{01..10}.txt
This would create 10 files starting at 01 all the way to 10. This method is also hand for looping over files in a sequence or if your also creating directories.
Now if i am reading you question right your asking that if you move a file or create a file in a directory. you would want the a script to automatically create a new file for you? If that is the case then just use a test and if there is a file move that file and mark it. Me personally I use time stamps to do so.
Logic:
# The [ -f ] tests if the file is present
if [ -f $MY_FILE_NAME ]; then
# If the file is present move the file and give it the PID
# That way the name will always be unique
mv $MY_FILE_NAME $MY_FILE_NAME_$$
mv $MY_NEW_FILE .
else
# Move or make the file here
mv $MY_NEW_FILE .
fi
As you can see the logic is very simple. Hope this helps.
Cheers
I don't know about Your particular use case, but You may try to look at logrotate:
https://wiki.archlinux.org/index.php/Logrotate

Finding files that are Hardlink in Soalris under specific folder

I need to find hardlink files under specific folder in Solaris. Tried this below command which lists the files based on inode count.
find . -type f -links +1
The above command list both source and target files. But i need to list only the target_file.
For Eg: Under Test folder, there is source.txt
Test
->source.txt
Created hardlink:
ln source.txt target.txt
The above find command return both source.txt and target.txt. But I need a command to fetch only target.txt. Is it possible?
No. After the hardlink both names of the file are equal in all ways, there is no original or copy.
Since they share the underlying inode, both files have the same attributes -- change one you change all of them.
Either switch to symbolic links or find a heuristic to choose which one you don't want to see, like it has an extension, or sorts later.

Is it Possible to Delete "C:\cygwin64\usr\share\" Directory, to Decrease The Cygwin Library Size?

I've the Cygwin Packages Library installed om my system (Win7- x64) at location C:\Cygwin64\ .
That directory contains over 185.000 Files ! and its size passed the 5GB this week, Knowing that the packages source directory isn't included .
Now, I want to decrease that size, and of-course I'm going to uninstall some of my packages that I don't need anymore. But first I want to ask about the ability of deleting a specific directory that located in: C:\cygwin64\usr\share
(Please, forgive my ignorant, if my question is silly)
While I was trying to figure out the cause of that large files number, I noticed that, this directory specifically, has over than 90.000 File !!
I don't Know what is that directory used for, but would someone please tell me if I can Delete that folder safely, without affecting on the installed packages? - Thanks :)
I cannot speak for the entirety of the folder, but
awk uses that folder for
include files, which I would miss
delete a column with awk or sed
awk - how to delete first column with field separator
how to remove the first two columns in a file using shell (awk, sed, whatever)

How do you search for all the files that contain a particular string?

Let's say you're working on a big project with multiple files, directories, and subdirectories. In one of these directories/subdirectories/files, you've defined a method, but now you want to know exactly which files in your entire project have been calling your method. How do you do this?
You mentioned grep so I'll throw this solution out there. A more robust solution would be to implement a version control system as Fibbe suggested.
find . -exec grep 'method_name' {} \; -print 2> /dev/null
The idea is, for each file that is found in the current directory and sub-directories, a grep for 'method_name' is executed on that file. The 2> /dev/null is nice if you don't want to get warned about all of the directories and files you don't have access to.
The most common way to do this is by using your editor. For example emacs can do this if you create a tag index with etags.
Source: http://www.gnu.org/software/emacs/emacs-lisp-intro/html_node/etags.html
The you just types M-. and type the name of the function you want to visit and emacs will take you there.
I don't know what system or which editor you are using but most editors has a simular function.
If you don't use emacs an other good way to keep track of functions, and get a lots of other good features, is to use a versions control system. Like git, it provides really fast search.
If you don't use a version control system you may want to look at a program that is designed just for searching. Like OpenGrok.

How can you tell what files are currently open by any user?

I am trying to write a script or a piece of code to archive files, but I do not want to archive anything that is currently open. I need to find a way to determine what files in a directory are open. I want to use either Perl or a shell script, but can try use other languages if needed. It will be in a Linux environment and I do not have the option to use lsof. I have also had inconsistant results with fuser. Thanks for any help.
I am trying to take log files in a directory and move them to another directory. If the files are open however, I do not want to do anything with them.
You are approaching the problem incorrectly. You wish to keep files from being modified underneath you while you are reading, and cannot do that without operating system support. The best that you can hope for in a multi-user system is to keep your archive metadata consistent.
For example, if you are creating the archive directory, make sure that the number of bytes stored in the archive matches the directory. You can checksum the file contents before and after reading the filesystem and compare that with what you wrote to the archive and perhaps flag it as "inconsistent".
What are you trying to accomplish?
Added in response to comment:
Look at logrotate to steal ideas about how to handle this consistently just have it do the work for you. If you are concerned that rename of files will make processes that are currently writing them will break things, take a look at man 2 rename:
rename() renames a file, moving it
between directories if required. Any
other hard links to the file (as
created using link(2)) are unaffected.
Open file descriptors for oldpath are
also unaffected.
If newpath already exists it will be atomically replaced (subject
to a few conditions; see ERRORS
below), so that there is no point at
which another process attempting to
access newpath will find it missing.
Try ls -l /proc/*/fd/* as root.
msw has answered the question correctly but if you want to file the list of open processes, the lsof command will give it to you.

Resources