Linux: Find a List of Files in a Dictionary recursively - linux

I have a Textfile with one Filename per row:
Interpret 1 - Song 1.mp3
Interpret 2 - Song 2.mp3
...
(About 200 Filenames)
Now I want to search a Folder recursivly for this Filenames to get the full path for each Filename in Filenames.txt.
How to do this? :)
(Purpose: Copied files to my MP3-Player but some of them are broken and i want to recopy them all without spending hours of researching them out of my music folder)

The easiest way may be the following:
cat orig_filenames.txt | while read file ; do find /dest/directory -name "$file" ; done > output_file_with_paths

Much faster way is run the find command only once and use fgrep.
find . -type f -print0 | fgrep -zFf ./file_with_filenames.txt | xargs -0 -J % cp % /path/to/destdir

You can use a while read loop along with find:
filecopy.sh
#!/bin/bash
while read line
do
find . -iname "$line" -exec cp '{}' /where/to/put/your/files \;
done < list_of_files.txt
Where list_of_files.txt is the list of files line by line, and /where/to/put/your/files is the location you want to copy to. You can just run it like so in the directory:
$ bash filecopy.sh

+1 for #jm666 answer, but the -J option doesn't work for my flavor of xargs, so i chaned it to:
find . -type f -print0 | fgrep -zFf ./file_with_filenames.txt | xargs -0 -I{} cp "{}" /path/to/destdir/

Related

delete all files except a pattern list file

I need to delete all the files in the current directory except a list of patterns that are described in a whitelist file (delete_whitelist.txt) like this:
(.*)dir1(/)?
(.*)dir2(/)?
(.*)dir2/ser1(/)?(.*)
(.*)dir2/ser2(/)?(.*)
(.*)dir2/ser3(/)?(.*)
(.*)dir2/ser4(/)?(.*)
(.*)dir2/ser5(/)?(.*)
How can I perform this in one bash line?
Any bash script can fit on one line:
find . -type f -print0 | grep -EzZvf delete_whitelist.txt | xargs -0 printf '%s\n'
Check the output and then, if it's OK:
find . -type f -print0 | grep -EzZvf delete_whitelist.txt | xargs -0 rm

Find/list only file types recursive in a directory

how can i search for files and give only a list of mimes or types out.
example:
dir
-file1.pdf
-file2.pdf
-dir2
--file3.png
--file4.pdf
wished output:
pdf
png
Edit
Found also a solution, but does not make a difference between upper and lowercase and also not .peng and .png
find . -type f -printf '%f\n' | sed 's/^.*\.//' | sort -u
Use find's -exec option together with any solution that extracts the extension. Then pipe through sort -u to remove duplicates.
find dir -type f -exec bash -c 'printf %s\\n "${###*.}"' x {} + | sort -u
Files without extensions will be listed too. To filter them out add the option -name '*\.*' before -exec.

linux bash count size of files in folder

i saw a few posts in forum but i cant manage to make them work for me
i have a script that runs in a folder and i want it to count the size only of the files in that folder but without the folders inside.
so if i have
file1.txt
folder1
file2.txt
it will return the size in bytes of file1+file2 without folder1
find . -maxdepth 1 -type f
gives me a list of all the files i want to count but how can i get the size of all this files?
The tool for this is xargs:
find "$dir" -maxdepth 1 -type f -print0 | xargs -0 wc -c
Note that find -print0 and xargs -0 are GNU extensions, but if you know they are available, they are well worth using in your script - you don't know what characters might be present in the filenames in the target directory.
You will need to post-process the output of wc; alternatively, use cat to give it a single input stream, like this:
find "$dir" -maxdepth 1 -type f -print0 | xargs -0 cat | wc -c
That gives you a single number you can use in following commands.
(I've assumed you meant "size" in bytes; obviously substitute wc -m if you meant characters or wc -l if you meant lines).

Is it possible to pipe the results of FIND to a COPY command CP?

Is it possible to pipe the results of find to a COPY command cp?
Like this:
find . -iname "*.SomeExt" | cp Destination Directory
Seeking, I always find this kind of formula such as from this post:
find . -name "*.pdf" -type f -exec cp {} ./pdfsfolder \;
This raises some questions:
Why cant you just use | pipe? isn't that what its for?
Why does everyone recommend the -exec
How do I know when to use that (exec) over pipe |?
There's a little-used option for cp: -t destination -- see the man page:
find . -iname "*.SomeExt" | xargs cp -t Directory
Good question!
why cant you just use | pipe? isn't that what its for?
You can pipe, of course, xargs is done for these cases:
find . -iname "*.SomeExt" | xargs cp Destination_Directory/
Why does everyone recommend the -exec
The -exec is good because it provides more control of exactly what you are executing. Whenever you pipe there may be problems with corner cases: file names containing spaces or new lines, etc.
how do I know when to use that (exec) over pipe | ?
It is really up to you and there can be many cases. I would use -exec whenever the action to perform is simple. I am not a very good friend of xargs, I tend to prefer an approach in which the find output is provided to a while loop, such as:
while IFS= read -r result
do
# do things with "$result"
done < <(find ...)
You can use | like below:
find . -iname "*.SomeExt" | while read line
do
cp $line DestDir/
done
Answering your questions:
| can be used to solve this issue. But as seen above, it involves a lot of code. Moreover, | will create two process - one for find and another for cp.
Instead using exec() inside find will solve the problem in a single process.
Try this:
find . -iname "*.SomeExt" -print0 | xargs -0 cp -t Directory
# ........................^^^^^^^..........^^
In case there is whitespace in filenames.
I like the spirit of the response from #fedorqui-so-stop-harming, but it needed a tweak to work in my bash terminal.
In this version...
find . -iname "*.SomeExt" | xargs cp Destination_Directory/
The cp command incorrectly takes Destination_Directory/ as the first argument. I needed to add a replacement string in order to get xargs to insert the argument in the right position for cp. I used a percent symbol for the replacement string, but you can use anything that doesn't conflict with the input from the pipe. This version works for me.
find . -iname "*.SomeExt" | xargs -I % cp % Destination_Directory/
This SOLVED my problem.
find . -type f | grep '\.pdf' | while read line
do
cp $line REPLACE_WITH_TARGET_DIRECTORY
done
If there are spaces in the filenames, try:
find . -iname *.ext > list.txt
cat list.txt | awk 'BEGIN {a="'"'"'"}{print "cp "a$0a" Directory"}' > script.sh
sh script.sh
You can inspect list.txt and script.sh before sh script.sh. Remember to delete the list.txt and script.sh afterwards.
I had some files with parenthesis and wanted a progress bar, so replaced the cat line with:
cat list.txt | awk -v X='"' '{print "rsync -Pa "X$0X" /Volumes/Untitled/"}' > script.sh

Find files and tar them (with spaces)

Alright, so simple problem here. I'm working on a simple back up code. It works fine except if the files have spaces in them. This is how I'm finding files and adding them to a tar archive:
find . -type f | xargs tar -czvf backup.tar.gz
The problem is when the file has a space in the name because tar thinks that it's a folder. Basically is there a way I can add quotes around the results from find? Or a different way to fix this?
Use this:
find . -type f -print0 | tar -czvf backup.tar.gz --null -T -
It will:
deal with files with spaces, newlines, leading dashes, and other funniness
handle an unlimited number of files
won't repeatedly overwrite your backup.tar.gz like using tar -c with xargs will do when you have a large number of files
Also see:
GNU tar manual
How can I build a tar from stdin?, search for null
There could be another way to achieve what you want. Basically,
Use the find command to output path to whatever files you're looking for. Redirect stdout to a filename of your choosing.
Then tar with the -T option which allows it to take a list of file locations (the one you just created with find!)
find . -name "*.whatever" > yourListOfFiles
tar -cvf yourfile.tar -T yourListOfFiles
Try running:
find . -type f | xargs -d "\n" tar -czvf backup.tar.gz
Why not:
tar czvf backup.tar.gz *
Sure it's clever to use find and then xargs, but you're doing it the hard way.
Update: Porges has commented with a find-option that I think is a better answer than my answer, or the other one: find -print0 ... | xargs -0 ....
If you have multiple files or directories and you want to zip them into independent *.gz file you can do this. Optional -type f -atime
find -name "httpd-log*.txt" -type f -mtime +1 -exec tar -vzcf {}.gz {} \;
This will compress
httpd-log01.txt
httpd-log02.txt
to
httpd-log01.txt.gz
httpd-log02.txt.gz
Would add a comment to #Steve Kehlet post but need 50 rep (RIP).
For anyone that has found this post through numerous googling, I found a way to not only find specific files given a time range, but also NOT include the relative paths OR whitespaces that would cause tarring errors. (THANK YOU SO MUCH STEVE.)
find . -name "*.pdf" -type f -mtime 0 -printf "%f\0" | tar -czvf /dir/zip.tar.gz --null -T -
. relative directory
-name "*.pdf" look for pdfs (or any file type)
-type f type to look for is a file
-mtime 0 look for files created in last 24 hours
-printf "%f\0" Regular -print0 OR -printf "%f" did NOT work for me. From man pages:
This quoting is performed in the same way as for GNU ls. This is not the same quoting mechanism as the one used for -ls and -fls. If you are able to decide what format to use for the output of find then it is normally better to use '\0' as a terminator than to use newline, as file names can contain white space and newline characters.
-czvf create archive, filter the archive through gzip , verbosely list files processed, archive name
Edit 2019-08-14:
I would like to add, that I was also able to use essentially use the same command in my comment, just using tar itself:
tar -czvf /archiveDir/test.tar.gz --newer-mtime=0 --ignore-failed-read *.pdf
Needed --ignore-failed-read in-case there were no new PDFs for today.
Why not give something like this a try: tar cvf scala.tar `find src -name *.scala`
Another solution as seen here:
find var/log/ -iname "anaconda.*" -exec tar -cvzf file.tar.gz {} +
The best solution seem to be to create a file list and then archive files because you can use other sources and do something else with the list.
For example this allows using the list to calculate size of the files being archived:
#!/bin/sh
backupFileName="backup-big-$(date +"%Y%m%d-%H%M")"
backupRoot="/var/www"
backupOutPath=""
archivePath=$backupOutPath$backupFileName.tar.gz
listOfFilesPath=$backupOutPath$backupFileName.filelist
#
# Make a list of files/directories to archive
#
echo "" > $listOfFilesPath
echo "${backupRoot}/uploads" >> $listOfFilesPath
echo "${backupRoot}/extra/user/data" >> $listOfFilesPath
find "${backupRoot}/drupal_root/sites/" -name "files" -type d >> $listOfFilesPath
#
# Size calculation
#
sizeForProgress=`
cat $listOfFilesPath | while read nextFile;do
if [ ! -z "$nextFile" ]; then
du -sb "$nextFile"
fi
done | awk '{size+=$1} END {print size}'
`
#
# Archive with progress
#
## simple with dump of all files currently archived
#tar -czvf $archivePath -T $listOfFilesPath
## progress bar
sizeForShow=$(($sizeForProgress/1024/1024))
echo -e "\nRunning backup [source files are $sizeForShow MiB]\n"
tar -cPp -T $listOfFilesPath | pv -s $sizeForProgress | gzip > $archivePath
Big warning on several of the solutions (and your own test) :
When you do : anything | xargs something
xargs will try to fit "as many arguments as possible" after "something", but then you may end up with multiple invocations of "something".
So your attempt: find ... | xargs tar czvf file.tgz
may end up overwriting "file.tgz" at each invocation of "tar" by xargs, and you end up with only the last invocation! (the chosen solution uses a GNU -T special parameter to avoid the problem, but not everyone has that GNU tar available)
You could do instead:
find . -type f -print0 | xargs -0 tar -rvf backup.tar
gzip backup.tar
Proof of the problem on cygwin:
$ mkdir test
$ cd test
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs touch
# create the files
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar czvf archive.tgz
# will invoke tar several time as it can'f fit 10000 long filenames into 1
$ tar tzvf archive.tgz | wc -l
60
# in my own machine, I end up with only the 60 last filenames,
# as the last invocation of tar by xargs overwrote the previous one(s)
# proper way to invoke tar: with -r (which append to an existing tar file, whereas c would overwrite it)
# caveat: you can't have it compressed (you can't add to a compressed archive)
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar rvf archive.tar #-r, and without z
$ gzip archive.tar
$ tar tzvf archive.tar.gz | wc -l
10000
# we have all our files, despite xargs making several invocations of the tar command
Note: that behavior of xargs is a well know diccifulty, and it is also why, when someone wants to do :
find .... | xargs grep "regex"
they intead have to write it:
find ..... | xargs grep "regex" /dev/null
That way, even if the last invocation of grep by xargs appends only 1 filename, grep sees at least 2 filenames (as each time it has: /dev/null, where it won't find anything, and the filename(s) appended by xargs after it) and thus will always display the file names when something maches "regex". Otherwise you may end up with the last results showing matches without a filename in front.

Resources