Finding a file within recursive directory of zip files - linux

I have an entire directory structure with zip files. I would like to:
Traverse the entire directory structure recursively grabbing all the zip files
I would like to find a specific file "*myLostFile.ext" within one of these zip files.
What I have tried
1. I know that I can list files recursively pretty easily:
find myLostfile -type f
2. I know that I can list files inside zip archives:
unzip -ls myfilename.zip
How do I find a specific file within a directory structure of zip files?

You can omit using find for single-level (or recursive in bash 4 with globstar) searches of .zip files using a for loop approach:
for i in *.zip; do grep -iq "mylostfile" < <( unzip -l $i ) && echo $i; done
for recursive searching in bash 4:
shopt -s globstar
for i in **/*.zip; do grep -iq "mylostfile" < <( unzip -l $i ) && echo $i; done

You can use xargs to process the output of find or you can do something like the following:
find . -type f -name '*zip' -exec sh -c 'unzip -l "{}" | grep -q myLostfile' \; -print
which will start searching in . for files that match *zip then will run unzip -ls on each and search for your filename. If that filename is found it will print the name of the zip file that matched it.

Some have suggested to use ugrep to search zip files and tarballs. To find the zip files that contain a mylostfile file, specify it as a -g glob pattern like so:
ugrep -z -l -g'myLostfile' ''
With the empty regex pattern '' this this recursively searches all files down the working directory, including any zip, tar, cpio/pax archives for mylostfile. If you only want to search the zip files located in the working directory:
ugrep -z -l -g'myLostfile' '' *.zip

Related

Copying a type of file, in specific directories, to another directory

I have a .txt file that contains a list of directories. I want to make a script that goes through this .txt file, copies anything in the directory thats listed of a certain file type, to another directory.
I've never done this with directories, only files.
How can i edit this simple script to work for reading a directory list, looking for a .csv file, and copy it to another directory?
cat filenames.list | \
while read FILENAME
do
find . -name "$FILENAME" -exec cp '{}' new_dir\;
done
for DIRNAME in $(dirname.list); do find $DIRNAME -type f -name "*.csv" -exec cp \{} dest \; ; done;
sorry, in my first answer i didnt understand what you asking for.
The first line of code, simply, take a dirname entry in your directory list as a path and search in it for each file which end with ".csv" extension; then copy it inside the destination you want.
But you could do with less code:
for DIRNAME in $(dirname.list); do cp $DIRNAME/*.csv dest ; done
Despite the filename of the list filenames.list, let me assume the file contains the list of directory names, not filenames. Then would you please try:
while IFS= read -r dir; do
find "$dir" -type f -name "*.mp3" -exec cp -p -- {} new_dir \;
done < filenames.list
The find command searches in "$dir" for files which have an extension .mp3 then copies them to the new_dir.
The script above does not care the duplication of the filenames. If you want to keep the original directory tree and/or need a countermeasure for the duplication of the filenames, please let me know.
Using find inside a while loop works but find will run on each line of the file, another alternative is to save the list in an array, that way find can search on the directories in the list in one search.
If you have bash4+ you can use mapfile.
mapfile -t directories < filenames.list
If you're stuck at bash3.
directories=()
while IFS= read -r line; do
directories+=("$lines")
done < filenames.list
Now if you're just after one file type like files ending in *.csv.
find "${directories[#]}" -type f -name '*.csv' -exec sh -c 'cp -v -- "$#" /newdirectory' _ {} +
If you have multiple file type to match and multiple directories to copy the files.
while IFS= read -r -d '' file; do
case $file in
*.csv) cp -v -- "$file" /foodirectory;; ##: csv file copy to foodirectory
*.mp3) cp -v -- "$file" /bardirectory;; ##: mp3 file copy to bardirectory
*.avi) cp -v -- "$file" /bazdirectory;; ##: avi file copy to bazdirectory
esac
done < <(find "${directories[#]}" -type f -print0)
find's print0 will work with read's -d '' when dealing with files with white spaces and newlines. see How can I find and deal with file names containing newlines, spaces or both?
The -- is there so if you have a problematic filename that starts with a dash - cp will not interpret it as an option.
Given find ability to process multiple folder, and assuming goal is to 'flatten' all csv files into a single destination, consider the following.
Note that it assumes folder names do not have special characters (including spaces, tabs, new lines, etc).
As a side benefit, it will minimize the number of 'cp' calls, making the process efficient across large number of files/folders.
find $(<filename.list) -name '*.csv' | xargs cp -t DESTINATION/
For the more complex case, where folder names/file name can be anything (including space, '*', etc.), consider using NUL separator (-print0 and -0).
xargs -I{} -t find '{}' -name '*.csv' <dd -print0 | xargs -0 -I{} -t cp -t new/ '{}'
Which will fork multiple find and multiple cp.

Rename all files in multiple folders with some condition in single linux command os script.

I have multiple folders with multiple files. I need to rename those files with the same name like the folder where the file stored with "_partN" prefix.
As example,
I have a folder named as "new_folder_for_upload" which have 2 files. I need to convert the name of these 2 files like,
new_folder_for_upload_part1
new_folder_for_upload_part2
I have so many folders like above which have multiple files. I need to convert all the file names as I describe above.
Can anybody help me to find out for a single linux command or script to do this work automatically?
Assuming bash shell, and assuming you want the file numbering to restart for each subdirectory, and doing the moving of all files to the top directory (leaving empty subdirectories). Formatted as script for easier reading:
find . -type f -print0 | while IFS= read -r -d '' file
do
myfile=$(echo $file | sed "s#./##")
mydir=$(dirname "$myfile")
if [[ $mydir != $lastdir ]]
then
NR=1
fi
lastdir=${mydir}
mv "$myfile" "$(dirname "$myfile")_part${NR}"
((NR++))
done
Or as one-line command:
find . -type f -print0 | while IFS= read -r -d '' file; do myfile=$(echo $file | sed "s#./##"); mydir=$(dirname "$myfile"); if [[ $mydir != $lastdir ]]; then NR=1; fi; lastdir=${mydir}; mv "$myfile" "$(dirname "$myfile")_part${NR}"; ((NR++)); done
Beware. This is armed, and will do a bulk renaming / moving of every file in or below your current work directory. Use at your own risk.
To delete the empty subdirs:
find . -depth -empty -type d -delete

copy entire directory excluding a file

As we know, cp -r source_dir intended_new_directory creates a copy of source directory with a new name. Now I want to do the same but want to exclude a particular file. I have found some related answers here, using tar and rsync, but in those solutions I need to create the destination directory first (using mkdir).
I honestly searched a lot, but didn't find exactly what I want.
So far the best I got is this:
tar -c --exclude=\*.dll --exclude=\*.exe sourceDir | tar -x -C destDir
(from http://www.linuxquestions.org/questions/programming-9/how-to-copy-an-entire-directory-structure-except-certain-files-385321/)
If you have binutils, you could use find to filter next cpio to copy (and create directories) :
find <sourceDir> \( ! -name *.dll \) -a \( ! -name *.exe \) | cpio -dumpv <destDir>
Try this by excluding the file using 'grep -v' ->
cp `ls | grep -v <exclude-file>` <dest-dir>
If the directory is not very large I used to write something like this:
src=path/to/source/directory
dst=path/to/destination/directory
find $src -type f | while read f ; do mkdir -p "$dst/`dirname $f`"; cp "$f" "$dst/$f" ; done
Here we list all regular files in $src, iterate over this list and for each file make a directory in $dst if it does not exist yet (-p option of mkdir), then copy the file to that directory.
The above command will copy all the files. Finally, just use
find $src -type f | grep -v whatever | while ...... # same as above
to filter out the files you don't need (e.g. \.bak$, \.orig$, or whatever files you don't want to copy).
Move all exclude file into home or other directory,copy the directory containing all remaining files to the destination folder then restore all exclude files.
#cd mydirectory
#mv exclude1 exclude2 /home/
#cp mydirectory destination_folder/
#cd /home/
#mv eclude1 exclude2 mydirectory/

Find all directories containing a file that contains a keyword in linux

In my hierarchy of directories I have many text files called STATUS.txt. These text files each contain one keyword such as COMPLETE, WAITING, FUTURE or OPEN. I wish to execute a shell command of the following form:
./mycommand OPEN
which will list all the directories that contain a file called STATUS.txt, where this file contains the text "OPEN"
In future I will want to extend this script so that the directories returned are sorted. Sorting will determined by a numeric value stored the file PRIORITY.txt, which lives in the same directories as STATUS.txt. However, this can wait until my competence level improves. For the time being I am happy to list the directories in any order.
I have searched Stack Overflow for the following, but to no avail:
unix filter by file contents
linux filter by file contents
shell traverse directory file contents
bash traverse directory file contents
shell traverse directory find
bash traverse directory find
linux file contents directory
unix file contents directory
linux find name contents
unix find name contents
shell read file show directory
bash read file show directory
bash directory search
shell directory search
I have tried the following shell commands:
This helps me identify all the directories that contain STATUS.txt
$ find ./ -name STATUS.txt
This reads STATUS.txt for every directory that contains it
$ find ./ -name STATUS.txt | xargs -I{} cat {}
This doesn't return any text, I was hoping it would return the name of each directory
$ find . -type d | while read d; do if [ -f STATUS.txt ]; then echo "${d}"; fi; done
... or the other way around:
find . -name "STATUS.txt" -exec grep -lF "OPEN" \{} +
If you want to wrap that in a script, a good starting point might be:
#!/bin/sh
[ $# -ne 1 ] && echo "One argument required" >&2 && exit 2
find . -name "STATUS.txt" -exec grep -lF "$1" \{} +
As pointed out by #BroSlow, if you are looking for directories containing the matching STATUS.txt files, this might be more what you are looking for:
fgrep --include='STATUS.txt' -rl 'OPEN' | xargs -L 1 dirname
Or better
fgrep --include='STATUS.txt' -rl 'OPEN' |
sed -e 's|^[^/]*$|./&|' -e 's|/[^/]*$||'
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# simulate `xargs -L 1 dirname` using `sed`
# (no trailing `\`; returns `.` for path without dir part)
Maybe you can try this:
grep -rl "OPEN" . --include='STATUS.txt'| sed 's/STATUS.txt//'
where grep -r means recursive , -l means only list the files matching, '.' is the directory location. You can pipe it to sed to remove the file name.
You can then wrap this in a bash script file where you can pass in keywords such as 'OPEN', 'FUTURE' as an argument.
#!/bin/bash
grep -rl "$1" . --include='STATUS.txt'| sed 's/STATUS.txt//'
Try something like this
find -type f -name "STATUS.txt" -exec grep -q "OPEN" {} \; -exec dirname {} \;
or in a script
#!/bin/bash
(($#==1)) || { echo "Usage: $0 <pattern>" && exit 1; }
find -type f -name "STATUS.txt" -exec grep -q "$1" {} \; -exec dirname {} \;
You could use grep and awk instead of find:
grep -r OPEN * | awk '{split($1, path, ":"); print path[1]}' | xargs -I{} dirname {}
The above grep will list all files containing "OPEN" recursively inside you dir structure. The result will be something like:
dir_1/subdir_1/STATUS.txt:OPEN
dir_2/subdir_2/STATUS.txt:OPEN
dir_2/subdir_3/STATUS.txt:OPEN
Then the awk script will split this output at the colon and print the first part of it (the dir path).
dir_1/subdir_1/STATUS.txt
dir_2/subdir_2/STATUS.txt
dir_2/subdir_3/STATUS.txt
The dirname will then return only the directory path, not the file name, which I suppose it what you want.
I'd consider using Perl or Python if you want to evolve this further, though, as it might get messier if you want to add priorities and sorting.
Taking up the accepted answer, it does not output a sorted and unique directory list. At the end of the "find" command, add:
| sort -u
or:
| sort | uniq
to get the unique list of the directories.
Credits go to Get unique list of all directories which contain a file whose name contains a string.
IMHO you should write a Python script which:
Examines your directory structure and finds all files named STATUS.txt.
For each found file:
reads the file and executes mycommand depending on what the file contains.
If you want to extend the script later with sorting, you can find all the interesting files first, save them to a list, sort the list and execute the commands on the sorted list.
Hint: http://pythonadventures.wordpress.com/2011/03/26/traversing-a-directory-recursively/

Search a directory of tarballs for a file?

So I have a directory of tarballs. i.e /home/username/dir_w_tarballs. In one of those tarballs is a license key from an engineer that used to work here. How can I search across each of the tarballs in the directory for a specific file? Something similar to find . -name some_file? I know I can search a single tarball with tar -jtvf file.tar.bz2. I am hoping for something a bit broader in scope.
How about this:
find -name '*.tar' -o -name '*.tar.*' | while read f; do
tar -tf "$f" | grep some_file | sed "s|^|$f:|"
done
It will recursively find all tarballs under the current directory, list each one of them, use grep to search for a specific file and then sed to prepend the tarball name to each match so that you would be able to tell which tarball contains each match...
You can search for a specific file or pattern using tar and grep command and bash for loop. Just change < pattern >:
for i in *.tar.gz;
do echo $i;
tar tf $i| grep <pattern>;
echo '------------';
done
It goes through every file in the folder and extracts only tree content and grep through them.
If you want to search recursively:
find . -type f -name "*.tar.gz" | while read f; do echo $f; tar tf $f| grep db_dump ;done

Resources