bash script to iterate directories and create tar files - linux

I was searching for ways to create a bash file that would iterate all the folders in a directory, and create a tar.gz file for each of those directories.
(This is used specifically for ubuntu/drupal website - but could be useful in other scenarios.)

After lots of searching, combining scripts and testing, I found the following works very well when run from within the main directory.
This might be slightly different depending on version of bash, what version of ubuntu and from where you schedule or run the bash file. (Run by typing sh createDirectoryTarFiles.sh at command line from within the parent folder.)
The echo line is not necessary - just for viewing purposes.
for D in *; do
if [ -d "${D}" ]; then
tx="${D%????}"
echo "Directory is ${D} - and name of file would be $tx"
tar -zcvf "$tx.tar.gz" "${D}"
fi
done

You can use find to search for directories between mindepth and maxdepth and then create the tars
find . -maxdepth 1 -mindepth 1 -type d -exec tar czf $(basename {}).tar.gz {} \;

Related

Finding subdirectories of depth 1 that do _not_ include a file

I am working on an open-source project. In most, but not all, of the sub-directories of depth 1, a file called "test.c" can be found. How can I find out those directories that do not include "test.c"?
For example, I have subdirectories dir1, dir2, dir3. Dir2 and dir3 have "test.c". I have to manually check them with "ls" to determine "dir1" does not have "test.c". Probably there is a simpler way (such as a bash command) to do so? I am under Ubuntu 16. So a bash command would be preferred.
You may use this find command from base directory of all the sub-directories:
find . -type d -exec bash -c 'for d; do [[ -f "$d"/test.c ]] || echo "$d"; done' - {} +
This command finds all sub directories from current directory and checks for presence of file test.c in each directory in the bash command. If file is not present then directory name is printed.

Run a qsub command in all subdirectories

I am using Centos on a HPC to run my code. I typically have a folder that contains a run_calc File, which is what I want to run as:
qsub run_calc
Now I want to write a script "submit_all.sh" that submits all run_calc files in all the subfolders in their current directory and not from the from a parent folder where I runt the submit_all.sh script.
I found similar questions posted here Solution and here Solution2
that seems to be a partial answer to this question. I am not confident just submitting scripts until I found a solution which is why I ask:
In the second link I found this solution:
for i in {1..1000}; do
cd "$i"
qsub submit.sh
cd ..
done
were "i" was a list of folders with the names 1-100. Is it somehow possible to use find to create a list of all the subdirectories and path it to the for loop? How would i deal with subsubdirectories? Would I be able to change the cd .. statement such that I always go back to the parent folder directly in that case?
I fond this solution here: Solution
#!/bin/sh
for i in `find /var/www -type d -maxdepth 1 -mindepth 1`; do
cd $i
# do something here
done
But I do not understand what is going on? Is it possible to change the above script to the only dive into folders containing a run_calc File and also include subsubdirectries?
Thank you in advance
Assuming that you are using bash as your shell:
$ cat ./test.sh
#!/bin/bash
IFS=$'\n'
while read -r fname ;
do
pushd $(dirname "${fname}") > /dev/null
qsub run_calc
popd > /dev/null
done < <(find . -type f -name 'run_calc')
find . -type f -name 'run_calc' finds all paths to file run_calc inside the current directory and its subdirectories. This is input for the while loop.
pushd, popd are bash specific, and adds in or pops out of directory stack.
for d in `find . -type d`
do ( cd "$d"
if test ! -f run_calc; then continue; fi
qsub run_calc
) done
( commnds ) execute commands in a separate process and effect of cd does not "leak".

Execute multiple commands on target files from find command

Let's say I have a bunch of *.tar.gz files located in a hierarchy of folders. What would be a good way to find those files, and then execute multiple commands on it.
I know if I just need to execute one command on the target file, I can use something like this:
$ find . -name "*.tar.gz" -exec tar xvzf {} \;
But what if I need to execute multiple commands on the target file? Must I write a bash script here, or is there any simpler way?
Samples of commands that need to be executed a A.tar.gz file:
$ tar xvzf A.tar.gz # assume it untars to folder logs
$ mv logs logs_A
$ rm A.tar.gz
Here's what works for me (thanks to Etan Reisner suggestions)
#!/bin/bash # the target folder (to search for tar.gz files) is parsed from command line
find $1 -name "*.tar.gz" -print0 | while IFS= read -r -d '' file; do # this does the magic of getting each tar.gz file and assign to shell variable `file`
echo $file # then we can do everything with the `file` variable
tar xvzf $file
# mv untar_folder $file.suffix # untar_folder is the name of folder after untar
rm $file
done
As suggested, the array way is unsafe if file name contained space(s), and also doesn't seem to work properly in this case.
Writing a shell script is probably easiest. Take a look at sh for loops. You could use the output of a find command in an array, and then loop over that array to perform a set of commands on each element.
For example,
arr=( $(find . -name "*.tar.gz" -print0) )
for i in "${arr[#]}"; do
# $i now holds each of the filenames output by find
tar xvzf $i
mv $i $i.suffix
rm $i
# etc., etc.
done

Find the name of subdirectories and process files in each

Let's say /tmp has subdirectories /test1, /test2, /test3 and so on,
and each has multiple files inside.
I have to run a while loop or for loop to find the name of the directories (in this case /test1, /test2, ...)
and run a command that processes all the files inside of each directory.
So, for example,
I have to get the directory names under /tmp which will be test1, test2, ...
For each subdirectory, I have to process the files inside of it.
How can I do this?
Clarification:
This is the command that I want to run:
find /PROD/140725_D0/ -name "*.json" -exec /tmp/test.py {} \;
where 140725_D0 is an example of one subdirectory to process - there are multiples, with different names.
So, by using a for or while loop, I want to find all subdirectories and run a command on the files in each.
The for or while loop should iteratively replace the hard-coded name 140725_D0 in the find command above.
You should be able to do with a single find command with an embedded shell command:
find /PROD -type d -execdir sh -c 'for f in *.json; do /tmp/test.py "$f"; done' \;
Note: -execdir is not POSIX-compliant, but the BSD (OSX) and GNU (Linux) versions of find support it; see below for a POSIX alternative.
The approach is to let find match directories, and then, in each matched directory, execute a shell with a file-processing loop (sh -c '<shellCmd>').
If not all subdirectories are guaranteed to have *.json files, change the shell command to for f in *.json; do [ -f "$f" ] && /tmp/test.py "$f"; done
Update: Two more considerations; tip of the hat to kenorb's answer:
By default, find processes the entire subtree of the input directory. To limit matching to immediate subdirectories, use -maxdepth 1[1]:
find /PROD -maxdepth 1 -type d ...
As stated, -execdir - which runs the command passed to it in the directory currently being processed - is not POSIX compliant; you can work around this by using -exec instead and by including a cd command with the directory path at hand ({}) in the shell command:
find /PROD -type d -exec sh -c 'cd "{}" && for f in *.json; do /tmp/test.py "$f"; done' \;
[1] Strictly speaking, you can place the -maxdepth option anywhere after the input file paths on the find command line - as an option, it is not positional. However, GNU find will issue a warning unless you place it before tests (such as -type) and actions (such as -exec).
Try the following usage of find:
find . -type d -exec sh -c 'cd "{}" && echo Do some stuff for {}, files are: $(ls *.*)' ';'
Use -maxdepth if you'd like to limit your directory levels.
You can do this using bash's subshell feature like so
for i in /tmp/test*; do
# don't do anything if there's no /test directory in /tmp
[ "$i" != "/tmp/test*" ] || continue
for j in $i/*.json; do
# don't do anything if there's nothing to run
[ "$j" != "$i/*.json" ] || continue
(cd $i && ./file_to_run)
done
done
When you wrap a command in ( and ) it starts a subshell to run the command. A subshell is exactly like starting another instance of bash except it's slightly more optimal.
You can also simply ask the shell to expand the directories/files you need, e.g. using command xargs:
echo /PROD/*/*.json | xargs -n 1 /tmp/test.py
or even using your original find command:
find /PROD/* -name "*.json" -exec /tmp/test.py {} \;
Both command will process all JSON files contained into any subdirectory of /PROD.
Another solution is to change slightly the Python code inside your script in order to accept and process multiple files.
For example, if your script contains something like:
def process(fname):
print 'Processing file', fname
if __name__ == '__main__':
import sys
process(sys.argv[1])
you could replace the last line with:
for fname in sys.argv[1:]:
process(fname)
After this simple modification, you can call your script this way:
/tmp/test.py /PROD/*/*.json
and have it process all the desired JSON files.

for each dir create a tar file

I have a bunch of directories that need to be restored, but they have to first be packaged into a .tar. Is there a script that would allow me to package all 100+ directories into their own tar so dir becomes dir.tar.
So far attempt:
for i in *; do tar czf $i.tar $i; done
The script that you wrote will not work if you have some spaces in a directory name, because the name will be split, and also it will tar files if they exist on this level.
You can use this command to list directories not recursively:
find . -maxdepth 1 -mindepth 1 -type d
and this one to perform a tar on each one:
find . -maxdepth 1 -mindepth 1 -type d -exec tar cvf {}.tar {} \;
Do you have any directory names with spaces in them at that level? If not, your script will work just fine.
What I usually do is write a script with the command I want to execute echoed out:
$ for i in *
do
echo tar czf $i.tar $i
done
Then you can look at the output and see if it's doing what you want. After you've determined that the program will work, edit the command line and remove the echo command.
If there are spaces in the directory names, then just put the variables inside double quotes:
for i in *
do
tar czf "$i.tar" "$i"
done
Get them all done simply and in parallel with GNU Parallel:
parallel tar -cf {}.tar {} ::: *
If you want to check what it is going to do without actually doing anything, add --dry-run like this:
parallel --dry-run tar -cf {}.tar {} ::: *
Sample Output
tar -cf ab.tar ab
tar -cf cd.tar cd
if number of directories are very large and their names are too long
after execution of statement number one
for i in *
do
echo tar czf $i.tar $i
done
you will get error "string too long"

Resources