Bash Script to find, process and rename files? - linux

I am trying to put together a script which will run through all the files on my server (under various subdirectories) , look for .jpeg files and run them through a translator which converts them to non progressive jpgs.
I have:
find /home/disk2/ -type f -iname "*.jpg"
Which finds all the files.
Then if it finds for example 1.jpg, I need to run:
/usr/bin/jpegtrans /file location/1.jpg > /file location/1.jpg.temp
The jpegtrans app converts the file to a temp file which needs to replace the original file.
So then I need to delete the original and rename 1.jpg.temp to 1.jpg
rm /file location/1.jpg
mv /file location/1.jpg.temp /file location/1.jpg
I can easily do this for single files but i need to do it for 100's on my server.

Use find with -exec:
find /home/disk2/ -type f -iname "*.jpg" -exec sh -c "/usr/bin/jpegtrans {} > {}.temp; mv -f {}.temp {}" \;
EDIT: For handling spaces in filenames, say:
find /home/disk2/ -type f -iname "*.jpg" -exec sh -c "/usr/bin/jpegtrans '{}' > '{}.temp'; mv -f '{}.temp' '{}'" \;

Related

Copy recursive files of all the subdirectories

I want to copy all the log files from a directory which does not contain log files, but it contains other subdirectories with log files. These subdirectories also contain other subdirectories, so I need something recursive.
I tried
cp -R *.log /destination
But it doesn't work because the first directory does not contains log files. The response can be also a loop in bash.
find /path/to/logdir -type f -name "*.log" |xargs -I {} cp {} /path/to/destinationdir
Explanation:
find searches recursively
-type f tells you to search for files
-name specifies the name pattern
xargs executes commands
-I {} indicates an argument substitution symbol
Another version without xargs:
find /path/to/logdir -type f -name '* .log' -exec cp '{}' /path/to/destinationdir \;

Bash: Find files containing a certain string and copy them into a folder

What I want:
In a bash script: Find all files in current directory that contain a certain string "teststring" and cop them into a subfolder "./testfolder"
Found this to find the filenames which im looking for
find . -type f -print0 | xargs -0 grep -l "teststring"
..and this to copy found files to another folder (here selecting by strings in filename):
find . -type f -iname "stringinfilename" -exec cp {} ./testfolder/ \;
Whats the best way to combine both commands to achieve what I described at the top?
Just let find do both:
find . -name subdir -prune -o -type f -exec \
grep -q teststring "{}" \; -exec cp "{}" subdir \;
Note that things like this are much easier if you don't try to add to the directory you're working in. In other words, write to a sibling dir instead of writing to a subdirectory. If you want to wind up with the data in a subdir, mv it when you're done. That way, you don't have to worry about the prune (ie, you don't have to worry about find descending into the subdir and attempting to duplicate the work).

Unzip all gz files in all subdirectories in the terminal

Is there a way to unzip all gz files in the folder containing the zipfiles. When zip files are in subdirectories.
A query for
find -type f -name "*.gz"
Gives results like this:
./datasets/auto/auto.csv.gz
./datasets/prnn_synth/prnn_synth.csv.gz
./datasets/sleep/sleep.csv.gz
./datasets/mfeat-zernike/mfeat-zernike.csv.gz
./datasets/sonar/sonar.csv.gz
./datasets/wine-quality-white/wine-quality-white.csv.gz
./datasets/ring/ring.csv.gz
./datasets/diabetes/diabetes.csv.g
If you want, for each of those, to launch "gzip -d" on them:
cd theparentdir && gzip -d $(find ./ -type f -name '*.gz')
and then, to gzip them back:
cd theparentdir && gzip $(find ./ -type f -name '*.csv')
This will however choke in many cases
if filenames have some special characters (spaces, tabs, newline, etc) in them
other similar cases
or if there are TOO MANY files to be put after the gzip command!
A solution would be instead, if you have GNU find, to do :
find ... -print0 | xarsg -0 gzip -d # for the gunzip one, but still choke on files with "newline" in them
Another (arguably better?) solution, if you have GNU find at your disposal:
cd theparentdir && find ./ -type f -name '*.gz' -exec gzip -d '{}' '+'
and to re-zip all csv in that parentdir & all subdirs:
cd theparentdir && find ./ -type f -name '*.csv' -exec gzip '{}' '+'
"+" tells GNU find to try to put as many found files as it can on each gzip invocation (instead of doing 1 gzip incocation per file, very very ressource intensive and very innefficient and slow), similar to xargs, but with some benefits (1 command only, no pipe needed)
There is an option for recursivity (-r).
gzip -dr ./datasets
All archive will be decompressed in their own directory.
Example: gzip -dr ./a
a/b/c/test1.gz
a/b/d/test2.gz
a/e/test3.gz
After execution:
a/b/c/test1
a/b/d/test2
a/e/test3

create a list with content of multiple zip files in linux

I am trying to create a script for linux that will make a list with all files inside all zip files from a directory.
#! /bin/bash
for file in `find /home -iname "*.zip*" -type f`
do
unzip -l $(echo ${file}) >> /home/list.txt
done
It works, but only when there are no white spaces in filename.
What can I do to make it work ?
You can use the find command to execute a command for each file it finds. Perhaps try something like:
find /home -iname "*.zip*" -type f -exec unzip -l {} \; > /home/list.txt

cat files in subdirectories using linux commands

I have the following directories:
P922_101
P922_102
.
.
Each directory, for instance P922_101 has following subdirectories:
140311_AH8MHGADXX 140401_AH8CU4ADXX
Each subdirectory, for instance 140311_AH8MHGADXX has the following files:
1_140311_AH8MH_P922_101_1.fastq.gz 1_140311_AH8MH_P922_101_2.fastq.gz
2_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_2.fastq.gz
And files in 140401_AH8CU4ADXX are:
1_140401_AH8CU_P922_101_1.fastq.gz 1_140401_AH8CU_P922_4001_2.fastq.gz
2_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_4001_2.fastq.gz
I want to do 'cat' for the files in the subdirectories in the following way:
cat 1_140311_AH8MH_P922_101_1.fastq.gz 2_140311_AH8MH_P922_101_1.fastq.gz
1_140401_AH8CU_P922_101_1.fastq.gz 2_140401_AH8CU_P922_101_1.fastq.gz > P922_101_1.fastq.gz
which means that files ending with _1.fastq.gz should be concatenated into a single file and files ending with _2.fatsq.gz into another file.
It should be run for all files in subdirectories in all directories. Could someone give a linux solution to do this?
Since they're compressed, you should probably use gzip -dc (decompress and write to stdout) -
find /somePath -type f -name "*.fastq.gz" -exec gzip -dc {} \; | \
tee -a /someOutFolder/out.txt
You can use find for this:
find /top/path -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > one_file
find /top/path -mindepth 2 -type f -name "*_2.fastq.gz" -exec cat {} \; > another_file
This will look for all the files starting from /top/path and having a name matching the pattern _1.fastq.gz / _2.fastq.gz and cat them into the desired file. -mindepth 2 makes find look for files that are at least under the current directory; this way, files in /top/path won't be matched.
Note that you will probably need zcat instead of cat, for gz files.
As you keep adding details in comments, let's see what else we can do:
Say you have the list of directories in a file directories_list, each line containing one:
while read directory
do
find $directory -mindepth 2 -type f -name "*_1.fastq.gz" -exec cat {} \; > $directory/output
done < directories_list

Resources