Convert all files of a specified extension within a directory to pdf, recursively for all sub-directories - linux

I'm using the following code (from this answer) to convert all CPP files in the current directory to a file named code.pdf and it works well:
find . -name "*.cpp" -print0 | xargs -0 enscript -Ecpp -MLetter -fCourier8 -o - | ps2pdf - code.pdf
I would like to improve this script to:
Make it a .sh file that can take an argument specifying the
extension instead of having it hardcoded to CPP;
Have it run recursively, visiting all subdirectories of the current directory;
For each subdirectory encountered, convert all files of the specified extension to a single PDF that is named $NameOfDirectory$.PDF and is placed in that subdirectory;

First, if I understand it correctly, this requirement:
For each subdirectory encountered, convert all files of the specified extension to a single PDF that is named $NameOfDirectory$.PDF
is unwise. If that means, say, a/b/c/*.cpp gets enscripted to ./c.pdf, then you're screwed if you also have a/d/x/c/*.cpp, since both directories' contents map to the same PDF. It also means that *.cpp (i.e. CPP files in the current dir) gets enscripted to a file named ./..pdf.
Something like this, which names the PDF according to the desired extension and places it in each subdirectory alongside its source files, doesn't have those problems:
#!/usr/bin/env bash
# USAGE: ext2pdf [<ext> [<root_dir>]]
# DEFAULTS: <ext> = cpp
# <root_dir> = .
ext="${1:-cpp}"
rootdir="${2:-.}"
shopt -s nullglob
find "$rootdir" -type d | while read d; do
# With "nullglob", this loop only runs if any $d/*.$ext files exist
for f in "$d"/*.${ext}; do
out="$d/$ext".pdf
# NOTE: Uncomment the following line instead if you want to risk name collisions
#out="${rootdir}/$(basename "$d")".pdf
enscript -Ecpp -MLetter -fCourier8 -o - "$d"/*.${ext} | ps2pdf - "$out"
break # We only want this to run once
done
done

First, if I understand correctly, what you are using is in fact wrong - find will retrieve the files from all sub-directories. To work recursively, only getting files from the current directory (I named it do.bash):
#!/bin/bash
ext=$1
if ls *.$ext &> /dev/null; then
enscript -Ecpp -MLetter -fCourier8 -o - *.$ext | ps2pdf - $(basename $(pwd)).pdf
fi
for subdir in */; do
if [ "$subdir" == "*/" ]; then break; fi
cd $subdir
/path/to/do.bash $ext
cd ../
done
The checks are to make sure a file with the extension or a sub directory actually exist. This scripts operates on the current directory, and calls itself recursively - if you do not want a full path, put it in your PATH listings, though a full path is fine.

Related

Linux - How to zip files per subdirectory separately

I have directory structure like this.
From this I want to create different zip files such as
data-A-A_1-A_11.zip
data-A-A_1-A_12.zip
data-B-B_1-B_11.zip
data-C-C_1-C_11.zip
while read line;
do
echo "zip -r ${line//\//-}.zip $line";
# zip -r "${line//\//-}.zip" "$line"
done <<< "$(find data -maxdepth 3 -mindepth 2 -type d)"
Redirect the result of a find command into a while loop. The find command searches the directory data for directories only, searching 3 directories deep only. In the while loop with use bash expansion to convert all forward slashes to "-" and add ".zip" in such a way that we can build a zip command on each directory. Once you are happy that the zip command looks fine when echoed for each directory, comment in the actual zip command

How to randomly distribute the files across 3 folders using Bash script?

I have many subdirectories and files in the folder mydata/files. I want to take files and copy them randomly into 3 folders:
train
test
dev
For example, mydata/files/ss/file1.wav could be copied into train folder:
train
file1.wav
And so on and so forth, until all files from mydata/files are copied.
How can I do it using Bash script?
Steps to solve this:
Need to gather all the files in the directory
Assign directories to a map
Generate random number for each file
Move the file to the corresponding directory
The script:
#!/bin/bash
original_dir=test/
## define 3 directories to copy into
# define an associative array (like a map)
declare -A target_dirs
target_dirs[0]="/path/to/train/"
target_dirs[1]="/path/to/test/"
target_dirs[2]="/path/to/dev/"
# recursively find all the files, and loop through them
find $original_dir -type f | while read -r file ; do
# find a random number 0 - (size of target_dirs - 1)
num=$(($RANDOM % ${#target_dirs[#]}))
# get that index in the associative array
target_dir=${target_dirs[$num]}
# copy the file to that directory
echo "Copying $file to $target_dir"
cp $file $target_dir
done
Things you'll need to change:
Change the destination of the directories to match the path in your system
Add executable priviledges to the file so that you can run it.
chmod 744 copy_script_name
./copy_script_name
Notes:
This script should easily be extendable to any number of directories if needed (just add the new directories, and the script will adjust the random numbers.
If you need to only get the files in the current directory (not recursively), you can add
-maxdepth 1 (see How to list only files and not directories of a directory Bash?).
Was able to leverage previous bash experience plus looking at bash documentation (it's generally pretty good). If you end up writing any scripts, be very careful about spaces
You can create a temp file, echo your destination folder to it, then use the shuf command.
dest=$(mktemp)
echo -e "test\ndev\ntrain" >> $dest
while IFS= read -r file; do
mv "$file" "$(shuf -n1 < $dest)/."
done < <(find mydata/files -type f 2>/dev/null)
rm -f "$dest"

Add name of each directory to files inside the corresponding directory in linux

I have a directory containing multiple directories. here is an example of the list of directories:
dir1_out
dir2_out
dir3_out
dir4_out
Each directory contains multiple files.
For example folder1_out contains the following files:
file1
file2
file3
In the same fashion other directories contain several folders.
I would like to add the name of each directory to file name in the corresponding directory.
I would like to have the following result in first directory(dir1_out):
dir1.file1
dir1.file2
dir1.file3
Since I have around 50 directories I would like to write a loop that takes the name of each directory and add the name to the beginning of all subfiles.
Do you have any idea how can I do that in linux.
A simple bash onliner if there aren't too many files is:
for p in */*; do [ -f "$p" ] && mv -i "$p" "${p%/*}/${p/\//.}"; done
This uses parameter expansions to generate new filenames, after checking that we are trying to rename an actual file - See bash manpage descriptions of ${parameter%word} and ${parameter/pattern/string}
If there may be too many files to safely expand them all into a single list:
#!/bin/bash
find . -maxdepth 2 -print |\
while read p; do
p="${p#./}"
mv -i "$p" "${p%/*}/${p/\//.}"
done

How to create empty txt files in a directory reflecting files in another directory?

I need to do some testing and need the same file names as I have in directory /home/recordings in /home/testing folder.
For example, if i have a file recording01.mp4 in /home/recordings, i would want to have the an empty file recording01.txt or recording01.mp4 or in /home/testing
I understand I can use the following command?
for i in /home/recordings/*; do touch "$i"; done
Not sure how to specify extension or the destination directory in this case?
A simple addition of /home/testing/ to touch command will do it.
for i in /home/recordings/*; do
temp=`echo $i|cut -f3 -d'/'`
cd /home/testing/
touch "$temp";
cd ../..
done
I assume you are not in home directory and running this script file from anywhere else.
You can also do this without a loop
find /home/recordings/ -type f -printf /home/testing/%f'\n' | xargs -n1 touch
Try this:
for i in /home/recordings/*; do touch "/home/testing/$i"; done
You need only specify absolute paths and things will work fine. A bunch of 0-length files are created, their names corresponding to those in /home/recordings.

Get grandparent directory in bash script - rename files for a directory in their paths

I have the following script, which I normally use when I get a bunch of files that need to be renamed to the directory name which contains them.
The problem now is I need to rename the file to the directory two levels up. How can I get the grandparent directory to make this work?
With the following I get errors like this example:
"mv: cannot move ./48711/zoom/zoom.jpg to ./48711/zoom/./48711/zoom.jpg: No such file or directory". This is running on CentOS 5.6.
I want the final file to be named: 48711.jpg
#!/bin/bash
function dirnametofilename() {
for f in $*; do
bn=$(basename "$f")
ext="${bn##*.}"
filepath=$(dirname "$f")
dirname=$(basename "$filepath")
mv "$f" "$filepath/$dirname.$ext"
done
}
export -f dirnametofilename
find . -name "*.jpg" -exec bash -c 'dirnametofilename "{}"' \;
find .
Another method could be to use
(cd ../../; pwd)
If this were executed in any top-level paths such as /, /usr/, or /usr/share/, you would get a valid directory of /, but when you get one level deeper, you would start seeing results: /usr/share/man/ would return /usr, /my/super/deep/path/is/awesome/ would return /my/super/deep/path, and so on.
You could store this in a variable as well:
GRANDDADDY="$(cd ../../; pwd)"
and then use it for the rest of your script.
Assuming filepath doesn't end in /, which it shouldn't if you use dirname, you can do
Parent = "${filepath%/*}"
Grandparent = "${filepath%/*/*}"
So do something like this
[[ "${filepath%/*/*}" == "" ]] && echo "Path isn't long enough" || echo "${filepath%/*/*}"
Also this likely won't work if you're using relative paths (like find .). In which case you will want to use
filepath=$(dirname "$f")
filepath=$(readlink -f "$filepath")
instead of
filepath=$(dirname "$f")
Also you're never stripping the extension, so there is no reason to get it from the file and then append it again.
Note:
* This answer solves the OP's specific problem, in whose context "grandparent directory" means: the parent directory of the directory containing a file (it is the grandparent path from the file's perspective).
* By contrast, given the question's generic title, other answers here focus (only) on getting a directory's grandparent directory; the succinct answer to the generic question is: grandParentDir=$(cd ../..; printf %s "$PWD") to get the full path, and grandParentDirName=$(cd ../..; basename -- "$PWD") to get the dir. name only.
Try the following:
find . -name '*.jpg' \
-execdir bash -c \
'old="$1"; new="$(cd ..; basename -- "$PWD").${old##*.}"; echo mv "$old" "$new"' - {} \;
Note: echo was prepended to mv to be safe - remove it to perform the actual renaming.
-execdir ..\; executes the specified command in the specific directory that contains a given matching file and expands {} to the filename of each.
bash -c is used to execute a small ad-hoc script:
$(cd ..; basename -- "$PWD") determines the parent directory name of the directory containing the file, which is the grandparent path from the file's perspective.
${old##*.} is a Bash parameter expansion that returns the input filename's suffix (extension).
Note how {} - the filename at hand - is passed as the 2nd argument to the command in order to bind to $1, because bash -c uses the 1st one to set $0 (which is set to dummy value _ here).
Note that each file is merely renamed, i.e., it stays in its original directory.
Caveat:
Each directory with a matching file should only contain 1 matching file, otherwise multiple files will be renamed to the same target name in sequence - effectively, only the last file renamed will survive.
Can't you use realpath ../../ or readlink -f ../../ ? See this, readlink(1), realpath(3), canonicalize_file_name(3), and realpath(1). You may want to install the realpath package on Debian or Ubuntu. Probably CentOS has an equivalent package. (readlink should always be available, it is in GNU coreutils)

Resources