Best practice for organizing many file paths for a Bash script

Best practice for organizing many file paths for a Bash script - linux

I have a bash script which copy files from one location to another location. Currently, all file paths are defined as strings directly in the script. Unfortunately, all these file paths blow up my script. How can I organize all file paths in a better way, except directly in the script? Should I use a config file?

I am not very sure about your particular requirement here.
It all depends upon the size, complexity, importance, and longevity of your script.
If the script is complex (many lines of code with significant logic and has many dependencies), it is good to do these:
Have an environment setup script - or config file - that creates standardized variables like, job_home, job_temp, job_log, job_log_archive etc., and has some naming convention for a hierarchy of directories with a common parent (job_home in this case). Using upper case for this could be a neat idea, though it is against POSIX standards as highlighted by Charles Duffy in his comment.
Create a function that sets up these hierarchy of directories as needed. Have a library of functions if that would make sense.
Source the environment setup and the library scripts in the main script.
Carefully avoid all hardcoding of paths - use the standard environment variables instead.

A common convention is to accept a filename which can be read from to retrieve a list of other files to operate on as an optional argument.
If you aren't using stdin for any other purpose, it too is a reasonable stream to use for the purpose. For example:
printf '%s\0' /first/directory /second/directory >dirs
your-script < dirs
...or...
find /top/directory -mindepth 1 -type d -print0 | your-script
...or...
# pass a results from "find" via a filename
your-script -d <(find /top/directory -mindepth 1 -type d -print0)
The use of printf '%s\0' and find -print0 above ensure that the list is NUL-delimited; this means that a maliciously-constructed directory name (ie. mkdir -p $'/top/directory/\n/etc/passwd\n') can't inject extra arguments (in the above example, /etc/passwd).
Reading a NUL-delimited stream into an array looks like:
array=( )
while IFS= read -r -d '' dirname; do
array+=( "$dirname" )
done

Related

How to copy a file to a new file with a new name in same directory but across multiple directories in bash?

I am trying to copy an existing file (that is found in across directories) to a new file with a new name in Bash. For example, 'Preview.json' to 'Performance.json'. I have tried using
find * -type f -name 'Preview.json' -exec cp {} {}"QA" \;
But ended up with 'Preview.jsonQA'. (I am new to Bash.) I have tried moving the "QA" in front of the {} but I got errors because of an invalid path.

In an -exec predicate, the symbol {} represents a path that is being considered, starting at one of the starting-point directories designated in the command. Example: start/dir2/Preview.json. You can form other file names by either prepending or appending characters, but whether that makes sense depends on the details. In your case, appending produces commands such as
cp start/dir2/Preview.json start/dir2/Preview.jsonQA
which is a plausible command in the event that start/dir2/Preview.json exists. But cp does not automatically create directories in the destination path, so the result of prepending characters ...
cp start/dir2/Preview.json QAstart/dir2/Preview.json
... is not as likely to be accepted -- it depends on directory QAstart/dir2 existing.
I think what you're actually looking for may be cp commands of the form ...
cp start/dir2/Preview.json start/dir2/QAPreview.json
... but find cannot do this by itself.
For more flexibility in handling the file names discovered by find, pipe its output into another command. If you want to pass them as command-line arguments to another command, then you can interpose the xargs command to achieve that. The command on the receiving end of the pipe can be a shell function or a compound command if you wish.
For example,
# Using ./* instead of * ensures that file names beginning with - will not
# be misinterpreted as options:
find ./* -type f -name 'Preview.json' |
while IFS= read -r name; do # Read one line and store it in variable $name
# the destination name needs to be computed differently if the name
# contains a / character (the usual case) than if it doesn't:
case "${name}" in
*/*) cp "${name}" "${name%/*}/QA${name##*/}" ;;
*) cp "${name}" "QA${name}" ;;
esac
done
Note that that assumes that none of your directory names contain newline characters (the read command would split up newline-containing filenames). That's a reasonably safe assumption, but not absolutely safe.
Of course, you would generally want to have that in a script, not to try to type it on the fly on the command line.

how to iterate over files using find in bash/ksh shell

I am using find in a loop to search recursively for files of a specific extension, and then do something with that loop.
cd $DSJobs
jobs=$(find $DSJobs -name "*.dsx")
for j in jobs; do
echo "$j"
done
assuming $DSJobs is a relevent folder, the output of $j is "Jobs" one time. doesn't even repeat.
I want to list all *.dsx files in a folder recursively through subfolders as well.
How do Make this work?
Thanks

The idiomatic way to do this is:
cd "$DSJobs"
find . -name "*.dsx" -print0 | while IFS= read -r -d "" job; do
echo "$job"
done
The complication derives from the fact that space and newline are perfectly valid filename characters, so you get find to output the filenames separated by the null character (which is not allowed to appear in a filename). Then you tell read to use the null character (with -d "") as the delimiter while reading the names.
IFS= read -r var is the way to get bash to read the characters verbatim, without dropping any leading/trailing whitespace or any backslashes.
There are further complications regarding the use of the pipe, which may or may not matter to you depending on what you do inside the loop.
Note: take care to quote your variables, unless you know exactly when to leave the quotes off. Very detailed discussion here.
Having said that, bash can do this without find:
shopt -s globstar
cd "$DSJobs"
for job in **/*.dsx; do
echo "$job"
done
This approach removes all the complications of find | while read.
Incorporating #Gordon's comment:
shopt -s globstar nullglob
for job in "$DSJobs"/**/*.dsx; do
do_stuff_with "$job"
done
The "nullglob" setting is useful when no files match the pattern. Without it, the for loop will have a single iteration where job will have the value job='/path/to/DSJobs/**/*.dsx' (or whatever the contents of the variable) -- including the literal asterisks.

Since all you want is to find files with a specific extension...
find ${DSJobs} -name "*.dsx"
Want to do this for several directories?
for d in <some list of directories>; do
find ${d} -name ""*.dsx"
done
Want to do something interesting with the files?
find ${DSJobs} -name "*.dsx" -exec dostuffwith.sh "{}" \;

How to make this (l)unix script dynamically accept directory name in for-loop?

I am teaching myself more (l)unix skills and wanted to see if I could begin to write a program that will eventually read all .gz files and expand them. However, I want it to be super dynamic.
#!/bin/bash
dir=~/derp/herp/path/goes/here
for file in $(find dir -name '*gz')
do
echo $file
done
So when I excute this file, I simply go
bash derp.sh.
I don't like this. I feel the script is too brittle.
How can I rework my for loop so that I can say
bash derp.sh ~/derp/herp/path/goes/here (1)
I tried re-coding it as follows:
for file in $*
However, I don't want to have to type in bash
derp.sh ~/derp/herp/path/goes/here/*.gz.
How could I rewrite this so I could simply type what is in (1)? I feel I must be missing something simple?
Note
I tried
for file in $*/*.gz and that obviously did not work. I appreciate your assistance, my sources have been a wrox unix text, carpentry v5, and man files. Unfortunately, I haven't found anything that will what I want.
Thanks,
GeekyOmega

for dir in "$#"
do
for file in "$dir"/*.gz
do
echo $file
done
done
Notes:
In the outer loop, dir is assigned successively to each argument given on the command line. The special form "$#" is used so that the directory names that contain spaces will be processed correctly.
The inner loop runs over each .gz file in the given directory. By placing $dir in double-quotes, the loop will work correctly even if the directory name contains spaces. This form will also work correctly if the gz file names have spaces.

#!/bin/bash
for file in $(find "$#" -name '*.gz')
do
echo $file
done
You'll probably prefer "$#" instead of $*; if you were to have spaces in filenames, like with a directory named My Documents and a directory named Music, $* would effectively expand into:
find My Documents Music -name '*.gz'
where "$#" would expand into:
find "My Documents" "Music" -name '*.gz'
Requisite note: Using for file in $(find ...) is generally regarded as a bad practice, because it does tend to break if you have spaces or newlines in your directory structure. Using nested for loops (as in John's answer) is often a better idea, or using find -print0 and read as in this answer.

How to open all files in a directory in Bourne shell script?

How can I use the relative path or absolute path as a single command line argument in a shell script?
For example, suppose my shell script is on my Desktop and I want to loop through all the text files in a folder that is somewhere in the file system.
I tried sh myshscript.sh /home/user/Desktop, but this doesn't seem feasible. And how would I avoid directory names and file names with whitespace?
myshscript.sh contains:
for i in `ls`
do
cat $i
done

Superficially, you might write:
cd "${1:-.}" || exit 1
for file in *
do
cat "$file"
done
except you don't really need the for loop in this case:
cd "${1:-.}" || exit 1
cat *
would do the job. And you could avoid the cd operation with:
cat "${1:-.}"/*
which lists (cats) all the files in the given directory, even if the directory or the file names contains spaces, newlines or other difficult to manage characters. You can use any appropriate glob pattern in place of * — if you want files ending .txt, then use *.txt as the pattern, for example.
This breaks down if you might have so many files that the argument list is too long. In that case, you probably need to use find:
find "${1:-.}" -type f -maxdepth 1 -exec cat {} +
(Note that -maxdepth is a GNU find extension.)
Avoid using ls to generate lists of file names, especially if the script has to be robust in the face of spaces, newlines etc in the names.

Use a glob instead of ls, and quote the loop variable:
for i in "$1"/*.txt
do
cat "$i"
done
PS: ShellCheck automatically points this out.

How would I flatten and overlay multiple directories into one directory?

I want to take a list of directory hierarchies and flatten them into a single directory. Any duplicate file later in the list will replace an earlier file. For example...
foo/This/That.pm
bar/This/That.pm
bar/Some/Module.pm
wiff/This/That.pm
wiff/A/Thing/Here.pm
This would wind up with
This/That.pm # from wiff/
Some/Module.pm # from bar/
A/Thing/Here.pm # from wiff/
I have a probably over complicated Perl program to do this. I'm interested in the clever ways SO users might solve it. The big hurdle is "create the intermediate directories if necessary" perhaps with some combination of basename and dirname.
The real problem I'm solving is checking the difference between two installed Perl libraries. I'm first flattening the multiple library directories for each Perl into a single directory, simulating how Perl would search for a module. I can then diff -r them.

If you do not mind the final order of the entries, I guess this can do the job:
#!/bin/bash
declare -A directory;
while read line; do
directory["${line#*/}"]=${line%%/*}
done < $1
for entry in ${!directory[#]}; do
printf "%s\t# from %s/\n" $entry ${directory[$entry]}
done
Output:
$ ./script.sh files.txt
A/Thing/Here.pm # from wiff/
This/That.pm # from wiff/
Some/Module.pm # from bar/
And if you need to move the files, then you can simply replace the printing step with a mv -- or cp --, like this:
for entry in ${!directory[#]}; do
mv "${directory[$entry]}/$entry" "your_dir_path/$entry"
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string