I am trying to write a shell script that loops through all the directories under a Parent Directory and skip the directories that have empty folder "I_AM_Already_Processed" at leaf level.
Parent directory is provided as input to shell script as:
. selectiveIteration.sh /Employee
Structure under parent directory is shown below
( Employee directory contains data bifurcated by yearly -> monthly -> daily -> hourly basis )
/Employee/alerts/output/2014/10/08/HOURS/Actual_Files
Shell script is trying to find out which directory is not already processed. For Example:
Let us consider three hours of data for Date : 10/08/2014
1. /USD/alerts/output/2014/10/08/2(hourly_directory)/Actual_file +
directory_with_name(I_AM_Already_Processed)
2. /USD/alerts/output/2014/10/08/3(hourly_directory)/Actual_file +
directory_with_name(I_AM_Already_Processed)
3. /USD/alerts/output/2014/10/08/(hourly_directory)/Actual_file
in above example leaf directories 2 and 3 are already processed as they contain the folder named
"I_AM_Already_Processed" and whereas directory 4 is not already processed.
So shell script should skip folders 2, 3 but should process directory 4 ( print this directory in output).
Research/work I did:
Till now i am being able to iterate through the directory structure and go through all folders/files from root to leaf level, but i am not sure how to check for specific file and skip the directory if that file is present. ( i was able to do this much after referring few tutorials and older posts on StackOverflow)
I am newbie to shell scripting, this is my first time writing shell script, so if this too basic question to ask please excuse me. Trying to learn.
Any suggestion is welcome. Thanks in advance.
To check if a some_directory has already been processed, just do something like
find some_directory -type d -links 2 -name 'I_AM_Already_Processed'
Which will return the directory name if it has, or nothing if it hasn't. Note -links 2 tests if the directory is a leaf (meaning it only has links to its parent and itself, but not to any subdirectories). See this answer for more information.
So in a script, you could do
#!/bin/bash
directory_list=(/dir1 /dir2)
for dir in "${directory_list[#]}"; do
if [[ -n $(find "$dir" -type d -links 2 -name 'I_AM_Already_Processed' -print -quit) ]]; then
echo 'Has been processed'
else
echo 'Has not been processed'
fi
Related
I've got (what feels like) a fairly simple problem but my complete lack of experience in bash has left me stumped. I've spent all day trying to synthesize a script from many different SO threads explaining how to do specific things with unintuitive commands, but I can't figure out how to make them work together for the life of me.
Here is my situation: I've got a directory full of nested folders each containing a file with extension .7 and another file with extension .pc, plus a whole bunch of unrelated stuff. It looks like this:
Folder A
Folder 1
Folder x
data_01.7
helper_01.pc
...
Folder y
data_02.7
helper_02.pc
...
...
Folder 2
Folder z
data_03.7
helper_03.pc
...
...
Folder B
...
I've got a script that I need to run in each of these folders that takes in the name of the .7 file as an input.
pc_script -f data.7 -flag1 -other_flags
The current working directory needs to be the folder with the .7 file when running the script and the helper.pc file also needs to be present in it. After the script is finished running, there are a ton of new files and directories. However, I need to take just one of those output files, result.h5, and copy it to a new directory maintaining the same folder structure but with a new name:
Result Folder/Folder A/Folder 1/Folder x/new_result1.h5
I then need to run the same script again with a different flag, flag2, and copy the new version of that output file to the same result directory with a different name, new_result2.h5.
The folders all have pretty arbitrary names, though there aren't any spaces or special characters beyond underscores.
Here is an example of what I've tried:
#!/bin/bash
DIR=".../project/data"
for d in */ ; do
for e in */ ; do
for f in */ ; do
for PFILE in *.7 ; do
echo "$d/$e/$f/$PFILE"
cd "$DIR/$d/$e/$f"
echo "Performing operation 1"
pc_script -f "$PFILE" -flag1
mkdir -p ".../results/$d/$e/$f"
mv "results.h5" ".../project/results/$d/$e/$f/new_results1.h5"
echo "Performing operation 2"
pc_script -f "$PFILE" -flag 2
mv "results.h5" ".../project/results/$d/$e/$f/new_results2.h5"
done
done
done
done
Obviously, this didn't work. I've also tried using find with -execdir but then I couldn't figure out how to insert the name of the file into the script flag. I'd appreciate any help or suggestions on how to carry this out.
Another, perhaps more flexible, approach to the problem is to use the find command with the -exec option to run a short "helper-script" for each file found below a directory path that ends in ".7". The -name option allows find to locate all files ending in ".7" below a given directory using simple file-globbing (wildcards). The helper-script then performs the same operation on each file found by find and handles moving the result.h5 to the proper directory.
The form of the command will be:
find /path/to/search -type f -name "*.7" -exec /path/to/helper-script '{}` \;
Where the -f option tells find to only return files (not directories) ending in ".7". Your helper-script needs to be executable (e.g. chmod +x helper-script) and unless it is in your PATH, you must provide the full path to the script in the find command. The '{}' will be replaced by the filename (including relative path) and passed as an argument to your helper-script. The \; simply terminates the command executed by -exec.
(note there is another form for -exec called -execdir and another terminator '+' that can be used to process the command on all files in a given directory -- that is a bit safer, but has additional PATH requirements for the command being run. Since you have only one ".7" file per-directory -- there isn't much benefit here)
The helper-script just does what you need to do in each directory. Based on your description it could be something like the following:
#!/bin/bash
dir="${1%/*}" ## trim file.7 from end of path
cd "$dir" || { ## change to directory or handle error
printf "unable to change to directory %s\n" "$dir" >&2
exit 1
}
destdir="/Result_Folder/$dir" ## set destination dir for result.h5
mkdir -p "$destdir" || { ## create with all parent dirs or exit
printf "unable to create directory %s\n" "$dir" >&2
exit 1
}
ls *.pc 2>/dev/null || exit 1 ## check .pc file exists or exit
file7="${1##*/}" ## trim path from file.7 name
pc_script -f "$file7" -flags1 -other_flags ## first run
## check result.h5 exists and non-empty and copy to destdir
[ -s "result.h5" ] && cp -a "result.h5" "$destdir/new_result1.h5"
pc_script -f "$file7" -flags2 -other_flags ## second run
## check result.h5 exists and non-empty and copy to destdir
[ -s "result.h5" ] && cp -a "result.h5" "$destdir/new_result2.h5"
Which essentially stores the path part of the file.7 argument in dir and changes to that directory. If unable to change to the directory (due to read-permissions, etc..) the error is handled and the script exits. Next the full directory structure is created below your Result_Folder with mkdir -p with the same error handling if the directory cannot be created.
ls is used as a simple check to verify that a file ending in ".pc" exits in that directory. There are other ways to do this by piping the results to wc -l, but that spawns additional subshells that are best avoided.
(also note that Linux and Mac have files ending in ".pc" for use by pkg-config used when building programs from source -- they should not conflict with your files -- but be aware they exists in case you start chasing why weird ".pc" files are found)
After all tests are performed, the path is trimmed from the current ".7" filename storing just the filename in file7. The file7 variabli is then used in your pc_script command (which should also include the full path to the script if not in you PATH). After the pc_script is run [ -s "result.h5" ] is used to verify that result.h5 exists and is non-empty before moving that file to your Result_Folder location.
That should get you started. Using find to locate all .7 files is a simple way to let the tool designed to find the files for you do its job -- rather than trying to hand-roll your own solution. That way you only have to concentrate on what should be done for each file found. (note: I don't have pc_script or the files, so I have not testes this end-to-end, but it should be very close if not right-on-the-money)
There is nothing wrong in writing your own routine, but using find eliminates a lot of area where bugs can hide in your own solution.
Let me know if you have further questions.
I have no background of shell script. I would like to perform the concatenation of files in all subdirectories recursively, which I was able to achieve by the help of previous posts on stack overflow. But I am unable to name the concatenated file on the name of files which are getting concatenated e.g.
Subdirectories:
Sample_001
Sample_002
Sample_003
Sample_004
Files in sub directory: Sample_001:
Sample_001-123_R1_001.fastq.gz
Sample_001-123_R1_002.fastq.gz
I used command :
For d in ./*/; do( cd "$d" && cat R1 > d_R1.fastq); done
My question is how I can use 001 from my file name instead of d in "d_R1.fastq".
I was unable to understand how I can achieve this and where I should put another command in the loop I was using.
Thank you in advance for kind help.
I came across a problem in Bash when I would try to only open images based upon the information stored in .txt files about them. I am trying to sort a number of images by size or height, and display an image with them in the sorted order, but if there exists a .jpg in the folder without a .txt file with the same name, it should not process it.
I have the sorting piece of my situation done, and am trying to figure out how I would go about opening only the images that have a .jpg extension WITH a .txt file.
I figured a solution would look like me putting every .jpg's name (without extension) in a list and then process through the list and run something like:
[if -f $filename.txt ]; then ~~~
but I came across the problem of iterating through without a for-loop, or else all the pictures would open multiple times. My attempt was:
for i in *jpg; do
y=$y ${i.jpg}
done
if[ -f $y.txt ] then
(sorting parts)
This only looked at the last filename in y, as it should, but I am trying to figure out a way to look at each separate filename and see if there exists that textfile, in order to include it in the sorting.
Thanks so much for your help!
Collecting a list of file names in a single variable is an antipattern. You want to collect them in an array instead.
a=()
for f in *.jpg; do
if [ -e "${f%.jpg}".txt ]; then
continue
fi
a+=("$f")
done
# now do things with "${a[#]}"
Frequently, you don't really need to collect the files in an array -- just do everything you were doing inside the for loop to each individual file as you traverse the files.
(And actually y=$y ${i%.jpg} doesn't append to y -- it sets y to itself for the duration of attempting to execute a file named i sans the .jpg extension, which would most likely fail in the vast majority of cases.)
I would do the file check first such that find just reports files that have a corresponding text file. The following snippet will just display jpg files that have a corresponding txt file:
find . -name "*.jpg" -maxdepth 1 -exec /bin/bash -c '[ -e "${0%.*}.txt" ] && echo "$0";' {} \;
I need to unzip a bunch of student assignment (jar) files so that I can use a script to submit the contents to the Moss (Stanford) plagiarism detection server. I did the same thing in Java which was trivial but I'm trying to re-implement to as a bash script.
I am trying to do the following:
Get a list of student names (each student has a directory).
In each student directory, sub-directories exist numbered from 1 to the
latest submission. I need to get the directory with the highest
number.
Inside of each of those submission directories contains a
jar file that I need. I copy each jar into a temp directory with the
same name as the student and unzip it.
I need that temp directory listing formatted as a string in the form
/tempDir/studentName1/.languageExt /tempDir/studentName2/.languageExt
The student directory has the basic structure:
Student_Root_Directory:
Student1
Student2
Student1
Sub-Directories: 1 2 3 4 5
1: student1.jar
2: student1.jar
...
Student2
Sub-Directories: 1 2 3
1. student2.jar
...
To do the first 3 steps above I did:
#!/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: .c, .cpp, .java, .py
students=`ls $1`
student_dir=$1
languageExt=$2
mossDir="/home/moss"
tempDir="/home/moss/tempJarStorage"
for student in $students
do
latestSubmissionDir=`ls -t $student_dir/$student | head -1`
for jarDir in $latestSubmissionDir
do
mkdir $tempDir/$student
cp $student_dir/$student/$jarDir/*.jar $tempDir/$student
unzip -d $tempDir/$student/ -o -j $tempDir/$student/$student.jar *.$languageExt
rm $tempDir/$student/$student.jar
done
done
...which results in a number of student directories being created in a temp directory that contains only the unzipped contents for the student submissions.
I need the ls output of the new temp directories formatted as a string that contains:
/tempDir/studentName1/\*.languageExt /tempDir/studentName2/\*.languageExt
I have tried variations on
find "$tempDir" -iname "*.$languageExt" -printf "%p/*.$languageExt"
using iname and not - but I either have output that contains extra directory information such as $tempDir/*.languageExt (when I just need the subdirectories $tempDir/$studentName/*.languageExt) or I have output where the path for every source file is also listed such as:
$tempDir/$studentName/studentNameA.java
$tempDir/$studentName/studentNameB.java
when I only need
$tempDir/$studentName/*.java
I think this should be really easy and I'm just over thinking it. Any hints for improving the script also appreciated.
Here's a revised version of the script hat may work:
#/bin/bash
# Extract all jar files into a temp directory called /home/moss/tempJarFiles/studentName
# $1 is the command line argument that contains the path to the institution submission dir.
# $2 is the language extension: c, cpp, java, py
students_dir=$1
languageExt=$2
studentPathsT=( "$students_dir"/*/ )
mossDir='/home/moss'
tempDir='/home/moss/tempJarStorage'
for studentPathT in "${studentPathsT[#]}"; do
student=$(basename "$studentPathT")
mkdir "$tempDir/$student"
submissionDirsT=( "$studentPathT"*/ )
latestSubmissionDirT=${submissionDirsT[${#submissionDirsT[#]-1]}
cp "$latestSubmissionDirT"*.jar "$tempDir/$student/"
unzip -d "$tempDir/$student/" -o -j "$tempDir/$student/*.jar" "*.$languageExt"
rm "$tempDir/$student"/*.jar
done
# Note that at this point `"$tempDir"/*/*.$languageExt` would expand
# to all extracted submission files, across all students.
# Finally, output each student's extracted files as an unexpanded glob à la
# /{tempDir}/{studentName1}/*.{languageExt}
for pT in "$tempDir"/*/; do
echo "$pT*.$languageExt"
# Note: If there is a chance that your filenames contain
# embedded newlines (rare in practice) using `echo` won't work properly
# as #Charles Duffy points out.
# If that is a concern, use
# printf '%s\0' "$pT*.$languageExt"
# and process the output with a utility that can process NUL characters
# as separators, such as `xargs -0`.
done
It avoids using ls and only uses pathname expansion and array variables so as to properly deal with paths that contain embedded spaces and other shell metacharacters.
suffix ...T in variable names indicates that a particular path or array of paths is *T*erminated, i.e, that it ends in a /.
The assumption is that the numbered subdirectories do not go beyond 9, as the implicit lexical sorting of pathname expansion is relied upon; if the numbers go higher, explicit numerical sorting must be applied.
Note that the globs (pathname patterns) passed to unzip are intentionally double-quoted, as they should be interpreted by unzip, not the shell.
Note that, based on your original code, I've assumed that $languageExt does NOT start with . (e.g., cpp rather than .cpp), despite what your comment says.
I am not a Linux user, so bash and shell are new for me.
I need a code that runs 2 scripts for all file extensions ".fal" that are located in the folder(and sub-folders preferably) that I run the code in.
E.g:
dos2unixfortxtandfal """""""that code runs for all files in the folder already
and
for all ".fal" files in this folder,
Do
eine_fal_macher (here the .fal files 1 by one) Versuch.txt
Done
eine_fal_marcher --> this is the script that runs in the moment only once
(here the .fal files 1 by one) --> this is input file 1
Versuch.txt--> this is input file 2 (same for all) (from the same
folder)
In the end I want to do the following in the terminal:
frdc09927:\Frdc09927\z183464\DOE_Wellen\21a>
frdc09927:\Frdc09927\z183464\DOE_Wellen\21a>script.bash --> Enter
frdc09927:\Frdc09927\z183464\DOE_Wellen\21b>script.bash --> Enter
frdc09927:\Frdc09927\z183464\DOE_Wellen\21c>script.bash --> Enter
find . -name \*.fal -exec eine_fal_macher {} Versuch.txt \;
This runs for all *.fal files in the current directory and its subdirectories. Use -maxdepth 1 as first option to limit it to the current directory only, or give a different working directory than . to have find search somewhere else. {} is replaced with the "found" filename, honoring things like spaces in the filename automatically.
I could start explaining find at this point, but you should really rather have a look at man find instead. This tool is extremely powerful, and can reduce rather complex problems (like acting on the age of files, their owners etc.) to a one-liner.
Try something like this:
for i in `ls *.fal`; do command1 $i && command2 $i; done
command2 is only executed for a specific file if command1 does not return an errorcode
I'm not sure I fully understand the requirement, but here goes (trying to follow your pseudo code):
for FILE in `find . -name "*.fal"`
do
eine_fal_macher "${FILE}" Versuch.txt
done