Facing issues in making a bash script work - linux

I'm new to Bash scripting. My script intended role is to access a provided path and then apply some software (RTG - Real time Genomics) commands on the data provided in that path. However, when i try to execute the bash from CLI, it gives me following error
ERROR:There were invalid input file paths
The path I have provided in the script is accurate. That is, In the original directory, where the program 'RTG' resides, I have made folders accordingly like /data/reads/NA19240 and placed both *_1.fastq and *_2.fastq files inside NA19240.
Here is the script:
#!/bin/bash
for left_fastq in /data/reads/NA19240/*_1.fastq; do
right_fastq=${left_fastq/_1.fastq/_2.fastq}
lane_id=$(basename ${left_fastq/_1.fastq})
rtg format -f fastq -q sanger -o ${lane_id} -l ${left_fastq} -r ${right_fastq} --sam-rg "#RG\tID:${lane_id}\tSM:NA19240\tPL:ILLUMINA"
done
I have tried many workarounds but still not being able to bypass this error. I will be really grateful if you guys can help me fixing this problem. Thanks
After adding set -aux in bash script for debugging purpose, I'm getting following output now
adnan#adnan-VirtualBox[Linux] ./format.sh
+ for left_fastq in '/data/reads/NA19240/*_1.fastq'
+ right_fastq='/data/reads/NA19240/*_2.fastq'
++ basename '/data/reads/NA19240/*'
+ lane_id='*'
+ ./rtg format -f fastq -q sanger -o '*' -l '/data/reads/NA19240/*_1.fastq' -r '/data/reads/NA19240/*_2.fastq' --sam-rg '#RG\tID:*\tSM:NA19240\tPL:ILLUMINA'
Error: File not found: "/data/reads/NA19240/*_1.fastq"
Error: File not found: "/data/reads/NA19240/*_2.fastq"
Error: There were 2 invalid input file paths

You need to set the nullglob option in the script, like so:
shopt -s nullglob
By default, non-matching globs are expanded to themselves. The output you got by setting set -aux indicates that the file glob /data/reads/NA19240/*_1.fastq is getting interpreted literally. The only way this would happen is if there were no files found, and nullglob was disabled.

In the original directory, where the program 'RTG' resides, I have
made folders accordingly like /data/reads/NA19240 and placed both
*_1.fastq and *_2.fastq files inside NA19240.
So you say, your data folders are in the original directory (whatever that may be), but in the script you wrongly specify them to be in the root directory (by the leading /).
Since you start the script in the original directory, just drop the leading / and use a relative path:
for left_fastq in data/reads/NA19240/*_1.fastq

Related

Exclude specific files from zip command line

I´m using zip to create a backup.
In the directory I´m processing, there are some files not intended to be included. My issue is that the filenames are named in this style:
abc-1 (excluded)
abc-2 (excluded)
abc-3.ini (included)
I don´t know how to specify the -x option in zip command line so that the first two files that have no extension are left out, and the third one is included.
I´ve tried
zip -r mybackup.zip mydir -x mydir/abc-*.
but it´s not working.
Thanks!
Well, this worked:
zip -r mybackup.zip mydir -x mydir/abc-!(*.*)
...but only from the command line. When trying to include the command in a bash script, it didn´t work and I had to include the following line before:
shopt -s extglob
which enables the extended globbing.

cpanel shell script on cloud server centos no such file or dir

I have a shell script copy_files.sh which I call once a day using a cron job.
However it has never worked I keep getting no such file or directory.
!#/bin/sh
for f in /home/site1/public_html/admin/data/*.csv
do
cp -v "$f" /home/site2/app/cron_jobs/data/"${f%.csv}".csv
done
I have checked via ssh that all paths are correct I have varified the path to /bin/sh using ls -l /bin/sh I have set perms and user and group to root for copy_files.sh I have disabled php open_basedir protection.
the shell script is in /home/site2/
Any ideas why I am still getting no such file or directory?
Is there anyway to check open_basedir protection is off that said considering the script is owned by root I don't see that being the problem unless it's executed as site2 user and not root?
Because of the way you use shell expansion, the variable in your for loop contains the absolute path to your files. Having the absolute path, means there is no need to use string manipulation (%) nor do you need to add the ".csv" to the filename, just get rid of it all together and provide the directory to which you're copying as your second argument to cp, see the example below.
#!/bin/sh
for f in /home/site1/public_html/admin/data/*.csv; do
cp -v "$f" /home/site2/app/cron_jobs/data
done

General file paths for replicating shell scripts on any linux OS

I'm doing a shell script that is intended to be replicated on any user directory. The same script should be able to recognize any user's directories structure. Since I don't know the directory structure of every user, I wasn't able to recognize the parent folder to run some commands inside my script (Find command in the script). I would appreciate so much your help.
As you can see in my code below, I have three different types of paths ~/Desktop/input_folder/source.txt, ~/Desktop/output_folder/FILE_${array[$i]}_${array[$((i+1))]}.txt, and ../shares/path/FOLDER_WITH_MANY_FILES.
The third path has a route that I don't know where is located, so I used ../ to tell the script that assume the parent folder. For the first and second route, I used ~/because the route is in /home/username/. Am I doing it right? Do these routes need double commas ("/path/blah") in order to be read it by shell? I would appreciate your help.
Thanks!
My code:
#!/bin/bash
#source file location
input=$(cat ~/Desktop/input_folder/source.txt )
#read file
IFS=',' read -r -a array <<< "$input"
for((i=0;i<${#array[#]}-1;i+=2));do
#Create the file
touch ~/Desktop/output_folder/FILE_${array[$i]}_${array[$((i+1))]}.txt
echo "Search result for these parameters [${array[$i]},${array[$((i+1))]}]: "$'\r' >> ~/Desktop/output_folder/FILE_${array[$i]}_${array[$((i+1))]}.txt
#Find pattern and save it in file
find ../shares/path/FOLDER_WITH_MANY_FILES -name "*${array[$((i+1))]}*" -exec grep -l "${array[$i]}" {} + | tee -a ~/Desktop/output_folder/FILE_${array[$i]}_${array[$((i+1))]}.txt

Change directory to path of parent/calling script in bash

I have dozens of scripts, all in different directories. (exported/expanded Talend jobs)
At this moment each job has 1 or 2 scripts, starting with the same lines, most important one:
CD ***path-to-script***
and several lines to set the Java path and start the job.
I want to create a script, which will be ran from all these scripts.
e.g.:
/scripts/talend.sh
And in all talend scripts, the first line will run /scripts/talend.sh, some examples of where these scripts are ran from:
/talend-job1_0.1/talend-job1_0.1/talend-job1/talend-job1.sh
/talend-task2_0.1/talend-task2_0.1/talend-task2/talend-task2.sh
/talend-job3_0.1/talend-job3_0.1/talend-job3/talend-job3.sh
How can I determine where the /scripts/talend.sh is started from, so I can CD to that path from within /scripts/talend.sh.
The Talend scripts are not run from within the directory itself, but from a cronjob, or a different users home directory.
EDIT:
The question was marked as duplicate, but Getting the source directory of a Bash script from within is not answering my question 100%.
Problem is:
- The basic script is being called from different scripts
- Those different scripts can be run from command line, with, and with or without a symbolic link.
- The $0, the $BASH_SOURCE and the pwd all do some things, but no solution mentioned covers all the difficulties.
Example:
/scripts/talend.sh
In this script I want to configure the $PATH and $HOME_PATH of Java, and CD to the place where the Talend job is placed. (It's a package, so that script MUST be run from that location).
Paths to the jobs are, for example:
/u/talend/talendjob1/sub../../talendjob1.sh
/u/talend/talendjob2/sub../../talendjob2.sh
/u/talend/talendjob3/sub../../talendjob3.sh
Multiple jobs are run from a TMS application. This application cannot run these scripts with the whol name (to long, name can only be 6 long), so in a different location I have symbolic links:
/u/tms/links/p00001 -> /u/talend/talendjob1/sub../../talendjob1.sh
/u/tms/links/p00002 -> /u/talend/talendjob1/sub../../talendjob2.sh
/u/tms/links/p00003 -> /u/talend/talendjob1/sub../../talendjob3.sh
/u/tms/links/p00004 -> /u/talend/talendjob1/sub../../talendjob4.sh
I think you get an overview of the complexity and why I want only one basic talend script, where I can leave all basic stuff. But I only can do that, if I know the source of the Talend script, because there I have to be to start that talend job.
These answers (beyond the first) are specific to Linux, but should be very robust there -- working with directory names containing spaces, literal newlines, wildcard characters, etc.
To change to your own source directory (a FAQ covered elsewhere):
cd "$(basename "$BASH_SOURCE")"
To change to your parent process's current directory:
cd "/proc/$PPID/cwd"
If you want to change to the directory passed as the first command-line argument to your parent process:
{ IFS= read -r -d '' _ && IFS= read -r -d '' argv1; } <"/proc/$PPID/cmdline"
cd "$argv1"
That said, personally, I'd just export the job directory to the environment variable in the parent process, and read that environment variable in the children. Much, much simpler, more portable, more accurate, and compliant with best process.
You can store pwd in a variable and then cd to it when you want to go back
This works for me:
In
/scripts/talend.sh
do
cd ${1%/*}
${1%/*} will strip off everything after the last / effectively providing a dirname for $1, which is the path to the script that calls this one.
and than call the script with the line:
/scripts/talend.sh $0.
Calling the script with $0 passes the name of the current script as an argument to the child which as shown above can be used to cd to the correct directory.
When you source /scripts/talend.sh the current directory is unchanged:
The scripts
# cat /scripts/talend.sh
echo "Talend: $(pwd)"
# cat /talend-job1_0.1/talend-job1_0.1/talend-job1/talend-job1.sh
echo Job1
. /scripts/talend.sh
Executing job1
# cd /talend-job1_0.1/talend-job1_0.1
# talend-job1/talend-job1.sh
Job1
Talend: /talend-job1_0.1/talend-job1_0.1
When you want to see the dir where the calling script is in, see get dir of script.
EDIT:
When you want to have the path of the callling script (talend-job1.sh) without having to cd to that dir first, you should get the dir of the script (see link above) and source talend.sh:
# cat /scripts/talend.sh
cd "$( dirname "${BASH_SOURCE[0]}" )"
echo "Talend: $(pwd)"
In talend.sh get the name of the calling script and then the directory:
parent_cmd=$(ps -o args= $PPID)
set -- $parent_cmd
parent_cmd=$(dirname $2)
Update: as pointed by Charles Duffy in the comments below this will cause havoc when used with paths containing white-space or glob patterns.
If procfs is available you could read the content of /proc/$PPID/cmdline or if portability is a concern do a better parsing of the args.
In /scripts/talend.sh:
cd "$(dirname "$0")"
Or:
cd "$(dirname "$BASH_SOURCE")"
Another one is:
cd "$(dirname "$_")"
#This must be the first line of your script after the shebang line
#Otherwise don't use it
Note: The most reliable of the above is $BASH_SOURCE

shell script : appending directory path and filename

I want to copy a file from a directory using shell script
Suppose I save the directory and file name seperately as
dir=/home/user/directory/
file=file_1
to copy the file Im using this command in my script
cp $dir$file .
But I get this error
/bin/cp omitting directory '/home/user/directory'
I have tried all combination eg. omitted the trail backslah from variable dir, etc but nothings working. I cant understand what is wrong with this code. Pleas help
Maybe the command $dir$file is not getting unpacked in the shell (ie only the directory variable is getting unpacked, not the file variable)!!!!!
It looks like you are having problem with expansion in cp $dir$file . In order to prevent possible problems, it is better to protect your variable with braces and double quote the full path/file to make sure you don't get caught by spaces in either the filename or heaven forbid the user's dirname:
cp "${dir}${file}" .
This will prevent the possibility the second $ is missed. Also make sure you have read access to other users /home (if you are root or using sudo you should be fine)
If you see this, when you somehow assign an empty string to file somewhere. Search your script for file= and unset file.
You can also debug this by adding
echo ".${file}."
in the line before the cp command. I'm pretty sure it prints .., i.e. the variable is empty or doesn't exist.

Resources