have to use a .sh script to unpack and prep some databases. The code is the following:
#
# Downloads and unzips all required data for AlphaFold.
#
# Usage: bash download_all_data.sh /path/to/download/directory
set -e
DOWNLOAD_DIR="$1"
for f in $(ls ${DOWNLOAD_DIR}/*.tar.gz)
do
tar --extract --verbose --file="${DOWNLOAD_DIR}/${f}" /
--directory="${DOWNLOAD_DIR}/mmseqs_dbs"
rm "${f}"
BASENAME="$(basename {f%%.*})"
DB_NAME="${BASENAME}_db"
OLD_PWD=$(pwd)
cd "${DOWNLOAD_DIR}/mmseqs_dbs"
mmseqs tar2exprofiledb "${BASENAME}" "${DB_NAME}"
mmseqs createindex "${DB_NAME}" "${DOWNLOAD_DIR}/tmp/"
cd "${OLD_PWD}"
done
When I run the code, I got that error:
(openfold_venv) watson#watson:~/pedro/openfold$ sudo bash scripts/prep_mmseqs_dbs.sh data/
tar: data//data//colabfold_envdb_202108.tar.gz: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now
I don`t understand why the code repeats my "DOWNLOAD_DIR", the correct should be :
data/colabfold_envdb_202108.tar.gz
and not
data//data//colabfold_envdb_202108.tar.gz
Could anyone help me?
New code:
set -e
DOWNLOAD_DIR="$1"
for f in ${DOWNLOAD_DIR}/*.tar.gz;
do
tar --extract --verbose --file="$f" /
--directory="${DOWNLOAD_DIR}/mmseqs_dbs"
rm "${f}"
BASENAME="$(basename {f%%.*})"
DB_NAME="${BASENAME}_db"
OLD_PWD=$(pwd)
cd "${DOWNLOAD_DIR}/mmseqs_dbs"
mmseqs tar2exprofiledb "${BASENAME}" "${DB_NAME}"
mmseqs createindex "${DB_NAME}" "${DOWNLOAD_DIR}/tmp/"
cd "${OLD_PWD}"
done
To answer your first question: why is it repeating? Because you are repeating it in your code:
for f in ${DOWNLOAD_DIR}/*.tar.gz;
do
tar --extract --verbose --file="${DOWNLOAD_DIR}/$f"
If f is downloads/file.tar.gz then ${DOWNLOAD_DIR}/${f} will resolve to downloads/downloads/file.tar.tgz.
As to your second question: the escape character is the backslash \, not the forward slash. Your multiline command should look like this:
tar --extract --verbose --file="${DOWNLOAD_DIR}/${f}" \
--directory="${DOWNLOAD_DIR}/mmseqs_dbs"
Related
I am working on a Red Hat Linux server. My end goal is to run CRB-BLAST on multiple fasta files and have the results from those in separate directories.
My approach is to download the fasta files using wget then run the CRB-BLAST. I have multiple files and would like to be able to download them each to their own directory (the name perhaps should come from the URL list files), then run the CRB-BLAST.
Example URLs:
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_3370_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_CB_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_13_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_37_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_123_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_195_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_31_chr.v0.1.liftover.CDS.fasta.gz
Ideally, the file name determines the directory name, for example, TC_3370/.
I think there might be a solution with cat URL.txt | mkdir | cd | wget | crb-blast
Currently I just run the commands in line:
mkdir TC_3370
cd TC_3370/
wget url
http://assemblies/Genomes/final_assemblies/10x_meta_assemblies_v1.0/TC_3370_chr.v1.0.maker.CDS.fasta.gz
crb-blast -q TC_3370_chr.v1.0.maker.CDS.fasta.gz -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
Try this Shellcheck-clean program:
#! /bin/bash -p
while read -r url; do
file=${url##*/}
dir=${file%%_chr.*}
mkdir -v -- "$dir"
(
cd "./$dir" || exit 1
wget -- "$url"
crb-blast -q "$file" -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
)
done <URL.txt
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${url##*/} etc.
The subshell (( ... )) is used to ensure that the cd doesn't affect the main program.
Another implementation
#!/bin/sh
# Read lines as url as long as it can
while read -r url
do
# Get file name by stripping-out anything up to the last / from the url
file_name=${url##*/}
# Get the destination dir name by stripping anything from the first __chr
dest_dir=${file_name%%_chr*}
# Compose the wget output path
fasta_path="$dest_dir/$file_name"
if
# Successfully created the destination directory AND
mkdir -p -- "$dest_dir" &&
# Successfully downloaded the file
wget --output-file="$fasta_path" --quiet -- "$url"
then
# Process the fasta file into fna
fna_path="$dest_dir/TCV2_annot_cds.fna"
crb-blast -q "$fasta_path" -t "$fna_path" -e 1e-20 -h 4 -o rbbh_TC
else
# Cleanup remove destination directory if any of mkdir or wget failed
rm -fr -- "$dest_dir"
fi
# reading from the URL.txt file for the whole while loop
done < URL.txt
Download files from list is task for -i file option, if you have file named say urls.txt with one URL per line you might simply do
wget -i urls.txt
Note that this will put all files inside current working directory, so if you wish to have them in separate dirs, you would need to move them after wget finish.
I am trying to run the following command using subprocess.check_output(). The command (shown below) works fine if run directly in bash :
tar -C /tmp/models/ -czvf model.tar.gz .
It also runs fine if I don't use the "C" option when run via subprocess.
cmd = ['tar', 'czf', "/tmp/model.tar.gz", "/tmp/models/"]
output = subprocess.check_output(cmd).decode("utf-8").strip() # Works
But when I try to use the -C option with the above tar command, I get an exception which says tar: Must specify one of -c, -r, -t, -u, -x.
cmd = ['tar', 'C', '/tmp/models', 'cf', 'model.tar.gz', '.'] # fails. Other variations of this fail too.
How do I run the above tar command using subprocess correctly?. Thanks.
I am using python3.8
Looks like the dashes - should be specified:
$ tar C whatever czvf thing.tar .
tar: Must specify one of -c, -r, -t, -u, -x
$ tar C whatever -czvf thing.tar .
tar: could not chdir to 'whatever'
So the command should look like this:
cmd = ['tar', '-C', '/tmp/models', '-cf', 'model.tar.gz', '.']
I'v 200.000 rows in file not_found_test1.txt
I'am running command as bellow, but getting error in first result
tar czvf /home/bukanadmin/test.tar.gz -T $(sed -n 1,10p /home/bukanadmin/not_found_test1.txt)
This is error what i got
tar: RT #StCecilias_PE\: Sara McBay y10 finished an impressive 4th in the JG 75m hurdles final. Sara only took up the hurdles a few months ago! #dedicated #workshard: Cannot stat: No such file or directory
tar: By stcecilias_re on 11-May-2018 17\:49: Cannot stat: No such file or directory
tar: at http\://twitter.com/stcecilias_re/statuses/994892363523874816: Cannot stat: No such file or directory
tar: : Cannot stat: No such file or directory
2018/05/2018-05-11/TWITTER.DATA_POST/abfeda55a6f5b9ad1622f5484c7452f1.txt
2018/05/2018-05-11/TWITTER.DATA_POST/73a38258c9e91110065c3973b90fc841.txt
2018/05/2018-05-11/TWITTER.DATA_POST/240ae384d7e1e1d2f5f4fa1f70e7f0e8.txt
2018/05/2018-05-11/TWITTER.DATA_POST/e5a6f6c8bccc3c1d0ed9f11eb543c0a2.txt
2018/05/2018-05-11/TWITTER.DATA_POST/23a051f72192affbe2e57e91df62e372.txt
2018/05/2018-05-11/TWITTER.DATA_POST/f629b60d212a04dc4d42695f348446f3.txt
2018/05/2018-05-11/TWITTER.DATA_POST/c7037ea6e3912496fc546b7135a763f3.txt
2018/05/2018-05-11/TWITTER.DATA_POST/93675eeb45dbd6385cbf37b0d9d39341.txt
2018/05/2018-05-11/TWITTER.DATA_POST/ded62f41db4a069bd4fd36e83661cdd2.txt
tar: Exiting with failure status due to previous errors
And when i remove Sed on command Tar, i got no issue
tar czvf /home/bukanadmin/test.tar.gz -T /home/bukanadmin/not_found_test1.txt
When i trying another command in Tar like command Head, i got same issue
Can someone help me and explain please
**NEW ISSUE :) **
Last issue is done
czvf /home/bukanadmin/test.tar.gz $(sed -n 1,10p /home/bukanadmin/not_found_test1.txt)
Now, i got error when i change my code to
czvf /home/bukanadmin/test.tar.gz $(sed -n 100000,200000p /home/bukanadmin/not_found_test1.txt)
This is error explain
-bash: /usr/bin/tar: Argument list too long
Using -T is meant to read the entire file, so trying to grab just the first ten lines won't work.
You can likely elimiate -T altogether and simply do:
tar czvf file.tar.gz $( sed -n 1,10p file )
...or using head...
tar czvf file.tar.gz $( head -10 file )
I am working on a bash script that automatically downloads phpMyAdmin and extracts it. I would like to add one more step to this installer script.
Copy config.sample.inc.php as config.inc.php and update this file's line with a random blowfish secret:
$cfg['blowfish_secret'] = ''; /* YOU MUST FILL IN THIS FOR COOKIE AUTH! */
So, this is what I have I have tried:
#!/bin/bash
wget -O phpMyAdmin-4.5.3.1-english.zip https://files.phpmyadmin.net/phpMyAdmin/4.5.3.1/phpMyAdmin-4.5.3.1-english.zip;
unzip phpMyAdmin-4.5.3.1-english.zip >/dev/null 2>/dev/null;
cd phpMyAdmin-4.5.3.1-english;
mv * ..;
cd ..;
rm -rf phpMyAdmin-4.5.3.1-english;
rm -rf phpMyAdmin-4.5.3.1-english.zip;
randomBlowfishSecret=`openssl rand -base64 32`;
cat config.sample.inc.php | sed -e "s/cfg['blowfish_secret'] = ''/cfg['blowfish_secret'] = '$randomBlowfishSecret'/" > config.inc.php
When this script runs, phpMyAdmin is downloaded and extracted and the file is copied, however it does not appear to be setting the randomBlowfishSecret to $cfg['blowfish_secret'].
Any ideas?
A few points:
You don't have to end your lines with ; – a newline has the same effect.
If you want to redirect both stdout and stderr, you can use &>/dev/null instead of >/dev/null 2>/dev/null, but in the case of unzip, you can just use unzip -q to suppress output (or even -qq, but -q was already silent for me).
Instead of
cd phpMyAdmin-4.5.3.1-english;
mv * ..;
cd ..;
you can just use mv phpMyAdmin-4.5.3.1-english/* .
There are two files starting with ., which aren't moved with your command (unless you have the dotglob shell option set), so you have to move them separately:
mv phpMyAdmin-4.5.3.1-english/.*.yml .
The phpMyAdmin-4.5.3.1-english is now empty, so you can remove it with rmdir instead of rm -rf (which would have let you know that it's not empty yet).
phpMyAdmin-4.5.3.1-english.zip is just a file; no need to recursively delete it, rm -f is enough.
Instead of the deprecated backticks for command substitution, you could use the more modern $():
randomBlowfishSecret=$(openssl rand -base64 32)
The sed can be improved in three ways:
No need for cat. cat file | sed "s/x/y/g" > output (replace all x in file with y, save to output) is equivalent to sed "s/x/y/g" file > output, but the latter doesn't spawn an extra subshell.
Your regular expression
s/cfg['blowfish_secret'] = ''/
is interpreted as "cfg, and the any ONE character from the list between [ and ]", but you want literal [ and ], so they have to be escaped: \[ and \]. In the replacement string, they don't have to be escaped.
The password generated by openssl rand can contain forward slashes, which confuses sed. You can use a different delimiter for sed, for example "s|x|y|" instead of "s/x/y/".
All of these are cosmetic, except the last two sed bullet points: those can break the script. Well, and the missing hidden files might be annoying, too.
Cleaned up version that works for me:
#!/bin/bash
wget -O phpMyAdmin-4.5.3.1-english.zip https://files.phpmyadmin.net/phpMyAdmin/4.5.3.1/phpMyAdmin-4.5.3.1-english.zip
unzip -q phpMyAdmin-4.5.3.1-english.zip
mv phpMyAdmin-4.5.3.1-english/* .
mv phpMyAdmin-4.5.3.1-english/.*.yml .
rmdir phpMyAdmin-4.5.3.1-english
rm -f phpMyAdmin-4.5.3.1-english.zip
randomBlowfishSecret=$(openssl rand -base64 32)
sed -e "s|cfg\['blowfish_secret'\] = ''|cfg['blowfish_secret'] = '$randomBlowfishSecret'|" config.sample.inc.php > config.inc.php
This a really short question. But is there something syntatically wrong with placing a variable $example as an argument for tar in a bash file?
I have the file written as
//only portion that really matters
#!/bin/bash
...
tar -cvpzf $filename $backup_source
//here's the actual code
#!/bin/bash
backup_source="~/momobobo"
backup_dest="~/momobobo_backup/"
dater=`date '+%m-%d-%Y-%H-%M-%S'`
filename="$backup_dest$dater.tgz"
echo “Backing Up your Linux System”
tar -cvpzf $filename $backup_source
echo tar -cvpzf $filename $backup_source
echo “Backup finished”
//and heres the error
“Backing Up your Linux System”
tar: ~/momobobo: Cannot stat: No such file or directory
tar (child): ~/momobobo_backup/07-02-2013-18-34-12.tgz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
tar -cvpzf ~/momobobo_backup/07-02-2013-18-34-12.tgz ~/momobobo
Notice the "echo tar ...". When I copy and paste the output and run it in my terminal there is no problem taring the file. I'm currently running Xubuntu and I already did an update.
~ doesn't expand to your home directory in double quotes.
Just remove the double quotes:
backup_source=~/momobobo
backup_dest=~/momobobo_backup/
In cases where you have things you would want to quote, you can use ~/"momobobo"