How to replace backup file with timestamp in its name without producing duplicates in Linux Bash (shell script) - linux

#!/usr/bin/env bash
# usage: wttr [location], e.g. wttr Berlin, wttr New\ York
# Standard location if no parameters were passed
location=''
language=''
time=`date`
# Expand terminal display
if [ -z "$language" ]; then
language=${LANG%_*}
fi
curl \
-H -x "Accept-Language: ${language}" \
-x wttr.in/"${1:-${location}}" |
head -n 7 |
tee /home/of/weather.txt |
tee -a /home/of/weather.log |
tee /home/of/BACKUP/weather_"$time".txt
#cp weather.txt /home/of/BACKUP
#mv -f /home/of/BACKUP/weather.txt /home/of/BACKUP/weather_"$time".txt
I'm very new to Linux Bash and Shell scripting and can't figure out the following.
I have a problem with the shell script above.
It works fine so far (curling ASCII data from website and writing it to weather.txt and .log).
It is also in set in crontab to run every 5 minutes.
Now I need to make a backup of weather.txt under /home/of/, in /home/of/BACKUP with the filename weather_<timestamp>.txt.
I tried to delete (rm weather*.txt) the old timestamped files in /home/of/BACKUP and then copy and rename the file everytime the cronjob is running.
I tried piping cp and mv and so on but somehow I end up with producing many duplicates as due to the timestamp the filenames are different or nothing at all when I try to delete the content of the folder first.
All I need is ONE backup file of weather.txt as weather_<timestamp>.txt which gets updated every 5 minutes with the actual timestamp bit I can't figure it out.

If I understand your question at all, then simply
rm -f /home/of/BACKUP/weather_*.txt
cp /home/of/weather.txt /home/of/BACKUP/weather_"$time".txt
cp lets you rename the file you are copying to; it doesn't make sense to separately cp and then mv.
For convenience, you might want to cd /home/of so you don't have to spell out the full paths, or put them in a variable.
dir=/home/of
rm -f "$dir"/BACKUP/weather_*.txt
cp "$dir"/weather.txt "$dir"/BACKUP/weather_"$time".txt
If you are running out of the cron of the user named of then your current working directory will be /home/of (though if you need to be able to run the script manually from anywhere, that cannot be guaranteed).
Obviously, make sure the wildcard doesn't match any files you actually want to keep.
As an aside, you can simplify the tee commands slightly. If this should only update the files and not print anything to the terminal, you could even go with
curl \
-H -x "Accept-Language: ${language}" \
-x wttr.in/"${1:-${location}}" |
head -n 7 |
tee /home/of/weather.txt \
>>/home/of/weather.log
I took out the tee to the backup file since you are deleting it immediately after anyway. You could alternatively empty the backup directory first, but then you will have no backups if the curl fails.
If you want to keep printing to the terminal, too, probably run the script with redirection to /dev/null in the cron job to avoid having your email inbox fill up with unread copies of the output.

Related

Create Directory, download file and execute command from list of URL

I am working on a Red Hat Linux server. My end goal is to run CRB-BLAST on multiple fasta files and have the results from those in separate directories.
My approach is to download the fasta files using wget then run the CRB-BLAST. I have multiple files and would like to be able to download them each to their own directory (the name perhaps should come from the URL list files), then run the CRB-BLAST.
Example URLs:
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_3370_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_CB_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_13_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_37_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_123_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_195_chr.v0.1.liftover.CDS.fasta.gz
http://assemblies/Genomes/final_assemblies/10x_assemblies_v0.1/TC_31_chr.v0.1.liftover.CDS.fasta.gz
Ideally, the file name determines the directory name, for example, TC_3370/.
I think there might be a solution with cat URL.txt | mkdir | cd | wget | crb-blast
Currently I just run the commands in line:
mkdir TC_3370
cd TC_3370/
wget url
http://assemblies/Genomes/final_assemblies/10x_meta_assemblies_v1.0/TC_3370_chr.v1.0.maker.CDS.fasta.gz
crb-blast -q TC_3370_chr.v1.0.maker.CDS.fasta.gz -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
Try this Shellcheck-clean program:
#! /bin/bash -p
while read -r url; do
file=${url##*/}
dir=${file%%_chr.*}
mkdir -v -- "$dir"
(
cd "./$dir" || exit 1
wget -- "$url"
crb-blast -q "$file" -t TCV2_annot_cds.fna -e 1e-20 -h 4 -o rbbh_TC
)
done <URL.txt
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${url##*/} etc.
The subshell (( ... )) is used to ensure that the cd doesn't affect the main program.
Another implementation
#!/bin/sh
# Read lines as url as long as it can
while read -r url
do
# Get file name by stripping-out anything up to the last / from the url
file_name=${url##*/}
# Get the destination dir name by stripping anything from the first __chr
dest_dir=${file_name%%_chr*}
# Compose the wget output path
fasta_path="$dest_dir/$file_name"
if
# Successfully created the destination directory AND
mkdir -p -- "$dest_dir" &&
# Successfully downloaded the file
wget --output-file="$fasta_path" --quiet -- "$url"
then
# Process the fasta file into fna
fna_path="$dest_dir/TCV2_annot_cds.fna"
crb-blast -q "$fasta_path" -t "$fna_path" -e 1e-20 -h 4 -o rbbh_TC
else
# Cleanup remove destination directory if any of mkdir or wget failed
rm -fr -- "$dest_dir"
fi
# reading from the URL.txt file for the whole while loop
done < URL.txt
Download files from list is task for -i file option, if you have file named say urls.txt with one URL per line you might simply do
wget -i urls.txt
Note that this will put all files inside current working directory, so if you wish to have them in separate dirs, you would need to move them after wget finish.

Nested if in for loop appears to use all results instead of single one

I'm not sure if this is a curl issue or if it's an issue with my code.
I am uploading a directory's content to a remote server using curl -T and since curl does not, according to the help file and some googling, have a way to skip file if it exists I am trying to script around it.
#!/bin/sh
cd /foo/bar
curl --user foo:bar -ls ftp://ftp.server.foo/remote/dir/ > /foo/bar/temp.txt
for file in ./*.dat
do
if ! grep -Fq "$file" temp.txt
then
curl --user foo:bar -T "$file" ftp://ftp.server.foo/remote/dir/
fi
done
rm -f /foo/bar/temp.txt
I currently have 6 files in /foo/bar and those are uploaded to remote already. I removed one of the files from remote for testing purposes, but all 6 files were transferred anew.
Using the code above, as long as there is a not-match against temp.txt, curl uploads ALL files that's found in the for loop, regardless if it matches the if condition or not. Am I missing something royally obvious or this a curl thing and I'm better off over in SuperUser?
Thanks to comment from Jonathan Leffler I found via sh -x script.sh that my for statement was the issue. The way I had written it, it sent ./file1.dat to grep to try to find in temp.txt, which only had file1.dat and not ./file1.dat specifically.
Changed to for file in *.dat and this fixed the issue.

Execute commands in specific location and depending on answer of previous command

I am currently working on a Text-to-speech project and I need to write bash script which will, when it is called, execute two commands. If the first command returns the proper answer (if returns an answer at all), the second command will be called and executed.
My question is, how can I write a script, that executes shell commands in a specific certain file system location?
For example, I need to be in the directory /opt/text/example and execute this command:
sudo ./bin/sample_read -I ../languages/ -I ../languages -v dave -T 2 \
-i /opt/text/example.txt -F 22 -O embedded-pro -o out_file.pcm
and then to wait for the answer, then (if it is good) execute the second command.
The second command is
aplay -f S16_LE -r 22050 -c 1 out_file.pcm
This should help:
pushd /path/to/directory
my_var=$(command1)
if [ "$my_var" == "expected_result" ]; then
command2
fi
popd
You basically run command1 and store its output in my_var. Then you compare the content of $my_var with whatever you're expecting.
Also pushd <path>/popd allow you to move to a directory and back.

Is there a way to perform a "tail -f" from an url?

I currently use tail -f to monitor a log file: this way I get an autorefreshing console monitoring a web server.
Now, said webserver was moved to another host and I have no shell privileges for that.
Nevertheless I have a .txt network path, which in the end is a log file which is constantly updated.
So, I'd like to do something like tail -f, but on that url.
Would it be possible?In the end "in linux everything is a file" so..
You can do auto-refresh with help of watch combined with wget.
It won't show history, like tail -f, rather update screen like top.
Example of command, that shows content on file.txt on the screen, and update output every five seconds:
watch -n 5 wget -qO- http://fake.link/file.txt
Also, you can output n last lines, instead of the whole file:
watch -n 5 "wget -qO- http://fake.link/file.txt | tail"
In case if you still need behaviour like "tail -f" (with keeping history), I think you need to write a script that will download log file each time period, compare it to previous downloaded version, and then print new lines. Should be quite easy.
I wrote a simple bash script to fetch URL content each 2 seconds and compare with local file output.txt then append the diff to the same file
I wanted to stream AWS amplify logs in my Jenkins pipeline
while true; do comm -13 --output-delimiter="" <(cat output.txt) <(curl -s "$URL") >> output.txt; sleep 2; done
don't forget to create empty file output.txt file first
: > output.txt
view the stream :
tail -f output.txt
original comment : https://stackoverflow.com/a/62347827/2073339
UPDATE:
I found better solution using wget here:
while true; do wget -ca -o /dev/null -O output.txt "$URL"; sleep 2; done
https://superuser.com/a/514078/603774
I've made this small function and added it to the .*rc of my shell. This uses wget -c, so it does not re-download the whole page:
# Poll logs continuously over HTTP
logpoll() {
FILE=$(mktemp)
echo "———————— LOGPOLLING TO $FILE ————————"
tail -f $FILE &
tail_pid=$!
bg %1
stop=0
trap "stop=1" SIGINT SIGTERM
while [ $stop -ne 1 ]; do wget -co /dev/null -O $FILE "$1"; sleep 2; done
echo "——————————— LOGPOLL DONE ————————————"
kill $tail_pid
rm $FILE
trap - SIGINT SIGTERM
}
Explanation:
Create a temporary logfile using mktemp and save its path to $FILE
Make tail -f output the logfile continuously in the background
Make ctrl+c set stop to 1 instead of exiting the function
Loop until stop bit is set, i.e. until the user presses ctrl+c
wget given URL in a loop every two seconds:
-c - "continue getting partially downloaded file", so that wget continues instead of truncating the file and downloading again
-o /dev/null - wget's log messages shall be thrown into the void
-O $FILE - output the contents to the temp logfile we've created
Clean up after yourself: kill the tail -f, delete the temporary logfile, unset the signal handlers.
The proposed solutions periodically download the full file.
To avoid that I've created a package and published in NPM that does a HEAD request ( getting the size of the file ) and requesting only the last bytes.
Check it out and let me know if you need any help.
https://www.npmjs.com/package/#imdt-os/url-tail

Parallel download using Curl command line utility

I want to download some pages from a website and I did it successfully using curl but I was wondering if somehow curl downloads multiple pages at a time just like most of the download managers do, it will speed up things a little bit. Is it possible to do it in curl command line utility?
The current command I am using is
curl 'http://www...../?page=[1-10]' 2>&1 > 1.html
Here I am downloading pages from 1 to 10 and storing them in a file named 1.html.
Also, is it possible for curl to write output of each URL to separate file say URL.html, where URL is the actual URL of the page under process.
My answer is a bit late, but I believe all of the existing answers fall just a little short. The way I do things like this is with xargs, which is capable of running a specified number of commands in subprocesses.
The one-liner I would use is, simply:
$ seq 1 10 | xargs -n1 -P2 bash -c 'i=$0; url="http://example.com/?page${i}.html"; curl -O -s $url'
This warrants some explanation. The use of -n 1 instructs xargs to process a single input argument at a time. In this example, the numbers 1 ... 10 are each processed separately. And -P 2 tells xargs to keep 2 subprocesses running all the time, each one handling a single argument, until all of the input arguments have been processed.
You can think of this as MapReduce in the shell. Or perhaps just the Map phase. Regardless, it's an effective way to get a lot of work done while ensuring that you don't fork bomb your machine. It's possible to do something similar in a for loop in a shell, but end up doing process management, which starts to seem pretty pointless once you realize how insanely great this use of xargs is.
Update: I suspect that my example with xargs could be improved (at least on Mac OS X and BSD with the -J flag). With GNU Parallel, the command is a bit less unwieldy as well:
parallel --jobs 2 curl -O -s http://example.com/?page{}.html ::: {1..10}
Well, curl is just a simple UNIX process. You can have as many of these curl processes running in parallel and sending their outputs to different files.
curl can use the filename part of the URL to generate the local file. Just use the -O option (man curl for details).
You could use something like the following
urls="http://example.com/?page1.html http://example.com?page2.html" # add more URLs here
for url in $urls; do
# run the curl job in the background so we can start another job
# and disable the progress bar (-s)
echo "fetching $url"
curl $url -O -s &
done
wait #wait for all background jobs to terminate
As of 7.66.0, the curl utility finally has built-in support for parallel downloads of multiple URLs within a single non-blocking process, which should be much faster and more resource-efficient compared to xargs and background spawning, in most cases:
curl -Z 'http://httpbin.org/anything/[1-9].{txt,html}' -o '#1.#2'
This will download 18 links in parallel and write them out to 18 different files, also in parallel. The official announcement of this feature from Daniel Stenberg is here: https://daniel.haxx.se/blog/2019/07/22/curl-goez-parallel/
For launching of parallel commands, why not use the venerable make command line utility.. It supports parallell execution and dependency tracking and whatnot.
How? In the directory where you are downloading the files, create a new file called Makefile with the following contents:
# which page numbers to fetch
numbers := $(shell seq 1 10)
# default target which depends on files 1.html .. 10.html
# (patsubst replaces % with %.html for each number)
all: $(patsubst %,%.html,$(numbers))
# the rule which tells how to generate a %.html dependency
# $# is the target filename e.g. 1.html
%.html:
curl -C - 'http://www...../?page='$(patsubst %.html,%,$#) -o $#.tmp
mv $#.tmp $#
NOTE The last two lines should start with a TAB character (instead of 8 spaces) or make will not accept the file.
Now you just run:
make -k -j 5
The curl command I used will store the output in 1.html.tmp and only if the curl command succeeds then it will be renamed to 1.html (by the mv command on the next line). Thus if some download should fail, you can just re-run the same make command and it will resume/retry downloading the files that failed to download during the first time. Once all files have been successfully downloaded, make will report that there is nothing more to be done, so there is no harm in running it one extra time to be "safe".
(The -k switch tells make to keep downloading the rest of the files even if one single download should fail.)
Curl can also accelerate a download of a file by splitting it into parts:
$ man curl |grep -A2 '\--range'
-r/--range <range>
(HTTP/FTP/SFTP/FILE) Retrieve a byte range (i.e a partial docu-
ment) from a HTTP/1.1, FTP or SFTP server or a local FILE.
Here is a script that will automatically launch curl with the desired number of concurrent processes: https://github.com/axelabs/splitcurl
Starting from 7.68.0 curl can fetch several urls in parallel. This example will fetch urls from urls.txt file with 3 parallel connections:
curl --parallel --parallel-immediate --parallel-max 3 --config urls.txt
urls.txt:
url = "example1.com"
output = "example1.html"
url = "example2.com"
output = "example2.html"
url = "example3.com"
output = "example3.html"
url = "example4.com"
output = "example4.html"
url = "example5.com"
output = "example5.html"
curl and wget cannot download a single file in parallel chunks, but there are alternatives:
aria2 (written in C++, available in Deb and Cygwin repo's)
aria2c -x 5 <url>
axel (written in C, available in Deb repo)
axel -n 5 <url>
wget2 (written in C, available in Deb repo)
wget2 --max-threads=5 <url>
lftp (written in C++, available in Deb repo)
lftp -n 5 <url>
hget (written in Go)
hget -n 5 <url>
pget (written in Go)
pget -p 5 <url>
Run a limited number of process is easy if your system have commands like pidof or pgrep which, given a process name, return the pids (the count of the pids tell how many are running).
Something like this:
#!/bin/sh
max=4
running_curl() {
set -- $(pidof curl)
echo $#
}
while [ $# -gt 0 ]; do
while [ $(running_curl) -ge $max ] ; do
sleep 1
done
curl "$1" --create-dirs -o "${1##*://}" &
shift
done
to call like this:
script.sh $(for i in `seq 1 10`; do printf "http://example/%s.html " "$i"; done)
The curl line of the script is untested.
I came up with a solution based on fmt and xargs. The idea is to specify multiple URLs inside braces http://example.com/page{1,2,3}.html and run them in parallel with xargs. Following would start downloading in 3 process:
seq 1 50 | fmt -w40 | tr ' ' ',' \
| awk -v url="http://example.com/" '{print url "page{" $1 "}.html"}' \
| xargs -P3 -n1 curl -o
so 4 downloadable lines of URLs are generated and sent to xargs
curl -o http://example.com/page{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}.html
curl -o http://example.com/page{17,18,19,20,21,22,23,24,25,26,27,28,29}.html
curl -o http://example.com/page{30,31,32,33,34,35,36,37,38,39,40,41,42}.html
curl -o http://example.com/page{43,44,45,46,47,48,49,50}.html
Bash 3 or above lets you populate an array with multiple values as it expands sequence expressions:
$ urls=( "" http://example.com?page={1..4} )
$ unset urls[0]
Note the [0] value, which was provided as shorthand to make the indices line up with page numbers, since bash arrays autonumber starting at zero. This strategy obviously might not always work. Anyway, you can unset it in this example.
Now you have a an array, and you can verify the contents with declare -p:
$ declare -p urls
declare -a urls=([1]="http://example.com?Page=1" [2]="http://example.com?Page=2" [3]="http://example.com?Page=3" [4]="http://example.com?Page=4")
Now that you have a list of URLs in an array, expand the array into a curl command line:
$ curl $(for i in ${!urls[#]}; do echo "-o $i.html ${urls[$i]}"; done)
The curl command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1) to a common server, but it needs the -o option before each one in order to download and save each target. Note that characters within some URLs may need to be escaped to avoid interacting with your shell.
I am not sure about curl, but you can do that using wget.
wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains website.org \
--no-parent \
www.website.org/tutorials/html/

Resources