Bash script to download graphic files from website - linux

I'm trying to write bash script in Linux (Debian), that will be used for downloading graphic files from website given by user during start-up. I'm not sure if my code is correct but first problem is when i try to run my script with website e.g. http://www.bbc.com/ an error shows: http://www.bbc.com/ : invalid identifier. I even tried a simple website that has only a few JPG files. My next problem is to find out how to download files from .txt file where the images Internet adresses are included.
#!/bin/bash
# $1 - URL $2 - new catalog name
read $1 $2
url=$1
fold=$2
mkdir -p $fold
if [$# -ne 3];
then
echo "Wrong command"
exit -1
fi
curl $url | grep -o -e "<img src=\".*\"+>" > img_list.txt |wc -l img_list.txt | lin=${% *}
baseurl=$(echo $url | grep -o "https?://[a-z.]*"")
curl -s $url | egrep -o "<img src\=[^>]*>" | sed 's/<img src=\"\([^"]*\).*/\1/.*/\1/g' > url_list.txt
sed -i "s|^/|$baseurl/|" url_list.txt
cd $fold;
what can I do next?

For download every image from the webpage I would to use:
mech-dump --absolute --images http://example.com | xargs -n1 curl -O
but this need to be installed the mech-dump command from the WWW::Mechanize package.
Using the list file
while read -r url folder
do
mkdir -p "$folder" || exit 1
(cd "$folder" && mech-dump --absolute --images "$url" | xargs -n1 curl -O)
done < list.txt
(assuming than no url nor folder containing a space).

an error shows: http://www.bbc.com/ : invalid identifier
Your use of read is wrong; change
read $1 $2
url=$1
fold=$2
to
read url fold
or decide to specify the arguments on the command line and omit only read $1 $2.
Also, each operand in [ ] must be separated from the brackets; change
if [$# -ne 3];
to
if [ -z "$fold" ]

Related

Multiple process curl command for urls to output to individual files

I am attempting to curl multiple urls in a bash command. Eventually I will be curling a large number of Urls so I am using xargs to use multiple processes to speed up the process.
My file consists of x number of URLs:
https://someurl.com
https://someotherurl.com
My issue comes when attempting to output the results to separate files named after the URLs I curl.
The bash command I have is:
xargs -P 5 -n 1 -I% curl -k -L % -0 % < urls.txt
When I run this I get 'Failed to create file https://someotherurl.com'
You cannot create a file with / in the filename. You could do it this way:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename="${line#https://}"
echo "FILENAME: $filename"
# YOUR CURL COMMAND HERE, USING $filename
fi
done < url.txt
it ignores empty lines
variable substitution is used to remove the https:// part of each URL
this will allow you to create the file
Note: if your URLs containt sub-directories, they must be removed as well.
Ex: you want to do https://www.exemple.com/some/sub/dir
The script I suggested here would try to create a file named "www.exemple.com/some/sub/dir". In this case, you could replace the / with _ using tr.
The script would become:
#!/bin/bash
while IFS= read -r line
do
echo "LINE: $line"
if [[ "$line" != "" ]]
then
filename=$(echo "$line" | tr '/' '_')
filename2=${filename#https:__}
echo "FILENAME: $filename2"
# YOUR CURL COMMAND HERE, USING $filename2
fi
done < url.txt
Because your question is ambiguous, I would assume:
You have a file urls.txt that contains URLs separated by LF.
You want to download all URLs by curl and use each URL as its filename.
Unfortunately, that's not possible because URL contains invalid characters like slash /. Alternatively, for this case, I would suggest you use Bsse64 safe mode to decode URL before saving to file based on RFC 3548.
After applying this requirement, your script would become like:
seq 100 | xargs -I# echo 'https://example.com?#' > urls.txt
xargs -P0 -L1 sh -c 'curl -SskL0 -o $(printf %s "$1" | uuencode -m /dev/stdout | sed "1d;\$d" | tr +/ -_) "$1"' sh < urls.txt

How to develop a Condition to close program only when log file has been updated in Bash Script [duplicate]

I want to run a shell script when a specific file or directory changes.
How can I easily do that?
You may try entr tool to run arbitrary commands when files change. Example for files:
$ ls -d * | entr sh -c 'make && make test'
or:
$ ls *.css *.html | entr reload-browser Firefox
or print Changed! when file file.txt is saved:
$ echo file.txt | entr echo Changed!
For directories use -d, but you've to use it in the loop, e.g.:
while true; do find path/ | entr -d echo Changed; done
or:
while true; do ls path/* | entr -pd echo Changed; done
I use this script to run a build script on changes in a directory tree:
#!/bin/bash -eu
DIRECTORY_TO_OBSERVE="js" # might want to change this
function block_for_change {
inotifywait --recursive \
--event modify,move,create,delete \
$DIRECTORY_TO_OBSERVE
}
BUILD_SCRIPT=build.sh # might want to change this too
function build {
bash $BUILD_SCRIPT
}
build
while block_for_change; do
build
done
Uses inotify-tools. Check inotifywait man page for how to customize what triggers the build.
Use inotify-tools.
The linked Github page has a number of examples; here is one of them.
#!/bin/sh
cwd=$(pwd)
inotifywait -mr \
--timefmt '%d/%m/%y %H:%M' --format '%T %w %f' \
-e close_write /tmp/test |
while read -r date time dir file; do
changed_abs=${dir}${file}
changed_rel=${changed_abs#"$cwd"/}
rsync --progress --relative -vrae 'ssh -p 22' "$changed_rel" \
usernam#example.com:/backup/root/dir && \
echo "At ${time} on ${date}, file $changed_abs was backed up via rsync" >&2
done
How about this script? Uses the 'stat' command to get the access time of a file and runs a command whenever there is a change in the access time (whenever file is accessed).
#!/bin/bash
while true
do
ATIME=`stat -c %Z /path/to/the/file.txt`
if [[ "$ATIME" != "$LTIME" ]]
then
echo "RUN COMMNAD"
LTIME=$ATIME
fi
sleep 5
done
Check out the kernel filesystem monitor daemon
http://freshmeat.net/projects/kfsmd/
Here's a how-to:
http://www.linux.com/archive/feature/124903
As mentioned, inotify-tools is probably the best idea. However, if you're programming for fun, you can try and earn hacker XPs by judicious application of tail -f .
Just for debugging purposes, when I write a shell script and want it to run on save, I use this:
#!/bin/bash
file="$1" # Name of file
command="${*:2}" # Command to run on change (takes rest of line)
t1="$(ls --full-time $file | awk '{ print $7 }')" # Get latest save time
while true
do
t2="$(ls --full-time $file | awk '{ print $7 }')" # Compare to new save time
if [ "$t1" != "$t2" ];then t1="$t2"; $command; fi # If different, run command
sleep 0.5
done
Run it as
run_on_save.sh myfile.sh ./myfile.sh arg1 arg2 arg3
Edit: Above tested on Ubuntu 12.04, for Mac OS, change the ls lines to:
"$(ls -lT $file | awk '{ print $8 }')"
Add the following to ~/.bashrc:
function react() {
if [ -z "$1" -o -z "$2" ]; then
echo "Usage: react <[./]file-to-watch> <[./]action> <to> <take>"
elif ! [ -r "$1" ]; then
echo "Can't react to $1, permission denied"
else
TARGET="$1"; shift
ACTION="$#"
while sleep 1; do
ATIME=$(stat -c %Z "$TARGET")
if [[ "$ATIME" != "${LTIME:-}" ]]; then
LTIME=$ATIME
$ACTION
fi
done
fi
}
Quick solution for fish shell users who wanna track a single file:
while true
set old_hash $hash
set hash (md5sum file_to_watch)
if [ $hash != $old_hash ]
command_to_execute
end
sleep 1
end
replace md5sum with md5 if on macos.
Here's another option: http://fileschanged.sourceforge.net/
See especially "example 4", which "monitors a directory and archives any new or changed files".
inotifywait can satisfy you.
Here is a common sample for it:
inotifywait -m /path -e create -e moved_to -e close_write | # -m is --monitor, -e is --event
while read path action file; do
if [[ "$file" =~ .*rst$ ]]; then # if suffix is '.rst'
echo ${path}${file} ': '${action} # execute your command
echo 'make html'
make html
fi
done
Suppose you want to run rake test every time you modify any ruby file ("*.rb") in app/ and test/ directories.
Just get the most recent modified time of the watched files and check every second if that time has changed.
Script code
t_ref=0; while true; do t_curr=$(find app/ test/ -type f -name "*.rb" -printf "%T+\n" | sort -r | head -n1); if [ $t_ref != $t_curr ]; then t_ref=$t_curr; rake test; fi; sleep 1; done
Benefits
You can run any command or script when the file changes.
It works between any filesystem and virtual machines (shared folders on VirtualBox using Vagrant); so you can use a text editor on your Macbook and run the tests on Ubuntu (virtual box), for example.
Warning
The -printf option works well on Ubuntu, but do not work in MacOS.

Watch file to be updated [duplicate]

I want to run a shell script when a specific file or directory changes.
How can I easily do that?
You may try entr tool to run arbitrary commands when files change. Example for files:
$ ls -d * | entr sh -c 'make && make test'
or:
$ ls *.css *.html | entr reload-browser Firefox
or print Changed! when file file.txt is saved:
$ echo file.txt | entr echo Changed!
For directories use -d, but you've to use it in the loop, e.g.:
while true; do find path/ | entr -d echo Changed; done
or:
while true; do ls path/* | entr -pd echo Changed; done
I use this script to run a build script on changes in a directory tree:
#!/bin/bash -eu
DIRECTORY_TO_OBSERVE="js" # might want to change this
function block_for_change {
inotifywait --recursive \
--event modify,move,create,delete \
$DIRECTORY_TO_OBSERVE
}
BUILD_SCRIPT=build.sh # might want to change this too
function build {
bash $BUILD_SCRIPT
}
build
while block_for_change; do
build
done
Uses inotify-tools. Check inotifywait man page for how to customize what triggers the build.
Use inotify-tools.
The linked Github page has a number of examples; here is one of them.
#!/bin/sh
cwd=$(pwd)
inotifywait -mr \
--timefmt '%d/%m/%y %H:%M' --format '%T %w %f' \
-e close_write /tmp/test |
while read -r date time dir file; do
changed_abs=${dir}${file}
changed_rel=${changed_abs#"$cwd"/}
rsync --progress --relative -vrae 'ssh -p 22' "$changed_rel" \
usernam#example.com:/backup/root/dir && \
echo "At ${time} on ${date}, file $changed_abs was backed up via rsync" >&2
done
How about this script? Uses the 'stat' command to get the access time of a file and runs a command whenever there is a change in the access time (whenever file is accessed).
#!/bin/bash
while true
do
ATIME=`stat -c %Z /path/to/the/file.txt`
if [[ "$ATIME" != "$LTIME" ]]
then
echo "RUN COMMNAD"
LTIME=$ATIME
fi
sleep 5
done
Check out the kernel filesystem monitor daemon
http://freshmeat.net/projects/kfsmd/
Here's a how-to:
http://www.linux.com/archive/feature/124903
As mentioned, inotify-tools is probably the best idea. However, if you're programming for fun, you can try and earn hacker XPs by judicious application of tail -f .
Just for debugging purposes, when I write a shell script and want it to run on save, I use this:
#!/bin/bash
file="$1" # Name of file
command="${*:2}" # Command to run on change (takes rest of line)
t1="$(ls --full-time $file | awk '{ print $7 }')" # Get latest save time
while true
do
t2="$(ls --full-time $file | awk '{ print $7 }')" # Compare to new save time
if [ "$t1" != "$t2" ];then t1="$t2"; $command; fi # If different, run command
sleep 0.5
done
Run it as
run_on_save.sh myfile.sh ./myfile.sh arg1 arg2 arg3
Edit: Above tested on Ubuntu 12.04, for Mac OS, change the ls lines to:
"$(ls -lT $file | awk '{ print $8 }')"
Add the following to ~/.bashrc:
function react() {
if [ -z "$1" -o -z "$2" ]; then
echo "Usage: react <[./]file-to-watch> <[./]action> <to> <take>"
elif ! [ -r "$1" ]; then
echo "Can't react to $1, permission denied"
else
TARGET="$1"; shift
ACTION="$#"
while sleep 1; do
ATIME=$(stat -c %Z "$TARGET")
if [[ "$ATIME" != "${LTIME:-}" ]]; then
LTIME=$ATIME
$ACTION
fi
done
fi
}
Quick solution for fish shell users who wanna track a single file:
while true
set old_hash $hash
set hash (md5sum file_to_watch)
if [ $hash != $old_hash ]
command_to_execute
end
sleep 1
end
replace md5sum with md5 if on macos.
Here's another option: http://fileschanged.sourceforge.net/
See especially "example 4", which "monitors a directory and archives any new or changed files".
inotifywait can satisfy you.
Here is a common sample for it:
inotifywait -m /path -e create -e moved_to -e close_write | # -m is --monitor, -e is --event
while read path action file; do
if [[ "$file" =~ .*rst$ ]]; then # if suffix is '.rst'
echo ${path}${file} ': '${action} # execute your command
echo 'make html'
make html
fi
done
Suppose you want to run rake test every time you modify any ruby file ("*.rb") in app/ and test/ directories.
Just get the most recent modified time of the watched files and check every second if that time has changed.
Script code
t_ref=0; while true; do t_curr=$(find app/ test/ -type f -name "*.rb" -printf "%T+\n" | sort -r | head -n1); if [ $t_ref != $t_curr ]; then t_ref=$t_curr; rake test; fi; sleep 1; done
Benefits
You can run any command or script when the file changes.
It works between any filesystem and virtual machines (shared folders on VirtualBox using Vagrant); so you can use a text editor on your Macbook and run the tests on Ubuntu (virtual box), for example.
Warning
The -printf option works well on Ubuntu, but do not work in MacOS.

wget does not terminate

I have the following problem with my code:
After the downloads are all finished the script does not terminate. It seems to wait for more urls.
My code:
#!/bin/bash
cd "$1"
test=$(wget -qO- "$3" | grep --line-buffered "tarball_url" | cut -d '"' -f4)
echo test:
echo $test
echo ==============
wget -nd -N -q --trust-server-names --content-disposition -i- ${test}
An example for $test:
https://api.github.com/repos/matrixssl/matrixssl/tarball/3-9-1-open https://api.github.com/repos/matrixssl/matrixssl/tarball/3-9-0-open
-i means to get the list of URLs from a file, and using - in place of the file means to get them from standard input. So it's waiting for you to type the URLs.
If $test contains the URLs, you don't need to use -i, just list the URLs on the command line:
wget -nd -N -q --trust-server-names --content-disposition $test

Changing directory and to download file using bash script and also extract it

I created a script to download file from URL and I want to download it in the specific directory but the problem is when its time in downloading it will not put to the directory given and also when extracting the file is in the given directory.
diskspace=$(df -h /var/ | sed '1d' | awk '{print $5}' | cut -d'%' -f1)
bundle=$(awk -F = '{print $2}' config.txt)
allowed=10
if [ "${diskspace}" -gt "${allowed}" ]; then
cd `/var/`
wget $bundle
else
echo "Not enough space to download the bundle"
echo $output
exit
fi
while true; do
for f in *.tar.gz; do
case $f in '*.tar.gz') exit 0;; esac
tar zxf "$f"
rm -v "$f"
done
done
Can Someone help me to this problem ? The thing that I want to happen is to download the file in the given directory and also extract it there. Help is greatly appreciated.

Resources