Retrieve URL components using bash - string

I have a massive list of URLs in a text file, which I'd like to download using wget. This seems simple enough:
#!/bin/bash
cat list.txt | \
while read CMD; do
wget $CMD; done;
However, wget uses the basename of the file as the download location, which results in renaming schemes, such as file.txt.1, file.txt.2 and so on.
An $URL can look like this:
http://sub.domain.com/some/folder/to/file.txt
Where http://sub.domain.com/some/ is always the same. Now, in JS I would do $URL.split("http://sub.domain.com/some/")[1], but this doesn't quite seem to work in Bash:
IFS="http://sub.domain.com/some/" read -a url <<< "http://sub.domain.com/some/folder/to/file.txt"
echo "${url[1]}"; // always empty.

Use the shell's parameter expansion operator to remove the prefix:
base=${CMD#http://sub.domain.com/some/}
BTW, you should get out of the habit of using all-uppercase variable names in shell scripts. These are conventionally used for environment variables.

If the length of the prefix is static you could do the following:
#!/bin/bash
while read line
do
suffix=${line:${#line} - LENGTH}
wget $line -O $suffix
done < "list.txt"

Related

Rename large folder of Jpegs

I have a large folder of jpegs, which I would like to rename sequentially to image01.jpg, image02.jpg...image533jpg etc.
I have tried using the following
find ‘/myImages/‘ -maxdepth 1 -name ‘*.jpg’ | sort -n | awk 'BEGIN{ x=1 }{printf "mv \"%s\" \”/myImages/image%04d.jpg\”\n”, $0, x++ }' | bash
which I got from here: http://www.algissalys.com/how-to/how-to-quickly-rename-modify-and-scale-all-images-in-a-directory-using-linux
However, this is only returning
>
And then nothing happens, any suggestions would be great.
The easiest way to do that is with rename which you can install with homebrew using:
brew install rename
Then, you can go into your directory containing the images and run:
rename --dry-run -X -e '$_ = "$N"' *jpg
Sample Output
'a.jpg' would be renamed to '1.jpg'
'article.jpg' would be renamed to '2.jpg'
'blob-0.jpg' would be renamed to '3.jpg'
'blob-1.jpg' would be renamed to '4.jpg'
'blob-2.jpg' would be renamed to '5.jpg'
'blob-3.jpg' would be renamed to '6.jpg'
If that looks correct, you can run it again without the --dry-run to actually do it, rather than just telling you what it will do.
If you want your names zero-padded, the easiest is to let rename work out how much padding you need automatically like this:
rename --dry-run -X -N ...01 -e '$_ = "$N"' *jpg
The benefits of using rename are that:
it is simple and powerful
it will warn you before overwriting any files
it can do a dry run and tell you what would happen without actually doing anything
If you want an explanation of the command '$_ = "$N"' then read on...
The rename command is actually a Perl script, so the part I mention above is just a Perl script enclosed in single quotes. The $N is just a Perl variable that expands to be a sequentially increasing number. The Perl special variable $_ is filled with the name of the current file before your little Perl script is executed, and crucially, you are expected to set it to the name you want that input file renamed as.
You could do that with a bash script. Say you have the following in a file called rename_images.
#!/bin/bash
declare -a FILESERIES
FILESERIES=(`ls $1`)
NUM=${#FILESERIES[#]}
NEWNAME=$2
EXT=$3
for (( i=0; i<$NUM ; i++))
do
FI=${FILESERIES[$i]}
NEWFILENAME=`echo $NEWNAME$i$EXT`
mv $FI $NEWFILENAME
done
To do what you need, run the script from within the folder with all the images as follows:
./rename_images '*.jpg' image .jpg
And you should be sorted.

Multiple files rename using linux shell script

I have following images.
10.jpg
11.jpg
12.jpg
I want to remove above images. I used following shell script file.
for file in /home/scrapping/imgs/*
do
COUNT=$(expr $COUNT + 1)
STRING="/home/scrapping/imgs/""Img_"$COUNT".jpg"
echo $STRING
mv "$file" "$STRING"
done
So, replaced file name
Img_1.jpg
Img_2.jpg
Img_3.jpg
But, I want to replace the file name like this:
Img_10.jpg
Img_11.jpg
Img_12.jpg
So, How to set COUNT value 10 to get my own output?
The expr syntax is pretty outdated, POSIX shell allows you to do arithmetic evaluation with $(()) syntax. You can just do
#!/usr/bin/env bash
count=10
for file in /home/scrapping/imgs/*; do
[ -f "$file" ] || continue
mv "$file" "/home/scrapping/imgs/Img_$((count++)).jpg"
done
Also from the errors reported in the comments, you seem to be running it from the dash shell. It does not seem to have all the features complying to the standard POSIX shell. Run it with the sh or the bash shell.
And always use lowercase letters for user defined variables in your shell script. Upper case letters are primarily for the environment variables managed by the shell itself.
With rename command you can suffix your files with Img_:
rename 's/^/Img_/' *
The ^ means replace the start of the filename with Img_, i.e: adds a suffix.

Using Bash to cURL a website and grep for keywords

I'm trying to write a script that will do a few things in the following order:
cURL websites from a list of urls contained within a "url_list.txt" (new-line delineated) file.
For each website in the list, I want to grep that website looking for keywords contained within a "keywords.txt" (new-line delineated) file.
I want to finish by printing to the terminal in the following format (or something similar):
$URL (that contained match) : $keyword (that made the match)
It needs to be able to run in Ubuntu (GNU grep, etc.)
It does not need to be cURL and grep; as long as the functionality is there.
So far I've got:
#!/bin/bash
keywords=$(cat ./keywords.txt)
urllist=$(cat ./url_list.txt)
for url in $urllist; do
content="$(curl -L -s "$url" | grep -iF "$keywords" /dev/null)"
echo "$content"
done
But for some reason, no matter what I try to tweak or change, it keeps failing to one degree or another.
How can I go about accomplishing this task?
Thanks
Here's how I would do it:
#!/bin/bash
keywords="$(<./keywords.txt)"
while IFS= read -r url; do
curl -L -s "$url" | grep -ioF "$keywords" |
while IFS= read -r keyword; do
echo "$url: $keyword"
done
done < ./url_list.txt
What did I change:
I used $(<./keywords.txt) to read the keywords.txt. This does not rely on an external program (cat in your original script).
I changed the for loop that loops over the url list, into a while loop. This guarentees that we use Θ(1) memory (i.e. we don't have to load the entire url list in memory).
I remove /dev/null from grep. greping from /dev/null alone is meaningless, since it will find nothing there. Instead, I invoke grep with no arguments so that it filters its stdin (which happens to be the output of curl in this case).
I added the -o flag for grep so that it outputs only the matched keyword.
I removed the subshell where you were capturing the output of curl. Instead I run the command directly and feed its output to a while loop. This is necessary because we might get more than keyword match per url.

shell string bad substitution

I'm new to shell programming. I intend to get directory name after zip file was extracted. The print statement of it is
$test.sh helloworld.zip
helloworld
Let's take a look at test.sh:
#! /bin/sh
length=echo `expr index "$1" .zip`
a=$1
echo $(a:0:length}
However I got the Bad substitution error from the compiler.
And when I mention about 'shell'.I just talking about shell for I don't know the difference between bash or the others.I just using Ubuntu 10.04 and using the terminal. (I am using bash.)
If your shell is a sufficiently recent version of bash, that parameter expansion notation should work.
In many other shells, it will not work, and a bad substitution error is the way the shell says 'You asked for a parameter substitution but it does not make sense to me'.
Also, given the script:
#! /bin/sh
length=echo `expr index "$1" .zip`
a=$1
echo $(a:0:length}
The second line exports variable length with value echo for the command that is generated by running expr index "$1" .zip. It does not assign to length. That should be just:
length=$(expr index "${1:?}" .zip)
where the ${1:?} notation generates an error if $1 is not set (if the script is invoked with no arguments).
The last line should be:
echo ${a:0:$length}
Note that if $1 holds filename.zip, the output of expr index $1 .zip is 2, because the letter i appears at index 2 in filename.zip. If the intention is to get the base name of the file without the .zip extension, then the classic way to do it is:
base=$(basename $1 .zip)
and the more modern way is:
base=${1%.zip}
There is a difference; if the name is /path/to/filename.zip, the classic output is filename and the modern one is /path/to/filename. You can get the classic output with:
base=${1%.zip}
base=${base##*/}
Or, in the classic version, you can get the path with:
base=$(dirname $1)/$(basename $1 .zip)`.)
If the file names can contain spaces, you need to think about using double quotes, especially in the invocations of basename and dirname.
Try running it with bash.
bash test.sh helloworld.zip
-likewise-
"try changing the first line to #!/bin/bash" as comment-answered by – #shellter
Try that in bash :
echo $1
len=$(wc -c <<< "$1")
a="${1}.zip"
echo ${a:0:$len}
Adapt it to fit your needs.

Shell Script: Truncating String

I have two folders full of trainings and corresponding testfiles and I'd like to run the fitting pairs against each other using a shell script.
This is what I have so far:
for x in SpanishLS.train/*.train
do
timbl -f $x -t SpanishLS.test/$x.test
done
This is supposed to take file1(-n).train in one directory, look for file1(-n).test in the other, and run them trough a tool called timbl.
What it does instead is look for a file called SpanishLS.train/file1(-n).train.test which of course doesn't exist.
What I tried to do, to no avail, is truncate $x in a way that lets the script find the correct file, but whenever I do this, $x is truncated way too early, resulting in the script not even finding the .train file.
How should I code this?
If I got you right, this will do the job:
for x in SpanishLS.train/*.train
do
y=${x##*/} # strip basepath
y=${y%.*} # strip extention
timbl -f $x -t SpanishLS.test/$y.test
done
Use basename:
for x in SpanishLS.train/*.train
do
timbl -f $x -t SpanishLS.test/$(basename "$x" .train).test
done
That removes the directory prefix and the .train suffix from $x, and builds up the name you want.
In bash (and other POSIX-compliant shells), you can do the basename operation with two shell parameter expansions without invoking an external program. (I don't think there's a way to combine the two expansions into one.)
for x in SpanishLS.train/*.train
do
y=${x##*/} # Remove path prefix
timbl -f $x -t SpanishLS.test/${y%.train}.test # Remove .train suffix
done
Beware: bash supports quite a number of (useful) expansions that are not defined by POSIX. For example, ${y//.train/.test} is a bash-only notation (or bash and compatible shells notation).
Replace all occurences of .train in the filename to .text:
timbl -f $x -t $(echo $x | sed 's/\.train/.text/g')

Resources