Linux script to return domains on a web page - linux

I was tasked with this question:
Write a bash script that takes a URL as its first argument and prints out statistics of the number of links per host/domain in the HTML of the URL.
So for instance given a URL like www.bbc.co.uk it might print something like
www.bbc.co.uk: 45
bbc.com: 1
google.com: 2
Facebook.com: 4
That is, it should analyse the HTML of the page, pull out all the links, examine the href attribute, decide which links are to the same domain (figure that one out of course), and which are foreign, then produce statistics for the local ones and for the remote ones.
Rules: You may use any set of standard Linux commands in your script. You may not use any higher-level programming languages such as C or Python or Perl. You may however use awk, sed, etc.
I came up with the solution as follows:
#!/bin/sh
echo "Enter a url eg www.bbc.com:"
read url
content=$(wget "$url" -q -O -)
echo "Enter file name to store URL output"
read file
echo $content > $file
echo "Enter file name to store filtered links:"
read links
found=$(cat $file | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2 | sort | uniq | awk '/http/' > $links)
output=$(egrep -o '^http://[^/]+/' $links | sort | uniq -c > out)
cat out
I was then told that "i must look at the data, and then check that your program deals satisfactorily with all the scenarios.This reports URLs but no the domains"
Is there someone out there that can help me or point me in the right direction so as i can be able to achieve my goal? what am i missing or what is the script not doing? I thought i had made it work as required.

The output of your script is:
7 http://news.bbc.co.uk/
1 http://newsvote.bbc.co.uk/
1 http://purl.org/
8 http://static.bbci.co.uk/
1 http://www.bbcamerica.com/
23 http://www.bbc.com/
179 http://www.bbc.co.uk/
1 http://www.bbcknowledge.com/
1 http://www.browserchoice.eu/
I think they mean that it should look more like:
7 news.bbc.co.uk
1 newsvote.bbc.co.uk
1 purl.org
8 static.bbci.co.uk
1 www.bbcamerica.com
23 www.bbc.com
179 www.bbc.co.uk
1 www.bbcknowledge.com
1 www.browserchoice.eu

Related

How to download URLs from the website and save them in a file (wget, curl)?

How to use WGET to separate the marked links from this side?
Can this be done with CURL?
I want to download URLs from this page and save them in a file.
I tried like that.
wget -r -p -k https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4
Link Gopher separated the addresses.
EDITION.
How can I download addresses to the file from the terminal?
Can it be done with the help of WGET?
Can it be done with the help of CURL?
I want to download addresses from this page and save them to the file.
I want to save these links.
`
https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4
https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2985/e15e664718ef6c0dba471d59c4a1928a
https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2986/58edb8e0f06dc3da40c255e50b3839cf
`
Edition 1.
You will need to use something like
Download Serialized DOM
I added that to my Firefox browser and it works, although it is a bit slow, and the only time you know it is completed is when the *.html.part file disappears for the corresponding *.html file which you will save using the Add-on button.
Basically, that will save the complete web page (excluding binaries, i.e. images, videos, etc.) as a single text file.
Also, only while saving these files, the developper indicates there is a bug for which you MUST allow "Use in private mode" to circumvent the bug.
Here is a fragment of the full season 44 index page displayed (note the address in the address bar):
Since I don't have your access I can't reproduce, but the service is hiding from me the page of the individual video (what you get when you click on a picture) because I don't have login access. They give me the index instead of the address in the address bar (their security processes at work). However the index page should probably show something different after the ".../sezon-44/5027472/" .
Using that saved DOM file as input, the following will extract the necessary references:
#!/bin/sh
###
### LOGIC FLOW => CONFIRMED VALID
###
DBG=1
#URL="https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4"
###
### Completely expanded and populated DOM file,
### as captured by Firefox extension "Download Serialized DOM"
###
### Extension is slow but apparently very functional.
###
INPUT="test_77_Serialized.html"
BASE=$(basename "$0" ".sh")
TMP="${BASE}.tmp"
HARVESTED="${BASE}.harvest"
DISTILLED="${BASE}.urls"
#if [ ! -s "${TMP}" ]
#then
# ### Non-serialized
# wget -O "${TMP}" "${URL}"
#fi
### Each 'more' step is to allow review of outputs to identify patterns which are to be used for the next step.
cp -p ${INPUT} "${TMP}"
test ${DBG} -eq 1 && more ${TMP}
sed 's+\<a\ +\n\<a\ +g' "${TMP}" >"${TMP}.2"
URL_BASE=$( grep 'tiba=' ${TMP}.2 |
sed 's+tiba=+\ntiba=+' |
grep -v 'viewport' |
cut -f1 -d\; |
cut -f2 -d\= |
cut -f1 -d\% )
echo "\n=======================\n${URL_BASE}\n=======================\n"
sed 's+\<a\ +\n\<a\ +g' "${TMP}" | grep '<a ' >"${TMP}.2"
test ${DBG} -eq 1 && more ${TMP}.2
grep 'title="Pierwsza Miłość - Odcinek' "${TMP}.2" >"${TMP}.3"
test ${DBG} -eq 1 && more ${TMP}.3
### FORMAT: Typical entry identified for video files
#<a data-testing="list.item.0" title="Pierwsza Miłość - Odcinek 2984" href="/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4" class="ifodj3-0 yIiYl"></a><div class="sc-1vdpbg2-2 hKhMfx"><img data-src="https://ipla.pluscdn.pl/p/vm2images/9x/9xzfengehrm1rm8ukf7cvzvypv175iin.jpg" alt="Pierwsza Miłość - Odcinek 2984" class="rebvib-0 hIBTLi" src="https://ipla.pluscdn.pl/p/vm2images/9x/9xzfengehrm1rm8ukf7cvzvypv175iin.jpg"></div><div class="sc-1i4o84g-2 iDuLtn"><div class="orrg5d-0 gBnmbk"><span class="orrg5d-1 AjaSg">Odcinek 2984</span></div></div></div></div><div class="sc-1vdpbg2-1 bBDzBS"><div class="sc-1vdpbg2-0 hWnUTt"><
sed 's+href=+\nhref=+' "${TMP}.3" |
sed 's+class=+\nclass=+' |
grep '^href=' >"${TMP}.4"
test ${DBG} -eq 1 && more ${TMP}.4
awk -v base="${URL_BASE}" -v splitter=\" '{
printf("https://%s", base ) ;
pos=index( $0, "href=" ) ;
if( pos != 0 ){
rem=substr( $0, pos+6 ) ;
n=split( rem, var, splitter) ;
printf("%s\n", var[1] ) ;
} ;
}' "${TMP}.4" >${TMP}.5
more ${TMP}.5
exit
That will give you a report for ${TMP}.5 like this:
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2985/e15e664718ef6c0dba471d59c4a1928a
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2986/58edb8e0f06dc3da40c255e50b3839cf
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2987/2ebc2e7b13268e74d90cc64c898530ee
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2988/2031529377d3be27402f61f07c1cd4f4
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2989/eaceb96a0368da10fb64e1383f93f513
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2990/4974094499083a8d67158d51c5df2fcb
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2991/4c79d87656dcafcccd4dfd9349ca7c23
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2992/26b4d8808ef4851640b9a2dfa8499a6d
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2993/930aaa5b2b3d52e2367dd4f533728020
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2994/fa78c186bc9414f844f197fd2d673da3
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2995/c059c7b2b54c3c25996c02992228e46b
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2996/4a016aeed0ee5b7ed5ae1c6117347e6a
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2997/1e3dca41d84471d5d95579afee66c6cf
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2998/440d069159114621939d1627eda37aec
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2999/f54381d4b61f76bb83f072059c15ea84
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3000/b272901a616147cd9f570750aa450f99
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3001/3aca6bd8e81962dc4a45fcc586cdcc7f
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3002/c6500c6e261bd5d65d0bd3a57cd36288
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3003/35a13bc5e5570ed223c5a0221a8d13f3
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3004/a5cfb71ed30e704730b8891323ff7d92
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3005/d86c1308029d78a6b7090503f8bab88e
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3006/54bba327bc7a1ae7b9b609e7ee11c07c
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3007/17d199a0523df8430bcb1f21d4a5b573
NOTE: In the image below, the icon between the "folder" and the "star", in the address bar of that image, is the button for the Download Serialized DOM extension to capture the currently displayed page as a fully-instantiated DOM file.
To save the output of the wget command that you provided above, add the following at the end of your command line:
-O ${vidfileUniqueName}.${fileTypeSuffix}
Before that wget, you will need to define something like the following:
vidfileUniqueName=$(echo "${URL}" | cut -f10 -d\/ )
fileTypeSuffix="mp4|avi|mkv"
You need to choose only one of the suffix types from that list and remove the others.

Mail output with Bash Script

SSH from Host A to a few hosts (only one listed below right now) using the SSH Key I generated and then go to a specific file, grep for a specific word with a date of yesterday .. then I want to email this output to myself.
It is sending an email but it is giving me the command as opposed to the output from the command.
#!/bin/bash
HOST="XXXXXXXXXXXXXXXXXX, XXXXXXXXXXXXX"
DATE=$(date -d "yesterday")
INVALID=' cat /xxx/xxx/xxxxx | grep 'WORD' | sed 's/$/.\n/g' | grep "$DATE"'
COUNT=$(echo "$INVALID" | wc -c)
for x in $HOSTS
do
ssh BLA#"$x" $COUNT
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT=""
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT="$INVALID"
fi
fi
done | echo -e "$EMAILTEXT" | mail XXXXXXXXXXX.com
This isn't properly an attempt to answer your question, but I think you should be aware of some fundamental problems with your code.
INVALID=' cat /xxx/xxx/xxxxx | grep 'WORD' | sed 's/$/.\n/g' | grep "$DATE"'
This assigns a simple string to the variable INVALID. Because of quoting issues, s/$/.\n/g is not quoted at all, and will probably be mangled by the shell. (You cannot nest single quotes -- the first single-quoted string extends from the first quote to the next one, and then WORD is outside of any quotes, followed by the next single-quoted string, etc.)
If your intent is to execute this as a command at this point, you are looking for a command substitution; with the multiple layers of uselessness peeled off, perhaps something like
INVALID=$(sed -n -e '/WORD/!d' -e "/$DATE/s/$/./p" /xxx/xxx/xxxx)
which looks for a line matching WORD and $DATE and prints the match with a dot appended at the end -- I believe that's what your code boils down to, but without further insights into what this code is supposed to do, it's impossible to tell if this is what you actually need.
COUNT=$(echo "$INVALID" | wc -c)
This assigns a number to $COUNT. With your static definition of INVALID, the number will always be 62; but I guess that's not actually what you want here.
for x in $HOSTS
do
ssh BLA#"$x" $COUNT
This attempts to execute that number as a command on a number of remote hosts (except the loop is over HOSTS and the variable containing the hosts is named just HOST). This cannot possibly be useful, unless you have a battery of commands named as natural numbers which do something useful on these remote hosts; but I think it's safe to assume that that is not what is supposed to be going on here (and if it was, it would absolutely be necessary to explain this in your question).
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT=""
if [ "$COUNT" -gt 1 ];
then
EMAILTEXT="$INVALID"
fi
fi
So EMAILTEXT is either an empty string or the value of INVALID. You assigned it to be a static string above, which is probably the source of your immediate question. But even if it was somehow assigned to a command on the local host, why do you need to visit remote hosts and execute something there? Or is your intent actually to execute the command on each remote host and obtain the output?
done | echo -e "$EMAILTEXT" | mail XXXXXXXXXXX.com
Piping into echo makes no sense at all, because it does not read its standard input. You should probably just have a newline after done; though a possibly more useful arrangement would be to have your loop produce output which we then pipe to mail.
Purely speculatively, perhaps something like the following is what you actually want.
for host in $HOSTS; do
ssh BLA#"$host" sed -n -e '/WORD/!d' -e "/$DATE/s/$/./p" /xxx/xxx/xxxx |
grep . || echo INVALID
done | mail XXXXXXXXXXX.com
If you want to check that there is strictly more than one line of output (which is what the -gt 1 suggests) then this may need to be a little bit more complicated.
Your command substitution is not working. You should read up on how it works but here are the problem lines:
COUNT=$(echo "$INVALID" | wc -c)
[...]
ssh BLA#"$x" $COUNT
should be:
COUNT_CMD="'${INVALID} | wc -c'"
[...]
COUNT=$(ssh BLA#"$x" $COUNT_CMD)
This inserts the value of $INVALID into the string, and puts the whole thing in single quotes. The single quotes are necessary for the ssh call so the pipes aren't evaluated in the script but on the remote host. (COUNT is changed to COUNT_CMD for readability/clarity.)
EDIT:
I misread the question and have corrected my answer.

Bash script to find which server is missing from the file?

Im relatively new to BASH scripting. I was hoping someone could help. I have two files. File 1 is a .csv file that contains certain server attributes.
cmdb_ci_linux_server.csv
"CLS000","csl000","Linux SuSe","9","HP"
"CLS001","cls001","Linux SuSe","9","VMware, Inc."
"CLS002","cls002","Linux Red Hat","5.11","VMware, Inc."
...
"VSRQ1CS1","vrsq1cs1","Linux SuSe","11","VMware, Inc."
These are servers that have been checked out. I need to compare it to the list of all the servers, and find out which ones havent been checked. The list of all the servers is in this format:
hosts.txt
cls000
cls001
cls002
cls003
cls004
cls005
...
cls499
I have tried a few different scripts, none have worked for me. I tried to do all the various steps in seperate scripts, hoping I could keeps things relatively simple. This one makes sense to me, but it would not return anything. Any help is much appreciated.
#!/bin/bash
while IFS="," read name host_name os os_version manufacturer
do
cat cmdb_ci_linux_server.csv
cat hosts.txt
grep -vf cmdb_ci_linux_server.csv hosts.txt
done
I know my way around linux, but not very well. Im much more familiar with windows. I was kinda thrown into this job unexpectedly :/
Thanks in advance!!
try this:
(cat cmdb_ci_linux_server.csv |
awk 'BEGIN{FS=","}{print substr($2,2,length($2)-2)}'|
sort | uniq;
cat hosts.txt | sort | uniq) | sort | uniq -c
example result:
1 cls000
2 cls001
2 cls002
1 cls003
1 cls004
1 cls005
1 csl000
left number: indicates the number of occurrences found
This should work:
while read line
do
IFS=","
arr=($line)
text=${arr[1]//\"/""}
[[ $(grep $text hosts.txt -c) -le 0 ]] && echo "$text: Not Matched"
done <cmdb_ci_linux_server.csv

Is there a way to automatically and programmatically download the latest IP ranges used by Microsoft Azure?

Microsoft has provided a URL where you can download the public IP ranges used by Microsoft Azure.
https://www.microsoft.com/en-us/download/confirmation.aspx?id=41653
The question is, is there a way to download the XML file from that site automatically using Python or other scripts? I am trying to schedule a task to grab the new IP range file and process it for my firewall policies.
Yes! There is!
But of course you need to do a bit more work than expected to achieve this.
I discovered this question when I realized Microsoft’s page only works via JavaScript when loaded in a browser. Which makes automation via Curl practically unusable; I’m not manually downloading this file when updates happen. Instead IO came up with this Bash “one-liner” that works well for me.
First, let’s define that base URL like this:
MICROSOFT_IP_RANGES_URL="https://www.microsoft.com/en-us/download/confirmation.aspx?id=41653"
Now just run this Curl command—that parses the returned web page HTML with a few chained Greps—like this:
curl -Lfs "${MICROSOFT_IP_RANGES_URL}" | grep -Eoi '<a [^>]+>' | grep -Eo 'href="[^\"]+"' | grep "download.microsoft.com/download/" | grep -m 1 -Eo '(http|https)://[^"]+'
The returned output is a nice and clean final URL like this:
https://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_20181224.xml
Which is exactly what one needs to automatic the direct download of that XML file. If you need to then assign that value to a URL just wrap it in $() like this:
MICROSOFT_IP_RANGES_URL_FINAL=$(curl -Lfs "${MICROSOFT_IP_RANGES_URL}" | grep -Eoi '<a [^>]+>' | grep -Eo 'href="[^\"]+"' | grep "download.microsoft.com/download/" | grep -m 1 -Eo '(http|https)://[^"]+')
And just access that URL via $MICROSOFT_IP_RANGES_URL_FINAL and there you go.
if you can use wget writing in a text editor and saving it as scriptNameYouWant.sh will give you a bash script that will download the xml file you want.
#! /bin/bash
wget http://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_21050223.xml
Of course, you can run the command directly from a terminal and have the exact result.
If you want to use Python, one way is:
Again, write in a text editor and save as scriptNameYouWant.py the folllowing:
import urllib2
page = urllib2.urlopen("http://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_21050223.xml").read()
f = open("xmlNameYouWant.xml", "w")
f.write(str(page))
f.close()
I'm not that good with python or bash though, so I guess there is a more elegant way in both python or bash.
EDIT
You get the xml despite the dynamic url using curl. What worked for me is this:
#! /bin/bash
FIRSTLONGPART="http://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_"
INITIALURL="http://www.microsoft.com/EN-US/DOWNLOAD/confirmation.aspx?id=41653"
OUT="$(curl -s $INITIALURL | grep -o -P '(?<='$FIRSTLONGPART').*(?=.xml")'|tail -1)"
wget -nv $FIRSTLONGPART$OUT".xml"
Again, I'm sure that there is a more elegant way of doing this.
Here's an one-liner to get the URL of the JSON file:
$ curl -sS https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 | egrep -o 'https://download.*?\.json' | uniq
I enhanced the last response a bit and I'm able to download the JSON file:
#! /bin/bash
download_link=$(curl -sS https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 | egrep -o 'https://download.*?\.json' | uniq | grep -v refresh)
if [ $? -eq 0 ]
then
wget $download_link ; echo "Latest file downloaded"
else
echo "Download failed"
fi
Slight mod to ushuz script from 2020 - as that was producing 2 different entries and unique doesn't work for me in Windows -
$list=#(curl -sS https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 | egrep -o 'https://download.*?.json')
$list.Get(2)
That outputs the valid path to the json, which you can punch into another variable for use for grepping or whatever.

Using google as a dictionary lookup via bash, How can one grab the first definition?

#!/bin/bash
# Command line look up using Google's define feature - command line dictionary
echo "Type in your word:"
read word
/usr/bin/curl -s -A 'Mozilla/4.0' 'http://www.google.com/search?q=define%3A+'$word \
| html2text -ascii -nobs -style compact -width 500 | grep "*"
Dumps a whole series of definitions from google.com an example is below:
Type in your word:
world
* universe: everything that exists anywhere; "they study the evolution of the universe"; "the biggest tree in existence"
* people in general; especially a distinctive group of people with some shared interest; "the Western world"
* all of your experiences that determine how things appear to you; "his world was shattered"; "we live in different worlds"; "for them demons were as much a part of reality as trees were"
Thing is, I don't want all the definitions, just the first one:
universe: everything that exists anywhere; "they study the evolution of the universe"; "the biggest tree in existence"
How can a grab that sentence out from the output? Its between two *, could that be used?
This will strip the bullet from the beginning of the first line, printing it and discarding the rest of the output.
sed 's/^ *\* *//; q'
Add this:
head -n 1 -q | tail -n 1
So it becomes:
#!/bin/bash
# Command line look up using Google's define feature - command line dictionary
echo "Type in your word:"
read word
/usr/bin/curl -s -A 'Mozilla/4.0' 'http://www.google.com/search?q=define%3A+'$word \
| html2text -ascii -nobs -style compact -width 500 | grep "*" | head -n 1 -q | tail -n 1
try head command

Resources