Using google as a dictionary lookup via bash, How can one grab the first definition? - linux

#!/bin/bash
# Command line look up using Google's define feature - command line dictionary
echo "Type in your word:"
read word
/usr/bin/curl -s -A 'Mozilla/4.0' 'http://www.google.com/search?q=define%3A+'$word \
| html2text -ascii -nobs -style compact -width 500 | grep "*"
Dumps a whole series of definitions from google.com an example is below:
Type in your word:
world
* universe: everything that exists anywhere; "they study the evolution of the universe"; "the biggest tree in existence"
* people in general; especially a distinctive group of people with some shared interest; "the Western world"
* all of your experiences that determine how things appear to you; "his world was shattered"; "we live in different worlds"; "for them demons were as much a part of reality as trees were"
Thing is, I don't want all the definitions, just the first one:
universe: everything that exists anywhere; "they study the evolution of the universe"; "the biggest tree in existence"
How can a grab that sentence out from the output? Its between two *, could that be used?

This will strip the bullet from the beginning of the first line, printing it and discarding the rest of the output.
sed 's/^ *\* *//; q'

Add this:
head -n 1 -q | tail -n 1
So it becomes:
#!/bin/bash
# Command line look up using Google's define feature - command line dictionary
echo "Type in your word:"
read word
/usr/bin/curl -s -A 'Mozilla/4.0' 'http://www.google.com/search?q=define%3A+'$word \
| html2text -ascii -nobs -style compact -width 500 | grep "*" | head -n 1 -q | tail -n 1

try head command

Related

Unix/Linux and MQ scripts explaination

echo "DISPLAY QL($Queue) CURDEPTH" \
| runmqsc Queue_Managr \
| grep 'CURDEPTH(' \
| sed 's/.*CURDEPTH//' \
| tr -d '()'.
Can anyone suggest how this script works? Actually this command displays the current depth value for a particular Q_Manager for a particular queue.
I understand "DISPLAY QL($Queue) CURDEPTH" | runmqsc Queue_Managr" - this command displays the queue name and curdepth{value}.
But I don't understand grep 'CURDEPTH(' | sed 's/.*CURDEPTH//' | tr -d '(). How does this command work?
It's a pipeline. It contains five stages, separated by the pipe character |. The output of one stage is used as the input to the next stage.
echo "DISPLAY blatti blatti" - this just outputs some text.
runmqsc Queue_Managr - Uses the text as input to the runmqsc-command, which does some MQ magic and outputs data.
grep 'CURDEPTH(' - Grep is a standard unix utility. It filters its input. In this case, only lines containing the text CURDEPTH( is allowed through to the next stage.
sed 's/.*CURDEPTH//' - Sed is another standard utility. It's short for "stream editor", and allows you edit the input as it passes through. In this case, the expression 's/.*CURDEPTH//' means to delete everything from the start of each line, up to and including the text CURDEPTH. (remember, only lines containing that text was passed through from the previous stage).
tr -d '()' - Finally, another standard utility, tr, which also allows editing the text that flows through from input to output. -d '()' means delete the characters ( and ) from the text.
The output from the final stage is shown in the terminal (if you ran your script in a terminal).
It's a fairly common way of building scripts in a unix shell. Generate the input data somehow, push it to a command, and massage the output data through a couple of stages each doing its little bit.
Long dissertations can be (and probably have been) written about all of grep, sed and tr. Look them up if you're interested.
CURDEPTH(3) DEFBIND(OPEN)
Notice that there are 2 pairs of attribute-value in this output. We need to handle only the appropriate pair.
We might be tempted to use the "cut" command to do simple trimming of the first pair to get the value.
However, the output from runmqsc for queues that have very long names (such as 48 characters) shows CURDEPTH as the 2nd pair (as shown below). Thus, a simple use of "cut" is no longer possible:
CRTIME(09.08.08) CURDEPTH(3)
The use of the "sed" (stream editor) can help us to get the value. Notice that the parenthesis are included.
$ echo "DISPLAY QL($QNAME) CURDEPTH" | runmqsc $QMNAME | grep 'CURDEPTH(' | sed 's/.*CURDEPTH//'
(3)
Notice that the answer is: (3)
Finally, it is necessary to remove the open and close parenthesis. This can be done using "tr" as follows:
$ echo "DISPLAY QL($QNAME) CURDEPTH" | runmqsc $QMNAME | grep 'CURDEPTH(' | sed 's/.*CURDEPTH//' | tr -d '()'
3
Notice that the answer is: 3

Bash loop is not working — cannot find command "[0%"

I just wrote a ping sweep script in Bash this morning, and guess what: it's not working. Can you please check what it is that I'm missing.
Here's the script:
for i in `seq 1 255`
do
if ["$(ping -c1 -W1 -n 192.168.1.$i | grep '%' | cut -d',' -f3 | cut -d' ' -f2)" -eq "0%"]
then echo "Host live"
else echo "Host down"
fi
done
And here's the error:
bash: [0%: command not found
Host down
bash: [100%: command not found
Host down
My purpose is to make a ping sweep program which scans the range 192.168.1.1-255 and it notifies the host's status. I know about nmap but just wanted to learn skills in Bash so I made this one. Please try to tell what the error meant. I mean to what command it's referring "command not found"?
The ping command returns error code if there was any problem, so you do not need to parse the output:
for i in {1..255}
do
if ping -c1 -W1 -n "192.168.1.$i"
then
echo 'Host live'
else
echo 'Host down'
fi
done
Primary diagnosis
The [ command needs a space after its name, just like the rm command needs a space after its name and the ls command does, and … The [ command also requires its last argument to be ], spelled thus, so there needs to be a space before that, too.
You have:
if ["$(ping -c1 -W1 -n 192.168.1.$i | grep '%' | cut -d',' -f3 | cut -d' ' -f2)" -eq "0%"]
At minimum, you need:
if [ "$(ping -c1 -W1 -n 192.168.1.$i | grep '%' | cut -d',' -f3 | cut -d' ' -f2)" -eq "0%" ]
Secondary issues
Note that 'at minimum' means, amongst other things, that I've not spent time analyzing why you are executing the complex sequence of 4 commands in the test condition, or looked for ways to cut that down to two (using grep and cut twice suggests that sed or a more powerful tool would be better). I griped about the formatting in the original version of the question, where the loop (it isn't a nested loop, incidentally — or it isn't in the code shown) was all on one line thanks to Bash flattening it in history. My version of the code would have far fewer semicolons in it, for example. The -eq operator in [ is for testing the equality of numbers (the converse convention applies in Perl, where eq is for testing strings and == tests numbers). Note that POSIX standard [ (aka test) does not support == as a synonym for =, though Bash does. It isn't entirely clear that "0%" is OK as an argument for numeric comparison. Many programs would not object — the zero can be converted and the residue doesn't matter; others might decide legitimately to complain that the whole string could not be converted, so it is erroneous. Careful code wouldn't risk the disconnect.
See Steven Penny's answer for a more thorough rewrite of the code. My answer remains a valid diagnosis of the immediate problem of not being able to find commands named [0% and [100%.

Line from bash command output stored in variable as string

I'm trying to find a solution to a problem analog to this one:
#command_A
A_output_Line_1
A_output_Line_2
A_output_Line_3
#command_B
B_output_Line_1
B_output_Line_2
Now I need to compare A_output_Line_2 and B_output_Line_1 and echo "Correct" if they are equal and "Not Correct" otherwise.
I guess the easiest way to do this is to copy a line of output in some variable and then after executing the two commands, simply compare the variables and echo something.
This I need to implement in a bash script and any information on how to get certain line of output stored in a variable would help me put the pieces together.
Also, it would be cool if anyone can tell me not only how to copy/store a line, but probably just a word or sequence like : line 1, bytes 4-12, stored like string in a variable.
I am not a complete beginner but also not anywhere near advanced linux bash user. Thanks to any help in advance and sorry for bad english!
An easier way might be to use diff, no?
Something like:
command_A > command_A.output
command_B > command_B.output
diff command_A.output command_B.output
This will work for comparing multiple strings.
But, since you want to know about single lines (and words in the lines) here are some pointers:
# first line of output of command_A
command_A | head -n 1
The -n 1 option says only to use the first line (default is 10 I think)
# second line of output of command_A
command_A | head -n 2 | tail -n 1
that will take the first two lines of the output of command_A and then the last of those two lines. Happy times :)
You can now store this information in a variable:
export output_A=`command_A | head -n 2 | tail -n 1`
export output_B=`command_B | head -n 1`
And then compare it:
if [ "$output_A" == "$output_B" ]; then echo 'Correct'; else echo 'Not Correct'; fi
To just get parts of a string, try looking into cut or (for more powerful stuff) sed and awk.
Also, just learing a good general purpose scripting language like python or ruby (even perl) can go a long way with this kind of problem.
Use the IFS (internal field separator) to separate on newlines and store the outputs in an array.
#!/bin/bash
IFS='
'
array_a=( $(./a.sh) )
array_b=( $(./b.sh) )
if [ "${array_a[1]}" = "${array_b[0]}" ]; then
echo "CORRECT"
else
echo "INCORRECT"
fi

Is it possible to do simple arithmetic in sed addresses?

Is it possible to do simple arithmetic in sed addresses?
Judging by the "addresses" manual section, the answer seems no. But maybe there is a workaround?
For example, how can I print the second last line of a file? It would be cool something like:
sed -n '$-1 p' file
But it obviously does not work... so I usually have to do multiple sed calls, first for identifying the line, then do the arithmetic using the shell $((expr)) and then finally call sed again. Like this:
sed -n "$(($(sed -n '$ =' file)-1)) p" file
Is there a "better", more compact, more readable way for doing arithmetics with sed addresses?
In a serious moment of procrastination, I decided to write a small script that quickly changes the xterm colorscheme. The idea is that you have the .Xresources a file with a start marker and an end marker:
...
START_MARKER
...
END_MARKER
...
and you want to delete everything that is between the markers, but not the markers themselves. Again, it would be great to do something like:
sed '/START_MARKER/+1,/END_MARKER/-1 d' file
...but you can't!
You're right, one can't directly do math in sed1, even addresses. But you can use some trickery to do what you want:
Second-last row:
$ seq 5 | sed -n -e '${ # On the last line
> g # Replace the buffer with the hold space
> p # and print it
> }
> h' # All lines, store the current line in the hold space.
4
Between START and END:
$ cat test.in
1
START
2
3
END
4
$ cat test.in | sed '/^START$/,/^END$/{
> /^START$/d
> /^END$/d
> p
> }
> d'
2
3
$ cat test.in | sed -n -e '/^START$/,/^END$/!d' -e '/^START/d' -e '/^END$/d' -e p
2
3
I'm using a BSD (mac) sed; on GNU systems you can use ; between lines instead of a newline. Or stick it in a script.
1: Sed is Turing complete, so you can do math, but it's unwieldy at best: http://rosettacode.org/wiki/A%2BB#sed
Yes, I know, UUOC; it's for illustration only
Delete the second last line:
sed ':r;$!{N;br};s/\n[^\n]*\(\n[^\n]*\)$/\1/' file
Delete everything inside markers:
sed ':r;$!{N;br};s/START_MARKER.*END_MARKER/START_MARKER\nEND_MARKER/' file
Far from being elegant, but kinda works.
As it was mentioned in the comments, sed operates on lines. However, you can read another line into the pattern space with N command. The two lines will now both be in the pattern space and will be separated with a \n. sed also has means of execution flow control, namely labels and conditional/unconditional branches. Everything is documented in man sed, also here is a full reference with examples. In the code above r is a label; $!{..} means "everywhere except last line, do ..; N;br reads another line and branches unconditionally to r again. So with :r;$!{N;br} you read all the input into the pattern space and then you operate on it as a single line with \n separating lines of the input.
This might work for you (GNU sed);
sed '$!N;$s/.*\n//;P;D' file
and this works and should be easy to understand:
sed '/start/,/end/!d;//d' file
These are solutions to your questions but as for arithmetic best use awk or perl.
You have some good sed suggestions, here's one based on GNU awk:
awk -v RS='START_MARKER|END_MARKER' 'RT == "END_MARKER"' infile
RS='START_MARKER|END_MARKER' splits input with the markers as separators.
RT is set to the matched separator, when it matches "END" the default block {print $0} is executed.
So for example if you wanted to print all but the last three lines, set FS to \n and apply the appropriate loop:
awk -v RS='START_MARKER|END_MARKER' -v FS='\n' 'RT == "END" { for(i=1; i<NF-3; i++) print $i }' infile
You can use simple method to show second last line of the file.
TOTAL_LENGTH=$(cat file_name | wc -l)
SECOND_LAST_LINE=`expr $TOTAL_LENGTH - 1`
head -$SECOND_LAST_LINE | tail -1
If you want to delete the second last line from the file:
sed -i "$SECOND_LAST_LINE"d file_name
A more comprehensive treatment for doing arithmetic in sedis given in solution #2. An introduction to using sed to `sed' its own script is here.
As the brain pain strain incurred in solution #2 from the quixotic comment demands of too much "hand waving" actually is too much "hand waving" of code, in juxtaposition, this is solution #3:
echo -e 'a\nb\nc\nd\ne' | sed -n '1!G;h;$p' | sed -n 3p
which still uses piping ("But maybe there is a workaround?"), where the numeral 3 must be replaced "by hand" for the desired line from the end of the file ala $-3.
Suppose the sed script is '$-4 p; $-6p; $-8 p;'
echo -e 'a\nb\nc\nd\ne\nf\ng\nh\ni' |
sed -n '1!G;h;$p' |
sed -n '4 p; 6p; 8 p;' |
sed -n '1!G;h;$p'
does the job via
echo '$-4 p; $-6p; $-8 p;' | sed s/$-//
Caveats:
The sed commands must be as simple as print.
The "simple arithmetic" can only be of the form '$-n'.
The arithmetic is not calculated "normally".
A "single" 'sed' command string (a "line" if the previous piping is considered as such) would embed and combine these two commands as outlined in the next answer #2.
The coup de grâce.
Given the perfunctory dismissal of the first answer here is #2:
As this is only the 2nd or 3rd time writing a substantial sed script, serious syntax subtlety (s)circumvention scuppering solutions seemed sufficient: ala
# file prep
echo -e ' a\n b\n c\n d\n e\n f' >test
The following strikeout is not incorrect but after playing and "messing about" with sed with an SO problem over here the sed execute can be simpler w/o IO redirection if run from the pattern buffer to get the file length line count $ via:
sed -e '1{h; s/.*/sed -n "$=" test /e' -e 'p;x}; ${p;x;}' test
The $= enumeration is held in the hold buffer from the get go and printed again at the end.
# get "sed -n $= test" command output into sed script
sed -n '1esed -n "$=" test >sedr' test
# see where this is headed? so far "sed -n ... test" is irrelevant
# a pedantic "sed" only solution would keep it this way with
# all the required "sed"'ng as part of an 'e' command or '$e'
# where the 'sedr' file is itself "sed"'d ultimately to a final
# command 'sed -n /<the calculated line number>/p'
# one could quibble whether '>sedr' io redirection is "pure sed"
# modify 'sedr'with [the sed RPN][1] to get <the calculated line number>
# with judicious use of "sed"'s 'r' command and buffering will
# realize the effective script to compute the desired result
# this is left as an exercise needing perverse persistence with
# a certain amount of masochistic agony
As a hint as to how to proceed; using the technique of solution #3 the sed script $- addresses are now replaced by the $= value and -. So sed is again used to edit its own script.
Parsing the sed script must accurately modify just the $- in addresses only.
Also, to use the RPN calculator the infix arithmetic must have post fixed operators. It is a conventional paradigm in theories of automata and formal languages to convert Polish Notation or its Reverse to Infix and vice versa.
Hopefully, this establishes the answer in the afirmative that it can be done (mais, pas par moi) and in the negative that it is not a trivial exercice (c'est par moi).
Excruciating rationale for an arbitrary solution is at the end.
The environment used for the empirical tests:
linuxuser#ubuntu:~$ sed --version
sed (GNU sed) 4.4
Copyright (C) 2017 Free Software Foundation, Inc.
linuxuser#ubuntu:~$ uname -a
Linux ubuntu 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:00 UTC 2019 i686 i686 i686 GNU/Linux
linuxuser#ubuntu:~$ lsbname -a
lsbname: command not found
linuxuser#ubuntu:~$ apropos lsb
lsb_release (1) - print distribution-specific information
lsblk (8) - list block devices
linuxuser#ubuntu:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic
Solution #1
A technique thinking outside inside the box:
seq 60 | sed -n '$!p' | sed -n '$!p' | sed -n '$!p' | sed -n '$p'
which prints:
57
specifically, for the second last line:
sed -n '$!p' file | sed -n '$p'
More generally, a script can iterate over sed -n '$!p' to "count backwards" from the end of a file.
Well, the answer to:
Is it possible to do simple arithmetic in sed addresses?
rhetorically is, it depends on one's abilities, wishes and desires as well as a realistic assessment of practicality. As well, the implication is that a single sed invocation should be used for this task exclusively. But yes it is possible.
A firm grounding in the studies of Automata, Formal Languages and Recursive Function Theory does not hurt.
As stated in previous answers: Not only can sed do simple arithmetic but also any computable function which includes complex arithmetic. To do so however requires implementing the Primitive Recursive Functions (PRF) (which of course sed does) of Recursive Function Theory (RFT). Of course the finite size of machine architecture does limit the computation without infinite tape resources as a Turing machine proves. In any case not wishing to demonstrate this the precedents are to be found in the sed manual.
Specifically, to do arithmetic (finitely) an RPN calculator:
https://www.gnu.org/software/sed/manual/html_node/Increment-a-number.html#FOOT9
Now then, using such a tool can create a sed script that precomputes the arithmetic that is then embedded in a sed script to print the desired output. A simple demonstration is given by the OP noting now that the shell arithmetic computation can be done using the RPN sed script.
This reduces to a form such as (very crude)
sed '/$(sed RPN($= - 3*4) file)/;p;' file
but still requires feeding sed a sed'd script.
Also, there is arguably the quibbling over the use of bash $() but it could be argued bash is already used to execute the first 'sed' so no harm no foul.
Recognizing that sed implements the PRF or equivalently is Turing complete means that yes, a single invocation of sed is adequate.
The paradigm can therefore do this.
Some commands that could expedite this task are:
e, e command, r, R, w, W
in addition to the usual hold and pattern buffer commands.
The r, R, w, W commands are particularly advantageous as temporary buffer space.
e [command] [3.7 Commands Specific to GNU sed][2]
This command allows one to pipe input from a shell command into
pattern space. Without parameters, the e command executes the
command that is found in pattern space ...
More abstractly, it is completely possible, though highly impractical, to write a sed script to execute the sed paradigm itself that also includes arithmetic calculations even in addresses.
A sed peculiarity. The expression /\n/ will not match any address and matches in pattern space only if a sed command like 'N'ext or s/.*/\n/ introduces one.
Confirmed via:
echo -e '\n\n' | sed -n ' /\n/ {s//hello/;p}'
But
echo -e '\n\n' | sed -n '0,/\n\n\n/ {s//hello/;p}'
outputs 3 blank lines and
echo -e '\n\n' | sed -n '0,/\n/ {s/.*/hello/;p}'
echo -e '\n\n' | sed -n '0,/\n\n\n/ {s/.*/hello/;p}'
each output 3 hello's
hello
hello
hello
while this is well-behaved:
echo -e '\n\n' | sed -n '0,/^$/ {s//hello/;p}'

Error with a script in bash

I have a little error with a script I wrote in bash and I can't figure out what's I'm doing wrong
note that I'm using this script for thousands of calculations and this error happened only a few times (like 20 or so), but it still happened
What the script does is this: basically it takes in input a web page that I got from a site with the utility w3m and it counts all the occurrences of the words in it... After it orders them from the most common to the ones that occur only once
this is the code:
#!/bin/bash
# counts the numbers of words from specific sites #
# writes in a file the occurrences ordered from the most common #
touch check # file used to analyze the occurrences
touch distribution # final file ordered
page=$1 # the web page that needs to be analyzed
occurrences=$2 # temporary file for the occurrences
dictionary=$3 # dictionary used for another purpose (ignore this)
# write the words one by column
cat $page | tr -c [:alnum:] "\n" | sed '/^$/d' > check
# lopp to analyze the words
cat check | while read words
do
word=${words}
strlen=${#word}
# ignores blacklisted words or small ones
if ! grep -Fxq $word .blacklist && [ $strlen -gt 2 ]
then
# if the word isn't in the file
if [ `egrep -c -i "^$word: " $occurrences` -eq 0 ]
then
echo "$word: 1" | cat >> $occurrences
# else if it is already in the file, it calculates the occurrences
else
old=`awk -v words=$word -F": " '$1==words { print $2 }' $occurrences`
### HERE IS THE ERROR, EITHER THE LET OR THE SED ###
let "new=old+1"
sed -i "s/^$word: $old$/$word: $new/g" $occurrences
fi
fi
done
# orders the words
awk -F": " '{print $2" "$1}' $occurrences | sort -rn | awk -F" " '{print $2": "$1}' > distribution
# ignore this, not important
grep -w "1" distribution | awk -F ":" '{print $1}' > temp_dictionary
for line in `cat temp_dictionary`
do
if ! grep -Fxq $line $dictionary
then
echo $line >> $dictionary
fi
done
rm check
rm temp_dictionary
this is the error: (I'm translating it, so it could be different in english)
./wordOccurrences line:30 let:x // where x is a number, usually 9 or 10 (but also 11, 13, etc)
1: syntax error in the espression (the error token is 1)
sed: expression -e #1, character y: command 's' not terminated // where y is another number (this one is also usually 9 or 10) with y being different from x
EDIT:
Talking with kev it looks like it's a newline problem
I added an echo between let and sed to print the sed and it worked perfectly for like 5 to 10 minutes until that error. Usually the sed without error looked like this:
s/^CONSULENTI: 6$/CONSULENTI: 7/g
but when I got the error it was like this:
s/^00145: 1
1$/00145: 4/g
how to fix this?
If you get a new line in $old, it means awk prints two lines so there is a duplicate in $occurences.
The script seems complicated to count words, and not efficient because it launches many processes and process file in a loop ;
maybe you can do something similar with
sort | uniq -c
You should also consider that your case-insensitivity is not consistent throughout the program. I created a page with just "foooo" in it and ran the program, then created one with "Foooo" in it and ran the program again. The 'old=`awk...' line sets 'old' to the empty string because awk is matching case sensitively. This results in the occurrences file not being updated. The subsequent sed and possibly some of the greps are also case sensitive.
This may not be the only error since it doesn't explain the error message you saw, but it is an indication that the same word with different capitalization will be handled erroneously by your script.
The following would separate the words, lowercase them, and then remove the ones smaller than three characters:
tr -cs '[:alnum:]' '\n' <foo | tr '[:upper:]' '[:lower:]' | egrep -v '^.{0,2}$'
Using this at the front of your script would mean that the rest of the script would not have to be case insensitive to be correct.

Resources