How can I change my program to let it display the output - linux

I'm trying to print out the happiest countries in the world for 2022, by receiving the data from https://en.wikipedia.org/wiki/World_Happiness_Report?action=raw). and then editing displaying the first 5 countries. Here is my code:
#!/bin/bash
content=$(curl -s "https://en.wikipedia.org/wiki/World_Happiness_Report?action=raw")
lines=$(echo "$content" | grep '^\|' | sed -n '/2022/{n;p;}')
top_5=$(echo "$lines" | awk '{print $3}' | sort | head -n 5)
echo "$top_5"
However, when I run this code in Ubuntu, nothing shows up, its just blank, like this:
....(My computer server).....:~$ bash happy_countriesnew.sh
#(I'm expecting there to be a list here)
....(My computer server).....:~$
I'm expecting something like this instead of the blank space my terminal is displaying:
Finland
Norway
Denmark
Iceland
Switzerland
Netherlands
Canada
New Zealand
Sweden
Australia
What am I doing wrong and what should I change?

echo | grep | sed | awk is a bit of an anti-pattern. Typically, you want to refactor such pipelines to just be a call to awk. In your case, it looks like your code that is attempting to extract the 2022 data is flawed. The data is already sorted, so you can drop the sort and get the data you want with:
sed -n '/^=== 2022 report/,/^=/{ s/}}//; /^|[12345]|/s/.*|//p; }'
The first portion (the /^=== 2022 report/,/^=/) tells sed to only work on lines between those that match the two given patterns, which is the data you are interested in. The rest is just cleaning up and extracting just the country name, printing only those lines in which the 2nd field is exactly one of the single digits 1, 2, 3, 4, or 5.
Note that this is not terribly flexible, and it is difficult to modify it to print the top 7 or the top 12, so you might want something like:
sed -n '/^=== 2022 report/,/^=/{ s/}}//; /^|[[:digit:]]/s/.*|//p; }' | head -n 5
Note that it could be argued that sed | head is also a bit of an anti-pattern, but keeping track of lines of output in sed is tedious and the pipe to head is less egregious than attempting to write such code.

I guess you see this error (but you are ignoring it)
grep: empty (sub)expression
the problem is with your grep expression, remove the ecape
lines=$(echo "$content" | grep '^|' | sed -n '/2022/{n;p;}')
and check for errors.

Using awk:
awk -F"{{|}}|[|]" '/^=== 2022 rep/ {f=1} /^=== 2021 rep/ {f=0} {if(f==1 && /flag/) {print $6}}' <<<"$content" | head -n 5
Finland
Denmark
Iceland
Switzerland
Netherlands
-F"{{|}}|[|]" # set field separator to '{{' or '}}' or '|'
/^=== 2022 rep/ {f=1} # set flag if line starts with '=== 2022 rep'
/^=== 2021 rep/ {f=0} # unset flag if line starts with '=== 2021 rep'
{if(f==1 && /flag/) {print $6}}' # if f is set and line contains 'flag' text print 6th field
Note: Assumes "$content" variable is populated via content=$(curl -s "https://en.wikipedia.org/wiki/World_Happiness_Report?action=raw")
-- or --
You could use bash command substitution and avoid the intermediate content variable altogether:
awk -F"{{|}}|[|]" '/^=== 2022 rep/ {f=1} /^=== 2021 rep/ {f=0} {if(f==1 && /flag/) {print $6}}' < <(curl -s "https://en.wikipedia.org/wiki/World_Happiness_Report?action=raw") | head -n 5
Output:
Finland
Denmark
Iceland
Switzerland
Netherlands

curl …………… |
gawk 'NF *= 2<NF' FS='^[|][1-5][|][|][{][{]flag[|]|[}][}]$' OFS=
Finland
Denmark
Iceland
Switzerland
Netherlands
If you wanna shrink it even further :
mawk 'NF *= 2<NF' FS='^[|][1-5][|].+[|]|[}]+$' OFS=
this approach makes is easy to expand the list to, say, Top 17 :
nawk 'NF *= 2<NF' FS='^[|]([1-9]|1[0-7])[|].+[|]|[}]+$' OFS=
1 Finland
2 Denmark
3 Iceland
4 Switzerland
5 Netherlands
6 Luxembourg
7 Sweden
8 Norway
9 Israel
10 New Zealand
11 Austria
12 Australia
13 Ireland
14 Germany
15 Canada
16 United States
17 United Kingdom

Related

How do I count lines which specific column has two patterns?

year start year end location topic data type data value
2016 2017 AL Alcohol Crude Prevalence 16.9
2016 2017 CA Alcohol Other 15
2016 2017 AZ Neuropathy Other 13.1
2016 2017 HI Smoke Crude Prevalence 20
2016 2017 IL Cancer Other 20
2016 2017 KS Cancer Other 14
2016 2017 AZ Smoke Crude Prevalence 16.9
2016 2017 KY Cancer Other 13.8
2016 2017 LA Alcohol Crude Prevalence 18
The answer is required to count lines which are associated with the disease “topic”s "Alcohol" and "Cancer".
I already got the index of column named as "topic" , but the contents I am going to extract from "topic" is not correct, then I am not able to count the lines which is containing "Alcohol" and "Cancer", how to solve it?
Here is my code:
awk '{print $4}' AAA.csv > topic.txt
head -n5 topic.txt | less
You could try the following:
the call to awk gets the column in question, the grep filters the keywords, and the word count counts the lines
$ awk '{ print $4 }' data.txt | grep -e Alcohol -e Cancer | wc -l
6
Using a regexp with grep:
cat data.txt|tr -s " "|cut -d " " -f 4|grep -E '(Alcohol|Cancer)'|wc -l
If you are sure that words "Alcohol" and "Cancer" only appear in the 4th column you can just do
grep -E '(Alcohol|Cancer)' data.txt|wc -l
Addition
The OP asks in the comment:
If there are many columns, and I don't know the index of them. How can I extract the columns just based on their name ("topic")?
This code will store in the variable i the column containing "topic". Essentially, the code stores the first line of data.txt as an array variable s, and then parse the array elements until it finds the desired word. (You have to increase i by one at the end because the array index starts from 0).
Note: the code works only if actually a column "topic" is found.
head -n 1 data.txt|read -a s
for (( i=0; i<${#s[#]}; i++ ))
do
if [ "${s[$i]}" == "topic" ]
then
break
fi
done
i=$(( $i + 1 ))

how to sort this in bash

Hello I have a file containing these lines:
apple
12
orange
4
rice
16
how to use bash to sort it by numbers ?
Suppose each number is the price for the above object.
I want they are formatted like this:
12 apple
4 orange
16 rice
or
apple 12
orange 4
rice 16
Thanks
A solution using paste + sort to get each product sorted by its price:
$ paste - - < file|sort -k 2nr
rice 16
apple 12
orange 4
Explanation
From paste man:
Write lines consisting of the sequentially corresponding lines from
each FILE, separated by TABs, to standard output. With no FILE, or
when FILE is -, read standard input.
paste gets the stream coming from the stdin (your <file) and figures that each line belongs to the fictional archive represented by - , so we get two columns using - -
sort use the flag -k 2nr to get paste output sorted by second column in reverse numerical order.
you can use awk:
awk '!(NR%2){printf "%s %s\n" ,$0 ,p}{p=$0}' inputfile
(slightly adapted from this answer)
If you want to sort the output afterwards, you can use sort (quite logically):
awk '!(NR%2){printf "%s %s\n" ,$0 ,p}{p=$0}' inputfile | sort -n
this would give:
4 orange
12 apple
16 rice
Another solution using awk
$ awk '/[0-9]+/{print prev, $0; next} {prev=$0}' input
apple 12
orange 4
rice 16
while read -r line1 && read -r line2;do
printf '%s %s\n' "$line1" "$line2"
done < input_file
If you want lines to be sorted by price, pipe the result to sort -k2:
while read -r line1 && read -r line2;do
printf '%s %s\n' "$line1" "$line2"
done < input_file | sort -k2
You can do this using paste and awk
$ paste - - <lines.txt | awk '{printf("%s %s\n",$2,$1)}'
12 apple
4 orange
16 rice
an awk-based solution without needing external paste / sort, using regex, calculating modulo % of anything, or awk/bash loops
{m,g}awk '(_*=--_) ? (__ = $!_)<__ : ($++NF = __)_' FS='\n'
12 apple
4 orange
16 rice

Shell script problems

So I'm doing some work on shell script. I have this code:
Echo "5 Matt male"
Echo "8 Sarah female"
Echo "9 Paul male"
I am meant to set a threshold number of 6 which will only output the lines whose numbers are above 6. Hence the lines containing sarah and Paul. But I have no idea on how to do this. Im so sorry but it is also meant to print only the ones that also contain "female"
your date need to be stored in file.txt.
file.txt:
5 Matt male
8 Sarah female
9 Paul male
cat file.txt | awk '{ if( $1 > 5 && $3=="female") print $0}'
If you don't know the usage of awk, take a look this http://cm.bell-labs.com/cm/cs/awkbook/

How to grep only two words in a line in file between them specific number of random words present

Given a file with this content:
Feb 1 ohio a1 rambo
Feb 1 ny a1 sandy
Feb 1 dc a2 rambo
Feb 2 alpht a1 jazzy
I only want the count of those lines containing Feb 1 and rambo.
You can use awk to do this more efficiently:
$ awk '/Feb 1/ && /rambo/' file
Feb 1 ohio a1 rambo
Feb 1 dc a2 rambo
To count matches:
$ awk '/Feb 1/ && /rambo/ {sum++} END{print sum}' file
2
Explanation
awk '/Feb 1/ && /rambo/' is saying: match all lines in which both Feb 1 and rambo are matched. When this evaluates to True, awk performs its default behaviour: print the line.
awk '/Feb 1/ && /rambo/ {sum++} END{print sum}' does the same, only that instead of printing the line, increments the var sum. When the file has been fully scanned, it enters in the END block, where it prints the value of the var sum.
Is Feb 1 always before rambo? if yes:
grep -c "Feb 1 .* rambo"
Try this as per #Marc's suggestions,
grep 'Feb 1.*rambo' file |wc -l
In case, position of both strings are not sure to be as mentioned in question following command will be useful,
grep 'rambo' file|grep 'Feb 1'|wc -l
The output will be,
2
Here is what I tried,
The awk solution is probably clearer, but this is a nice sed technique:
sed -n '/Feb 1/{/rambo/p; }' | wc -l

Cannot get this simple sed command

This sed command is described as follows
Delete the cars that are $10,000 or more. Pipe the output of the sort into a sed to do this, by quitting as soon as we match a regular expression representing 5 (or more) digits at the end of a record (DO NOT use repetition for this):
So far the command is:
$ grep -iv chevy cars | sort -nk 5
I have to add another pipe at the end of that command I think which "quits as soon as we match a regular expression representing 5 or more digits at the end of a record"
I tried things like
$ grep -iv chevy cars | sort -nk 5 | sed "/[0-9][0-9][0-9][0-9][0-9]/ q"
and other variations within the // but nothing works! What is the command which matches a regular expression representing 5 or more digits and quits according to this question?
Nominally, you should add a $ before the second / to match 5 digits at the end of the record. If you omit the $, then any sequence of 5 digits will cause sed to quit, so if there is another number (a VIN, perhaps) before the price, it might match when you didn't intend it to.
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/q'
On the whole, it's safer to use single quotes around the regex, unless you need to substitute a shell variable into it (or unless the regex contains single quotes itself). You can also specify the repetition:
grep -iv chevy cars | sort -nk 5 | sed '/[0-9]\{5,\}$/q'
The \{5,\} part matches 5 or more digits. If for any reason that doesn't work, you might find you're using GNU sed and you need to do something like sed --posix to get it working in the normal mode. Or you might be able to just remove the backslashes. There certainly are options to GNU sed to change the regex mechanism it uses (as there are with GNU grep too).
Another way.
As you don't post a file sample, a did it as a guess.
Here I'm looking for lines with the word "chevy" where the field 5 is less than 10000.
awk '/chevy/ {if ( $5 < 10000 ) print $0} ' cars
I forgot the flag -i from grep ... so the correct is:
awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
$ cat > cars
Chevy 2 3 4 10000
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 10000
CHEVY 2 3 4 2000
Prevy 2 3 4 1000
Prevy 2 3 4 10000
$ awk 'BEGIN{IGNORECASE=1} /chevy/ {if ( $5 < 10000 ) print $0} ' cars
Chevy 2 3 4 5000
chEvy 2 3 4 1000
CHEVY 2 3 4 2000
grep -iv chevy cars | sort -nk 5 | sed '/[0-9][0-9][0-9][0-9][0-9]$/d'

Resources