Tee pipe into 3 different processes and grepping the second match

Tee pipe into 3 different processes and grepping the second match - linux

I am trying to create a bash script which shows me the latest stats about corona infection numbers in the countries Germany and Switzerland and also in the whole world.
corona () {
curl -s https://corona-stats.online\?minimal\=true | tee >(head -n 1) > >(grep "(CH)\|(DE)")
curl -s https://corona-stats.online\?minimal\=true | tail -n 20 | grep World
}
As you can see, to do this I had to create this very ugly script where curl is called twice. I had to do this because the website looks like this:
Rank World Total Cases New Cases ▲ Total Deaths New Deaths ▲ Recovered Active Critical Cases / 1M pop
1 USA (US) 7,497,256 2,585 ▲ 212,694 34 ▲ 4,737,369 2,547,193 14,190 22,617
2 India (IN) 6,397,896 5,936 ▲ 99,833 29 ▲ 5,352,078 945,985 8,944 4,625
3 Brazil (BR) 4,849,229 144,767 4,212,772 491,690 8,318 22,773
4 Russia (RU) 1,194,643 9,412 ▲ 21,077 186 ▲ 970,296 203,270 2,300 8,185
...
22 Germany (DE) 295,943 413 ▲ 9,586 259,500 26,857 362 3,529
...
58 Switzerland (CH) 54,384 552 ▲ 2,075 1 ▲ 45,300 7,009 32 6,272
...
World 34,534,040 63,822 ▲ 1,028,540 1,395 ▲ 25,482,492 8,023,008 66,092 4,430.85
Code: https://github.com/sagarkarira/coronavirus-tracker-cli
Twitter: https://twitter.com/ekrysis
Last Updated on: 02-Oct-2020 12:10 UTC
US STATES API: https://corona-stats.online/states/us
HELP: https://corona-stats.online/help
SPONSORED BY: ZEIT NOW
Checkout fun new side project I am working on: https://messagink.com/story/5eefb79b77193090dd29d3ce/global-response-to-coronavirus
I only want to display the first line, the last line of the table (World) and the two lines about Germany and Switzerland. I manged to display the first line as well as the two countries by piping the output of curl into head -n 1 and grepping the country codes. I was able to do both things thanks to this answer.
Now I want to get the last line in the table, the one where the current cases of the whole World are displayed. I tried to use tee again to pipe it into a third process tee >(head -n 1) > >(grep "(CH)\|(DE)") > >(tail -n 20 | grep World). But that didn't work. My first question is, how can I pipe an output into 3 different processes using tee?
The second question revolves around the way I try to grep the World line. I tail the last 20 lines and then grep "World". I do this because if I simply grep "World", it only return the title line where "World" can also be found. So my second question is: How can I grep only the last or second occurance?

You can chain several tee commands and throw away only the last output of tee:
curl -s ... | tee >( cmd1 ) | tee >( cmd2 ) | tee > >( cmd3 )
Actually, we can shorten it to:
curl -s ... | tee >( cmd1 ) | tee >( cmd2 ) | cmd3
because we do not use the output of the last tee anyway.
Having multiple commands write to the terminal at the same time might get the output mixed up. A much more elegant solution is to use only one grep, e.g.
curl -s ... | grep '(DE)\|(CH)\|World.*,'
The expression World.*, will just look for a comma in the same line after World, in order to exclude the head line.

I think a variable should suit better what you need (at least in this case), something like:
corona() {
data="$(curl -s https://corona-stats.online\?minimal\=true)"
echo "$data" | head -n 1
echo "$data" | grep "(CH)\|(DE)"
echo "$data" | tail -n 20 | grep World
}
It would convey easier what you're trying to do and would also be easier to expand if you'd need to change anything.

You can try this:
curl -s https://corona-stats.online\?minimal\=true | grep -E "(Rank|^1[^0-9]|\(CH\)|\(DE\))"
Use grep to display only line contain "Rank", 1[non-digit], (CH), (DE)

Related

How can I fix my bash script to find a random word from a dictionary?

I'm studying bash scripting and I'm stuck fixing an exercise of this site: https://ryanstutorials.net/bash-scripting-tutorial/bash-variables.php#activities
The task is to write a bash script to output a random word from a dictionary whose length is equal to the number supplied as the first command line argument.
My idea was to create a sub-dictionary, assign each word a number line, select a random number from those lines and filter the output, which worked for a similar simpler script, but not for this.
This is the code I used:
6 DIC='/usr/share/dict/words'
7 SUBDIC=$( egrep '^.{'$1'}$' $DIC )
8
9 MAX=$( $SUBDIC | wc -l )
10 RANDRANGE=$((1 + RANDOM % $MAX))
11
12 RWORD=$(nl "$SUBDIC" | grep "\b$RANDRANGE\b" | awk '{print $2}')
13
14 echo "Random generated word from $DIC which is $1 characters long:"
15 echo $RWORD
and this is the error I get using as input "21":
bash script.sh 21
script.sh: line 9: counterintelligence's: command not found
script.sh: line 10: 1 + RANDOM % 0: division by 0 (error token is "0")
nl: 'counterintelligence'\''s'$'\n''electroencephalograms'$'\n''electroencephalograph': No such file or directory
Random generated word from /usr/share/dict/words which is 21 characters long:
I tried in bash to split the code in smaller pieces obtaining no error (input=21):
egrep '^.{'21'}$' /usr/share/dict/words | wc -l
3
but once in the script line 9 and 10 give error.
Where do you think is the error?

problems
SUBDIC=$( egrep '^.{'$1'}$' $DIC ) will store all words of the given length in the SUBDIC variable, so it's content is now something like foo bar baz.
MAX=$( $SUBDIC | ... ) will try to run the command foo bar baz which is obviously bogus; it should be more like MAX=$(echo $SUBDIC | ... )
MAX=$( ... | wc -l ) will count the lines; when using the above mentioned echo $SUBDIC you will have multiple words, but all in one line...
RWORD=$(nl "$SUBDIC" | ...) same problem as above: there's only one line (also note #armali's answer that nl requires a file or stdin)
RWORD=$(... | grep "\b$RANDRANGE\b" | ...) might match the dictionary entry catch 22
likely RWORD=$(... | awk '{print $2}') won't handle lines containing spaces
a simple solution
doing a "random sort" over the all the possible words and taking the first line, should be sufficient:
egrep "^.{$1}$" "${DIC}" | sort -R | head -1

MAX=$( $SUBDIC | wc -l ) - A pipe is used for connecting a command's output, while $SUBDIC isn't a command; an appropriate syntax is MAX=$( <<<$SUBDIC wc -l ).
nl "$SUBDIC" - The argument to nl has to be a filename, which "$SUBDIC" isn't; an appropriate syntax is nl <<<"$SUBDIC".

This code will do it. My test dictionary of words is in file file. It's a good idea to get all words of a given length first but put them in an array not in var. And then get a random index and echo it.
dic=( $(sed -n "/^.\{$1\}$/p" file) )
ind=$((0 + RANDOM % ${#dic[#]}))
echo ${dic[$ind]}

I am also doing this activity and I create one simple solution.
I create the script.
#!/bin/bash
awk "NR==$1 {print}" /usr/share/dict/words
Here if you want a random string then you have to run the script as per the below command from the terminal.
./script.sh $RANDOM
If you want the print any specific number string then you can run as per the below command from the terminal.
./script.sh 465

cat /usr/share/dict/american-english | head -n $RANDOM | tail -n 1
$RANDOM - Returns a different random number each time is it referred to.
this simple line outputs random word from the mentioned dictionary.
Otherwise as umläute mentined you can do:
cat /usr/share/dict/american-english | sort -R | head -1

Using printf to display an extracted string through grep and to be use as user input in a script

Good day,
This is kinda lenghty, Im hoping for the kind help of anybody who can support me on this simple problem (to others) but taking me almost forever to figure out.
I have this file (EOL.txt) which consists of the following sample lists:
35 - 5976
36 - 5976C0
53 - 5976C2
64 - 5976D0
69 - 43593
72 - 43593C0
Im using the following commands to extract the leftmost figure since this correspond to a routine number of another script:
grep 5976C2 EOL.txt | head -n1 | cut -d- -f1
After I acquired that number, I will input that along with the other data on another script (N.csh-syntax as follows) that will execute another one (Test.csh):
$./N.csh 53 XXXX.XX "01 02 03"
N.csh --> printf "$1\n$2\n$3\nYYYY\n1\nN\n" | /export/home/Script/Test.csh
What I want to do now is to incorporate the grep command to N.csh so that I wont have to do that separately. It should look like this:
$./N.csh 5976C2 XXXX.XX "01 02 03"
I tried the following commands but its not working.
grep $1 EOL.txt | head -n1 | cut -d- -f1 >> A ; set B=`cat A` ; printf %s "$B\n$2\n$3\n82869\n1\nN\n"
Im new to this stuff, any help will be highly appreciated.
Thanks a lot in advance.
Mike

You can use the following in the file N.csh:
set mynumber = `grep $1 EOL.txt | head -n1 | cut -d- -f1`
printf "$mynumber\n$2\n$3\n...
and then invoke N.csh like
./N.csh 5976C2 XXXX.XX "01 02 03"
Note that after set mynumber =, in the first line, there is a "backtick" - a reversed single quote. The shell executes the commands delimited by two backticks, takes the output, and puts it back in place of the original contents, so the first line turns into set mynumber = 53.

Counting grep result wont work in bash script

My question is not easy to ask, I try explain the problem with the following example:
/home/luther/tipical_surnames.txt
Smith
Johnson
Williams
Jones
Brown
#Davis
Miller
Wilson
#Moore
Taylor
Anderson
/home/luther/employers.txt
2000 Johnson A lot-of details / BJC3000,6000, i550 0
2101 Smith A lot-of details / BJC3000,6000, i550 0
2102 Smith A lot-of details / BJC3000,6000, i550 0
2103 Jones A lot-of details / BJC3000,6000, i550 0
2104 Johnson A lot-of details / BJC3000,6000, i550 0
2100 Smith A lot-of details / BJC3000,6000, i550 0
I have a list with the favorite surnames and another with the name of employers.
Let's check how many people have the most popular surname in the company, using console:
grep -v "#" /home/luther/tipical_surnames.txt | sed -n 1'p' | cut -f 1
Smith
grep Smith /home/luther/employers.txt | wc -l
230
Work perfect.
Now lets check the first 5 most popular surnames using a simple bash script:
#!/bin/bash
counter=1
while [ $counter -le 5 ]
do
surname=`grep -v "#" /home/luther/tipical_surnames.txt | sed -n "$counter"'p' | cut -f 1`
qty=`grep "$surname" /home/luther/employers.txt | wc -l`
echo $surname
echo $qty
counter=$(( $counter + 1 ))
done
And the result the follows:
Smith
0
Johnson
0
Williams
0
Jones
0
Brown
0
Whats wrong?
Update:
Like I wrote I tested the script on other computer and everything is works fine.
After I try the follow:
root#problematic:/var/www# cat testfile.bash
#!/bin/bash
for (( c=1; c<=5; c++ ))
{
echo $c
}
root#problematic:/var/www# bash testfile.bash
testfile.bash: line 2: syntax error near unexpected token `$'\r''
'estfile.bash: line 2: `for (( c=1; c<=5; c++ ))
root#problematic:/var/www# echo $BASH_VERSION
4.2.37(1)-release
root#problematic:/var/www#
Of course on other computer this simply script work as expected, without error.

This is obviously untested since you haven't posted sample input but this is the kind of approach you should use:
awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
PROCINFO["sorted_in"] = "#val_num_desc"
for (name in cnt) {
print name, cnt
if (++c == 5) {
break
}
}
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt
Replace "WHATEVER" with the field number where employee surnames are stored in employers.txt.
The above uses GNU awk for sorted_in, with other awks I'd just remove the PROCINFO line and the count from the output loop and pipe the output to sort then head, e.g.:
awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
for (name in cnt) {
print name, cnt
}
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt | sort -k2,1nr | head -5
or whatever the right sort options are.

I'm actually not quite sure. I tested your script, by copying it and pasting it, with imagined data (/usr/share/dict/words) and it seems to work as expected. I wonder if there is a difference between the script you posted and the script you're running?
While at it, I took the liberty of making it run a bit smoother. Notice how, in the loop, you read the entirety of the surnames file in each iteration? Also, grep + wc -l may be replaced by grep -c. I'm also adding -F to the first invocation of grep since the pattern (#) is fixed strings. The grep into the employee file uses \<$name\> to make sure we only get the Johns and no Johnssons when $name is John.
#!/bin/bash
employees_in="/usr/share/dict/words"
names_in="/usr/share/dict/words"
grep -v -F "#" "$names_in" | head -n 5 | cut -f 1 |
while read -r name; do
count="$( grep -c "\<$names\> " "$employees_in" )"
printf "name: %-10s\tcount: %d\n" "$name" "$count"
done
Testing it:
$ bash script.sh
name: A count: 1
name: a count: 1
name: aa count: 1
name: aal count: 1
name: aalii count: 1
Note: I get only ones in the count because the dictionary (not surprisingly) contains only unique words.

Obtaining specific data from a very very long string unix

I'm doing a script in unix for obtaining specific data, after running a program it gives as an output a very huge string, for example: (is just a random example)
In this example, the null scorex: 34;hypothesis of "marginal homogeneity" would mean there was no effect of the treatment. From the above data, the McNemar scorex: 687;test statistic with Yates's continuity correction is scorex: 9;
and I like that whenever it finds the string "scorex: " it gives me the actual score: 34, 687 or 9, for this example.
Thank you
I forgot, my string is inside a variable called RESULTADO

You can use grep:
grep -oP 'scorex:\s?\K\d*' input
or
<command> | grep -oP 'scorex:\s?\K\d*'
For your example:
$ echo "In this example, the null scorex: 34;hypothesis of "marginal homogeneity" would mean there was no effect of the treatment. From the above data, the McNemar scorex: 687;test statistic with Yates's continuity correction is scorex: 9;" | grep -oP 'scorex:\s?\K\d*'
34
687
9

This can be solved via a regex. Considering the following pattern:
scorex: (\d+)
Using this pattern with grep would look like this:
grep -Eo "scorex: (\d+)" file_containing_string | cut -d: -f2
The output of this is for every capture the result:
34
687
9

View tabular file such as CSV from command line [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 4 years ago.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Improve this question
Anyone know of a command-line CSV viewer for Linux/OS X? I'm thinking of something like less but that spaces out the columns in a more readable way. (I'd be fine with opening it with OpenOffice Calc or Excel, but that's way too overpowered for just looking at the data like I need to.) Having horizontal and vertical scrolling would be great.

You can also use this:
column -s, -t < somefile.csv | less -#2 -N -S
column is a standard unix program that is very convenient -- it finds the appropriate width of each column, and displays the text as a nicely formatted table.
Note: whenever you have empty fields, you need to put some kind of placeholder in it, otherwise the column gets merged with following columns. The following example demonstrates how to use sed to insert a placeholder:
$ cat data.csv
1,2,3,4,5
1,,,,5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5
$ cat data.csv
1,2,3,4,5
1,,,,5
$ column -s, -t < data.csv
1 2 3 4 5
1 5
$ sed 's/,,/, ,/g;s/,,/, ,/g' data.csv | column -s, -t
1 2 3 4 5
1 5
Note that the substitution of ,, for , , is done twice. If you do it only once, 1,,,4 will become 1, ,,4 since the second comma is matched already.

You can install csvtool (on Ubuntu) via
sudo apt-get install csvtool
and then run:
csvtool readable filename | view -
This will make it nice and pretty inside of a read-only vim instance, even if you have some cells with very long values.

Have a look at csvkit. It provides a set of tools that adhere to the UNIX philosophy (meaning they are small, simple, single-purposed and can be combined).
Here is an example that extracts the ten most populated cities in Germany from the free Maxmind World Cities database and displays the result in a console-readable format:
$ csvgrep -e iso-8859-1 -c 1 -m "de" worldcitiespop | csvgrep -c 5 -r "\d+"
| csvsort -r -c 5 -l | csvcut -c 1,2,4,6 | head -n 11 | csvlook
-----------------------------------------------------
| line_number | Country | AccentCity | Population |
-----------------------------------------------------
| 1 | de | Berlin | 3398362 |
| 2 | de | Hamburg | 1733846 |
| 3 | de | Munich | 1246133 |
| 4 | de | Cologne | 968823 |
| 5 | de | Frankfurt | 648034 |
| 6 | de | Dortmund | 594255 |
| 7 | de | Stuttgart | 591688 |
| 8 | de | Düsseldorf | 577139 |
| 9 | de | Essen | 576914 |
| 10 | de | Bremen | 546429 |
-----------------------------------------------------
Csvkit is platform independent because it is written in Python.

Tabview: lightweight python curses command line CSV file viewer (and also other tabular Python data, like a list of lists) is here on Github
Features:
Python 2.7+, 3.x
Unicode support
Spreadsheet-like view for easily visualizing tabular data
Vim-like navigation (h,j,k,l, g(top), G(bottom), 12G goto line 12, m - mark,
' - goto mark, etc.)
Toggle persistent header row
Dynamically resize column widths and gap
Sort ascending or descending by any column. 'Natural' order sort for numeric values.
Full-text search, n and p to cycle between search results
'Enter' to view the full cell contents
Yank cell contents to clipboard
F1 or ? for keybindings
Can also use from python command line to visualize any tabular data (e.g.
list-of-lists)

If you're a vimmer, use the CSV plugin, which is juuust beautiful:
.

The nodejs package tecfu/tty-table can be globally installed to do precisely this:
apt-get install nodejs
npm i -g tty-table
cat data.csv | tty-table
It can also handle streams.
For more info, see the docs for terminal usage here.

xsv is more than a viewer. I recommend it for most CSV task on the command line, especially when dealing with large datasets.

I used pisswillis's answer for a long time.
csview()
{
local file="$1"
sed "s/,/\t/g" "$file" | less -S
}
But then combined some code I found at http://chrisjean.com/2011/06/17/view-csv-data-from-the-command-line which works better for me:
csview()
{
local file="$1"
cat "$file" | sed -e 's/,,/, ,/g' | column -s, -t | less -#5 -N -S
}
The reason it works better for me is that it handles wide columns better.

Ofri's answer gives you everything you asked for.
But.. if you don't want to remember the command you can add this to your ~/.bashrc (or equivalent):
csview()
{
local file="$1"
sed "s/,/\t/g" "$file" | less -S
}
This is exactly the same as Ofri's answer except I have wrapped it in a shell function and am using the less -S option to stop the wrapping of lines (makes less behaves more like a office/oocalc).
Open a new shell (or type source ~/.bashrc in your current shell) and run the command using:
csview <filename>

Here's a (probably too) simple option:
sed "s/,/\t/g" filename.csv | less

Yet another multi-functional CSV (and not only) manipulation tool: Miller. From its own description, it is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON. (link to github repository: https://github.com/johnkerl/miller)

tblless in the Tabulator package wraps the unix column command, and also aligns numeric columns.

I've created tablign for these (and other) purposes. Install with
pip install tablign
and
$ cat test.csv
Header1,Header2,Header3
Pizza,Artichoke dip,Bob's Special of the Day
BLT,Ham on rye with the works,
$ tablign test.csv
Header1 , Header2 , Header3
Pizza , Artichoke dip , Bob's Special of the Day
BLT , Ham on rye with the works ,
Also works if the data is separated by something else than commas. Most importantly, it preserves the delimiters so you can also use it to style your ASCII tables without sacrificing your [Markdown,CSV,LaTeX] syntax.

I wrote this csv_view.sh to format CSVs from the command line, this reads the entire file to figure out the optimal width of each column (requires perl, assumes there are no commas in fields, also uses less):
#!/bin/bash
perl -we '
sub max( # ) {
my $max = shift;
map { $max = $_ if $_ > $max } #_;
return $max;
}
sub transpose( # ) {
my #matrix = #_;
my $width = scalar #{ $matrix[ 0 ] };
my $height = scalar #matrix;
return map { my $x = $_; [ map { $matrix[ $_ ][ $x ] } 0 .. $height - 1 ] } 0 .. $width - 1;
}
# Read all lines, as arrays of fields
my #lines = map { s/\r?\n$//; [ split /,/ ] } ;
my $widths =
# Build a pack expression based on column lengths
join "",
# For each column get the longest length plus 1
map { 'A' . ( 1 + max map { length } #$_ ) }
# Get arrays of columns
transpose
#lines
;
# Format all lines with pack
map { print pack( $widths, #$_ ) . "\n" } #lines;
' $1 | less -NS

Tabview is really good. Worked with 200+MB files that displayed nicely which were buggy with LibreOffice as well as csv plugin in gvim.
The Anaconda version is available here: https://anaconda.org/bioconda/tabview

Using TxtSushi you can do:
csvtopretty filename.csv | less -S

I wrote a script, viewtab , in Groovy for just this purpose. You invoke it like:
viewtab filename.csv
It is basically a super-lightweight spreadsheet that can be invoked from the command line, handles CSV and tab separated files, can read VERY large files that Excel and Numbers choke on, and is very fast. It's not command-line in the sense of being text-only, but it is platform independent and will probably fit the bill for many people looking for a solution to the problem of quickly inspecting many or large CSV files while working in a command line environment.
The script and how to install it are described here:
http://bayesianconspiracy.blogspot.com/2012/06/quick-csvtab-file-viewer.html

There's this short command line script in python: https://github.com/rgrp/csv2ascii/blob/master/csv2ascii.py
Just download and place in your path. Usage is like
csv2ascii.py [options] csv-file-path
Convert csv file at csv-file-path to ascii form returning the result on
stdout. If csv-file-path = '-' then read from stdin.
Options:
-h, --help show this help message and exit
-w WIDTH, --width=WIDTH
Width of ascii output
-c COLUMNS, --columns=COLUMNS
Only display this number of columns

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Tee pipe into 3 different processes and grepping the second match - linux

You can try this: curl -s https://corona-stats.online\?minimal\=true | grep -E "(Rank|^1[^0-9]|\(CH\)|\(DE\))" Use grep to display only line contain "Rank", 1[non-digit], (CH), (DE)

Related

How can I fix my bash script to find a random word from a dictionary?

Using printf to display an extracted string through grep and to be use as user input in a script

Counting grep result wont work in bash script

Obtaining specific data from a very very long string unix

View tabular file such as CSV from command line [closed]

Categories

Resources