Parsing in Linux shell script - linux

I am trying to parse the output below:
+--------------------------------------+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+--------------------------------------+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| 1 | m1.tiny | 512 | 1 | 0 | | 1 | 1.0 | True |
| 2 | m1.small | 2048 | 20 | 0 | | 1 | 1.0 | True |
| 214b272c-e6a4-4bb5-96a4-c74c64984e5a | MC | 2048 | 100 | 0 | | 1 | 1.0 | True |
| 3 | m1.medium | 4096 | 40 | 0 | | 2 | 1.0 | True |
| 4 | m1.large | 8192 | 80 | 0 | | 4 | 1.0 | True |
| 5 | m1.xlarge | 16384 | 160 | 0 | | 8 | 1.0 | True |
| 71aa57d1-52e3-4499-abd2-23985949aeb4 | slmc | 4096 | 32 | 0 | | 2 | 1.0 | True |
| 7cf1d926-c904-47b8-af70-499196a1f65f | new test flavor | 1 | 1 | 0 | | 1 | 1.0 | True |
| 97b3dc38-f752-437b-881d-c3415c8a682c | slstore | 10240 | 32 | 0 | | 4 | 1.0 | True |
+--------------------------------------+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
It is the list of flavours in open-stack. I am expecting output as below:
m1.tiny;m1.small;MC;m1.medium;m1.large;m1.xlarge;slmc;new test flavor;slstore;
What I tried:
I came up with below command for parsing:
nova flavor-list | grep '|' | awk 'NR>1 {print $4}' | tr '\n' ';'
but the issue is that the command returns output as follows:
m1.tiny;m1.small;MC;m1.medium;m1.large;m1.xlarge;slmc;new;slstore;
There is a problem with the space in new test flavor.

Below command will give expected output
nova flavor-list | grep '|' | awk -F "|" 'NR>1 {print $3}' | tr '\n' ';'
Above command will give output will white spaces i.e.
$ nova flavor-list | grep '|' | awk -F "|" 'NR>1 {print $3}' | tr '\n' ';'
m1.tiny ; m1.small ; MC ; m1.medium ; m1.large ; m1.xlarge ; slmc ; new test flavor ; slstore ;
To get output without white spaces use below command
$nova flavor-list | grep '|' | awk -F "|" 'NR>1 {print $3}' | awk -F "\n" '{gsub(/^[ \t]+|[ \t]+$/, "", $1)}1' | tr '\n' ';'
m1.tiny;m1.small;MC;m1.medium;m1.large;m1.xlarge;slmc;new test flavor;slstore;

Use the following command
sed -e '1,3d' < flavor-list | sed -e 's/-//g' | sed -e 's/+//g' | cut -f 3 -d "|" | sed -e 's/^ //g' | sed -e 's/\s\+$//g' | tr '\n' ';'

Software tools (no sed, no awk):
tail -n +4 flavor-list | grep -v '\-\-' | cut -d'|' -f3 | cut -d' ' -f2- | \
tr -s ' ' | rev | cut -d' ' -f2- | rev | tr '\n' ';' ; echo
Pure bash (uses no utils at all):
while read a b c d ; do \
d="${d/ [ |]*/}" ; \
[ -z "$d" -o "$d" = Name ] && continue ; \
echo -n "$d;" ; \
done < flavor-list ; echo
Output (either one):
m1.tiny;m1.small;MC;m1.medium;m1.large;m1.xlarge;slmc;new test flavor;slstore;

Related

Using awk and/or grep for two columns from one file1 and grep column 2 value from file2 while inserting column1 from file 1 before column 1 in file2

Good day.
I have two files, vmList and flavorList, the vmList containing the following:
$ cat /tmp/vmList
cf0012vm001| OS-SRV-USG:terminated_at | -
cf0012vm001| accessIPv4 |
cf0012vm001| accessIPv6 |
cf0012vm001| cf0012v_internal_network network | 192.168.210.10
cf0012vm001| created | 2021-09-17T17:21:39Z
cf0012vm001| flavor | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm001| hostId | fcf39100bcc6ae57a8212f97d3251ac43913719f2aebcaa72006956e
cf0012vm001| key_name | -
cf0012vm002| OS-SRV-USG:terminated_at | -
cf0012vm002| accessIPv4 |
cf0012vm002| accessIPv6 |
cf0012vm002| cf0012v_internal_network network | 192.168.210.11
cf0012vm002| created | 2021-09-17T17:21:37Z
cf0012vm002| flavor | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm002| hostId | e1590af8ddd57f1e2e74617d6c3631195e410bdd188a0b59813ffbef
cf0012vm002| id | 0e292900-6b50-4055-9842-d95e54fa1490
and the flavorList containing the following information:
$ cat /tmp/flavorList
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
| ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
| 711f0ff2f01d403689819b6cbab36e42 | nd.c4r8d21s8e21 | 8192 | 21 | 21 | 8192 | 4 | | N/A |
| 78a70b62efae4fbcb35994aeb0f87678 | nd.c8r16d31s8e31 | 16384 | 31 | 31 | 8192 | 8 | | N/A |
| 78f4fe71cc3340a59c62fc0b32d81e3f | nd.c4r16d100 | 16384 | 100 | 0 | | 4 | | N/A |
| 7a7e6ae4bfe34ac4ab3983b8f764a8ce | nd.c2r8d40 | 8192 | 40 | 0 | | 2 | | N/A |
| 832169fed2244bb6b1739ab3db0f232e | nd.c1r4d100 | 4096 | 100 | 0 | | 1 | | N/A |
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
| 8e968623e5c44674b33e1cc1f892e32d | nd.c9r40d50 | 40960 | 50 | 0 | | 9 | | N/A |
| 8e96a7044566406f9ef7eba48c2a8c55 | nd.c5r4d81 | 4096 | 81 | 0 | | 5 | | N/A |
| 8fd07e2004f84658a76af1cd8b9cea43 | nd.c2r8d50 | 8192 | 50 | 0 | | 2 | | N/A |
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
My goal is to find the 'flavor' in the vmList, then grep the flavor value (nd.c8r16d50e60) from the flavorList, which in itself works:
$ for f in `grep flavor /tmp/vmList|awk '{print $4}'`;do grep ${f} /tmp/flavorList;done
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
However, I would like to add the first parameter from the vmList (cf0012vm001 and cf0012vm002) to precede the output, either in a line above the output or in front of the line:
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
or even:
cf0012vm001
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
Please advise.
Bjoern
Assumptions:
a flavor does not contain spaces
a specific ordering of the output has not be stated
vmList: column/field #1 could be associated with different flavors [NOTE: not supported by sample data set; OP would need to refute/confirm]
One GNU awk idea that uses an array of arrays:
awk -F'|' ' # input field delimiter = "|" for both files
FNR==NR { # for 1st file ...
name=gensub(/ /,"","g",$2) # remove all spaces from field #2 and save in awk variable "name"
if (name == "flavor") { # if field #2 == "flavor" ...
split($3,arr,"(") # split field #3 using "(" as delimiter, storing results in array arr[]
gsub(" ","",arr[1]) # remove all spaces from first array entry
flavors[arr[1]] # keep track of unique flavors
col1[arr[1]][$1] # keep track of associated values from column/field #1
}
next
}
FNR>3 { # for 2nd file, after ignoring first 3 lines ...
if (NF == 1) next # skip line if it only has 1 "|" delimited field
name=gensub(/ /,"","g",$3) # remove all spaces from field #3 and save in awk variable "name"
if (name in flavors) # if name is in our list of flavors ...
for (i in col1[name]) # loop through list of columns (from 1st file)
print i,$0 # print column (from 1st file) plus current line
}
' vmList flavorList
This generates:
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
NOTE: while this output appears to be sorted by the first column this is merely a coincidence; if a specific order needs to be guaranteed this can likely be done by adding an appropriate PROCINFO["sorted_in"] entry; OP just needs to state the desired ordering
Would you please try the following:
echo "VM Name | ID | Flavor Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |"
echo "------------+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+"
awk -F '[[:blank:]]*\\|[[:blank:]]*' '
NR==FNR && $2=="flavor" {sub(/[[:blank:]].+/, "", $3); a[$1]=$3; next}
{
for (i in a) {
if (a[i] == $3) print i " " $0
}
}
' /tmp/vmList /tmp/flavorList | sort -k1.9,1.11n
Output:
VM Name | ID | Flavor Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
------------+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
The field separator [[:blank:]]*\\|[[:blank:]]* splits the record
on the pipe character with preceding / following blank characters if any.
The condition NR==FNR && $2=="flavor" matches the flavor line
in vmList.
The statement sub(/[[:blank:]].+/, "", $3) extracts the nd.xxx
field by removing the substring after the blank character.
a[$1]=$3 stores the nd.xxx field keyed by the 1st cfxxx field.
The final for (i in a) loop prints the matched lines in flavorList with prepending the cfxxx field.
sort -k1.9,1.11n sorts the output by the substring from the 1st field 9th character to the 1st field 11th character. The trailing n option specifies the numerical sort.

Print footer with column numbers

I found this command {print NF} to show total number of columns:
$ nova list | awk '{print NF}' | sort -n | uniq
1
9
10
But I wish to print for every column their number.
See example with field separator |:
$ nova list | head
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
1 2 3 4
Let just understand what awk is doing here
nova list | awk '{print NF}' | sort -n | uniq
here in awk '{printf NF}' NF is number of fields where the field separator is taken as space. So in output below NF=9 (count pipe '|' symbol also).
| ID | Name | Status | Networks |
and same goes for below data line
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
In your output you are getting 1 and 10 also some line in nova list command output must have single field or 10 fields.
Now coming to your problem you wish to print field with its NF or field value with its NF value.
nova list | awk '{for(I=1;I<=NF;I++){printf I"-"$I" "}printf "\n"}'
It will not print the field number in last of the file but with the data.
Perl to the rescue:
nova list | \
perl -ne 'print;
$s = $_ if /\|/;
}{
$s =~ s/[^|]/ /g;
$s =~ s/\|/++$i/ge;
print " $s\n"
'
-n reads the input line by line
each line is printed and remembered in $s if it contains | (to skip the final border)
when the input ends }{, everything that's not a | is replaced by a space
all | are replaced by numbers
the result is printed
More tweaking needed if the number of columns > 10 (numbers get wider than 1 char):
$s =~ s/(??{" {".((length(0+$i))-1)."}"})\|/++$i/ge;
nova list | {
read line; echo "$line" # read and print the first line
read header; echo "$header" # read, remember and print the 2nd line
cat # all the rest of the nova list output
# then, use the header, and transform the words into numbers
echo "$header" | perl -pe 's/(\w+)/ sprintf "%-*d", length($1), ++$n /ge';
}
output
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
| 1 | 2 | 3 | 4 |
Actually, I can put that all is a quick perl script:
echo "$nova_list" | perl -ne '
$header = $_ if $. == 2;
print;
} {
$header =~ s/(\w+)/ sprintf "%-*d", length($1), ++$n /ge;
print $header;
'
You can get the last line of the file and "clean" it so that it becomes the footer with the field numbers:
nova list | awk -F"|" '1;
END {gsub (/[^|]/," ")
for(i=1;i<=NF;i++)
sub(/\| /, " " i)
gsub(/\|/," ")
print
}'
This:
replaces everything but | with a space.
replaces all strings "| " with an autoincremented number.
replaces the trailing |.
prints the result.
Test
$ cat a
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
See output:
$ awk -F"|" '1; END {line=$0; fields=NF; gsub (/[^\|]/," "); for(i=1;i<=fields;i++) sub(/\| /, " " i); gsub(/\|/," "); print}' a
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
1 2 3 4
Or keeping the field separators:
$ awk -F"|" '1; END {line=$0; fields=NF; gsub (/[^\|]/," "); for(i=1;i<=fields;i++) sub(/\| /, "| " i); print}' a
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
| 1 | 2 | 3 | 4 |

how to join to files with awk/sed/grep/bash similar to SQL JOIN

how to join to files with awk/sed/grep/bash similar to SQL JOIN?
I have a file that looks like this:
and another one that looks like this:
i've also a text version of the image above:
+----------+------------------+------+------------+----+---------------------------------------------------+---------------------------------------------------+-----+-----+-----+------+-------+-------+--------------+------------+--+--+---+---+----+--+---+---+----+------------+------------+------------+------------+
| 21548598 | DSND001906102.2 | 0107 | 001906102 | 02 | FROZEN / O.S.T. | FROZEN / O.S.T. | 001 | 024 | | | 11.49 | 13.95 | 050087295745 | 11/25/2013 | | | N | N | 30 | | 1 | E | 1 | 10/07/2013 | 02/27/2014 | 10/07/2013 | 10/07/2013 |
| 25584998 | WD1194190DVD | 0819 | 1194190 | 18 | FROZEN / (WS DOL DTS) | FROZEN / (WS DOL DTS) | 050 | 110 | | G | 21.25 | 29.99 | 786936838961 | 03/18/2014 | | | N | N | 0 | | 1 | A | 2 | 12/20/2013 | 03/13/2014 | 12/20/2013 | 12/20/2013 |
| 25812794 | WHV1000292717BR | 0526 | 1000292717 | BR | GRAVITY / (UVDC) | GRAVITY / (UVDC) | 050 | 093 | | PG13 | 29.49 | 35.99 | 883929244577 | 02/25/2014 | | | N | N | 30 | | 1 | E | 3 | 01/16/2014 | 02/11/2014 | 01/16/2014 | 01/16/2014 |
| 24475594 | SNY303251.2 | 0085 | 303251 | 02 | BEYONCE | BEYONCE | 001 | 004 | | | 14.99 | 17.97 | 888430325128 | 12/20/2013 | | | N | N | 30 | | 1 | A | 4 | 12/19/2013 | 01/02/2014 | 12/19/2013 | 12/19/2013 |
| 25812787 | WHV1000284958DVD | 0526 | 1000284958 | 18 | GRAVITY (2PC) / (UVDC SPEC 2PK) | GRAVITY (2PC) / (UVDC SPEC 2PK) | 050 | 093 | | PG13 | 21.25 | 28.98 | 883929242528 | 02/25/2014 | | | N | N | 30 | | 1 | E | 5 | 01/16/2014 | 02/11/2014 | 01/16/2014 | 01/16/2014 |
| 21425462 | PBSDMST64400DVD | E349 | 64400 | 18 | MASTERPIECE CLASSIC: DOWNTON ABBEY SEASON 4 (3PC) | MASTERPIECE CLASSIC: DOWNTON ABBEY SEASON 4 (3PC) | 050 | 095 | 094 | | 30.49 | 49.99 | 841887019705 | 01/28/2014 | | | N | N | 30 | | 1 | A | 6 | 09/06/2013 | 01/15/2014 | 09/06/2013 | 09/06/2013 |
| 25584974 | WD1194170BR | 0819 | 1194170 | BR | FROZEN (2PC) (W/DVD) / (WS AC3 DTS 2PK DIGC) | FROZEN (2PC) (W/DVD) / (WS AC3 DTS 2PK DIGC) | 050 | 110 | | G | 27.75 | 39.99 | 786936838923 | 03/18/2014 | | | N | N | 0 | | 2 | A | 7 | 12/20/2013 | 03/13/2014 | 01/15/2014 | 01/15/2014 |
| 21388262 | HBO1000394029DVD | 0203 | 1000394029 | 18 | GAME OF THRONES: SEASON 3 | GAME OF THRONES: SEASON 3 | 050 | 095 | 093 | | 47.99 | 59.98 | 883929330713 | 02/18/2014 | | | N | N | 30 | | 1 | E | 8 | 08/29/2013 | 02/28/2014 | 08/29/2013 | 08/29/2013 |
| 25688450 | WD11955700DVD | 0819 | 11955700 | 18 | THOR: THE DARK WORLD / (AC3 DOL) | THOR: THE DARK WORLD / (AC3 DOL) | 050 | 093 | | PG13 | 21.25 | 29.99 | 786936839500 | 02/25/2014 | | | N | N | 30 | | 1 | A | 9 | 12/24/2013 | 02/20/2014 | 12/24/2013 | 12/24/2013 |
| 23061316 | PRT359054DVD | 0818 | 359054 | 18 | JACKASS PRESENTS: BAD GRANDPA / (WS DUB SUB AC3) | JACKASS PRESENTS: BAD GRANDPA / (WS DUB SUB AC3) | 050 | 110 | | R | 21.75 | 29.98 | 097363590545 | 01/28/2014 | | | N | N | 30 | | 1 | E | 10 | 12/06/2013 | 03/12/2014 | 12/06/2013 | 12/06/2013 |
| 21548611 | DSND001942202.2 | 0107 | 001942202 | 02 | FROZEN / O.S.T. (BONUS CD) (DLX) | FROZEN / O.S.T. (BONUS CD) (DLX) | 001 | 024 | | | 14.09 | 19.99 | 050087299439 | 11/25/2013 | | | N | N | 30 | | 1 | E | 11 | 10/07/2013 | 02/06/2014 | 10/07/2013 | 10/07/2013 |
+----------+------------------+------+------------+----+---------------------------------------------------+---------------------------------------------------+-----+-----+-----+------+-------+-------+--------------+------------+--+--+---+---+----+--+---+---+----+------------+------------+------------+------------+
The 2nd column from the first file can be joined to the 14th column of the second file!
here's what i've been trying to do:
join <(sort awk -F"\t" '{print $14,$12}' aecprda12.tab) <(sort awk -F"\t" '{print $2,$1}' output1.csv)
but i am getting these errors:
$ join <(sort awk -F"\t" '{print $14,$12}' aecprda12.tab) <(sort awk -F"\t" '{print $2,$1}' output1.csv)
sort: unknown option -- F
Try sort --help' for more information.
sort: unknown option -- F
Try sort --help' for more information.
-700476409 [waitproc] -bash 10336 sig_send: error sending signal 20 to pid 10336, pipe handle 0x84, Win32 error 109
the output i would like would be something like this:
+-------+-------+---------------+
| 12.99 | 14.77 | 3383510002151 |
| 13.97 | 17.96 | 3383510002175 |
| 13.2 | 13 | 3383510002267 |
| 13.74 | 14.19 | 3399240165349 |
| 9.43 | 9.52 | 3399240165363 |
| 12.99 | 4.97 | 3399240165479 |
| 7.16 | 7.48 | 3399240165677 |
| 11.24 | 9.43 | 4011550620286 |
| 13.86 | 13.43 | 4260182980316 |
| 13.98 | 12.99 | 4260182980507 |
| 10.97 | 13.97 | 4260182980514 |
| 11.96 | 13.2 | 4260182980545 |
| 15.88 | 13.74 | 4260182980552 |
+-------+-------+---------------+
what am i doing wrong?
You can do all the work in join and sort
join -1 2 -2 14 -t $'\t' -o 2.12,1.1,0 \
<( sort -t $'\t' -k 2,2 output1.csv ) \
<( sort -t $'\t' -k 14,14 aecprda12.tab )
Notes:
$'\t' is a bash ANSI-C quoted string which is a tab character: neither join nor sort seem to recognize the 2-character string "\t" as a tab
-k col,col sorts the file on the specified column
join has several options to control how it works; see the join(1) man page.
sort awk -F...
is not a valid command; it means sort a file named awk but of course, like the error message says, there is no -F option to sort. The syntax you are looking for is
awk -F ... | sort
However, you might be better off doing the joining in Awk directly.
awk -F"\t" 'NR==FNR{k[$14]=$12; next}
k[$2] { print $2, $1, k[$2] }' aecprda12.tab output1.csv
I am assuming that you don't know whether every item in the first file has a corresponding item in the second file - and that you want only "matching" items. There is indeed a good way to do this in awk. Create the following script (as a text file, call it myJoin.txt):
BEGIN {
FS="\t"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
a[$2]=$1 # create one array element for each $1/$2 pair
next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
# see if the associative array element exists:
gsub(/ /,"",$14) # trim leading/ trailing spaces
if (a[$14]) { # see if the value in $14 was seen in the first file
# print out the three values you care about:
print $12 " " a[$14] " " $14
}
}
Now execute this with
awk -f myJoin.txt file1 file2
Seems to work for me...

Compare two files line by line and find the largest and smallest number using shell scripting

I have two files which both have one number per line and need to compare both files to find the largest and smallest numbers.
eg:-
file1
2
34
5
file2
44
5
66
4
need to get 66 as largest number and 2 as smallest number.
If anyone guides me about the commands i need to focus on, that will be a grate help as I just started to learn shell scripting.
You can use:
sort -n file1 file2 > _sorted.tmp
min=$(head -1 _sorted.tmp)
max=$(tail -1 _sorted.tmp)
Without temporary file:
arr=( $(sort -n file1 file2) )
min=${arr[1]}
max=${arr[#]:(-1)}
Chain reaction:
sort --numeric --unique nu1 nu2 | sed '/^$/d' | sed -n '1p;$p'
| | | | | | | | |
| | | | | | | | +---- print last
| | | | | | | +------- print first
| | | | | | +----------- no print
| | | | | +----------------------- remove empty
| | | | +------------------------------------ file2
| | | +--------------------------------------- file1
| | +---------------------------------------------- unique
| +-------------------------------------------------------- numeric
+--------------------------------------------------------------- sort
The sed '/^$/d' part can be removed if you are 100% sure there are no empty lines in any of the files. If so, then unique can also be removed from sort.
In other words, meeting those two criterias this also works:
sort --numeric nu1 nu2 | sed -n '1p;$p'
As in short version:
sort -n nu1 nu2 | sed -n '1p;$p'
Can be one liner too if you do not want to store the value
sort -n file1 file2 | head -1
sort -n file1 file2 | tail -1
Using awk:
$ head f1 f2
==> f1 <==
10
32
14
==> f2 <==
9
42
4
$ awk 'NR==1{min=$1;max=$1}
{max=(max>$1)?max:$1;min=(min<$1)?min:$1}
END{print "max is: "max; print "min is: "min}' f1 f2
max is: 42
min is: 4

Performing file type counting in all directories

I have a bash script that gives me counts of files in all of the directories recursively that were edited in the last 45 days
find . -type f -mtime -45| rev | cut -d . -f1 | rev | sort | uniq -ic | sort -rn
I have a directory called
\parent
and in parent I have:
\parent\a
\parent\b
\parent\c
I would run the above script once on folder a, once on b and once on c.
The current output is:
91 xls
85 xlsx
49 doc
46 db
31 docx
24 jpg
22 pub
10 pdf
4 msg
2 xml
2 txt
1 zip
1 thmx
1 htm
1 /ic
I would like to run the script from \parent on all the folders inside \parent and get an output like this:
+-------+------+--------+
| count | ext | folder |
+-------+------+--------+
| 91 | xls | a |
| 85 | xlsx | a |
| 49 | doc | a |
| 46 | db | a |
| 31 | docx | a |
| 24 | jpg | a |
| 22 | pub | a |
| 10 | pdf | a |
| 4 | msg | a |
| 98 | jpg | b |
| 92 | pub | b |
| 62 | pdf | b |
| 2 | xml | b |
| 2 | txt | b |
| 1 | zip | b |
| 1 | thmx | b |
| 1 | htm | b |
| 1 | /ic | b |
| 66 | txt | c |
| 48 | msg | c |
| 44 | xml | c |
| 30 | zip | c |
| 12 | doc | c |
| 6 | db | c |
| 6 | docx | c |
| 3 | jpg | c |
+-------+------+--------+
How can I accomplish this with bash?
Put it into a script, make it executable: chmod +x script.sh and run it with: ./script.sh
#!/bin/sh
find . -type f -mtime -45 2>/dev/null \
| sed 's|^\./\([^/]*\)/|\1/|; s|/.*/|/|; s|/.*.\.| |p; d' \
| sort | uniq -ic \
| sort -b -k2,2 -k1,1rn \
| awk '
BEGIN{
sep = "+-------+------+--------+"
print sep "\n| count | ext | folder |\n" sep
}
{ printf("| %5d | %-4s | %-6s |\n", $1, $3, $2) }
END{ print sep }'
sed 's|^\./\([^/]*\)/|\1/|; s|/.*/|/|; s|/.*.\.| |p; d'
s|^\./\([^/]*\)/.*/|\1 | substitutes ./a/file.xls with a/file.xls.
s|/.*/|/| substitutes b/some/dir/file.mp3 with b/file.mp3.
s|/.*.\.| |p substitutes a file.xls with a xls, if s///p is successful then it also prints to standard out, (to avoid files without extension).
d deletes the line (to avoid printing matching (again) or non-matching lines).
sort | uniq -ic counts each group of extension and directory name.
sort -b -k2,2 -k1,1rn sorts first by directory (field 2), small -> large, and then by count (field 1) in reverse order (large -> small) and numerically. -b makes sort(1) ignore blanks (spaces/tabs).
the last awk part pretty prints the output, maybe you want to put this into a separate script.
If you want to see how each pipe filters the results just try to remove each and you will see the output.
Here you can find good tutorials about sh/awk/sed, etc.
http://www.grymoire.com/Unix/

Resources