I wonder if you can help... I'm currently implementing container scanning on one of my images in Gitlab and want to use grep to search for any CRITICAL vulnerabilities.
So far I have the below but the problem is, with the report it mentions CRITICAL then the number of vulnerabilities found whereas I was hoping to ignore that and look for where it mentions CRITICAL under SEVERITY.
I guess ideally i'd want the grep to work if it found CRITICAL > 0 under the total row but i'm not sure how to do this with grep so any help appreciated!
Code:
if cat REPORT.txt | grep -e 'CRITICAL'; then
echo 'Critical vulnerability found -- fail build' currentBuild.result = 'FAILURE'
else
echo 'All Good'
fi
Report example:
Total: 2 (UNKNOWN: 0, LOW: 1, MEDIUM: 1, HIGH: 0, CRITICAL: 0)
+------------------------------+---------------------+----------+--------------------------+--------------------------+--------------------------------------------------------------+
| LIBRARY | VULNERABILITY ID | SEVERITY | INSTALLED VERSION | FIXED VERSION | TITLE |
+------------------------------+---------------------+----------+--------------------------+--------------------------+--------------------------------------------------------------+
| apt | CVE-2020-3810 | MEDIUM | 1.4.9 | 1.4.10 | Missing input validation in |
| | | | | | the ar/tar implementations of |
| | | | | | APT before version 2.1.2... |
+ +---------------------+----------+ +--------------------------+--------------------------------------------------------------+
| | CVE-2011-3374 | LOW | | | It was found that apt-key |
| | | | | | in apt, all versions, do not |
| | | | | | correctly... |
+------------------------------+---------------------+ +--------------------------+--------------------------+--------------------------------------------------------------+
To return only lines where CRITICAL appears in the column under SEVERITY, try:
awk -F'|' '$4 ~ /CRITICAL/' reports.txt
awk reads input files one line at a time and breaks each line into fields. -F'|' tells awk to use | as the field separator. Consequently, the SEVERITY column is the fourth field and $4 ~ /CRITICAL/ tests to see if that field containst CRITICAL.
Example
Consider this input file (which has one CRITICAL that we want and several that we want to ignore):
$ cat reports.txt
+------------------------------+---------------------+----------+--------------------------+--------------------------+--------------------------------------------------------------+
| LIBRARY | VULNERABILITY ID | SEVERITY | INSTALLED VERSION | FIXED VERSION | TITLE |
+------------------------------+---------------------+----------+--------------------------+--------------------------+--------------------------------------------------------------+
| apt-CRITICAL | CVE-2020-3810 | MEDIUM | 1.4.9 | 1.4.10 | Missing input validation in |
| | | | CRITICAL | | the ar/tar implementations of-CRITICAL |
| | | | | | APT before version 2.1.2... |
+ +---------------------+----------+ +--------------------------+--------------------------------------------------------------+
| | CVE-2011-3374 | CRITICAL | | | It was found that apt-key |
| | | | | | in apt, all versions, do not |
| | | | | | correctly... |
+------------------------------+---------------------+----------+--------------------------+--------------------------+--------------------------------------------------------------+
Our command correctly returns only the line with the CRITICAL severity:
$ awk -F'|' '$4 ~ /CRITICAL/' reports.txt
| | CVE-2011-3374 | CRITICAL | | | It was found that apt-key
Use in an if-statement
We can use awk to set the correct exit-code so that it works correctly in an if statement while producing no extraneous output:
if awk -F'|' -v c=1 '$4 ~ /CRITICAL/{c=0; exit} END{exit c}' reports.txt; then
Related
Good day.
I have two files, vmList and flavorList, the vmList containing the following:
$ cat /tmp/vmList
cf0012vm001| OS-SRV-USG:terminated_at | -
cf0012vm001| accessIPv4 |
cf0012vm001| accessIPv6 |
cf0012vm001| cf0012v_internal_network network | 192.168.210.10
cf0012vm001| created | 2021-09-17T17:21:39Z
cf0012vm001| flavor | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm001| hostId | fcf39100bcc6ae57a8212f97d3251ac43913719f2aebcaa72006956e
cf0012vm001| key_name | -
cf0012vm002| OS-SRV-USG:terminated_at | -
cf0012vm002| accessIPv4 |
cf0012vm002| accessIPv6 |
cf0012vm002| cf0012v_internal_network network | 192.168.210.11
cf0012vm002| created | 2021-09-17T17:21:37Z
cf0012vm002| flavor | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm002| hostId | e1590af8ddd57f1e2e74617d6c3631195e410bdd188a0b59813ffbef
cf0012vm002| id | 0e292900-6b50-4055-9842-d95e54fa1490
and the flavorList containing the following information:
$ cat /tmp/flavorList
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
| ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
| 711f0ff2f01d403689819b6cbab36e42 | nd.c4r8d21s8e21 | 8192 | 21 | 21 | 8192 | 4 | | N/A |
| 78a70b62efae4fbcb35994aeb0f87678 | nd.c8r16d31s8e31 | 16384 | 31 | 31 | 8192 | 8 | | N/A |
| 78f4fe71cc3340a59c62fc0b32d81e3f | nd.c4r16d100 | 16384 | 100 | 0 | | 4 | | N/A |
| 7a7e6ae4bfe34ac4ab3983b8f764a8ce | nd.c2r8d40 | 8192 | 40 | 0 | | 2 | | N/A |
| 832169fed2244bb6b1739ab3db0f232e | nd.c1r4d100 | 4096 | 100 | 0 | | 1 | | N/A |
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
| 8e968623e5c44674b33e1cc1f892e32d | nd.c9r40d50 | 40960 | 50 | 0 | | 9 | | N/A |
| 8e96a7044566406f9ef7eba48c2a8c55 | nd.c5r4d81 | 4096 | 81 | 0 | | 5 | | N/A |
| 8fd07e2004f84658a76af1cd8b9cea43 | nd.c2r8d50 | 8192 | 50 | 0 | | 2 | | N/A |
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
My goal is to find the 'flavor' in the vmList, then grep the flavor value (nd.c8r16d50e60) from the flavorList, which in itself works:
$ for f in `grep flavor /tmp/vmList|awk '{print $4}'`;do grep ${f} /tmp/flavorList;done
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
However, I would like to add the first parameter from the vmList (cf0012vm001 and cf0012vm002) to precede the output, either in a line above the output or in front of the line:
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
or even:
cf0012vm001
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
Please advise.
Bjoern
Assumptions:
a flavor does not contain spaces
a specific ordering of the output has not be stated
vmList: column/field #1 could be associated with different flavors [NOTE: not supported by sample data set; OP would need to refute/confirm]
One GNU awk idea that uses an array of arrays:
awk -F'|' ' # input field delimiter = "|" for both files
FNR==NR { # for 1st file ...
name=gensub(/ /,"","g",$2) # remove all spaces from field #2 and save in awk variable "name"
if (name == "flavor") { # if field #2 == "flavor" ...
split($3,arr,"(") # split field #3 using "(" as delimiter, storing results in array arr[]
gsub(" ","",arr[1]) # remove all spaces from first array entry
flavors[arr[1]] # keep track of unique flavors
col1[arr[1]][$1] # keep track of associated values from column/field #1
}
next
}
FNR>3 { # for 2nd file, after ignoring first 3 lines ...
if (NF == 1) next # skip line if it only has 1 "|" delimited field
name=gensub(/ /,"","g",$3) # remove all spaces from field #3 and save in awk variable "name"
if (name in flavors) # if name is in our list of flavors ...
for (i in col1[name]) # loop through list of columns (from 1st file)
print i,$0 # print column (from 1st file) plus current line
}
' vmList flavorList
This generates:
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
NOTE: while this output appears to be sorted by the first column this is merely a coincidence; if a specific order needs to be guaranteed this can likely be done by adding an appropriate PROCINFO["sorted_in"] entry; OP just needs to state the desired ordering
Would you please try the following:
echo "VM Name | ID | Flavor Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |"
echo "------------+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+"
awk -F '[[:blank:]]*\\|[[:blank:]]*' '
NR==FNR && $2=="flavor" {sub(/[[:blank:]].+/, "", $3); a[$1]=$3; next}
{
for (i in a) {
if (a[i] == $3) print i " " $0
}
}
' /tmp/vmList /tmp/flavorList | sort -k1.9,1.11n
Output:
VM Name | ID | Flavor Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
------------+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
The field separator [[:blank:]]*\\|[[:blank:]]* splits the record
on the pipe character with preceding / following blank characters if any.
The condition NR==FNR && $2=="flavor" matches the flavor line
in vmList.
The statement sub(/[[:blank:]].+/, "", $3) extracts the nd.xxx
field by removing the substring after the blank character.
a[$1]=$3 stores the nd.xxx field keyed by the 1st cfxxx field.
The final for (i in a) loop prints the matched lines in flavorList with prepending the cfxxx field.
sort -k1.9,1.11n sorts the output by the substring from the 1st field 9th character to the 1st field 11th character. The trailing n option specifies the numerical sort.
I am checking the stats of background process by below command:
select * from pg_stat_bgwriter ;
But after resetting the stats by command:
select pg_stat_reset() ;
I am expecting the column stats_reset return the time at which stats reset, but it shows the very old time. Any idea or guidance on this ?
Example output:
checkpoints_timed | checkpoints_req | checkpoint_write_time |
checkpoint_sync_time | buffers_checkpoint | buffers_clean |
maxwritten_c lean | buffers_backend | buffers_backend_fsync |
buffers_alloc | stats_reset
-------------------+-----------------+-----------------------+----------------------+--------------------+---------------+-------------
-----+-----------------+-----------------------+---------------+-------------------------------
2525 | 9 | 193751796 | 322501 | 3162662 | 30839 | 176 | 451310 |
0 | 4120735 | 2016-09-27 08:32:43.638545-05
Thanks
Got the answer in the documentation : Doc- link
bgwriter is the shared among all the databases so it can be reset by different function by the below command:
pg_stat_reset_shared('bgwriter') ;
I found this command {print NF} to show total number of columns:
$ nova list | awk '{print NF}' | sort -n | uniq
1
9
10
But I wish to print for every column their number.
See example with field separator |:
$ nova list | head
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
1 2 3 4
Let just understand what awk is doing here
nova list | awk '{print NF}' | sort -n | uniq
here in awk '{printf NF}' NF is number of fields where the field separator is taken as space. So in output below NF=9 (count pipe '|' symbol also).
| ID | Name | Status | Networks |
and same goes for below data line
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
In your output you are getting 1 and 10 also some line in nova list command output must have single field or 10 fields.
Now coming to your problem you wish to print field with its NF or field value with its NF value.
nova list | awk '{for(I=1;I<=NF;I++){printf I"-"$I" "}printf "\n"}'
It will not print the field number in last of the file but with the data.
Perl to the rescue:
nova list | \
perl -ne 'print;
$s = $_ if /\|/;
}{
$s =~ s/[^|]/ /g;
$s =~ s/\|/++$i/ge;
print " $s\n"
'
-n reads the input line by line
each line is printed and remembered in $s if it contains | (to skip the final border)
when the input ends }{, everything that's not a | is replaced by a space
all | are replaced by numbers
the result is printed
More tweaking needed if the number of columns > 10 (numbers get wider than 1 char):
$s =~ s/(??{" {".((length(0+$i))-1)."}"})\|/++$i/ge;
nova list | {
read line; echo "$line" # read and print the first line
read header; echo "$header" # read, remember and print the 2nd line
cat # all the rest of the nova list output
# then, use the header, and transform the words into numbers
echo "$header" | perl -pe 's/(\w+)/ sprintf "%-*d", length($1), ++$n /ge';
}
output
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
| 1 | 2 | 3 | 4 |
Actually, I can put that all is a quick perl script:
echo "$nova_list" | perl -ne '
$header = $_ if $. == 2;
print;
} {
$header =~ s/(\w+)/ sprintf "%-*d", length($1), ++$n /ge;
print $header;
'
You can get the last line of the file and "clean" it so that it becomes the footer with the field numbers:
nova list | awk -F"|" '1;
END {gsub (/[^|]/," ")
for(i=1;i<=NF;i++)
sub(/\| /, " " i)
gsub(/\|/," ")
print
}'
This:
replaces everything but | with a space.
replaces all strings "| " with an autoincremented number.
replaces the trailing |.
prints the result.
Test
$ cat a
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
See output:
$ awk -F"|" '1; END {line=$0; fields=NF; gsub (/[^\|]/," "); for(i=1;i<=fields;i++) sub(/\| /, " " i); gsub(/\|/," "); print}' a
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
1 2 3 4
Or keeping the field separators:
$ awk -F"|" '1; END {line=$0; fields=NF; gsub (/[^\|]/," "); for(i=1;i<=fields;i++) sub(/\| /, "| " i); print}' a
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+-----------------------------------------+--------+----------------------------------------------+
| 45bd0bc3-96b4-4193-ae76-59115b4ee528 | rert | ACTIVE | netblock5=192.168.0.10 |
| 6682aa37-b766-437e-9b16-ce1076ce2410 | test5 | ACTIVE | netblock5=192.168.0.110 |
| 6f08fcf3-ea71-4f33-a01a-9b0712385511 | test2 | ACTIVE | netblock5=192.168.0.111 |
| 8f628408-1ace-4792-85b6-e134fe1f07cb | test55 | ACTIVE | netblock5=192.168.0.52, 192.168.222.46 |
| 458aa8cb-42c2-4aa6-ab30-c6858bcd85f3 | derggdre | ACTIVE | netblock5=192.168.0.63, 192.168.222.49 |
| 67f4bd0c-0e4d-4ba1-8765-dc7d7831c8f8 | dgrfdrf | ACTIVE | netblock5=192.168.1.86 |
| 846ffa7d-76a4-4c70-8d82-23b5a205ad77 | ttttt | ACTIVE | netblock5=192.168.1.27 |
| 1 | 2 | 3 | 4 |
how to join to files with awk/sed/grep/bash similar to SQL JOIN?
I have a file that looks like this:
and another one that looks like this:
i've also a text version of the image above:
+----------+------------------+------+------------+----+---------------------------------------------------+---------------------------------------------------+-----+-----+-----+------+-------+-------+--------------+------------+--+--+---+---+----+--+---+---+----+------------+------------+------------+------------+
| 21548598 | DSND001906102.2 | 0107 | 001906102 | 02 | FROZEN / O.S.T. | FROZEN / O.S.T. | 001 | 024 | | | 11.49 | 13.95 | 050087295745 | 11/25/2013 | | | N | N | 30 | | 1 | E | 1 | 10/07/2013 | 02/27/2014 | 10/07/2013 | 10/07/2013 |
| 25584998 | WD1194190DVD | 0819 | 1194190 | 18 | FROZEN / (WS DOL DTS) | FROZEN / (WS DOL DTS) | 050 | 110 | | G | 21.25 | 29.99 | 786936838961 | 03/18/2014 | | | N | N | 0 | | 1 | A | 2 | 12/20/2013 | 03/13/2014 | 12/20/2013 | 12/20/2013 |
| 25812794 | WHV1000292717BR | 0526 | 1000292717 | BR | GRAVITY / (UVDC) | GRAVITY / (UVDC) | 050 | 093 | | PG13 | 29.49 | 35.99 | 883929244577 | 02/25/2014 | | | N | N | 30 | | 1 | E | 3 | 01/16/2014 | 02/11/2014 | 01/16/2014 | 01/16/2014 |
| 24475594 | SNY303251.2 | 0085 | 303251 | 02 | BEYONCE | BEYONCE | 001 | 004 | | | 14.99 | 17.97 | 888430325128 | 12/20/2013 | | | N | N | 30 | | 1 | A | 4 | 12/19/2013 | 01/02/2014 | 12/19/2013 | 12/19/2013 |
| 25812787 | WHV1000284958DVD | 0526 | 1000284958 | 18 | GRAVITY (2PC) / (UVDC SPEC 2PK) | GRAVITY (2PC) / (UVDC SPEC 2PK) | 050 | 093 | | PG13 | 21.25 | 28.98 | 883929242528 | 02/25/2014 | | | N | N | 30 | | 1 | E | 5 | 01/16/2014 | 02/11/2014 | 01/16/2014 | 01/16/2014 |
| 21425462 | PBSDMST64400DVD | E349 | 64400 | 18 | MASTERPIECE CLASSIC: DOWNTON ABBEY SEASON 4 (3PC) | MASTERPIECE CLASSIC: DOWNTON ABBEY SEASON 4 (3PC) | 050 | 095 | 094 | | 30.49 | 49.99 | 841887019705 | 01/28/2014 | | | N | N | 30 | | 1 | A | 6 | 09/06/2013 | 01/15/2014 | 09/06/2013 | 09/06/2013 |
| 25584974 | WD1194170BR | 0819 | 1194170 | BR | FROZEN (2PC) (W/DVD) / (WS AC3 DTS 2PK DIGC) | FROZEN (2PC) (W/DVD) / (WS AC3 DTS 2PK DIGC) | 050 | 110 | | G | 27.75 | 39.99 | 786936838923 | 03/18/2014 | | | N | N | 0 | | 2 | A | 7 | 12/20/2013 | 03/13/2014 | 01/15/2014 | 01/15/2014 |
| 21388262 | HBO1000394029DVD | 0203 | 1000394029 | 18 | GAME OF THRONES: SEASON 3 | GAME OF THRONES: SEASON 3 | 050 | 095 | 093 | | 47.99 | 59.98 | 883929330713 | 02/18/2014 | | | N | N | 30 | | 1 | E | 8 | 08/29/2013 | 02/28/2014 | 08/29/2013 | 08/29/2013 |
| 25688450 | WD11955700DVD | 0819 | 11955700 | 18 | THOR: THE DARK WORLD / (AC3 DOL) | THOR: THE DARK WORLD / (AC3 DOL) | 050 | 093 | | PG13 | 21.25 | 29.99 | 786936839500 | 02/25/2014 | | | N | N | 30 | | 1 | A | 9 | 12/24/2013 | 02/20/2014 | 12/24/2013 | 12/24/2013 |
| 23061316 | PRT359054DVD | 0818 | 359054 | 18 | JACKASS PRESENTS: BAD GRANDPA / (WS DUB SUB AC3) | JACKASS PRESENTS: BAD GRANDPA / (WS DUB SUB AC3) | 050 | 110 | | R | 21.75 | 29.98 | 097363590545 | 01/28/2014 | | | N | N | 30 | | 1 | E | 10 | 12/06/2013 | 03/12/2014 | 12/06/2013 | 12/06/2013 |
| 21548611 | DSND001942202.2 | 0107 | 001942202 | 02 | FROZEN / O.S.T. (BONUS CD) (DLX) | FROZEN / O.S.T. (BONUS CD) (DLX) | 001 | 024 | | | 14.09 | 19.99 | 050087299439 | 11/25/2013 | | | N | N | 30 | | 1 | E | 11 | 10/07/2013 | 02/06/2014 | 10/07/2013 | 10/07/2013 |
+----------+------------------+------+------------+----+---------------------------------------------------+---------------------------------------------------+-----+-----+-----+------+-------+-------+--------------+------------+--+--+---+---+----+--+---+---+----+------------+------------+------------+------------+
The 2nd column from the first file can be joined to the 14th column of the second file!
here's what i've been trying to do:
join <(sort awk -F"\t" '{print $14,$12}' aecprda12.tab) <(sort awk -F"\t" '{print $2,$1}' output1.csv)
but i am getting these errors:
$ join <(sort awk -F"\t" '{print $14,$12}' aecprda12.tab) <(sort awk -F"\t" '{print $2,$1}' output1.csv)
sort: unknown option -- F
Try sort --help' for more information.
sort: unknown option -- F
Try sort --help' for more information.
-700476409 [waitproc] -bash 10336 sig_send: error sending signal 20 to pid 10336, pipe handle 0x84, Win32 error 109
the output i would like would be something like this:
+-------+-------+---------------+
| 12.99 | 14.77 | 3383510002151 |
| 13.97 | 17.96 | 3383510002175 |
| 13.2 | 13 | 3383510002267 |
| 13.74 | 14.19 | 3399240165349 |
| 9.43 | 9.52 | 3399240165363 |
| 12.99 | 4.97 | 3399240165479 |
| 7.16 | 7.48 | 3399240165677 |
| 11.24 | 9.43 | 4011550620286 |
| 13.86 | 13.43 | 4260182980316 |
| 13.98 | 12.99 | 4260182980507 |
| 10.97 | 13.97 | 4260182980514 |
| 11.96 | 13.2 | 4260182980545 |
| 15.88 | 13.74 | 4260182980552 |
+-------+-------+---------------+
what am i doing wrong?
You can do all the work in join and sort
join -1 2 -2 14 -t $'\t' -o 2.12,1.1,0 \
<( sort -t $'\t' -k 2,2 output1.csv ) \
<( sort -t $'\t' -k 14,14 aecprda12.tab )
Notes:
$'\t' is a bash ANSI-C quoted string which is a tab character: neither join nor sort seem to recognize the 2-character string "\t" as a tab
-k col,col sorts the file on the specified column
join has several options to control how it works; see the join(1) man page.
sort awk -F...
is not a valid command; it means sort a file named awk but of course, like the error message says, there is no -F option to sort. The syntax you are looking for is
awk -F ... | sort
However, you might be better off doing the joining in Awk directly.
awk -F"\t" 'NR==FNR{k[$14]=$12; next}
k[$2] { print $2, $1, k[$2] }' aecprda12.tab output1.csv
I am assuming that you don't know whether every item in the first file has a corresponding item in the second file - and that you want only "matching" items. There is indeed a good way to do this in awk. Create the following script (as a text file, call it myJoin.txt):
BEGIN {
FS="\t"
}
# loop around as long as the total number of records read
# is equal to the number of records read in this file
# in other words - loop around the first file only
NR==FNR {
a[$2]=$1 # create one array element for each $1/$2 pair
next
}
# loop around all the elements of the second file:
# since we're done processing the first file
{
# see if the associative array element exists:
gsub(/ /,"",$14) # trim leading/ trailing spaces
if (a[$14]) { # see if the value in $14 was seen in the first file
# print out the three values you care about:
print $12 " " a[$14] " " $14
}
}
Now execute this with
awk -f myJoin.txt file1 file2
Seems to work for me...
I have a bash script that gives me counts of files in all of the directories recursively that were edited in the last 45 days
find . -type f -mtime -45| rev | cut -d . -f1 | rev | sort | uniq -ic | sort -rn
I have a directory called
\parent
and in parent I have:
\parent\a
\parent\b
\parent\c
I would run the above script once on folder a, once on b and once on c.
The current output is:
91 xls
85 xlsx
49 doc
46 db
31 docx
24 jpg
22 pub
10 pdf
4 msg
2 xml
2 txt
1 zip
1 thmx
1 htm
1 /ic
I would like to run the script from \parent on all the folders inside \parent and get an output like this:
+-------+------+--------+
| count | ext | folder |
+-------+------+--------+
| 91 | xls | a |
| 85 | xlsx | a |
| 49 | doc | a |
| 46 | db | a |
| 31 | docx | a |
| 24 | jpg | a |
| 22 | pub | a |
| 10 | pdf | a |
| 4 | msg | a |
| 98 | jpg | b |
| 92 | pub | b |
| 62 | pdf | b |
| 2 | xml | b |
| 2 | txt | b |
| 1 | zip | b |
| 1 | thmx | b |
| 1 | htm | b |
| 1 | /ic | b |
| 66 | txt | c |
| 48 | msg | c |
| 44 | xml | c |
| 30 | zip | c |
| 12 | doc | c |
| 6 | db | c |
| 6 | docx | c |
| 3 | jpg | c |
+-------+------+--------+
How can I accomplish this with bash?
Put it into a script, make it executable: chmod +x script.sh and run it with: ./script.sh
#!/bin/sh
find . -type f -mtime -45 2>/dev/null \
| sed 's|^\./\([^/]*\)/|\1/|; s|/.*/|/|; s|/.*.\.| |p; d' \
| sort | uniq -ic \
| sort -b -k2,2 -k1,1rn \
| awk '
BEGIN{
sep = "+-------+------+--------+"
print sep "\n| count | ext | folder |\n" sep
}
{ printf("| %5d | %-4s | %-6s |\n", $1, $3, $2) }
END{ print sep }'
sed 's|^\./\([^/]*\)/|\1/|; s|/.*/|/|; s|/.*.\.| |p; d'
s|^\./\([^/]*\)/.*/|\1 | substitutes ./a/file.xls with a/file.xls.
s|/.*/|/| substitutes b/some/dir/file.mp3 with b/file.mp3.
s|/.*.\.| |p substitutes a file.xls with a xls, if s///p is successful then it also prints to standard out, (to avoid files without extension).
d deletes the line (to avoid printing matching (again) or non-matching lines).
sort | uniq -ic counts each group of extension and directory name.
sort -b -k2,2 -k1,1rn sorts first by directory (field 2), small -> large, and then by count (field 1) in reverse order (large -> small) and numerically. -b makes sort(1) ignore blanks (spaces/tabs).
the last awk part pretty prints the output, maybe you want to put this into a separate script.
If you want to see how each pipe filters the results just try to remove each and you will see the output.
Here you can find good tutorials about sh/awk/sed, etc.
http://www.grymoire.com/Unix/