How to compare the lines in two files in the columns ID?

How to compare the lines in two files in the columns ID? - linux

I have two files with dynamic length from 1 to 30 lines, and these data:
[File1]
Time | Name | Name | ID1 | ID2
10:50 | Volume | Xxx | 55 | 65
12:50 | Kate | Uh | 35 | 62
15:50 | Maria | Zzz | 38 | 67
15:50 | Alex | Web | 38 | 5
...
[File2]
Time | Name | Name | ID1 | ID2
10:50 | Les | Xxx | 31 | 75
15:50 | Alex | Web | 38 | 5
...
How to compare two files [only ID1 and ID2 columns]: [File1] and [File2] to all first lines of the file {File1] compared with all lines of {File2].
If data exists in both files saved to a file [File3] data adding character *
In addition to the file {File3] have hit other data from [File1].
Result:
[File3]
Time | Name | Name | ID1 | ID2
15:50 | Alex | Web | * 38 | 5
10:50 | Volume | Xxx | 55 | 65
12:50 | Kate | Uh | 35 | 62
15:50 | Maria | Zzz | 38 | 67

Using awk
awk 'BEGIN{t="Time | Name | Name | ID1 | ID2"}
FNR==1{next}
NR==FNR{a[$4 FS $5];next}
{ if ($4 FS $5 in a)
{$4="*"$4;t=t RS $0}
else{s=s==""?$0:s RS $0}
}
END{print t RS s}' FS=\| OFS=\| file2 file1
Time | Name | Name | ID1 | ID2
15:50 | Alex | Web |* 38 | 5
10:50 | Volume | Xxx | 55 | 65
12:50 | Kate | Uh | 35 | 62
15:50 | Maria | Zzz | 38 | 67
Explanation
BEGIN{t="Time | Name | Name | ID1 | ID2"} # set the title
FNR==1{next} # ignore the title, FNR is the current record number in the current file.for each file
NR==FNR{a[$4 FS $5];next} # record the $4 and $5 into Associative array a
{ if ($4 FS $5 in a)
{$4="*"$4;t=t RS $0} # if found in file1, mark the $4 with start "*" and attach to var t
else{s=s==""?$0:s RS $0} # if not found, attach to var s
{print t RS s} # print the result.

Related

Using awk and/or grep for two columns from one file1 and grep column 2 value from file2 while inserting column1 from file 1 before column 1 in file2

Good day.
I have two files, vmList and flavorList, the vmList containing the following:
$ cat /tmp/vmList
cf0012vm001| OS-SRV-USG:terminated_at | -
cf0012vm001| accessIPv4 |
cf0012vm001| accessIPv6 |
cf0012vm001| cf0012v_internal_network network | 192.168.210.10
cf0012vm001| created | 2021-09-17T17:21:39Z
cf0012vm001| flavor | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm001| hostId | fcf39100bcc6ae57a8212f97d3251ac43913719f2aebcaa72006956e
cf0012vm001| key_name | -
cf0012vm002| OS-SRV-USG:terminated_at | -
cf0012vm002| accessIPv4 |
cf0012vm002| accessIPv6 |
cf0012vm002| cf0012v_internal_network network | 192.168.210.11
cf0012vm002| created | 2021-09-17T17:21:37Z
cf0012vm002| flavor | nd.c8r16d50e60 (89ba4c986a28447aa27de65bca986db1)
cf0012vm002| hostId | e1590af8ddd57f1e2e74617d6c3631195e410bdd188a0b59813ffbef
cf0012vm002| id | 0e292900-6b50-4055-9842-d95e54fa1490
and the flavorList containing the following information:
$ cat /tmp/flavorList
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
| ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
| 711f0ff2f01d403689819b6cbab36e42 | nd.c4r8d21s8e21 | 8192 | 21 | 21 | 8192 | 4 | | N/A |
| 78a70b62efae4fbcb35994aeb0f87678 | nd.c8r16d31s8e31 | 16384 | 31 | 31 | 8192 | 8 | | N/A |
| 78f4fe71cc3340a59c62fc0b32d81e3f | nd.c4r16d100 | 16384 | 100 | 0 | | 4 | | N/A |
| 7a7e6ae4bfe34ac4ab3983b8f764a8ce | nd.c2r8d40 | 8192 | 40 | 0 | | 2 | | N/A |
| 832169fed2244bb6b1739ab3db0f232e | nd.c1r4d100 | 4096 | 100 | 0 | | 1 | | N/A |
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
| 8e968623e5c44674b33e1cc1f892e32d | nd.c9r40d50 | 40960 | 50 | 0 | | 9 | | N/A |
| 8e96a7044566406f9ef7eba48c2a8c55 | nd.c5r4d81 | 4096 | 81 | 0 | | 5 | | N/A |
| 8fd07e2004f84658a76af1cd8b9cea43 | nd.c2r8d50 | 8192 | 50 | 0 | | 2 | | N/A |
+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
My goal is to find the 'flavor' in the vmList, then grep the flavor value (nd.c8r16d50e60) from the flavorList, which in itself works:
$ for f in `grep flavor /tmp/vmList|awk '{print $4}'`;do grep ${f} /tmp/flavorList;done
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
However, I would like to add the first parameter from the vmList (cf0012vm001 and cf0012vm002) to precede the output, either in a line above the output or in front of the line:
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
or even:
cf0012vm001
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002
| 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
Please advise.
Bjoern

Assumptions:
a flavor does not contain spaces
a specific ordering of the output has not be stated
vmList: column/field #1 could be associated with different flavors [NOTE: not supported by sample data set; OP would need to refute/confirm]
One GNU awk idea that uses an array of arrays:
awk -F'|' ' # input field delimiter = "|" for both files
FNR==NR { # for 1st file ...
name=gensub(/ /,"","g",$2) # remove all spaces from field #2 and save in awk variable "name"
if (name == "flavor") { # if field #2 == "flavor" ...
split($3,arr,"(") # split field #3 using "(" as delimiter, storing results in array arr[]
gsub(" ","",arr[1]) # remove all spaces from first array entry
flavors[arr[1]] # keep track of unique flavors
col1[arr[1]][$1] # keep track of associated values from column/field #1
}
next
}
FNR>3 { # for 2nd file, after ignoring first 3 lines ...
if (NF == 1) next # skip line if it only has 1 "|" delimited field
name=gensub(/ /,"","g",$3) # remove all spaces from field #3 and save in awk variable "name"
if (name in flavors) # if name is in our list of flavors ...
for (i in col1[name]) # loop through list of columns (from 1st file)
print i,$0 # print column (from 1st file) plus current line
}
' vmList flavorList
This generates:
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
NOTE: while this output appears to be sorted by the first column this is merely a coincidence; if a specific order needs to be guaranteed this can likely be done by adding an appropriate PROCINFO["sorted_in"] entry; OP just needs to state the desired ordering

Would you please try the following:
echo "VM Name | ID | Flavor Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |"
echo "------------+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+"
awk -F '[[:blank:]]*\\|[[:blank:]]*' '
NR==FNR && $2=="flavor" {sub(/[[:blank:]].+/, "", $3); a[$1]=$3; next}
{
for (i in a) {
if (a[i] == $3) print i " " $0
}
}
' /tmp/vmList /tmp/flavorList | sort -k1.9,1.11n
Output:
VM Name | ID | Flavor Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
------------+--------------------------------------+------------------+-----------+------+-----------+-------+-------+-------------+-----------+
cf0012vm001 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
cf0012vm002 | 89ba4c986a28447aa27de65bca986db1 | nd.c8r16d50e60 | 16384 | 50 | 60 | | 8 | | N/A |
The field separator [[:blank:]]*\\|[[:blank:]]* splits the record
on the pipe character with preceding / following blank characters if any.
The condition NR==FNR && $2=="flavor" matches the flavor line
in vmList.
The statement sub(/[[:blank:]].+/, "", $3) extracts the nd.xxx
field by removing the substring after the blank character.
a[$1]=$3 stores the nd.xxx field keyed by the 1st cfxxx field.
The final for (i in a) loop prints the matched lines in flavorList with prepending the cfxxx field.
sort -k1.9,1.11n sorts the output by the substring from the 1st field 9th character to the 1st field 11th character. The trailing n option specifies the numerical sort.

How use grep for that complicated expressions?

+----+-------+-----+
| ID | STORE | QTY |
+----+-------+-----+
| | | |
| 9 | 101 | 18 |
| | | |
| 8 | 154 | 19 |
| | | |
| 7 | 111 | 13 |
| | | |
| 9 | 154 | 18 |
| | | |
| 8 | 101 | 19 |
| | | |
| 7 | 101 | 13 |
| | | |
| 9 | 111 | 18 |
| | | |
| 8 | 111 | 19 |
| | | |
| 7 | 154 | 14 |
+----+-------+-----+
Suppose that I have 3 stores, and I'd like to take STORE for every id which qty is the same for every store.
e.g id 9 is in 3 stores, in every store has 18 qty,
but id 7 is in stores but in only two store has equal qty (in store 111 and 101 - in 154 - id has 14 qty); how can I get that result using grep?
Do you think that is impossible to get that one in one expressions? I thought about regex but I don't know in which way I get Qty and compare to another row. In my file it looks like:

Extract the first and last columns by cut, count the number of uniq combinations, and output only those whose count is 3 (i.e. the value is the same for all three stores):
$ cut -d\| -f2,4 | sort | uniq -c | grep '^ *3 '
3 8 | 19
3 9 | 18

excel I need formula in column name "FEBRUARY"

I have a set of data as below.
SHEET 1
+------+-------+
| JANUARY |
+------+-------+
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | ALFRED | 11 | 150 |
| 2 | ARIS | 22 | 120 |
| 3 | JOHN | 33 | 170 |
| 4 | CHRIS | 22 | 190 |
| 5 | JOE | 55 | 120 |
| 6 | ACE | 11 | 200 |
+----+----------+------+-------+
SHEET2
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | CHRIS | 13 | 123 |
| 2 | ACE | 26 | 165 |
| 3 | JOE | 39 | 178 |
| 4 | ALFRED | 21 | 198 |
| 5 | JOHN | 58 | 112 |
| 6 | ARIS | 11 | 200 |
+----+----------+------+-------+
The RESULT should look like this in sheet1 :
+------+-------++------+-------+
| JANUARY | FEBRUARY |
+------+-------++------+-------+
+----+----------+------+-------++-------+-------+
| ID | NAME |COUNT | PRICE || COUNT | PRICE |
+----+----------+------+-------++-------+-------+
| 1 | ALFRED | 11 | 150 || 21 | 198 |
| 2 | ARIS | 22 | 120 || 11 | 200 |
| 3 | JOHN | 33 | 170 || 58 | 112 |
| 4 | CHRIS | 22 | 190 || 13 | 123 |
| 5 | JOE | 55 | 120 || 39 | 178 |
| 6 | ACE | 11 | 200 || 26 | 165 |
+----+----------+------+-------++-------+-------+
I need formula in column name "FEBRUARY". this formula will find its match in sheet 2

Assuming the first Count value should go in cell E3 of Sheet1, the following formula would be the usual way of doing it:-
=INDEX(Sheet2!C:C,MATCH($B3,Sheet2!$B:$B,0))
Then the Price (in F3) would be given by
=INDEX(Sheet2!D:D,MATCH($B3,Sheet2!$B:$B,0))

I think this query will work fine for your requirement
SELECT `Sheet1$`.ID,`Sheet1$`.NAME, `Sheet1$`.COUNT AS 'Jan-COUNT',`Sheet1$`.PRICE AS 'Jan-PRICE', `Sheet2$`.COUNT AS 'Feb-COUNT',`Sheet2$`.PRICE AS 'Feb-PRICE'
FROM `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet1$` `Sheet1$`, `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet2$` `Sheet2$`
WHERE (`Sheet1$`.NAME=`Sheet2$`.NAME)
Provide Actual path insted of
C:\Users\Nagendra\Desktop\aaaaa.xlsx
First you need to know about how to make connection. So refer http://smallbusiness.chron.com/use-sql-statements-ms-excel-41193.html

Performing file type counting in all directories

I have a bash script that gives me counts of files in all of the directories recursively that were edited in the last 45 days
find . -type f -mtime -45| rev | cut -d . -f1 | rev | sort | uniq -ic | sort -rn
I have a directory called
\parent
and in parent I have:
\parent\a
\parent\b
\parent\c
I would run the above script once on folder a, once on b and once on c.
The current output is:
91 xls
85 xlsx
49 doc
46 db
31 docx
24 jpg
22 pub
10 pdf
4 msg
2 xml
2 txt
1 zip
1 thmx
1 htm
1 /ic
I would like to run the script from \parent on all the folders inside \parent and get an output like this:
+-------+------+--------+
| count | ext | folder |
+-------+------+--------+
| 91 | xls | a |
| 85 | xlsx | a |
| 49 | doc | a |
| 46 | db | a |
| 31 | docx | a |
| 24 | jpg | a |
| 22 | pub | a |
| 10 | pdf | a |
| 4 | msg | a |
| 98 | jpg | b |
| 92 | pub | b |
| 62 | pdf | b |
| 2 | xml | b |
| 2 | txt | b |
| 1 | zip | b |
| 1 | thmx | b |
| 1 | htm | b |
| 1 | /ic | b |
| 66 | txt | c |
| 48 | msg | c |
| 44 | xml | c |
| 30 | zip | c |
| 12 | doc | c |
| 6 | db | c |
| 6 | docx | c |
| 3 | jpg | c |
+-------+------+--------+
How can I accomplish this with bash?

Put it into a script, make it executable: chmod +x script.sh and run it with: ./script.sh
#!/bin/sh
find . -type f -mtime -45 2>/dev/null \
| sed 's|^\./\([^/]*\)/|\1/|; s|/.*/|/|; s|/.*.\.| |p; d' \
| sort | uniq -ic \
| sort -b -k2,2 -k1,1rn \
| awk '
BEGIN{
sep = "+-------+------+--------+"
print sep "\n| count | ext | folder |\n" sep
}
{ printf("| %5d | %-4s | %-6s |\n", $1, $3, $2) }
END{ print sep }'
sed 's|^\./\([^/]*\)/|\1/|; s|/.*/|/|; s|/.*.\.| |p; d'
s|^\./\([^/]*\)/.*/|\1 | substitutes ./a/file.xls with a/file.xls.
s|/.*/|/| substitutes b/some/dir/file.mp3 with b/file.mp3.
s|/.*.\.| |p substitutes a file.xls with a xls, if s///p is successful then it also prints to standard out, (to avoid files without extension).
d deletes the line (to avoid printing matching (again) or non-matching lines).
sort | uniq -ic counts each group of extension and directory name.
sort -b -k2,2 -k1,1rn sorts first by directory (field 2), small -> large, and then by count (field 1) in reverse order (large -> small) and numerically. -b makes sort(1) ignore blanks (spaces/tabs).
the last awk part pretty prints the output, maybe you want to put this into a separate script.
If you want to see how each pipe filters the results just try to remove each and you will see the output.
Here you can find good tutorials about sh/awk/sed, etc.
http://www.grymoire.com/Unix/

Script to delete all /n number of lines starting from a word except last line

How to delete all lines below a word except last line in a file. suppose i have a file which contains
| 02/04/2010 07:24:20 | 20-24 | 26 | 13 | 2.60 |
| 02/04/2010 07:24:25 | 25-29 | 6 | 3 | 0.60 |
+---------------------+-------+------------+----------+-------------+
02-04-2010-07:24 --- ER GW 03
+---------------------+-------+------------+----------+-------------+
| date | sec | BOTH_MO_MT | MO_or_MT | TPS_PER_SEC |
+---------------------+-------+------------+----------+-------------+
| 02/04/2010 07:00:00 | 00-04 | 28 | 14 | 2.80 |
| 02/04/2010 07:00:05 | 05-09 | 27 | 14 | 2.70 |
...
...
...
...
END OF TPS PER 5 REPORT
and I need to delete all contents from "02-04-2010-07:24 --- ER GW 03" except "END OF TPS PER 5 REPORT" and save the file.
This has to be done for around 700 files. all files are same format, with datemonthday filename.

sed -ni '/ER GW/ b end; p; d; :end $p; n; b end' $file
$file should be the filename. E.g.:
for file in *.txt ; do
sed -ni '/ER GW/ b end; p; d; :end $p; n; b end' $file
done

The following awk script will do it:
awk '
/^02-04-2010-07:24 --- ER GW 03$/ {skip=1}
{ln=$0;if (skip!=1){print}}
END {if (skip==1){print $ln}}'
as shown in the following transcript:
$ echo '| 02/04/2010 07:24:20 | 20-24 | 26 | 13 | 2.60 |
| 02/04/2010 07:24:25 | 25-29 | 6 | 3 | 0.60 |
+---------------------+-------+------------+----------+-------------+
02-04-2010-07:24 --- ER GW 03
+---------------------+-------+------------+----------+-------------+
| date | sec | BOTH_MO_MT | MO_or_MT | TPS_PER_SEC |
+---------------------+-------+------------+----------+-------------+
| 02/04/2010 07:00:00 | 00-04 | 28 | 14 | 2.80 |
| 02/04/2010 07:00:05 | 05-09 | 27 | 14 | 2.70 |
...
...
...
...
END OF TPS PER 5 REPORT' | awk '
/^02-04-2010-07:24 --- ER GW 03$/ {skip=1}
{ln=$0;if (skip!=1){print}}
END {if (skip==1){print $ln}}'
which produces:
| 02/04/2010 07:24:20 | 20-24 | 26 | 13 | 2.60 |
| 02/04/2010 07:24:25 | 25-29 | 6 | 3 | 0.60 |
+---------------------+-------+------------+----------+-------------+
END OF TPS PER 5 REPORT
as requested.
Breaking it down:
skip is initially 0 (false).
if you find a line you want to start skipping from, set skip to 1 (true) - change this pattern where necessary.
if skip is false, output the line.
regardless of skip, store the last line.
at the end, is skip is true, output the last line (sjip check prevents double print).
For doing it to multiple files, you can just use for:
for fspec in *.txt ; do
awk 'blah blah' <${fspec} >${fspec}.new
done
The command required for your update in the comment (searching for "--- ER GW 03") is:
awk '
/--- ER GW 03/ {skip=1}
{ln=$0;if (skip!=1){print}}
END {if (skip==1){print $ln}}'

This might work for you:
sed -i '$q;/^02-04-2010-07:24 --- ER GW 03/,$d' *.txt

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to compare the lines in two files in the columns ID? - linux

Related

Using awk and/or grep for two columns from one file1 and grep column 2 value from file2 while inserting column1 from file 1 before column 1 in file2

How use grep for that complicated expressions?

excel I need formula in column name "FEBRUARY"

Performing file type counting in all directories

Script to delete all /n number of lines starting from a word except last line

Categories

Resources