awk command that compares strings for difference - string

I have an gz file which contains values in $12 and $33, where they contains strings (ex $12: 33-A and $33: 33A), I am trying to create an awk command that reads the values and counts the number of times "-" is in $12 but not in $13.
I have: gzcat test.gz | awk '{if ($12!=$33 && $12~/ -/ && $33!~/ -/) wc -l; else null} | wc -l'
But that command doesn't seem to work and get me the outcome I would like.

no need to check equality separately since it's implied, and no need to use wc, awk is capable of counting
... | awk '$12~/-/ && $33!~/-/{count++} END{print count+0}'
ps. your script is not a valid awk script. Also is the field 33 or 13?

Related

Print lines not containg a period linux

I have a file with thousands of rows. I want to print the rows which do not contain a period.
awk '{print$2}' file.txt | head
I have used this to print the column I am interested in, column 2 (The file only has two columns).
I have removed the head and then did
awk '{print$2}' file.txt | grep -v "." | head
But I only get blank lines not any actual values which is expected, I think it has included the spaces between the rows but I am not sure.
Is there an alternative command?
As suggested by Jim, I did-
awk '{print$2}' file.txt | grep -v "\." | head
However the number of lines is greater than before, is this expected? Also, my output is a list of numbers but with spaces in between them (Vertical), is this normal?
file.txt example below-
120.4 3
270.3 7.9
400.8 3.9
200.2 4
100.2 8.7
300.2 3.4
102.3 6
49.0 2.3
38.0 1.2
So the expected (and correct) output would be 3 lines, as there is 3 values in column 2 without the period:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
However, when running the code as above, I instead get 5, which is also counting the spaces between the rows I think:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
You seldom need to use grep if you're already using awk
This would print the second column on each line where that second column doesn't contain a dot:
awk '$2 !~ /\./ {print $2}'
But you also wanted to skip empty lines, or perhaps ones where the second column is not empty. So just test for that, too:
awk '$2 != "" && $2 !~ /\./ {print $2}'
(A more amusing version would be awk '$2 ~ /./ && $2 !~ /\./ {print $2}' )
As you said, grep -v "." gives you only blank lines. That's because the dot means "any character", and with -v, the only lines printed are those that don't contain, well, any characters.
grep is interpreting the dot as a regex metacharacter (the dot will match any single character). Try escaping it with a backslash:
awk '{print$2}' file.txt | grep -v "\." | head
If I understand well, you can try this sed
sed ':A;N;${s/.*/&\n/};/\n$/!bA;s/\n/ /g;s/\([^ ]*\.[^ ]* \)//g' file.txt
output
3
4
6

Print out only last 4 digits of mac addresses from 2nd column using awk in linux

I have made a shell script for getting the list of mac address using awk and arp-scan command. I want to strip the mac address to only last 4 digits i.e (i want to print only the letters yy)
ac:1e:04:0e:yy:yy
ax:8d:5c:27:yy:yy
ax:ee:fb:55:yy:yy
dx:37:42:c9:yy:yy
cx:bf:9c:a4:yy:yy
Try cut -d: -f5-
(Options meaning: delimiter : and fields 5 and up.)
EDIT: Or in awk, as you requested:
awk -F: '{ print $5 ":" $6 }'
here are a few
line=cx:bf:9c:a4:yy:yy
echo ${line:(-5)}
line=cx:bf:9c:a4:yy:yy
echo $line | cut -d":" -f5-
I imagine you want to strip the trailing spaces, but it isn't clear whether you want yy:yy or yyyy.
Anyhow, there are multiple ways to it but you already are running AWK and have the MAC in $2.
In the first case it would be:
awk '{match($2,/([^:]{2}:[^:]{2}) *$/,m); print m[0]}'
yy:yy
In the second (no colon :):
awk 'match($2,/([^:]{2}):([^:]{2}) *$/,m); print m[1]m[2]}'
yyyy
In case you don't have match available in your AWK, you'd need to resort to gensub.
awk '{print gensub(/.*([^:]{2}:[^:]{2}) *$/,"\\1","g",$2)}'
yy:yy
or:
awk '{print gensub(/.*([^:]{2}):([^:]{2}) *$/,"\\1\\2","g",$0)}'
yyyy
Edit:
I now realized the trailing spaces were added by anubhava in his edit; they were not present in the original question! You can then simply keep the last n characters:
awk '{print substr($2,13,5)}'
yy:yy
or:
awk '{print substr($2,13,2)substr($2,16,2)}'
yyyy
Taking into account that the mac address always is 6 octets, you probably could just do something like this to get the last 2 octets:
awk '{print substr($0,13)}' input.txt
While testing on the fly by using arp -an I notice that the output was not always printing the mac addresses in some cases it was returning something like:
(169.254.113.54) at (incomplete) on en4 [ethernet]
Therefore probably is better to filter the input to guarantee a mac address, this can be done by applying this regex:
^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$
Applying the regex in awk and only printing the 2 last octecs:
arp -an | awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) print substr($4,13)}'
This will filter the column $4 and verify that is a valid MAC address, then it uses substr to just return the last "letters"
You could also split by : and print the output in multiple ways, for example:
awk '{if ($4 ~ /^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$/) split($4,a,":"); print a[5] ":" a[6]}
Notice the exp ~ /regexp/
This is true if the expression exp (taken as a string) is matched by regexp.
The following example matches, or selects, all input records with the upper-case letter `J' somewhere in the first field:
$ awk '$1 ~ /J/' inventory-shipped
-| Jan 13 25 15 115
-| Jun 31 42 75 492
-| Jul 24 34 67 436
-| Jan 21 36 64 620
So does this:
awk '{ if ($1 ~ /J/) print }' inventory-shipped

Obtaining the total of coincidences with multiple pattern using grep command

I have a file in Linux contains strings:
CALLTMA
Starting
Starting
Ending
Starting
Ending
Ending
CALLTMA
Ending
I need the quantity of any string (FE. #Ending, # Starting, #CALLTMA). In my example I need obtaining:
CALLTMA : 2
Starting: 3
Ending : 4
I can obtaining this output when I execute 3 commands:
grep -i "Starting" "/myfile.txt" | wc -l
grep -i "Ending" "/myfile.txt" | wc -l
grep -i "CALLTMA" "/myfile.txt" | wc -l
I want to know if it is possible to obtain the same output using only one command.
I try running this command
grep -iE "CALLTMA|Starting|Ending" "/myfile.txt" | wc -l
But this returned the total of coincidences. I appreciate your help .
Use sort and uniq:
sort myfile.txt | uniq -c
The -c adds the counts to the unique lines. If you want to sort the output by frequency, add
| sort -n
to the end (and change to -nr if you want the descending order).
A simple awk way to handle this:
awk '{counts[$1]++} END{for (c in counts) print c, counts[c]}' file
Starting 3
Ending 4
CALLTMA 2
grep -c will work. You can put it all together in a short script:
for i in Starting CALLTMA Ending; do
printf "%-8s : %d\n" "$i" $(grep -c "$i" file.txt)
done
(to enter the search terms as arguments, just use the arguments array for the loop list, e.g. for i in "$#"; do)
Output
Starting : 3
CALLTMA : 2
Ending : 4

Linux shell script read columns into variable and then add the attribute

I have a file test.txt looking like this:
2092 Mary
103 Tom
1239 Mary
204 Mark
1294 Tom
1092 Mary
I am trying to create a shell script that will
Read each line and put the data in two columns into variable var1 and var2
If var2 in each line is the same, then add the var1 in those lines.
output the file into a text file.
The result should be unique values in the var2 column. Here's what I have so far:
#!/bin/sh
#!/usr/bin/sh
cat test.txt| while read line;
do
$var1=$(echo $line| awk -F\; '{print $1}')
$var2=$(echo $line| awk -F\; '{print $2}')
How can I reference the variable in each line and then compare them?
The expected output would be:
4423 Mary
1397 Tom
204 Mark
Using awk it is easy:
awk '{sum[$2] += $1} END {for (i in sum) printf "%4d %s\n", sum[i], i; }'
If you want to do it with bash 4.x (not 3.x), then:
declare -A sum
while read number name
do
((sum[$name] += $number))
done
for name in "${!sum[#]}"
do
echo ${sum[$name]} $name
done
The structure here is essentially isomorphic with the awk script, but a little less notationally convenient. It will read from standard input, using the names as indexes into the associative array sum. The ${!sum[#]} notation is described in the Shell Parameter Expansion section of the manual, and not even hinted at in the section on Arrays. The information is there if you know where to look.
If you want to process an arbitrary number of input files (like the awk script would) then you need to use cat to collect the data:
cat "$#" |
{
declare -A sum
while read number name
do
((sum[$name] += $number))
done
for name in "${!sum[#]}"
do
echo ${sum[$name]} $name
done
}
This is not UUOC because it handles no arguments (read standard input), one argument or many arguments.
For all the scripts, if you want to sort the output in number or name order, apply an appropriate sort to the output of the script:
script file1 file2 file3 | sort -k 1,1n # By sum increasing order
script file1 file2 file3 | sort -k 1,1nr # By sum decreasing order
script file1 file2 file3 | sort -k 2,2 # By name increasing order
script file1 file2 file3 | sort -k 2,2r # By name decreasing order

Find value from one csv in another one (like vlookup) in bash (Linux)

I have already tried all options that I found online to solve my issue but without good result.
Basically I have two csv files (pipe separated):
file1.csv:
123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|
file2.csv:
MAYOBAN|BANGOR|2400
MAYOBEL|BELLAVARY|2400
CORKKIN|KINSALE|2200
CORKCOR|CORK|2200
DUBLD11|DUBLIN 11|2100
I need a linux bash script to find the value of pos.3 from file2 based on the content of pos7 in file1.
Example:
file1, line1, pos 7: MAYOBAN
find MAYOBAN in file2, return pos 3 (2400)
the output should be something like this:
**2400**
**2200**
**2200**
**etc...**
Please help
Jacek
A little approach, far away to be perfect:
DELIMITER="|"
for i in $(cut -f 7 -d "${DELIMITER}" file1.csv );
do
grep "${i}" file2.csv | cut -f 3 -d "${DELIMITER}";
done
This will work, but since the input files must be sorted, the output order will be affected:
join -t '|' -1 7 -2 1 -o 2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
The output would look like:
2200
2200
2400
which is useless. In order to have a useful output, include the key value:
join -t '|' -1 7 -2 1 -o 0,2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
The output then looks like this:
CORKCOR|2200
CORKKIN|2200
MAYOBAN|2400
Edit:
Here's an AWK version:
awk -F '|' 'FNR == NR {keys[$7]; next} {if ($1 in keys) print $3}' file1.csv file2.csv
This loops through file1.csv and creates array entries for each value of field 7. Simply referring to an array element creates it (with a null value). FNR is the record number in the current file and NR is the record number across all files. When they're equal, the first file is being processed. The next instruction reads the next record, creating a loop. When FNR == NR is no longer true, the subsequent file(s) are processed.
So file2.csv is now processed and if it has a field 1 that exists in the array, then its field 3 is printed.
You can use Miller (https://github.com/johnkerl/miller).
Starting from input01.txt
123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|
and input02.txt
MAYOBAN|BANGOR|2400
MAYOBEL|BELLAVARY|2400
CORKKIN|KINSALE|2200
CORKCOR|CORK|2200
DUBLD11|DUBLIN 11|2100
and running
mlr --csv -N --ifs "|" join -j 7 -l 7 -r 1 -f input01.txt then cut -f 3 input02.txt
you will have
2400
2200
2200
Some notes:
-N to set input and output without header;
--ifs "|" to set the input fields separator;
-l 7 -r 1 to set the join fields of the input files;
cut -f 3 to extract the field named 3 from the join output
cut -d\| -f7 file1.csv|while read line
do
grep $line file1.csv|cut -d\| -f3
done

Resources