read line by line awk and if - linux

I have a File called contenido.txt
the file have inside the next table
Nombre column1 column2 valor3
Marcos 1 0 0
Jose 1 0 0
Andres 0 0 0
Oscar 1 0 0
Pablo 0 0 0
I need a final file or a print of the lines that only has 0 in the column2
could you help me please?
cat contenido.txt | while read LINE; do
var=$(cat $LINE | awk '{print $2}')
if ["$var" == 0]
then
echo $LINE | awk '{print $1}'
fi
done

After reading your codes, the column 2 you meant is actually the 2nd column( the column with header "column1"), it is not the column with header "column2". So this line will help you:
awk 'NR==1{print;next}$2==0' file
test with your data
kent$ echo "Nombre column1 column2 valor3
Marcos 1 0 0
Jose 1 0 0
Andres 0 0 0
Oscar 1 0 0
Pablo 0 0 0"|awk 'NR==1{print;next}$2==0'
Nombre column1 column2 valor3
Andres 0 0 0
Pablo 0 0 0
and the 2nd part of your codes seem that extracting the first column (names?) out. You can do this in one shot with awk (ignore the header):
kent$ echo "Nombre column1 column2 valor3
Marcos 1 0 0
Jose 1 0 0
Andres 0 0 0
Oscar 1 0 0
Pablo 0 0 0"|awk '$2==0{print $1}'
Andres
Pablo

column2 is $3 in awk. So:
$ awk '$3 == 0' < in.txt
Marcos 1 0 0
Jose 1 0 0
Andres 0 0 0
Oscar 1 0 0
Pablo 0 0 0
{print $0} is the implicit action.

Related

AWK / SCRIPT , return number of values from specific field in the /etc/group file

Im trying to write somthing that will give me this type of output using awk.
I'm trying to extract the group name , the group ID and the numbers of users in each group from the /etc/group file
Group : root ID:0 : 2 accounts
Group : daemon ID: 1 : 1 account
Group : bin ID: 2 : 1 account
Ive tried this for now ,
#!/bin/bash
NbrsUtil=$(cut -d ":" -f4 /etc/group | awk -F "," '{print NF}')
awk -v utils=$NbrsUtil -F ":" '{print "Groupe:",$1,"ID:" $3,utils," :accounts"} ' /etc/group
This is not working ..
i can try to use "cut" to specify the field i want , and then I use awk to count the number of fiels via the "|" , and i get the good values but the output is not good and does not work with my script.
cut -d ":" -f4 /etc/group | awk -F "," '{print NF}'
0
0
0
0
2
0
0
0
0
0
0
0
0
0
2
0
If i echo the command in the script it show in one line
#!/bin/bash
NbrsUtil=$(cut -d ":" -f4 /etc/group | awk -F "," '{print NF}')
echo $NbrsUtil
awk -F ":" '{print "Groupe:",$1,"ID:" $3,$4," :accounts"} ' /etc/group
-->
0 0 0 0 2 0 0 0 0 0 0 0 0 0 2 0 0 1 1 0 1 2 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 2 0 0 1 0 0 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0
Groupe: root ID:0 :accounts
Groupe: daemon ID:1 :accounts
Groupe: bin ID:2 :accounts
Groupe: sys ID:3 :accounts
Groupe: adm ID:4 franco,root :accounts
Groupe: tty ID:5 :accounts
Groupe: disk ID:6 :accounts
Groupe: lp ID:7 :accounts
Groupe: mail ID:8 :accounts
awk has a split function that can help:
split(s, a[, fs ])
Split the string s into array elements a[1], a[2], ..., a[n], and return n. [...] The separation shall be done with the ERE fs or with the field separator FS if fs is not given.
So you can just do:
awk -F: '{print "Group:",$1,"ID:",$3,"Accounts:",split($4,_,",")}' /etc/group
You might also use:
awk -F: '{
nr = ($4 == "") ? 0 : gsub(/,/, "", $4) + 1
print "Groupe:",$1,"ID:",$3,"Accounts:", nr " :accounts"
}' /etc/group
Or a bit shortened version as suggested by RARE Kpop Manifesto:
awk -F: '{
nr = ( "" < $4 ) + gsub(/,/, "", $4)
print "Groupe:",$1,"ID:",$3,"Accounts:", nr " :accounts"
}' /etc/group
The nr in the script is zero when $4 is empty.
If it is not empty, you can replace all the comma's using gsub and that will return the number of replacements.
Add 1 to the result of gsub as when it is not empty but also not comma's, there is still 1 account.
Which gives a result like:
Groupe: root ID: 0 Accounts: 0 :accounts
Groupe: daemon ID: 1 Accounts: 0 :accounts
Groupe: bin ID: 2 Accounts: 0 :accounts
Groupe: sys ID: 3 Accounts: 0 :accounts
Groupe: adm ID: 4 Accounts: 2 :accounts
Groupe: tty ID: 5 Accounts: 1 :accounts

Nested Loop Over Two Files

I have two test files, the first one contains a 3rd party names, the second file contains a message status like sent, failed, technical errors, etc.
I want to search in a log file for each 3rd party name (from first file) and get count of each message status (listed in file 2)
example of 1st file.txt (3rd party names)
BNF_IPL
one97
pajwok
RadioAzadi
SPICDIGITAL
U2OPIA
UNIFUN
UNIFUNRS
vectracom
VNTAF
YRMP
INFOTT
second file.txt (message status):
success
partial
failed
Error absentSubscriber
UnknownSubscriber
smDeliveryFailure
userSpecificReason
CallBarred
systemFailure
my goal is to produce a report contains total status for each 3rd party. something like
sent | failed | TechErrpr | Absent | subscriber
IBM someValue someValue someValue someValue someValue
Microsoft someValue someValue someValue someValue someValue
Oracle someValue someValue someValue someValue someValue
google someValue someValue someValue someValue someValue
To get the values i will grep those names and status in a log file and get the totals. for that i am trying to use nested loop but with no luck.something like:
for ((i = 0; i < wc -l 3rdPList.txt ; i++)); do
for ((j = i; j < wc -l status.txt ; j++)); do
grep 3rdPList.txt logFile | grep status.txt | wc -l > outputFile.txt
echo $st[j]
done
done
example of the log file:
2018-10-30 00:07:19,640 DEBUG [org.mobicents.smsc.library.CdrGenerator] 2018-10-29 14:42:45,789 +0430,588,5,0,93706315646,1,1,temp_failed,BNF_IPL,26674477,0702700006,412012004908984,null,ایید.,Error absentSubscriber after MtForwardSM Request: MAPErrorMessageAbsentSubscriber []
2018-10-30 00:07:41,034 DEBUG [org.mobicents.smsc.library.CdrGenerator] 2018-10-29 16:21:27,260 +0430,588,5,0,0700375593,1,1,temp_failed,BNF_IPL,27008401,null,null,null,عدد1 را به588 ارسال ,AbsentSubscriber response from HLR: MAPErrorMessageAbsentSubscriber []
This does pretty much what you ask, but I didn't work too much on pretty formatting!
{ sed 's/^/1,/' 1.txt; sed 's/^/2,/' 2.txt; cat log.txt; } | awk -F, '$1==1{c=substr($0,3);cc[c]++;next} $1==2{s=substr($0,3); ss[s]++;next} {s=$10;c=$11;res[c SEP s]++} END{for(s in ss){printf("%s ",s)};printf("\n");for(c in cc){printf("%s ",c);for(s in ss){printf("%d ",res[c SEP s]+0)}printf("\n")}}'
Sample Output
systemFailure temp_failed CallBarred userSpecificReason smDeliveryFailure UnknownSubscriber Error absentSubscriber partial success
pajwok 0 0 0 0 0 0 0 0 0
SPICDIGITAL 0 0 0 0 0 0 0 0 0
YRMP 0 0 0 0 0 0 0 0 0
UNIFUN 0 0 3 0 0 0 0 0 0
U2OPIA 0 0 0 0 0 0 0 0 0
UNIFUNRS 0 0 0 0 0 0 0 0 0
RadioAzadi 0 0 0 0 0 0 0 0 0
one97 0 0 0 0 0 0 0 0 0
BNF_IPL 0 2 0 0 0 0 0 0 0
VNTAF 0 0 0 0 0 0 0 0 0
INFOTT 0 0 0 0 0 0 0 0 0
vectracom 0 0 0 0 0 0 0 0 0
If you want to understand it, try running the parts separately. So, for the first part, I prefix all the company names by a 1 so that awk can differentiate them from status codes and log lines:
sed 's/^/1,/' 1.txt
Output
1,BNF_IPL
1,one97
1,pajwok
1,RadioAzadi
1,SPICDIGITAL
1,U2OPIA
1,UNIFUN
1,UNIFUNRS
1,vectracom
1,VNTAF
1,YRMP
1,INFOTT
Then, I prefix all the status messages with a 2 so that awk can differentiate those from company names and log lines:
sed 's/^/2,/' 2.txt
Output
2,success
2,partial
2,temp_failed
2,Error absentSubscriber
2,UnknownSubscriber
2,smDeliveryFailure
2,userSpecificReason
2,CallBarred
2,systemFailure
Then I cat the log file into awk:
cat log.txt
The awk can be written across multiple lines and commented:
{ sed ...; sed ...; cat ...; } | awk -F, '
$1==1 {c=substr($0,3); cc[c]++; next} # Process company name in "1.txt", "c" holds name, "cc[]" is an array of names
$1==2 {s=substr($0,3); ss[s]++; next} # Process status code in "2.txt, "s" holds status, "ss[]" is an array of statuses
{s=$10; c=$11; res[c SEP s]++} # Process line from log, status is field 10, company is field 11. Increment results array "res[]"
END {
# Print line of status codes
for(s in ss){printf("%s ",s)};
printf("\n");
for(c in cc){printf("%s ",c);
for(s in ss){printf("%d ",res[c SEP s]+0)}printf("\n")}
}'
SEP is just a separator to fake 2-D arrays.

add header to columns from list text file awk

I have a very large text file with hundreds of columns. I want to add a header to every column from an independent text file containing a list.
My large file looks like this:
largefile.txt
chrom start end 0 1 0 1 0 0 0 etc
chrom start end 0 0 0 0 1 1 1 etc
chrom start end 0 0 0 1 1 1 1 etc
my list of headers:
headers.txt
h1
h2
h3
wanted output:
output.txt
h1 h2 h3 h4 h5 h6 h7 etc..
chrom start end 0 1 0 1 0 0 0 etc
chrom start end 0 0 0 0 1 1 1 etc
chrom start end 0 0 0 1 1 1 1 etc
$ awk 'NR==FNR{h=h OFS $0; next} FNR==1{print OFS OFS h} 1' head large | column -s ' ' -t
h1 h2 h3
chrom start end 0 1 0 1 0 0 0 etc
chrom start end 0 0 0 0 1 1 1 etc
chrom start end 0 0 0 1 1 1 1 etc
or if you prefer:
$ awk -v OFS='\t' 'NR==FNR{h=h OFS $0; next} FNR==1{print OFS OFS h} {$1=$1}1' head large
h1 h2 h3
chrom start end 0 1 0 1 0 0 0 etc
chrom start end 0 0 0 0 1 1 1 etc
chrom start end 0 0 0 1 1 1 1 etc
Well, here's one. OFS is tab for eye candy. From the OP I concluded that the headers should start from the fourth field, hence +3s in the code.
$ awk -v OFS="\t" ' # tab OFS
NR==FNR { a[NR]=$1; n=NR; next } # has headers
FNR==1 { # print headers in the beginning of 2nd file
$1=$1 # rebuild record for tabs
b=$0 # buffer record
$0="" # clear record
for(i=1;i<=n;i++) # spread head to fields
$(i+3)=a[i]
print $0 ORS b # output head and buffered first record
}
{ $1=$1 }1' head data # implicit print with record rebuild
h1 h2 h3
chrom start end 0 1 0 1 0 0 0 etc
chrom start end 0 0 0 0 1 1 1 etc
chrom start end 0 0 0 1 1 1 1 etc
Then again, this would also do the trick:
$ awk 'NR==FNR{h=h (NR==1?"":OFS) $0;next}FNR==1{print OFS OFS OFS h}1' head date
h1 h2 h3
chrom start end 0 1 0 1 0 0 0 etc
chrom start end 0 0 0 0 1 1 1 etc
chrom start end 0 0 0 1 1 1 1 etc
Use paste to pivot the headers into a single line and then cat them together with the main file (- instead of a file name means stdin to cat):
$ paste -s -d' ' headers.txt | cat - largefile.txt
If you really need the headers to line up as in your example output you can preprocess (either manually or with a command) the headers file, or you can finish with sed (for just one option) as below:
$ paste -s -d' ' headers.txt | cat - largefile.txt | sed '1 s/^/ /'
h1 h2 h3
chrom start end 0 1 0 1 0 0 0 etc
chrom start end 0 0 0 0 1 1 1 etc
chrom start end 0 0 0 1 1 1 1 etc

Counting number of rows depending on more than 1 column condition

I have a data file like this
H1 H2 H3 E1 E2 E3 C1 C2 C3
0 0 0 0 0 0 0 0 1
1 0 0 0 1 0 0 0 1
0 1 0 0 1 0 1 0 1
now i want to count the rows where H1,H2,H3 has the same pattern as E1,E2 and E3. for example, i want to count the number of time H1,H2,H3 and E1,E2,E3 both are 010 or 000.
I tried to use this code but it doesnt really work
awk -F "" '!($1==0 && $2==1 && $3==0 && $4==0 && $5==1 && $6==0)' file | wc -l
Something like
>>> awk '$1$2$3 == $4$5$6' input | wc -l
2
What it does?
$1$2$3 == $4$5$6 Checks if the string formed by columns 1 2 and 3 is equal to the columns formed by 4 5 and 6. When it is true, awk takes the default action of printing the entire line and the wc takes care of counting those lines.
Or, if you want complete awk solution, you can write
>>> awk '$1$2$3 == $4$5$6{count++} END{print count}' input
2

"Finding and extracting matches with single hit" from blat output, Mac vs. linux syntax?

Problem: the output file "single_hits.txt" is blank:
cut -f10 genome_v_trans.pslx | sort | uniq -c | grep ' 1 ' | sed -e 's/ 1 /\\\</' -e 's/$/\\\>/' > single_hits.txt
I have downloaded the script from Linux to be used on Mac OSX 10.7.5. There are some changes that need to be made as it is not working. I have nine "contigs" of DNA data that need to be filtered to remove all but unique contigs. blat is used to compare two datasets and output a .pslx file with these contigs, which worked:
964 0 0 0 0 0 3 292 + m.1 1461 0 964 3592203 ...
501 0 0 0 0 0 3 468 - m.1 1461 960 1461 5269699 ...
1168 0 0 0 1 2 7 1232 - m.7292 1170 0 1170 5233270 ...
Then this script is supposed to remove identical contigs such as the top two (m.1)
This seems to work on the limited data you gave,
grep -v `awk '{print $10}' genome_v_trans.pslx | uniq -d` genome_v_trans.pslx
unless you want it to have <> in place of the duplicates, then you can sed substitute the duplicate entries then you can do something like:
IFS=$(echo -en "\n\b") && for a in $(awk '{print $10}' genome_v_trans.pslx | uniq -d); do sed -i "s/$a/<>/g" genome_v_trans.pslx; done && unset IFS
results in:
964 0 0 0 0 0 3 292 + <> 1461 0 964 3592203 ...
501 0 0 0 0 0 3 468 - <> 1461 960 1461 5269699 ...
1168 0 0 0 1 2 7 1232 - m.7292 1170 0 1170 5233270 ...
or if you wanted that in the singlehits file:
IFS=$(echo -en "\n\b") && for a in $(awk '{print $10}' dna.txt | uniq -d); do sed "s/$a/<>/g" dna.txt >> singlehits.txt; done && unset IFS
SINGLE_TMP=/tmp/_single_tmp_$$ && awk '{if ($10 == "<>") print}' singlehits.txt > "$SINGLE_TMP" && mv "$SINGLE_TMP" singlehits.txt && unset SINGLE_TMP
or more elegant: sed -ni '/<>/p' singlehits.txt
singlehits.txt:
964 0 0 0 0 0 3 292 + <> 1461 0 964 3592203 ...
501 0 0 0 0 0 3 468 - <> 1461 960 1461 5269699 ...

Resources