Awk coping when column is occassionly empty - linux

I have output from a foreach loop in the form:
ABC123603LP 44Bq AAAA
ABC123603P 3BU AAAA
ABC123603ZZP AAAA
ABC123604DP 3BU BBBB
ABC123604LP 44Bq BBBB
ABC123605AP 4q CCCC
ABC123605DP 33BGU CCCC
ABC123606AP 35Bjq DDDD
ABC123606DP 4B DDDD
From this I wish to print columns 1 and 2 to the terminal with
echo ... | awk '{print $1, $2}'
However the third row and others prints ABC123603ZZP AAAA as the second column is blank in this case. How do I get around this?

Check the number of field before you print:
$ awk 'BEGIN {OFS="\t"}{ if (NF==2) print $1; else print $1, $2}' file
ABC123603LP 44Bq
ABC123603P 3BU
ABC123603ZZP
ABC123604DP 3BU
ABC123604LP 44Bq
ABC123605AP 4q
ABC123605DP 33BGU
ABC123606AP 35Bjq
ABC123606DP 4B

You can use sed instead:
echo ... | sed 's/\(A[^ ]*[\t ]*\)\([^ \t]*[0-9][^ \t]*\)*.*/\1 \2/'

One technique to remove the last column is to make awk think there are 1 fewer fields:
awk -v OFS='\t' '{NF--; print}'

Related

How to replace rows starting with a certain character with emptly lines in text file with Linux/Unix?

I'm using Ubuntu and have a large text file where certain rows start with < character what I'd like to replace each of these to an empty row. So from this:
eeeeee
<
<
aaaa
bbbb
cccc
<
dddddd
<
ff
I want this:
eeee
aaaa
bbbb
cccc
dddddd
ff
(In case of multiple consecutive < rows only one empty rows is needed ideally)
How to perform this in command line?
This Perl one-liner should do what you're asking for:
perl -ne 'if (/^</) {print "\n" if !$f; $f=1} else {$f=0; print}' tmp.txt
Here it is in action:
# cat tmp.txt
eeeeee
<
<
aaaa
bbbb
cccc
<
dddddd
<
ff
# perl -ne 'if (/^</) {print "\n" if !$f; $f=1} else {$f=0; print}' tmp.txt
eeeeee
aaaa
bbbb
cccc
dddddd
ff
Commented code:
# Found a line starting with '<'
if (/^</) {
# Print a blank line if $f is false
print "\n" if !$f;
# Set $f to true so subsequent lines starting with '<' are ignored
$f=1;
} else {
# Not a line starting with '<'; reset $f to false
$f=0;
# Print the current line
print;
}
You could use sed. In this case the command would look something like:
sed '/</c\\' file.txt
This would find a line with a '<' character at the beginning, and replace the line with nothing. It would not however replace multiple empty rows with a single row.

Linux grep, how can I display lines that don't contain word 1 and word 2 but still display the lines that have both words in them

I need some help with displaying all lines that don't contain word1 or word2 but lines that contain both of them have to be shown.
Example:
aaaa bbbb cccc
bbbb bbbb bbbb
cccc cccc cccc
dddd dddd aaaa
if word1 = aaaa and word2 = bbbb then output should be:
aaaa bbbb cccc
cccc cccc cccc
Tried
grep -Ewv "word1/word2" file.txt
but this shows only lines that don't contain them, it doesn't show lines containing both
I need to do this with grep command, forgot to mention this
Grep version of both or none of each:
grep -v -P '((?=.*aaaa)(?!.*bbbb))|((?=.*bbbb)(?!.*aaaa))'
But please do not use grep in this case. Negative and positive look ahead can easily lead to Catastrophic Backtracking
GNU grep knows Perl compatible regular expression (PCRE) syntax (option -P). This thing is still called a "regular" expression, although it not regular anymore. Other people are more explicit and call backtracking irregular expressions.
How it works:
(?=.*aaaa) matches aaaa anywhere in the line, but does not move the cursor. After the match the next search starts at the beginning of the line.
(?!.*bbbb) matches when no bbbb is in the line and does not move the cursor either.
Both together matches lines, which include aaaa but do not include bbbb.
This is one of the cases you want to exclude, from your search results. The second behind the or condition (|) is the other one you want to exclude: any bbbb without a aaaa.
With the above, you have defined, what you do not want. Next use -v to invert the search to get what you want.
Bash version of both or none of each:
#! /bin/bash
word1=${1:-aaaa}
word2=${2:-bbbb}
while read -r line; do
if [[ $line =~ $word1 ]]; then
if [[ $line =~ $word2 ]]; then
printf "%s\n" "$line"
fi
else
if [[ $line =~ $word2 ]]; then
:
else
printf "%s\n" "$line"
fi
fi
done
In my opinion, the simplest way (even though possibly not the fastest) is to find separately the lines that contain neither word and the lines that contain both words, and to concatenate the results. For example (assuming file.txt is a text file in directory test, and I pass the input values as environment variables for generality - and we are only looking for full words, not word fragments):
[mathguy#localhost test]$ more file.txt
aaaa bbbb cccc
bbbb bbbb bbbb
cccc cccc cccc
dddd dddd aaaa
[mathguy#localhost test]$ word1=aaaa
[mathguy#localhost test]$ word2=bbbb
[mathguy#localhost test]$ ( grep "\b$word1\b" file.txt | grep "\b$word2\b" ; \
> grep -v "\b$word1\b" file.txt | grep -v "\b$word2\b" ) | cat
aaaa bbbb cccc
cccc cccc cccc

how to use awk/sed to deal with these two files to get a result that I want

I want to use awk/sed to deal with two files(a.txt and b.txt) below and get the result
cat a.txt
a UK
b Japan
c China
d Korea
e US
And cat b.txt results
c Russia
e Canada
The result that I want is as below:
a UK
b Japan
c Russia
d Korea
e Canada
With awk:
First fill aray/hash a with complete row ($0) and use first column ($1) from this row as index. Finally, print all elements of array/hash a with a loop.
awk '{a[$1]=$0} END{for(i in a) print a[i]}' file1 file2
Output:
a UK
b Japan
c Russia
d Korea
e Canada
try:
awk 'FNR==NR{A[$1]=$NF;next} {printf("%s %s\n",$1,$1 in A?A[$1]:$NF)}' b.txt a.txt
Checking here condition FNR==NR which will be TRUE only when first file(b.txt) is being read. Then creating an array named A whose index is $1 and have the value last column. Then using printf for printing 2 strings where first string is $1 and another is if $1 of a.txt is present in array A then print array A's value whose index is $1 else print last column of a.tzt itself.
EDIT: as OP had carriage characters into Input_files so please remove them by following too.
tr -d '\r' < b.txt > temp_b.txt && mv temp_b.txt b.txt
You can use the below one-liner:
join -a 1 -a 2 a.txt <( awk '{print $1, "--", $0, "--"}' < b.txt ) | sed 's/ --$//' | awk -F ' -- ' '{print $NF}'
We use awk to prefix each line in b.txt with a key and -- to give us a split point later:
<( awk '{print $1, "--", $0, "--"}' < b.txt )
Use the join command to join the files on common keys. The -a 1 option tells the command to
join -a 1 -a 2 a.txt <( awk '{print $1, "--", $0, "--"}' < b.txt )
Use sed to remove the -- parts that are on some end of lines:
sed 's/ --$//'
Use awk to print the last item on each line:
awk -F ' -- ' '{print $NF}'
$ awk 'NR==FNR{b[$1]=$2;next} {print $1, ($1 in b ? b[$1] : $2)}' b.txt a.txt
a UK
b Japan
c Russia
d Korea
e Canada

awk how to print the rest

my file contains lines like this
any1 aaa bbb ccc
The delimiter is space. the number of words in the line is unknown
I want to put the first word into a var1. It's simple with
awk '{print $1}'
Now I want to put the rest of the line into a var2 with awk.
How I can print the rest of the line with awk ?
Better to use read here:
s="any1 aaa bbb ccc"
read var1 var2 <<< "$s"
echo "$var1"
any1
echo "$var2"
aaa bbb ccc
For awk only solution use:
echo "$s" | awk '{print $1; print substr($0, index($0, " ")+1)}'
any1
aaa bbb ccc
$ var=$(awk '{sub(/^[^[:space:]]+[[:space:]]+/,"")}1' file)
$ echo "$var"
aaa bbb ccc
or in general to skip some number of fields use a RE interval:
$ awk '{sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){1}/,"")}1' file
aaa bbb ccc
$ awk '{sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")}1' file
bbb ccc
$ awk '{sub(/^[[:space:]]*([^[:space:]]+[[:space:]]+){3}/,"")}1' file
ccc
Note that doing this gets much more complicated if you have a FS that's more than a single char, and the above is just for the default FS since it additionally skips any leading blanks if present (remove the first [[:space:]]* if you have a non-default but still single-char FS).
awk solution:
awk '{$1 = ""; print $0;}'`

In Linux replace \n with \n\n from text (file)

Let say we have simple text
aaaa
bbbb
cccc
Output:
aaaa
bbbb
cccc
Any ideas
Why not just this awk
awk '{print $0"\n"}' file
aaaa
bbbb
cccc
Or this:
awk 1 ORS="\n\n" file
aaaa
bbbb
cccc
Or this:
awk '$0=$0"\n"' file
aaaa
bbbb
cccc
You could try the below awk command,
$ awk -v ORS="\n\n" '{print}' file
aaaa
bbbb
cccc
If you want to only have spaces between lines and not have an extra blank line in the end:
awk 'NR > 1 { printf ORS } 1' file
Or
awk 'NR > 1 { $0 = ORS $0 } 1' file
This might work for you (GNU sed):
sed G file
The pure bash soltion would be
while read line; do echo $line; echo; done < file;

Resources