how to use awk/sed to deal with these two files to get a result that I want - linux

I want to use awk/sed to deal with two files(a.txt and b.txt) below and get the result
cat a.txt
a UK
b Japan
c China
d Korea
e US
And cat b.txt results
c Russia
e Canada
The result that I want is as below:
a UK
b Japan
c Russia
d Korea
e Canada

With awk:
First fill aray/hash a with complete row ($0) and use first column ($1) from this row as index. Finally, print all elements of array/hash a with a loop.
awk '{a[$1]=$0} END{for(i in a) print a[i]}' file1 file2
Output:
a UK
b Japan
c Russia
d Korea
e Canada

try:
awk 'FNR==NR{A[$1]=$NF;next} {printf("%s %s\n",$1,$1 in A?A[$1]:$NF)}' b.txt a.txt
Checking here condition FNR==NR which will be TRUE only when first file(b.txt) is being read. Then creating an array named A whose index is $1 and have the value last column. Then using printf for printing 2 strings where first string is $1 and another is if $1 of a.txt is present in array A then print array A's value whose index is $1 else print last column of a.tzt itself.
EDIT: as OP had carriage characters into Input_files so please remove them by following too.
tr -d '\r' < b.txt > temp_b.txt && mv temp_b.txt b.txt

You can use the below one-liner:
join -a 1 -a 2 a.txt <( awk '{print $1, "--", $0, "--"}' < b.txt ) | sed 's/ --$//' | awk -F ' -- ' '{print $NF}'
We use awk to prefix each line in b.txt with a key and -- to give us a split point later:
<( awk '{print $1, "--", $0, "--"}' < b.txt )
Use the join command to join the files on common keys. The -a 1 option tells the command to
join -a 1 -a 2 a.txt <( awk '{print $1, "--", $0, "--"}' < b.txt )
Use sed to remove the -- parts that are on some end of lines:
sed 's/ --$//'
Use awk to print the last item on each line:
awk -F ' -- ' '{print $NF}'

$ awk 'NR==FNR{b[$1]=$2;next} {print $1, ($1 in b ? b[$1] : $2)}' b.txt a.txt
a UK
b Japan
c Russia
d Korea
e Canada

Related

Print lines not containg a period linux

I have a file with thousands of rows. I want to print the rows which do not contain a period.
awk '{print$2}' file.txt | head
I have used this to print the column I am interested in, column 2 (The file only has two columns).
I have removed the head and then did
awk '{print$2}' file.txt | grep -v "." | head
But I only get blank lines not any actual values which is expected, I think it has included the spaces between the rows but I am not sure.
Is there an alternative command?
As suggested by Jim, I did-
awk '{print$2}' file.txt | grep -v "\." | head
However the number of lines is greater than before, is this expected? Also, my output is a list of numbers but with spaces in between them (Vertical), is this normal?
file.txt example below-
120.4 3
270.3 7.9
400.8 3.9
200.2 4
100.2 8.7
300.2 3.4
102.3 6
49.0 2.3
38.0 1.2
So the expected (and correct) output would be 3 lines, as there is 3 values in column 2 without the period:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
However, when running the code as above, I instead get 5, which is also counting the spaces between the rows I think:
$ awk '{print$2}' file.txt | grep -v "\." | head
3
4
6
You seldom need to use grep if you're already using awk
This would print the second column on each line where that second column doesn't contain a dot:
awk '$2 !~ /\./ {print $2}'
But you also wanted to skip empty lines, or perhaps ones where the second column is not empty. So just test for that, too:
awk '$2 != "" && $2 !~ /\./ {print $2}'
(A more amusing version would be awk '$2 ~ /./ && $2 !~ /\./ {print $2}' )
As you said, grep -v "." gives you only blank lines. That's because the dot means "any character", and with -v, the only lines printed are those that don't contain, well, any characters.
grep is interpreting the dot as a regex metacharacter (the dot will match any single character). Try escaping it with a backslash:
awk '{print$2}' file.txt | grep -v "\." | head
If I understand well, you can try this sed
sed ':A;N;${s/.*/&\n/};/\n$/!bA;s/\n/ /g;s/\([^ ]*\.[^ ]* \)//g' file.txt
output
3
4
6

Linux command (Calculating the sum)

I have a .txt file with the following content:
a 3
a 4
a 5
a 6
b 1
b 3
b 5
c 9
c 10
I am wondering if there is any command (no awk if possible) that can read the .txt file and give the following output (Sorted by the second column):
c 19
a 18
b 9
You can use awk piped to sort:
awk '{sums[$1] += $2} END {for (i in sums) print i, sums[i]}' file | sort -rnk2
c 19
a 18
b 9
sums[$1] += $2 is adding value of $2 in an array sums that is indexed by field #1 ($1).
sort -rnk2 is reverse sorting numerically output of awk on field 2
Use can use this code:
cat 1.txt | awk '{arr[$1]+=$2}END{for (var in arr) print var," ",arr[var]}' | sort -rnk 2
Explanation:
cat 1.txt - read 1.txt file with content
awk - is a language very useful for data manipulation
{arr[$1]+=$2} for each line in content file increase array item with key first field with value of second field. Field separator by default is space.
END{for (var in arr) print var," ",arr[var]}' - after all line is proceeded, print array content
sort -rnk 2 - reverse numeric sort on field 2
Non-awk solutions.
perl
perl -lane '
$sum{$F[0]} += $F[1]
} END {
$, = " ";
print $_, $sum{$_} for reverse sort {$sum{$a} <=> $sum{$b}} keys %sum
' file.txt
bash version 4
declare -A sum
while read key val; do (( sum[$key] += $val )); done < file.txt
for key in "${!sum[#]}"; do echo "$key ${sum[$key]}"; done | sort -rn -k2
non-awk challenge accepted
vars=$(cut -d" " -f1 nums | uniq); paste <(echo "$vars") <(cat <(sed -e 's/ /+=/' nums) <(echo "$vars" | sed 's/$/;/') | bc) | sort -k2,2nr
c 19
a 18
b 9

Two text file comparison with grep

I have two files (a.txt, b.txt)
a.txt is a list of English words (one word in ever row)
b.txt contains in every row: a number, a space character, a 5-65 char long string
(for example b.txt can contain: 1234 dsafaaraehawada)
I would like to know which row in b.txt contains words from a.txt and how many of them?
Example input:
a.txt
green
apple
bar
b.txt
1212 greensdsdappleded
12124 dfsfsd
123 bardws
output:
2 1212 greensdsdappleded
1 123 bardws
First row contains 'green' and 'apple' (2)
Second row contains nothing.
Third row contains 'bar' (1)
Thats all I would like to know.
The code (By Mr. Barmar):
grep -F -o -f a.txt b.txt | sort | uniq -c | sort -nr
But it need to be modified.
Try something like this:
awk 'NR==FNR{A[$1]; next} {t=0; for (i in A) t+=gsub(i,"&",$2)} t{print t, $0}' file1 file2
Try something like this:
awk '
NR==FNR { list[$1]++; next }
{
cnt=0
for(word in list) {
if(index($2,word) > 0)
cnt++
}
if(cnt>0)
print cnt,$0
}' a.txt b.txt
Test:
$ cat a.txt
green
apple
bar
$ cat b.txt
1212 greensdsdappleded
12124 dfsfsd
123 bardws
$ awk '
NR==FNR { list[$1]++; next }
{
cnt=0
for(word in list) {
if(index($2,word) > 0)
cnt++
}
if(cnt>0)
print cnt,$0
}' a.txt b.txt
2 1212 greensdsdappleded
1 123 bardws

linux command to get the last appearance of a string in a text file

I want to find the last appearance of a string in a text file with linux commands. For example
1 a 1
2 a 2
3 a 3
1 b 1
2 b 2
3 b 3
1 c 1
2 c 2
3 c 3
In such a text file, i want to find the line number of the last appearance of b which is 6.
I can find the first appearance with
awk '/ b / {print NR;exit}' textFile.txt
but I have no idea how to do it for the last occurrence.
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
cat -n prints the file to STDOUT prepending line numbers.
grep greps out all lines containing "b" (you can use egrep for more advanced patterns or fgrep for faster grep of fixed strings)
tail -1 prints last line of those lines containing "b"
cut -f 1 prints first column, which is line # from cat -n
Or you can use Perl if you wish (It's very similar to what you'd do in awk, but frankly, I personally don't ever use awk if I have Perl handy - Perl supports 100% of what awk can do, by design, as 1-liners - YMMV):
perl -ne '{$n=$. if / b /} END {print "$n\n"}' textfile.txt
This can work:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
We check every second file being "b" and we record the number of line. It is appended, so by the time we finish reading the file, it will be the last one.
Test:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
6
Update based on sudo_O advise:
$ awk '{if ($2=="b") a=NR} END{print a}' your_file
to avoid having some abc in 2nd field.
It is also valid this one (shorter, I keep the one above because it is the one I thought :D):
$ awk '$2=="b" {a=NR} END{print a}' your_file
Another approach if $2 is always grouped (may be more efficient then waiting until the end):
awk 'NR==1||$2=="b",$2=="b"{next} {print NR-1; exit}' file
or
awk '$2=="b"{f=1} f==1 && $2!="b" {print NR-1; exit}' file

Linux shell script read columns into variable and then add the attribute

I have a file test.txt looking like this:
2092 Mary
103 Tom
1239 Mary
204 Mark
1294 Tom
1092 Mary
I am trying to create a shell script that will
Read each line and put the data in two columns into variable var1 and var2
If var2 in each line is the same, then add the var1 in those lines.
output the file into a text file.
The result should be unique values in the var2 column. Here's what I have so far:
#!/bin/sh
#!/usr/bin/sh
cat test.txt| while read line;
do
$var1=$(echo $line| awk -F\; '{print $1}')
$var2=$(echo $line| awk -F\; '{print $2}')
How can I reference the variable in each line and then compare them?
The expected output would be:
4423 Mary
1397 Tom
204 Mark
Using awk it is easy:
awk '{sum[$2] += $1} END {for (i in sum) printf "%4d %s\n", sum[i], i; }'
If you want to do it with bash 4.x (not 3.x), then:
declare -A sum
while read number name
do
((sum[$name] += $number))
done
for name in "${!sum[#]}"
do
echo ${sum[$name]} $name
done
The structure here is essentially isomorphic with the awk script, but a little less notationally convenient. It will read from standard input, using the names as indexes into the associative array sum. The ${!sum[#]} notation is described in the Shell Parameter Expansion section of the manual, and not even hinted at in the section on Arrays. The information is there if you know where to look.
If you want to process an arbitrary number of input files (like the awk script would) then you need to use cat to collect the data:
cat "$#" |
{
declare -A sum
while read number name
do
((sum[$name] += $number))
done
for name in "${!sum[#]}"
do
echo ${sum[$name]} $name
done
}
This is not UUOC because it handles no arguments (read standard input), one argument or many arguments.
For all the scripts, if you want to sort the output in number or name order, apply an appropriate sort to the output of the script:
script file1 file2 file3 | sort -k 1,1n # By sum increasing order
script file1 file2 file3 | sort -k 1,1nr # By sum decreasing order
script file1 file2 file3 | sort -k 2,2 # By name increasing order
script file1 file2 file3 | sort -k 2,2r # By name decreasing order

Resources