Use a file to extract specified rows from another file - linux

input1:
1 s1
100 s100
90 s90
input2:
a 1
b 3
c 7
d 100
e 101
f 90
Output:
a 1
d 100
f 90
I know join can do this, but it needs to (1) sort these common fields (2) after join, I need to remove the second column from input1. Does anyone have better solution for this.

Here's one way using awk:
awk 'FNR==NR { a[$1]; next } $2 in a' file1 file2
Results:
a 1
d 100
f 90

This might work for you (GNU sed):
sed -r 's|(\S+).*|/\\<\1$/p|' input1 | sed -nf - input2

Depending on your requirements, grep might do:
grep -wFf <(cut -d' ' -f1 input1) input2
Output:
a 1
d 100
f 90
Note that grep is not column-aware and will happily match where it can.

As far i know awk is better soluiton for this,but since its already provided :below is the perl solution.
> perl -F -lane '$H{$F[0]}=$F[1];END{%T=reverse(%H);foreach (values %H){if(exists($H{$_})){print $T{$_}." ".$_;}}}' file1 file2
a 1
d 100
f 90

Related

If first two columns are equal, select top 3 based on descending order of 3rd column

I want to select top 3 results for every line that has the same first two column.
For example the data will look like,
cat data.txt
A A 10
A A 1
A A 2
A A 5
A A 8
A B 1
A B 2
A C 6
A C 5
A C 10
A C 1
B A 1
B A 1
B A 2
B A 8
And for the result I want
A A 10
A A 8
A A 5
A B 2
A B 1
A C 10
A C 6
A C 5
B A 1
B A 1
B A 2
Note that some of the "groups" do not contain 3 rows.
I have tried
sort -k1,1 -k2,2 -k3,3nr data.txt | sort -u -k1,1 -k2,2 > 1.txt
comm -23 <(sort data.txt) <(sort 1.txt)| sort -k1,1 -k2,2 -k3,3nr| sort -u -k1,1 -k2,2 > 2.txt
comm -23 <(sort data.txt) <(cat 1.txt 2.txt | sort)| sort -k1,1 -k2,2 -k3,3nr| sort -u -k1,1 -k2,2 > 3.txt
It seems like it's working but since I am learning to code better was wondering if there was a better way to go about this. Plus, my code will generate many files that I will have to delete.
You can do:
$ sort -k1,1 -k2,2 -k3,3nr file | awk 'a[$1,$2]++<3'
A A 10
A A 8
A A 5
A B 2
A B 1
A C 10
A C 6
A C 5
B A 8
B A 2
B A 1
Explanation:
There are two key items to understand the awk program; associative arrays and fields.
If you reference an empty awk array element, it is an empty container -- ready for anything you put into it. You can use that as a counter.
You state If first two columns are equal...
The sort puts the file in order desired. The statement a[$1,$2] uses the values of the first two fields as a unique entry into an associative array.
You then state ...select top 3 based on descending order of 3rd column...
Once again, the sort put the file into the desired order, and the statement a[$1,$2]++ counts them. Now just count up to three.
awk is organized into blocks of condition {action} The statement a[$1,$2]++<3 is true until there are more than 3 of the same pattern seen.
A wordier version of the program would be:
awk 'a[$1,$2]++<3 {print $0}'
But the default action if the condition is true is to print $0 so it is not needed.
If you are processing text in Unix, you should get to know awk. It is the most powerful tool that POSIX guarantees you will have, and is commonly used for these tasks.
Great place to start is the online book Effective AWK Programming by Arnold D. Robbins
#Dawg has the best answer. This one will be a little lighter on memory, which probably won't be a concern for your data:
sort -k1,2 -k3,3nr file |
awk '
{key = $1 FS $2}
prev != key {prev = key; count = 1}
count <= 3 {print; count++}
'
You can sort the file by first two columns primarily and by the 3rd one numerically secondarily, then read the output and only print the first three lines for each combination of the first two columns.
sort -k1,2 -k3,3rn data.txt \
| while read c1 c2 n ; do
if [[ $c1 == $l1 && $c2 == $l2 ]] ; then
((c++))
else
c=0
fi
if (( c < 3 )) ; then
echo $c1 $c2 $n
l1=$c1
l2=$c2
fi
done

Joining a pair of lines with specific starting points

I know that with sed I can print
cat current.txt | sed 'N;s/\n/,/' > new.txt
A
B
C
D
E
F
to
A,B
C,D
E,F
What I would like to do is following:
A
B
C
D
E
F
to
A,D
B,E
C,F
I'd like to join 1 with 4, 2 with 5, 3 with 6 and so on.
Is this possible with sed? Any idea how it could be achieved?
Thank you.
Try printing in columns:
pr -s, -t -2 current.txt
This is longer than I was hoping, but:
$ lc=$(( $(wc -l current.txt | sed 's/ .*//') / 2 ))
$ paste <(head -"$lc" current.txt) <(tail -"$lc" current.txt) | column -t -o,
The variable lc stores the number of lines in current.txt divided by two. Then head and tail are used to print lc first and lc last lines, respectively (i.e. the first and second half of the file); then paste is used to put the two together and column changes tabs to commas.
An awk version
awk '{a[NR]=$0} NR>3 {print a[NR-3]","$0}' current.txt
A,D
B,E
C,F
This solution is easy to adjust if you like other interval.
Just change NR>3 and NR-3 to desired number.

Making horizontal String vertical shell or awk

I have a string
ABCDEFGHIJ
I would like it to print.
A
B
C
D
E
F
G
H
I
J
ie horizontal, no editing between characters to vertical. Bonus points for how to put a number next to each one with a single line. It'd be nice if this were an awk or shell script, but I am open to learning new things. :) Thanks!
If you just want to convert a string to one-char-per-line, you just need to tell awk that each input character is a separate field and that each output field should be separated by a newline and then recompile each record by assigning a field to itself:
awk -v FS= -v OFS='\n' '{$1=$1}1'
e.g.:
$ echo "ABCDEFGHIJ" | awk -v FS= -v OFS='\n' '{$1=$1}1'
A
B
C
D
E
F
G
H
I
J
and if you want field numbers next to each character, see #Kent's solution or pipe to cat -n.
The sed solution you posted is non-portable and will fail with some seds on some OSs, and it will add an undesirable blank line to the end of your sed output which will then become a trailing line number after your pipe to cat -n so it's not a good alternative. You should accept #Kent's answer.
awk one-liner:
awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++)print i,$i}'
test :
kent$ echo "ABCDEF"|awk 'BEGIN{FS=""}{for(i=1;i<=NF;i++)print i,$i}'
1 A
2 B
3 C
4 D
5 E
6 F
So I figured this one out on my own with sed.
sed 's/./&\n/g' horiz.txt > vert.txt
One more awk
echo "ABCDEFGHIJ" | awk '{gsub(/./,"&\n")}1'
A
B
C
D
E
F
G
H
I
J
This might work for you (GNU sed):
sed 's/\B/\n/g' <<<ABCDEFGHIJ
for line numbers:
sed 's/\B/\n/g' <<<ABCDEFGHIJ | sed = | sed 'N;y/\n/ /'
or:
sed 's/\B/\n/g' <<<ABCDEFGHIJ | cat -n

linux command to get the last appearance of a string in a text file

I want to find the last appearance of a string in a text file with linux commands. For example
1 a 1
2 a 2
3 a 3
1 b 1
2 b 2
3 b 3
1 c 1
2 c 2
3 c 3
In such a text file, i want to find the line number of the last appearance of b which is 6.
I can find the first appearance with
awk '/ b / {print NR;exit}' textFile.txt
but I have no idea how to do it for the last occurrence.
cat -n textfile.txt | grep " b " | tail -1 | cut -f 1
cat -n prints the file to STDOUT prepending line numbers.
grep greps out all lines containing "b" (you can use egrep for more advanced patterns or fgrep for faster grep of fixed strings)
tail -1 prints last line of those lines containing "b"
cut -f 1 prints first column, which is line # from cat -n
Or you can use Perl if you wish (It's very similar to what you'd do in awk, but frankly, I personally don't ever use awk if I have Perl handy - Perl supports 100% of what awk can do, by design, as 1-liners - YMMV):
perl -ne '{$n=$. if / b /} END {print "$n\n"}' textfile.txt
This can work:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
We check every second file being "b" and we record the number of line. It is appended, so by the time we finish reading the file, it will be the last one.
Test:
$ awk '{if ($2~"b") a=NR} END{print a}' your_file
6
Update based on sudo_O advise:
$ awk '{if ($2=="b") a=NR} END{print a}' your_file
to avoid having some abc in 2nd field.
It is also valid this one (shorter, I keep the one above because it is the one I thought :D):
$ awk '$2=="b" {a=NR} END{print a}' your_file
Another approach if $2 is always grouped (may be more efficient then waiting until the end):
awk 'NR==1||$2=="b",$2=="b"{next} {print NR-1; exit}' file
or
awk '$2=="b"{f=1} f==1 && $2!="b" {print NR-1; exit}' file

In a *nix environment, how would I group columns together?

I have the following text file:
A,B,C
A,B,C
A,B,C
Is there a way, using standard *nix tools (cut, grep, awk, sed, etc), to process such a text file and get the following output:
A
A
A
B
B
B
C
C
C
You can do:
tr , \\n
and that will generate
A
B
C
A
B
C
A
B
C
which you could sort.
Unless you want to pull the first column then second then third, in which case you want something like:
awk -F, '{for(i=1;i<=NF;++i) print i, $i}' | sort -sk1 | awk '{print $2}'
To explain this, the first part generates
1 A
2 B
3 C
1 A
2 B
3 C
1 A
2 B
3 C
the second part will stably sort (so the internal order is preserved)
1 A
1 A
1 A
2 B
2 B
2 B
3 C
3 C
3 C
and the third part will strip the numbers
You could use a shell for-loop combined with cut if you know in advanced the number of columns. Here is an example using bash syntax:
for i in {1..3}; do
cut -d, -f $i file.txt
done
Try:
awk 'BEGIN {FS=","} /([A-C],)+([A-C])?/ {for (i=1;i<=NF;i++) print $i}' YOURFILE | sort

Resources