how to sort a file according to another file? - linux

Is there a unix oneliner or some other quick way on linux to sort a file according to a permutation set by the sorting of another file?
i.e.:
file1: (separated by CRLFs, not spaces)
2
3
7
4
file2:
a
b
c
d
sorted file1:
2
3
4
7
so the result of this one liner should be
sorted file2:
a
b
d
c

paste file1 file2 | sort | cut -f2

Below is a perl one-liner that will print the contents of file2 based on the sorted input of file1.
perl -n -e 'BEGIN{our($x,$t,#a)=(0,1,)}if($t){$a[$.-1]=$_}else{$a[$.-1].=$_ unless($.>$x)};if(eof){$t=0;$x=$.;close ARGV};END{foreach(sort #a){($j,$l)=split(/\n/,$_,2);print qq($l)}}' file1 file2
Note: If the files are different lengths, the output will only print up to the shortest file length.
For example, if file-A has 5 lines and file-B has 8 lines then the output will only be 5 lines.

Related

grep matches counted per keyword?

I am using
$ grep -cf keyword_file Folder1/*
to generate a count of how many lines, in files housed within Folder1, match a keyword from the keyword_file.
this generates an aggregate count total such as
file1: 7
file2: 4
file3: 9
my problem:
I would like the output to be in the form below
first_keyword
file1: 5
file2: 0
file3: 5
second_keyword
file1: 0
file2: 3
file3: 1
third_keyword
file1: 2
file2: 1
file3: 3
so that I can see how many times each individual keyword was present in a line of each file.
How do I achieve this?
===== added detail ====
keyword_file is at Documents/script_pad/keyword_file
Folder1 is at Documents/script_pad/Folder1
what worked for me was
creating a file "Documents/script_pad/loop2" containing
#!/bin/bash
cat Documents/script_pad/keyword_file | while read line
do
echo $line; grep -c $line Documents/script_pad/Folder1/*
done
which when run resulted in
$ bash Documents/script_pad/loop2
first_keyword
Documents/script_pad/Folder1/file1:5
Documents/script_pad/Folder1/file2:0
Documents/script_pad/Folder1/file3:5
second_keyword
Documents/script_pad/Folder1/file1:0
Documents/script_pad/Folder1/file2:3
Documents/script_pad/Folder1/file3:1
third_keyword
Documents/script_pad/Folder1/file1:2
Documents/script_pad/Folder1/file2:1
Documents/script_pad/Folder1/file3:3

How to extract some missing rows by comparing two different files in linux?

I have two diferrent files which some rows are missing in one of the files. I want to make a new file including those non-common rows between two files. as and example, I have following files:
file1:
id1
id22
id3
id4
id43
id100
id433
file2:
id1
id2
id22
id3
id4
id8
id43
id100
id433
id21
I want to extract those rows which exist in file2 but do not in file1:
new file:
id2
id8
id21
any suggestion please?
Use the comm utility (assumes bash as the shell):
comm -13 <(sort file1) <(sort file2)
Note how the input must be sorted for this to work, so your delta will be sorted, too.
comm uses an (interleaved) 3-column layout:
column 1: lines only in file1
column 2: lines only in file2
column 3: lines in both files
-13 suppresses columns 1 and 2, which prints only the values exclusive to file2.
Caveat: For lines to be recognized as common to both files they must match exactly - seemingly identical lines that differ in terms of whitespace (as is the case in the sample data in the question as of this writing, where file1 lines have a trailing space) will not match.
cat -et is a command that visualizes line endings and control characters, which is helpful in diagnosing such problems.
For instance, cat -et file1 would output lines such as id1 $, making it obvious that there's a trailing space at the end of the line (represented as $).
If instead of cleaning up file1 you want to compare the files as-is, try:
comm -13 <(sed -E 's/ +$//' file1 | sort) <(sort file2)
A generalized solution that trims leading and trailing whitespace from the lines of both files:
comm -13 <(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file1 | sort) \
<(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file2 | sort)
Note: The above sed commands require either GNU or BSD sed.
Edit: I only wanted to change 1 character but 6 is the minimum... Delete this...
You can try to sort both files then count the duplicate rows and select only those row where the count is 1
sort file1 file2 | uniq -c | awk '$1 == 1 {print $2}'

Cat headers and renaming a column header using awk?

I've got an input file (input.txt) like this:
name value1 value2
A 3 1
B 7 4
C 2 9
E 5 2
And another file with a list of names (names.txt) like so:
B
C
Using grep -f, I can get all the lines with names "B" and "C"
grep -wFf names.txt input.txt
to get
B 7 4
C 2 9
However, I want to keep the header at the top of the output file, and also rename the column name "name" with "ID". And using grep, to keep the rows with names B and C, the output should be:
**ID** value1 value2
B 7 4
C 2 9
I'm thinking awk should be able to accomplish this, but being new to awk I'm not sure how to approach this. Help appreciated!
While it is certainly possible to do this in awk, the fastest way to solve your actual problem is to simply prepend the header you want in front of the grep output.
echo **ID** value1 value2 > Output.txt && grep -wFf names.txt input.txt >> Output.txt
Update Since the OP has multiple files, we can modify the above line to pull the first line out of the input file instead.
head -n 1 input.txt | sed 's/name/ID/' > Output.txt && grep -wFf names.txt input.txt >> Output.txt
Here is how to do it with awk
awk 'FNR==NR {a[$1];next} FNR==1 {$1="ID";print} {for (i in a) if ($1==i) print}' name input
ID value1 value2
B 7 4
C 2 9
Store the names in an array a
Then test filed #1 if it contains data in array a

Change format of text file

I have a file with many lines of tab separated data in the following format:
1 1 2 2
3 3 4 4
5 5 6 6
...
and I would like to change the format to:
1 1
2 2
3 3
4 4
5 5
6 6
Is there a not too complicated way to do this? I don't have any experience with using awk, sed, etc.
Thanks
If you just want to group your file in blocks of X columns, you can make use of xargs -nX:
$ xargs -n2 < file
1 1
2 2
3 3
4 4
5 5
6 6
To have more control and print an empty line after 4th field, you can also use this awk:
$ awk 'BEGIN{FS=OFS="\t"} {for (i=1;i<=NF;i++) printf "%s%s", $i, (i%2?OFS:RS); print ""}' file
1 1
2 2
3 3
4 4
5 5
6 6
# <-- note there is an empty line here
Explanation
On odd fields, it print FS after it.
On even fields, print RS.
Note FS stands for field separator, which defaults to space, and RS stands for record separator, which defaults to new line. As you have tab as field separator, we redefine it in the BEGIN block.
This is probably the simplest way which allows for customisation
awk '{print $1,$2"\n"$3,$4}' file
For a line between
awk '{print $1,$2"\n"$3,$4"\n"}' file
although fedorquis answer with xargs is probably the simplest if this isn't needed
As Ed pointed out this wouldn't work if there were blanks in the fields, this could be resolved using
awk 'BEGIN{FS=OFS="\t"} {print $1,$2 ORS $3,$4 ORS}' file
Through perl,
perl -pe 's/\t(\d\t\d)$/\n$1\n/g' file
Fed the above command's output to the sed command to delete the last blank line.
perl -pe 's/\t(\d\t\d)$/\n$1\n/g' file | sed '$d'

linux file compare string unincluded

how can i compare 2 files in LINUX containing for example:
file1
1
2
3
4
5
file2
1
2
3
and to get the result
file3
4
5
How about using comm: Select or reject lines common to two files?
comm -3 file1 file2 > file3
would work for your simple example.
If you want to list all the lines that are in file1, but not in file2, you can do this:
diff file1 file2 | grep "^<" | sed "s/^< //" > file3

Resources