how to do sorting of column 3, and change corresponding value of column 2 in new file using shell scripting? [closed] - linux

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to do sorting using sort command.
Input file is 1.txt
1 2 2
1 3 5.5
1 4 1.5
1 5 2.2
2 1 1.1
2 3 0.7
2 4 0.9
2 5 0.4
out file should be
1 4 1.5
1 2 2
1 5 2.2
1 3 5.5
2 5 0.4
2 3 0.7
2 4 0.9
2 1 1.1
column 3 should be sorted and corresponding second column should change.

Seems like you just want to do a numeric sort on two keys:
$ sort -n -k1 -k3 file
1 4 1.5
1 2 2
1 5 2.2
1 3 5.5
2 5 0.4
2 3 0.7
2 4 0.9
2 1 1.1
-n does a numeric sort, first on field 1 -k1 and then on field 3 -k3.

try then tune this one:
cat 1.txt | sed -E -e 's/[[:blank:]]+/ /g' | awk 'BEGIN {FS=" "; OFS=" "} {print $1, $3, $2}' | sort | awk 'BEGIN {FS=" "; OFS=" "} {print $1, $3, $2}'
sed - unify the column separators to a single space
awk - change order of columns
sort - sort 'em accordingly
awk - change order of columns back to normal

Here is one using GNU awk. It reads the data in memory and sorts while outputing so a huge file may cause you problems:
$ awk '{
a[$1][$3]=$0
}
END {
PROCINFO["sorted_in"]="#ind_num_asc"
for(i in a)
for(j in a[i])
print a[i][j]
}' file
Output (cleaned off the leading space after awk output):
1 4 1.5
1 2 2
1 5 2.2
1 3 5.5
2 5 0.4
2 3 0.7
2 4 0.9
2 1 1.1

Related

How to replace a number to another number in a specific column using awk

This is probably basic but I am completely new to command-line and using awk.
I have a file like this:
1 RQ22067-0 -9
2 RQ34365-4 1
3 RQ34616-4 1
4 RQ34720-1 0
5 RQ14799-8 0
6 RQ14754-1 0
7 RQ22101-7 0
8 RQ22073-1 0
9 RQ30201-1 0
I want the 0s to change to 1 in column3. And any occurence of 1 and 2 to change to 2 in column3. So essentially only changing numbers in column 3. But I am not changing the -9.
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
I have tried using (see below) but it has not worked
>> awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
>> awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
Thank you.
With this code in your question:
awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
you're running both commands on the same input file and writing their
output to the same output file so only the output of the 2nd script
will be present in the output, and
you're trying to change 0 to 1
first and THEN change 1 to 2 so the $3s that start out as 0 would
end up as 2, you need to change the order of the operations.
This is what you should be doing, using your existing code:
awk '{gsub("1","2",$3); gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
For example:
$ awk '{gsub("1","2",$3); gsub("0","1",$3)}1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
The gsub() should also just be sub()s as you only want to perform each substitution once, and you don't need to enclose the numbers in quotes so you could just do:
awk '{sub(1,2,$3); sub(0,1,$3)}1' file
You can check the value of column 3 and then update the field value.
Check for 1 as the first rule because if the first check is for 0, the value will be set to 1 and the next check will set the value to 2 resulting in all 2's.
awk '
{
if($3==1) $3 = 2
if($3==0) $3 = 1
}
1' file
Output
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
With your shown samples and ternary operators try following code. Simple explanation would be, checking condition if 3rd field is 1 then set it to 2 else check if its 0 then set it to 0 else keep it as it is, finally print the line.
awk '{$3=$3==1?2:($3==0?1:$3)} 1' Input_file
Generic solution: Adding a Generic solution here, where we can have 3 awk variables named: fieldNumber in which you could mention all field numbers which we want to check for. 2nd one is: existValue which we want to match(in condition) and 3rd one is: newValue new value which needs to be there after replacement.
awk -v fieldNumber="3" -v existValue="1,0" -v newValue="2,1" '
BEGIN{
num=split(fieldNumber,arr1,",")
num1=split(existValue,arr2,",")
num2=split(newValue,arr3,",")
for(i=1;i<=num1;i++){
value[arr2[i]]=arr3[i]
}
}
{
for(i=1;i<=num;i++){
if($arr1[i] in value){
$arr1[i]=value[$arr1[i]]
}
}
}
1
' Input_file
This might work for you (GNU sed):
sed -E 's/\S+/\n&\n/3;h;y/01/12/;G;s/.*\n(.*)\n.*\n(.*)\n.*\n.*/\2\1/' file
Surround 3rd column by newlines.
Make a copy.
Replace all 0's by 1's and all 1's by 2's.
Append the original.
Pattern match on newlines and replace the 3rd column in the original by the 3rd column in the amended line.
Also with awk:
awk 'NR > 1 {s=$3;sub(/1/,"2",s);sub(/0/,"1",s);$3=s} 1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
the substitutions are made with sub() on a copy of $3 and then the copy with the changes is assigned to $3.
When you don't like the simple
sed 's/1$/2/; s/0$/1/' file
you might want to play with
sed -E 's/(.*)([01])$/echo "\1$((\2+1))"/e' file

How to match two different length and different column text file with header using join command in linux

I have two different length text files A.txt and B.txt
A.txt looks like :
ID pos val1 val2 val3
1 2 0.8 0.5 0.6
2 4 0.9 0.6 0.8
3 6 1.0 1.2 1.3
4 8 2.5 2.2 3.4
5 10 3.2 3.4 3.8
B.txt looks like :
pos category
2 A
4 B
6 A
8 C
10 B
I want to match pos column and in both files and want the output like this
ID catgeory pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
I used the join function join -1 2 -2 1 <(sort -k2 A.txt) <(sort -k1 B.txt) > C.txt
The C.txt comes without a header
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
I want to get output with a header from the join function. kindly help me out
Thanks in advance
In case you are ok with awk, could you please try following. Written and tested with shown samples in GNU awk.
awk 'FNR==NR{a[$1]=$2;next} ($2 in a){$2=a[$2] OFS $2} 1' B.txt A.txt | column -t
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when B.txt is being read.
a[$1]=$2 ##Creating array a with index of 1st field and value is 2nd field of current line.
next ##next will skip all further statements from here.
}
($2 in a){ ##Checking condition if 2nd field is present in array a then do following.
$2=a[$2] OFS $2 ##Adding array a value along with 2nd field in 2nd field as per output.
}
1 ##1 will print current line.
' B.txt A.txt | column -t ##Mentioning Input_file names and passing awk program output to column to make it look better.
As you requested... It is perfectly possible to get the desired output using just GNU join:
$ join -1 2 -2 1 <(sort -k2 -g A.txt) <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5
ID category pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
$
The key to getting the correct output is using the sort -g option, and specifying the join output column order using the -o option.
To "pretty print" the output, pipe to column -t
$ join -1 2 -2 1 <(sort -k2 -g A.txt) <(sort -k1 -g B.txt) -o 1.1,2.2,1.2,1.3,1.4,1.5 | column -t
ID category pos val1 val2 val3
1 A 2 0.8 0.5 0.6
2 B 4 0.9 0.6 0.8
3 A 6 1.0 1.2 1.3
4 C 8 2.5 2.2 3.4
5 B 10 3.2 3.4 3.8
$

AWK (or something else) Average of multiple columns from multiple files

I would appreciate some help with an awk script, or whatever would do the job.
So, I've got multiple files (the same amount of lines and columns) and I want to do an average of every number in every column (except the first) from all the files. I have got no idea how many columns there are in a file (though i could probably get the number if needed).
filename.1
1 1 2 3 4
2 3 4 5 6
3 2 3 5 6
filename.2
1 3 4 6 6
2 5 6 7 8
3 4 5 7 8
output
1 2 3 5 5
2 4 5 6 7
3 3 4 6 7
I've found this somewhere on here that does it for a single column (as far as I understand it
awk '{a[FNR]+=$2;b[FNR]++;}END{for(i=1;i<=FNR;i++)print i,a[i]/b[i];}' fort.*
So the only? change would be to replace the +=$2 with a cycle over all columns? Is there a way to do that without knowing the exact number of columns?
Thanks.
$ cat tst.awk
{
key[FNR] = $1
for (colNr=2; colNr<=NF; colNr++) {
sum[FNR,colNr] += $colNr
}
}
END {
for (rowNr=1; rowNr<=FNR; rowNr++) {
printf "%s%s", key[rowNr], OFS
for (colNr=2; colNr<=NF; colNr++) {
printf "%s%s", int(sum[rowNr,colNr]/ARGIND+0.5), (colNr<NF ? OFS : ORS)
}
}
}
$ awk -f tst.awk file1 file2
1 2 3 5 5
2 4 5 6 7
3 3 4 6 7
The above uses GNU awk for ARGIND, with other awks just add a line FNR==1{ARGIND++} at the start.

linux command to combine multiple columns in a tab-delim file?

everyone!
How can I convert this
a 2 3 4
b 3 1 6
c 3 5 2
d 6 3 5
to below?
a-2:3 4
b-3:1 6
c-3:5 2
d-6:3 5
Thank you!!!
You can use awk, in your case :
awk -F \\t '{print $1"-"$2$3":"$4}' < input.txt
if the input is in the input.txt file or you can even pipe to awk

How to sum column of different file in bash scripting

I have two files:
file-1
1 2 3 4
1 2 3 4
1 2 3 4
file-2
0.5
0.5
0.5
Now I want to add column 1 of file-2 to column 3 of file-1
Output
1 2 3.5 4
1 2 3.5 4
1 2 3.5 4
I've tried this, but it does not work correctly:
awk '{print $1, $2, $3+file-2 }' file-2=$1_of_file-2 file-1 > file-3
I know the awk statement is not right but I want to use something like this; can anyone help me?
Your data isn't very exciting…
awk 'FNR == NR { for (i = 1; i <= NF; i++) { line[NR,i] = $i } fields[NR] = NF }
FNR != NR { line[FNR,3] += $1
pad = ""
for (i = 1; i <= fields[FNR]; i++) { printf "%s%s", pad, line[FNR,i]; pad = " " }
printf "\n"
}' file-1 file-2
The first pattern matches the lines in the first file; it saves each field into the pseudo-multidimensional array line, and also records how many fields there are in that line.
The second pattern matches the lines in the second file; it adds the value in column one to column three of the saved data, then prints out all the fields with a space between them, and adds a newline to the end.
Given this (mildly) modified input, the script (saved in file so-25657951.sh) produces the output shown:
$ cat file-1
1 2 3 4
2 3 6 5
3 4 9 6
$ cat file-2
0.1
0.2
0.3
$ bash so-25657951.sh
1 2 3.1 4
2 3 6.2 5
3 4 9.3 6
$
Note that because this slurps the whole of the first file into memory before reading anything from the second file, the input files should not be too large (say sub-gigabyte size). If they're bigger than that, you should probably devise an alternative strategy.
For example, there is a getline function (even in POSIX awk) which could be used to read a line from file 2 for each line in file 1, and you could then simply print the data without needing to accumulate anything:
awk '{ getline add < "file-2"; $3 += add; print }' file-1
This works reasonably cleanly for any size of file (as long as the files have the same number of lines — or, more precisely, as long as file-2 has at least as many lines as file-1).
This may work:
cat f1
1 2 3 4
2 3 6 5
3 4 9 6
cat f2
0.1
0.2
0.3
awk 'FNR==NR {a[NR]=$1;next} {$3+=a[FNR]}1' f2 f1
1 2 3.1 4
2 3 6.2 5
3 4 9.3 6
After I posted it, I do see that its the same as Jaypal posted in a comment.

Resources