I want to replace the second column of my first file
file 1:
2 rs58086319 0 983550 T C
2 rs56809628 0 983571 T C
2 rs7608441 0 983572 A G
2 rs114910509 0 983579 A G
2 var_chr2_983614 0 983614 T C
2 var_chr2_983624 0 983624 A G
2 rs115188027 0 983632 A C
2 var_chr2_983636 0 983636 T C
2 var_chr2_983650 0 983650 A G
2 var_chr2_983660 0 983660 T C
with the first column of my second file
file 2:
2_983550_T_C
2_983571_T_C
2_983572_A_G
2_983579_A_G
2_983614_T_C
2_983624_A_G
2_983632_A_C
2_983636_T_C
2_983650_A_G
2_983660_T_C
I've tried join and awk but somehow it doesn't seem to work. I suspect the fact that there's '_' on my second file.
Thank you
I'm a bit puzzled why you need a second file. All information of file2 seems to be encoded in file1. You could just do something like this :
awk '{$2=$1"_"$4"_"$5"_"$6}1' file1
Your file2 have only one column so with awk.
awk -v f='file2' '{getline $2 <f}1' file1
If the separator of file2 is "_"
awk -v f='file2' '{getline a <f;split(a,b,"_");$2=b[1]}1' file1
EDIT: In case you want to make _ as field separator in Input_file2 then following may help you.
awk 'FNR==NR{a[FNR]=$1;next} (FNR in a){$2=a[FNR]} 1' FS="_" file2 FS=" " file1 | column -t
Following awk may help you here.
awk 'FNR==NR{a[FNR]=$0;next} (FNR in a){$2=a[FNR]} 1' file2 file1 | column -t
I would go with paste and awk, e.g.:
paste file1 file2 | awk '{ $2 = $NF } NF--' OFS='\t'
Output:
2 2_983550_T_C 0 983550 T C
2 2_983571_T_C 0 983571 T C
2 2_983572_A_G 0 983572 A G
2 2_983579_A_G 0 983579 A G
2 2_983614_T_C 0 983614 T C
2 2_983624_A_G 0 983624 A G
2 2_983632_A_C 0 983632 A C
2 2_983636_T_C 0 983636 T C
2 2_983650_A_G 0 983650 A G
2 2_983660_T_C 0 983660 T C
Related
I'm working with GWAS data.
Using p-link command I was able to get SNPslist, SNPs.map, SNPs.ped.
Here are the data files and commands I have for 2 SNPs (rs6923761, rs7903146):
$ cat SNPs.map
0 rs6923761 0 0
0 rs7903146 0 0
$ cat SNPs.ped
6 6 0 0 2 2 G G C C
74 74 0 0 2 2 A G T C
421 421 0 0 2 2 A G T C
350 350 0 0 2 2 G G T T
302 302 0 0 2 2 G G C C
bash commands I used:
echo -n IID > SNPs.csv
cat SNPs.map | awk '{printf ",%s", $2}' >> SNPs.csv
echo >> SNPs.csv
cat SNPs.ped | awk '{printf "%s,%s%s,%s%s\n", $1, $7, $8, $9, $10}' >> SNPs.csv
cat SNPs.csv
Output:
IID,rs6923761,rs7903146
6,GG,CC
74,AG,TC
421,AG,TC
350,GG,TT
302,GG,CC
This is about 2 SNPs, so I can see manually their position so I added and called using the above command. But now I have 2000 SNPs IDs and their values. Need help with bash command which can parse over 2000 SNPs in the same way.
One awk idea that replaces all of the current code:
awk '
BEGIN { printf "IID" }
# process 1st file:
FNR==NR { printf ",%s", $2; next }
# process 2nd file:
FNR==1 { print "" } # terminate 1st line of output
{ printf $1 # print 1st column
for (i=7;i<=NF;i=i+2) # loop through columns 7-NF, incrementing index +2 on each pass
printf ",%s%s", $i, $(i+1) # print (i)th and (i+1)th columns
print "" # terminate line
}
' SNPs.map SNPs.ped
NOTE: remove comments to declutter code
This generates:
IID,rs6923761,rs7903146
6,GG,CC
74,AG,TC
421,AG,TC
350,GG,TT
302,GG,CC
You can use --recodeA flag in plink to have your IID as rows and SNPs as columns.
I have an o/p like below.I want the values of first column correspondent to a input value for second column.
Ex: in column 1, 0 and 1 belongs to 0 value of column 2.
So I need a command in which if I pass 0(second column values) I must get 0,1
dmpgdo dbsconfig 0 | grep AMP | grep Online | awk -F' ' '{print $1,$4}'
0 0
1 0
2 1
3 1
4 2
5 2
6 3
7 3
Will this do?
printf "0 0\n1 0\n2 1\n3 1\n4 2\n5 2\n6 3\n7 3" | awk '{if ($2 == 0) print $1}'
0
1
The following question is somehow tricky but seemingly simple , i need to use bash
let us suppose i have 2 text files, the first on is
FirstFile.txt
0 1
0 2
1 1
1 2
2 0
SecondFile.txt
0 1
0 2
0 3
0 4
0 5
1 0
1 1
1 2
1 3
1 4
1 5
2 1
2 2
2 3
2 4
2 5
I want to be able to create a new Thirdfile.txt that contains that values that are not in file A , meaning if there is a common variable with file A i want it to be removed. knowing that 2 0 and 0 2 are the same ...
Can you help me out ?
Using awk, you can rearrange the columns so that the lower number is always first. When reading the first file, save them as keys in an associative array. When reading the second file, test if they're not found in the array.
awk '{if ($1 <= $2) { a = $1; b = $2; } else { a = $2; b = $1 } }
FNR==NR { arr[a, b] = 1; next; }
!arr[a, b]' FirstFile.txt SecondFile.txt > ThirdFile.txt
Results:
0 3
0 4
0 5
1 3
1 4
1 5
2 2
2 3
2 4
2 5
paste <(cut -f2 a.txt) <(cut -f1 a.txt) > tmp.txt
cat a.txt b.txt tmp.txt | sort | uniq -u
or
cat a.txt b.txt <(paste <(cut -f2 a.txt) <(cut -f1 a.txt)) | sort | uniq -u
Result
0 3
0 4
0 5
1 3
1 4
1 5
2 2
2 3
2 4
2 5
Explanation
uniq removes duplicate rows from a text file.
uniq requires that its input be sorted.
uniq -u prints only the rows that do not have duplicates.
So, cat a.txt b.txt | sort | uniq -u will almost get you there: Only rows in b.txt that are not in a.txt will get printed. However it doesn't handle the reversed cases, like '1 2' <-> '2 1'.
Therefore, you need a temp file that holds all the reversed removal keys. That's what paste <(cut -f2 a.txt) <(cut -f1 a.txt) does.
Note that cut assumes columns are separated by \t's. If they are not, you will need to specify a delimiter with, for example, -d ' '.
I have two files of different lengths, with file2 being a big reference file, which I extract data from for file 1.
I have a line of awk which I normally tweak to do find and replace in my files, but it is always find and replace in the same column.
So for something like, if $1 of file1 = $7 of file2, replace $1 of file1 with $2 of file2, I would normally use:
awk 'FNR==NR{a[$7]=$2;next}a[$1]{$1=a[$1]}1' file2 file1 > newfile
However, I am trying to think of a way to code:
If $2 of file1 = $2 of file2, replace $1 file1 with $1 of file2.
But in the above code, I do not know which $1 refers to "find" and which $1 refers to "replace".
file1 looks like
0 rs58108140 0 0 G A
0 rs189107123 0 0 C G
0 rs180734498 0 0 C T
file2 looks like
1 rs58108140 0 10583 G A 1:10583
1 rs189107123 0 10611 C G 1:10611
1 rs180734498 0 13302 C T 1:13302
Desired output would be:
1 rs58108140 0 10583 G A
1 rs189107123 0 10611 C G
1 rs180734498 0 13302 C T
Thanks in advance for any help given.
this one-liner would do:
awk 'NR==FNR{a[$2]=$1;b[$2]=$4;next}$2 in a{$1=a[$2];$4=b[$2]}7' f2 f1
I have the following text file:
A,B,C
A,B,C
A,B,C
Is there a way, using standard *nix tools (cut, grep, awk, sed, etc), to process such a text file and get the following output:
A
A
A
B
B
B
C
C
C
You can do:
tr , \\n
and that will generate
A
B
C
A
B
C
A
B
C
which you could sort.
Unless you want to pull the first column then second then third, in which case you want something like:
awk -F, '{for(i=1;i<=NF;++i) print i, $i}' | sort -sk1 | awk '{print $2}'
To explain this, the first part generates
1 A
2 B
3 C
1 A
2 B
3 C
1 A
2 B
3 C
the second part will stably sort (so the internal order is preserved)
1 A
1 A
1 A
2 B
2 B
2 B
3 C
3 C
3 C
and the third part will strip the numbers
You could use a shell for-loop combined with cut if you know in advanced the number of columns. Here is an example using bash syntax:
for i in {1..3}; do
cut -d, -f $i file.txt
done
Try:
awk 'BEGIN {FS=","} /([A-C],)+([A-C])?/ {for (i=1;i<=NF;i++) print $i}' YOURFILE | sort