HaplotypeCaller provide variants more than expected - vcf-variant-call-format

I used HaplotypeCaller for variant calling out of WES picard.sorted.MarkedDup.bam file with GATK 4.2.6.1. HaplotypeCaller standard command line.
Apparently, everything worked well and I received standard .vcf file. But the number of identified variants are too much for WES result. It's close to one million variants for one sample!
Did I perform something wrong?
What solution do you recommend?
Any help would be appreciated.
The command line I used was as follow:
gatk --java-options -Xmx8g HaplotypeCaller \ -R $refFile \ -I ${base}.picard.sorted.markedDup.bam \ --dont-use-soft-clipped-bases -stand-call-conf 20.0 \ --emit-ref-confidence GVCF \ -O ${base}.rrrrealigned.vcf

Related

Annotating a Corpus (Syntaxnet)

I downloaded and installed SyntaxNet following Syntax official documentation on Github. following the documentation (annotating corpus) I tried to read a .conll file named wj.conll by SyntaxNet and write the results in wj-tagged.conll but I could not. My questions are:
does SyntaxNet always reads .conll files? (not .txt files?). I got a bit confused as I knew SyntaxNet reads .conll file for training and testing process but I am a bit suspicious that it is necessary to convert a .txt file to .conll file in order to have their Part Of Speach and Dependancy Parsing.
How can I make SyntaxNet reads from files (I tired all possible ways explain in GitHub documentation about SyntaxNet and It didn't work for me)
Add these declaration lines to "context.pbtxt" at the end of the file. Here "inp" and "out" are the text files present in the root directory of syntexnet.
input {
name: 'inp_file'
record_format: 'english-text'
Part {
file_pattern: 'inp'
}
}
input {
name: 'out_file'
record_format: 'english-text'
Part {
file_pattern: 'out'
}
}
Add sentences to the "inp" file for which you want tagging to be done and specify them in shell the next time you run syntaxnet using --input and --output tags.
Just to help you a bit more I am pasting an example shell command.
bazel-bin/syntaxnet/parser_eval \
--input inp_file \
--output stdout-conll \
--model syntaxnet/models/parsey_mcparseface/tagger-params \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--hidden_layer_sizes 64 \
--arg_prefix brain_tagger \
--graph_builder structured \
--slim_model \
--batch_size 1024 | bazel-bin/syntaxnet/parser_eval \
--input stdout-conll \
--output out_file \
--hidden_layer_sizes 512,512 \
--arg_prefix brain_parser \
--graph_builder structured \
--task_context syntaxnet/models/parsey_mcparseface/context.pbtxt \
--model_path syntaxnet/models/parsey_mcparseface/parser-params \
--slim_model --batch_size 1024
In the above script the output(POS tagging) of the first shell command is used as an input for the second shell command, where the two shell commands are seperated by "|"
just a quick help if you want to save the output of demo in a .txt file:
try echo "open file X with application Y" | ./demo.sh > output.txt
it gives you sentence tree to the current directory.

Unknown option: %{strftime("%c")}%=0x%B\\

I have write the code according to book*learning vi and vim* p202
set statusline=%<%t%h%m%r\ \ %a\ %{strftime(\"%c\")}%=0x%B\
\\ line:%1,\ \ col:%c%V\ %P
i write the sentence in my _vimrc ,when i open a file ,an mistake occur .
Unknown option: %{strftime("%c")}%=0x%B\\
what is the matter?
Just before the %{strftime, you have two space characters, and only the first one is properly escaped with \. Therefore, Vim thinks the option value ends there and another option name begins. You need to either remove that additional space, or escape it (same for later occurrences of multiple spaces):
set statusline=%<%t%h%m%r\ \ %a\ \ %{strftime(\"%c\")}%=0x%B\
\\ \ line:%1,\ \ \ \ col:%c%V\ %P
As this is cumbersome and hard to read and edit, an alternative is to use :let, which avoids that escaping:
let &statusline = '%<%t%h%m%r %a %{strftime("%c")}%=0x%B line:%1, col:%c%V %P'

Target for make gives "nothing to be done"

I have an issue with "make" (Oh, the horror!).
We're trying to migrate some COBOL code from Windows to Linux. The compiler and such are from Micro Focus. Under Windows the code is developed with Micro Focus Net Express. Linux has Micro Focus Server Express as the equivalent. The programs are compiled and linked using "make" scripts.
So much for the background.
The problem is a "make" script that doesn't want to compile and link an executable under Linux. The targets look like this:
# HP INIT-Daten laden
#
datLoad$O: \
$(UI)/defretrn.cpy \
$(UI)/e12sy00s.cpy \
$(UI)/e12sy005.cpy \
$(UI)/e12sy006.cpy \
$(UI)/e12sy010.cpy \
$(UI)/e12sy013.cpy \
$(UI)/e12sy050.cpy \
$(UI)/e12db001.cpy \
$(UI)/e12db050.cpy \
$(UI)/evlg.cpy \
$(UI)/deffehl.cpy \
datLoad.xcbl $(FRC)
# #echo "dollar-O is \"$O\" in $#"
datLoad$X: $(LIBDSQL) datLoad$O \
$(LP)/evlg$O $(LP)/alock$O
$(LCOB) -o $(#:$X=) -e $(#:$X=) $(LCOBFLAGS) \
-d e12db001 -d e12db003 -d e12db012 \
-d e12sy005 -d e12sy006 -d e12sy009 \
-d e12sy010 -d e12sy012 -d e12sy013 \
-d e12sy050 \
-I EvLgSetCategory $(LP)/evlg$O \
-I ALckSetDebug $(LP)/alock$O \
$(LIBEXEEXT) "$(LIBXSQL)"
if [ -f $B/$# -a ! -w $B/$# ] ; then rm -f $B/$# ; fi
cp $# $B
To put this into context, $0=".o" (i.e. an object file extension). $(LCOB) is the link command. $X=".exe" (an executable ... just forget about the extension, we'll fix that in due course). All the other stuff relates to paths ==> not relevant to the issue at hand and, yes, they've all been checked and verified.
Ultimately, I am trying to get "make" to resolve a target called "datLoad.o".
Included is a second "make" script containing the following:
COBFLAGS = -cx # create object file
GNTFLAGS = -ug # create .gnt file
SOFLAGS = -z # create
LCOB = cob
...
.cbl$O:
$(CCOB) $(COBFLAGS) $*.cbl, $*$O, NUL, NUL
if [ -f $(LP)/$*$O -a ! -w $(LP)/$*$O ] ; then rm -f $(LP)/$*$O ; fi
cp $*$O $(LP)
The relevant part is the target which resolves to ".cbl.o:". Yes, that's the shorthand version and I don't really like it but I did not write this script. I'm assured that it really means *.o:*.cbl and other similar constructs in the script do work correctly.
With a simple "make" I get a link error:
> In function `cbi_entry_point': (.data+0x384): undefined reference to
> `datLoad' /tmp/cobwZipkt/%cob0.o: In function `main': (.text+0x28):
> undefined reference to `datLoad' make: *** [datLoad.exe] Error 1
That means datLoad.o was not created. If I do create it explicitly with:
cob -cx datload
Then "make" still gives the same error as above. Weird! However, what I really cannot understand is the response I get from "make datLoad.o" when the target does not exist:
make: Nothing to be done for `datLoad.o'.
I assumed (heaven help me) that the target "datLoad.o" would try to create the required target file if that file does not already exist. Am I going mad?
Sorry if this seems a bit obscure, I'm not sure how to phrase it better. If anybody has an idea what might be going on, I'd be really grateful...
Thank you Mad Scientist. Your tip was correct.
The included .mk contained a .SUFFIXES rule. The problem was that the $O was not being used consistently. $O was originally set to ".obj" for Windows. Under Linux it's ".o". However, the .SUFFIXES rule had the ".obj" hard coded into it, so of course the ".o" targets were not being recognised. I replaced the hard coded suffix with the $O variable and it now works.
Achim

Export vim syntax highlighting to latex

I would like to exploit vim's syntax highlighting capabilities to highlight code (any language) in latex (using the xcolor package). Therefore I wonder if it is possible to have a vim-script export the vim internal information about the highlighted text in the buffer. Obviously it would be sufficient to know start, end and color of each highlighted entity. The generation of the latex code or other languages such as html would then be obvious.
You can use my formatvim plugin: it can export to latex-xcolor format with
Format format latex-xcolor
. If you are not fine with the result (it is completely untested and I almost never used this option) feel free to send patches, dictionary with format specification can be seen here, everything what you need to create your own format is in documentation.
Note: if you need to export to any other language all you need is to write a specification for it in terms of my plugin. Here is a code that will add latex-xcolor-clone format to my plugin:
scriptencoding utf-8
execute frawor#Setup('0.0', {'plugin/format': '3.0'})
let s:texescape=
\'substitute('.
\ 'substitute(###, ''\v[\\\[\]{}&$_\^%#]'', '.
\ '''\=''''\char''''.char2nr(submatch(0))."{}"'', '.
\ '"g"),'.
\'" ", ''\\enskip{}'', "g")'
let s:texstylestart=
\'((#inverse#)?'.
\ '(''\colorbox[HTML]{''.'.
\ '((#fgcolor#!=#"")?'.
\ '(toupper(#fgcolor#[1:])):'.
\ '(toupper(#_fgcolor#[1:])))."}{".'.
\ '''\textcolor[HTML]{''.'.
\ '((#bgcolor#!=#"")?'.
\ '(toupper(#bgcolor#[1:])):'.
\ '(toupper(#_bgcolor#[1:])))."}{"):'.
\ '(((#bgcolor#!=#"")?'.
\ '(''\colorbox[HTML]{''.toupper(#bgcolor#[1:])."}{"):'.
\ '("")).'.
\ '''\textcolor[HTML]{''.'.
\ '((#fgcolor#!=#"")?'.
\ '(toupper(#fgcolor#[1:])):'.
\ '(toupper(#_fgcolor#[1:])))."}{"))'
let s:texstyleend=
\'repeat("}", '.
\ '((#inverse#)?'.
\ '(2):'.
\ '((#bgcolor#!=#"")+1)))'
let s:format={
\'begin': '\documentclass[a4paper,12pt]{article}'.
\ '\usepackage[utf8]{inputenc}'.
\ '\usepackage[HTML]{xcolor}'.
\ '\pagecolor[HTML]{%''toupper(#_bgcolor#[1:])''%}'.
\ '\color[HTML]{%''toupper(#_fgcolor#[1:])''%}'.
\ '\begin{document}{\ttfamily\noindent',
\'line': '%>'.s:texstylestart.".".
\ s:texescape.".".
\ s:texstyleend,
\'lineend': '\\',
\'end': '}\end{document}',
\'strescape': s:texescape,
\}
call s:_f.format.add('latex-xcolor-clone', s:format)
The :TOhtml command is built in Vim. It, rather obviously, generates HTML, not Latex, though.

Ghostscript and high resolutions?

I am writing a script that reads some markup data, generates a tex document and converts it to a png image.
As long as I use a resolution up tp 286 px/inch everything works fine. Unfortunately GhostScript, which I use to create picture data, does nothing when I use higher values.
How can I fix this behaviour?
Since info about your problem is not very detailed (What kind of fonts are used in the TeX document? Are they Chinese, Japanese, Korean, or...? Which is the Ghostscript commandline you're using?) ... here is a thing to check. But it is only a first guess: try to add "-c "100000000 setvmthreshold" -f /path/to/pdffile.pdf" to your command:
gswin32c.exe ^
-o c:/path/to/output.png ^
-sDEVICE=png ^
-r600x600 ^
-c "100000000 setvmthreshold" ^
-f /path/to/pdffile.pdf
This will allow for ~100 MByte extra RAM usage by Ghostscript. If you are on X-Windows (Linux, Unix), then "-dMaxBitmap=..." could help (provided you've enough of RAM):
gs \
-o /path/to/output.png \
-sDEVICE=png \
-r600x600 \
-dMaxBitmap=100000000 \
-c "100000000 setvmthreshold" \
-f /path/to/pdffile.pdf

Resources