I have trained Kaldi models (tri1b ... tri3b) and I am get WERs. I have also successfully installed sclite as well inside kalde/tools
I have read through the few pages of sclite documentations available. the only useful information I have been able to gather is that I need a ref.txt and hyp.txt.
can anyone please guide me step by step how to run the sclite too?
The website corenlp.run which is supposed to be CoreNLP's demo site, shows pretty different results from when I run the CoreNLP pipeline on my local machine.
The website actually shows the correct result, while the local machine version does not. I was wondering if anyone close to the CoreNLP project can explain the differences?
Case in point - this is what happens when I use this as an input "Give me a restaurant on Soquel Drive that serves good french food" (this is from the RestQuery dataset)
On CoreNLP (local machine, with Stanford's default model), I get this result:
root(ROOT-0, Give-1)
iobj(Give-1, me-2)
det(restaurant-4, a-3)
dobj(Give-1, restaurant-4)
case(Drive-7, on-5)
compound(Drive-7, Soquel-6)
nmod:on(Give-1, Drive-7) <--- WRONG HEAD
nsubj(serves-9, that-8)
acl:relcl(Drive-7, serves-9) <--- WRONG HEAD
amod(food-12, good-10)
amod(food-12, french-11)
dobj(serves-9, food-12)
While on corenlp.run, I get this result:
root(ROOT-0, Give-1)
iobj(Give-1, me-2)
det(restaurant-4, a-3)
dobj(Give-1, restaurant-4)
case(Drive-7, on-5)
compound(Drive-7, Soquel-6)
nmod:on(restaurant-4, Drive-7) <--- CORRECT HEAD
nsubj(serves-9, that-8)
acl:relcl(restaurant-4, serves-9) <--- CORRECT HEAD
amod(food-12, good-10)
amod(food-12, french-11)
dobj(serves-9, food-12)
You will note that there are two wrong heads in the local machine version. I have no idea why - especially if this is a model issue (I'm currently trying to debug the output of each annotator to see what the process returns)
These are the annotators I used: "tokenize,ssplit,pos,lemma,ner,parse,openie". The models are straight out of CoreNLP version 3.6.0
So can anyone help me understand why my results differ from the demo site's results?
CoreNLP comes with multiple parsers to obtain constituency and dependency trees. The default parser is the PCFG constituency parser which outputs constituency trees that are then converted to dependency trees.
corenlp.run, on the other hand, uses the neural net dependency parser which directly outputs dependency trees that can be different from the output of the default pipeline.
In order to get the same output on your local machine, use the following annotators:
tokenize,ssplit,pos,lemma,ner,depparse,openie
(lemma, ner, and openie are all optional in case you only need a dependency parse.)
I've been trying to use the Natural Logic Inference component (Naturalli) packaged with Stanford CoreNLP 3.5.2 to extract relation triples...however upon creating a new OpenIE instance I get the following exception:
Could not load affinity model at edu/stanford/nlp/naturalli/: Could not find a part of the path '...\edu\stanford\nlp\naturalli\pp.tab.gz'
I tried searching for the pp.tab.gz file on the web but I couldn't find it. Then I tried to get around by disabling the affinity:
Properties props = new Properties();
props.put("ignoreaffinity", true);
OpenIE ie = new OpenIE(props);
But then I started getting this following exception:
Could not load clause splitter model at edu/stanford/nlp/naturalli/clauseSplitterModel.ser.gz: Unable to resolve "edu/stanford/nlp/naturalli/clauseSplitterModel.ser.gz" as either class path, filename or URL
Same issue with this file...I couldn't find it anywhere.
Any help regarding how to solve these issues is greatly appreciated! Thanks for everyone in advanced!
This was recently put up, there are some downloads available here:
http://nlp.stanford.edu/software/openie.shtml
I would recommend using the jars pointed to there instead of the Stanford CoreNLP 3.5.2 release.
I am trying to use the Sphinx4 library for speech recognition, but I cannot seem to figure out the correct combination of acoustic model-dictionary-language model. I have tried out various combinations and I get a different error every time.
I am trying to follow the tutorial on http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4. I do not have a config.xml as I would if I was using ConfigurationManager instead of Configuration, because there is no perceivable way of passing the location of the config file to the Configuration itself (ConfigMgr takes it as an argument to the constructor); and that might be my problem right there. I just do not know how to point to one, and since the tutorial says "It is possible to configure low-level components of the application through XML file although you should do that ONLY IF you understand what is going on.", I assume having a config.xml file is not compulsory.
Combining the latest dictionary (7b - obtained from Sourceforge) with the latest acoustic model (cmusphinx-en-us-5.2.tar.gz - from SF again) and the language model (cmusphinx-5.0-en-us.lm.gz - from SF again) results in NullPointerException in startRecognition. The issue is similar to the problem here: sphinx-4 NullPointerException at startRecognition, but the link given in the answer no longer works. I obtained 0.7a from SF (since that is the dict the link seems to point at), but I am getting even earlier in the execution Error loading word: ;;; when I use that one. I tried downloading latest models and dict from the Github repo, that results in java.lang.IndexOutOfBoundsException: Index: 16128, Size: 16128.
Any help is much appreciated!
You need to use latest code from github
http://github.com/cmusphinx/sphinx4
as described by tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4
Correct models (en-us) are already included, you should not replace anything. You should not configure any XML files, use samples as provided in the sources.
I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don't use a dictionary are not accurate. Also I tried the WordNet but it is not good for my project. I found phpmorphy project but it doesn't include API in Java.
At this time I am looking for a database or a text file of english words with their different forms. for example:
run running ran ...
include including included ...
...
Thank you for your help or advise.
You could download LanguageTool (Disclaimer: I'm the maintainer), which comes with a binary file english.dict. The LanguageTool Wiki describes how to dump that file as a text file:
java -jar morfologik-tools-1.6.0-standalone.jar fsa_dump -x -d english.dict
For run, the file will contain this:
ran run VBD
run run NN
run run VB
run run VBN
run run VBP
running run VBG
runs run NNS
runs run VBZ
The first column is the inflected form, the second is the base form, and the third is the part-of-speech tag according to the (slightly extended) Penn Treebank tagset.