CMUSphinx with custom features - cmusphinx

I'm trying to train a new accoustic model for speechrecognition, with custom features. Is it possible to feed the feature matrix instead of wav file into CMUSphinx for training? I have tried searching on google and the tutorial page and can't find any information about it. Am i missing something?

Is it possible to feed the feature matrix instead of wav file into CMUSphinx for training?
It is better to train with more modern toolkit like Kaldi.
I have tried searching on google and the tutorial page and can't find any information about it. Am i missing something?
You just prepare mfc files yourself and skip first feature_extraction step. You have to write code for that like described here:
https://cmusphinx.github.io/wiki/mfcformat/
In kaldi you just prepare ark files yourself:
https://pypi.org/project/kaldiio/

Related

Can't use custom 3D model for visualization

I'm using the Modelica.Mechanics.MultiBody.Visualizers.FixedShape to render custom 3D models. I tried using .dxf and .stl in ASCII format but none works.
For .dxf I get no error but the model doesn't show up.
For .stl I get an error saying there is no plugin for that file type.
Please see the highlighted portions in the above image. OpenModelica supports adding an external shape using the format shapeType=modelica://<Modelica-name>/<relative-path-file-name>. Preferably create a Resources folder in the same level under the library folder and add dxf files in them. The following animation shows the same. Hope this helps!

How to prepare test data for textsum?

I have been able to successfully run the pre-trained model of TextSum (Tensorflow 1.2.1). The output consists of summaries of CNN & Dailymail articles (which are chuncked into bin format prior to testing).
I have also been able to create the aforementioned bin format test data for CNN/Dailymail articles & vocab file (per instructions here). However, I am not able to create my own test data to check how good the summary is. I have tried modifying the make_datafiles.py code to remove had coded values. I am able to create tokenized files, but the next step seems to be failing. It'll be great if someone can help me understand what url_lists is being used for. Per the github readme -
"For each of the url lists all_train.txt, all_val.txt and all_test.txt, the corresponding tokenized stories are read from file, lowercased and written to serialized binary files train.bin, val.bin and test.bin. These will be placed in the newly-created finished_files directory."
How is a URL such as http://web.archive.org/web/20150401100102id_/http://www.cnn.com/2015/04/01/europe/france-germanwings-plane-crash-main/ being mapped to the corresponding story in my data folder? If someone has had success with this, please do let me know how to go about this. Thanks in advance!
Update: I was able to figure out how to use own data to create bin files for testing (and avoid using url_lists altogether).
This will be helpful - https://github.com/dondon2475848/make_datafiles_for_pgn
Will update answer once I figure out how to fix ROGUE scoring for this.

OFFIS DICOM scope toolkit Structured report link to image

Does anybody knows how to create a structured report using dicom scope toolkit via console (ubuntu 16.04) with a link to a related image?
The thing is that I have an image of some kind of trauma and I have to connect with a report which is in a text file. The last file should be in .dcm format which contains annotation and a link to an image. I have to use dicom scope program.
Maybe others refrain from answering because your question needs a very long answer. I cannot provide step-by-step instructions, a few hints, though.
The way I would go is to:
(assuming that your image is available in DICOM format):
obtain a sample structured report. I think that the "simple" Basic Text SR is what you want to go for. You can find some samples here.
convert the SR to an XML file using dsr2xml
edit the contents in XML. Do not forget to include your image reference in (0040,a730) Content Sequence -> (0008,1199) Referenced SOP Sequence
convert the XML back to DICOM SR using xml2dsr
By the way: From your question, I did not really understand why you want to use a structured report, as you wrote that your report is plain text. Instead of digging into the complex structure of SR, you may want to consider exporting the report to an Encapsulated PDF document which can reference images as well.

TensorFlow example for text classification - how to evaluate your own text?

Does any one have full steps and example for TensorFlow example for passing in your own text files and getting them evaluated against the existing model that comes with examples - using train.py as documented?
Also, if I wanted to train on different input set of say 1000 text files of my own samples, and then use that model for new text files? I know there is documentation but is terse for someone who is not familiar with text classification process.
I was able to run image example against my own images as that was only requiring to swap out one image .jpg file name for myh new image file, but for text it seems to be more involved.
Thanks
Here is an example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/text_classification.py
You can set the flag test_with_fake_data to use the fake data in text_train.csv (training samples) and text_test.csv (testing samples) here. Next, you can modify these two files to include whatever data you'd like to have. You will need to do some preprocessing if your existing text files are in a different format.
You need to load the vocabulary file saved during training and process your new text with that. See the eval.py file here
Change the data parameters with your input text and proceed.

Java converter from kml/shapefile to Geojson

I would like to write a command line program using Java that take in KML/Shapfile and output GeoJSON file.
What I usually did is go over ogre2ogre and manually convert my file.
Once I got the GeoJson I modified the content of it a little bit before output final GeoJSON.
I would like to skip the manual part and find some API that do the conversion for me.
Anyone could help please.
Thanks
OSMBonusPack provides a KML+GeoJSON toolkit, with both a KML parser/writer and a GeoJSON parser/writer, all in Java.
So this allows to read KML content, and write it as GeoJSON.
You can test this conversion using the demo app OSMNavigator.
It is targeting Android, so for your need you would have to pick the relevant classes, and remove code sections you don't need (icon loading, overlay building, Parcelable implementation, for instance).

Resources