Standards for writing grammar file for sphinx 4 - cmusphinx

I want to write grammar file for speech recognition system. I am using sphinx 4 for doing that. But i am confused with the syntax of writing grammar file. I have tried to search it out how to write gram file but i am unable to find it out. Can anyone provide me a link for writing grammar file standards and syntax for it??

Related

Create a C and C++ preprocessor using ANTLR

I want to create a tool that can analyze C and C++ code and detect unwanted behaviors, based on a config file. I thought about using ANTLR for this task, as I already created a simple compiler with it from scratch a few years ago (variables, condition, loops, and functions).
I grabbed C.g4 and CPP14.g4 from ANTLR grammars repository. However, I came to notice that they don't support the pre-processing parsing, as that's a different step in the compilation.
I tried to find a grammar that does the pre-processing part (updated to ANTLR4) with no luck. Moreover, I also understood that if I'll go with two-steps parsing I won't be able to retain the original locations of each character, as I'd already modified the input stream.
I wonder if there's a good ANTLR grammar or program (preferably Python, but can deal with other languages as well) that can help me to pre-process the C code. I also thought about using gcc -E, but then I won't be able to inspect the macro definitions (for example, I want to warn if a user used a #pragma GCC (some students at my university, for which I write this program to, used this to bypass some of the course coding style restrictions). Moreover, gcc -E will include library header contents, which I don't want to process.
My question is, therefore, if you can recommend me a grammar/program that I can use to pre-process C and C++ code. Alternatively, if you can guide me on how to create a grammar myself that'd be perfect. I was able to write the basic #define, #pragma etc. processings, but I'm unable to deal with conditions and with macro functions, as I'm unsure how to deal with them.
Thanks in advance!
This question is almost off-topic as it asks for an external resource. However, it also bears a part that deserves some attention.
The term "preprocessor" already indicates what the handling of macros etc. is about. The parser never sees the disabled parts of the input, which also means it can be anything, which might not be part of the actual language to parse. Hence a good approach for parsing C-like languages is to send the input through a preprocessor (which can be a specialized input stream) to strip out all preprocessing constructs, to resolve macros and remove disabled text. The parse position is not a problem, because you can push the current token position before you open a new input stream and restore that when you are done with it. Store reported errors together with your input stream stack. This way you keep the correct token positions. I have used exactly this approach in my Windows resource file parser.

Speech to Text Method Using Python

Good day. I am currently working on Machine Translation (Speech-(Text--Text)-Speech) with our local dialects and I already have the speech and text corpus. However, I am facing a problem in recording the speech as input and transcribing it to a text file because the modules available for speech recognition did not cover our dialects, mostly it just supports English and other major languages.
Is there anyone who know how I can fix it? I would be honored to accept your valuable suggestions and it will help me a lot on my studies. Thanks!
To work on text-to-speech for unusual dialects is a big challenge since frequently the audio models do not exist and have to be created from scratch. A good place to start is with one of the tutorials from http://voxforge.org. At this site you will find not only the tutorials involving a number of audio decoders and model generators but also a useful forum where students of various languages other than English have found solutions related to their own dialect problems.
A general plan might be as follows: build a simple English model following the examples given to get used to the terminology and concepts and process involved. Given your inevitable success with English, you can then turn your possession of a native corpus to advantage by building models for your own dialect. It is a reasonable goal and has been done many times before. Be warned however that to get good recognition across a broad vocabulary you will need a very comprehensive corpus data set.

Bulk load XML files into Cassandra

I'm looking into using Cassandra to store 50M+ documents that I currently have in XML format. I've been hunting around but I can't seem to find anything I can really follow on how to bulk load this data into Cassandra without needing to write some Java (not high on my list of language skills!).
I can happily write a script to convert this data into any format if it would make the loading easier although CSV might be tricky given the body of the document could contain just about anything!
Any suggestions welcome.
Thanks
Si
If you're willing to convert the XML to a delimited format of some kind (i.e. CSV), then here are a couple options:
The COPY command in cqlsh. This actually got a big performance boost in a recent version of Cassandra.
The cassandra-loader utility. This is a lot more flexible and has a bunch of different options you can tweak depending on the file format.
If you're willing to write code other than Java (for example, Python), there are Cassandra drivers available for a bunch of programming languages. No need to learn Java if you've got another language you're better with.

Read .SAV or SPSS file from c# code

May you please suggest me that, is it possible to read .SAV file from Code behind in c#? If yes then you may guide me about the procedure and the dll's.
Can we convert .SAV to .CSV from code behind C#.
SPSS makes available free libraries that provide apis for reading and writing sav files. They would need some slight wrapping for use with C#. You can obtain these libraries and their documentation from the SPSS Community website at www.ibm.com/developerworks/spssdevcentral in the Downloads for SPSS Statistics section.

Generation of a .cpp file

I am new to Visual C++ and I am using Microsoft Visual C++ 6.0 to build an application.
The application for now has to generate a .cpp file from a proprietory .cfg file. Can anyone please guide how this can be achieved. Any help or guidance is much appreciated.
Thanks,
Viren
Your question is a little vague, however it sounds like you need to develop some kind of parser to read in the cfg files and translate it into some form of intermediate language or object graph, optimize it, and then output it to c++. Sounds to me like a job for a home-grown compiler.
If you aren't familar with the different phases of a compiler I would highly recommend you check out the infamous dragon book
http://www.amazon.com/Compilers-Principles-Techniques-Tools-2nd/dp/0321486811/ref=sr_1_2?ie=UTF8&s=books&qid=1244657404&sr=8-2
Then again, if this is for an important project with a deadline you probably don't have a lot of time to spend in the world of compiler theory. Instead you might want to check out antlr. It is really useful for creating a lexar and parser for you based on grammar rules that you define from the syntax of the cfg files. You can use the antlr parser to translate the cfg files into an AST or some other form of object graph. At that point you are going to be responsible for manipulating, optimizing and outputting the c++ syntax to a new file.
I haven't read it yet but this is supposed to be an excellent book for novice and experienced antlr users
http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
plus there a plenty of antlr tutorials and examples online that I've used to help learn it. Hope that helps put you in the right direction.

Resources