Pocketsphinx setKeywordThreshold() issue - cmusphinx

I am thinking to use pocketsphinx offline speech recognition for my app but its documentation is not clear. If anybody can give answers of following question then it will really help me a lot.
What is the role (use) of setKeywordThreshold(1e-5f) method. What is minimum and maximum value allowed in this method.
I want to give support for different languages and find in built acoustic models for some languages on this link http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/. but i cant understand which model will be best for which language because of lag of documentation. Can anybody please suggest me best in-build acoustic models for following languages -
(a). Australian English
(b). American English
(c). British English
(d). Canadian English
(e). European English
(f). Indian English
(g). Irish English
(h). New Zealand English
(i). South African English
(j). Russian
(k). Spanish
(l). French
(m). Dutch
(n). German
I just want to recognize numbers from 1 to 200 in each language. What is the best way to do this ?
I created a digits.gram file to recognize digits from 1 to 99 but it recognize background voice also. For example, When any background voice of drill machine occur then it recognize it as one. How could we recognize digits only when that particular digits is spoken ?
digits.gram file
#JSGF V1.0;
grammar digits;
<single> = one | two | three | four | five | six | seven | eight | nine ;
<digit> = <single> |
zero |
ten |
eleven |
twelve |
thirteen |
fourteen |
fifteen |
sixteen |
seventeen |
eighteen |
nineteen |
twenty |
thirty |
forty |
fifty |
sixty |
seventy |
eighty |
ninety |
twenty <single> |
thirty <single> |
forty <single> |
fifty <single> |
sixty <single> |
seventy <single> |
eighty <single> |
ninety <single> ;

The best way to solve problem 4 is to add a keyword to start the recognition.
When you have a keyword than you can suggest that user knows how to use your system and will say "hello, Pocketsphinx" before the real command.
So can try:
Use a keyword.
Filter the output by a confidence that should be returned by a decoder.
Also you can add few more common words as fallback to your dictionary so Pocketsphinx will match them instead your "correct" list, maybe this will increase accuracy. (but it can be even worth, you should play with it to find the best way to solve your scenario)

Related

CMU Sphinx4 - Custom Language Model

I have a very specific requirement. I am working on an application which will allow users to speak their employee number which is of the format HN56C12345 (any alphanumeric characters sequence) into the app. I have gone through the link: http://cmusphinx.sourceforge.net/wiki/tutoriallm but I am not sure if that would work for my usecase.
So my question is three-folds :
Can Sphinx4 actually recognize an alphanumeric sequence with high accuracy like an emp number in my case?
If yes, can anyone point me to a concrete example / reference page where someone has built custom language support in Sphinx4 from scratch. I haven't found a detailed step-by-step doc yet on this. Did anyone work on alphanumeric sequence based dictionaries or language models?
How to build an acoustic model for this scenario?
You don't need a new acoustic model for this, but rather a custom grammar. See http://cmusphinx.sourceforge.net/wiki/tutoriallm#building_a_grammar and http://cmusphinx.sourceforge.net/doc/sphinx4/edu/cmu/sphinx/jsgf/JSGFGrammar.html to learn more. Sphinx4 recognizes characters just fine if you put them space-separated in the grammar:
#JSGF V1.0
grammar jsgf.emplID;
<digit> = zero | one | two | three | four | five | six | seven | eight | nine ;
<digit2> = <digit> <digit> ;
<digit4> = <digit2> <digit2> ;
<digit5> = <digit4> <digit> ;
// This rule accepts IDs of a kind: hn<2 digits>c<5 digits>.
public <id> = h n <digit2> c <digit5> ;
As to accuracy, there are two ways to increase it. If the numbers of employees isn't too large, you can just make the grammar with all possible employee IDs. If this is not your case, than to have a generic grammar is your only option. Although it's possible to make a custom scorer which will use the context information to predict the employee ID better than the generic algorithm. This way requires some knowledge in both ASR and CMU Sphinx code.

Create (mathematical) function from set of predefined values

I want to create an excel table that will help me when estimating implementation times for tasks that I am given. To do so, I derived 4 categories in which I individually rate the task from 1 to 10.
Those are: Complexity of system (simple scripts or entire business systems), State of requirements (well defined or very soft), Knowledge about system (how much I know about the system and the code base) and Plan for implementation (do I know what to do or don't I have any plan what to do or where to start).
After rating each task in these categories, I want to have a resulting factor of how expensive and how long the task will likely take, as a very rough estimate that I can tell my bosses.
What I thought about doing
I thought to create a function where I define the inputs and then get the result in form of a number, see:
| a | b | c | d | Result |
| 1 | 1 | 1 | 1 | 160 |
| 5 | 5 | 5 | 5 | 80 |
| 10 | 10 | 10 | 10 | 2 |
And I want to create a function that, when given a, b, c, d will produce the results above for the extreme cases (max, min, avg) and of course any values (float) in between.
How can I go about doing this? I imagine this is some form of polynomial problem, but how can I actually create the function that creates these results?
I have tasks like this often, so it would be cool to have a sort of pattern to follow whenever I need to create such functions for any amount of parameters and results needed.
I tried using wolfram alphas interpolate polynomial command for this, but the result is just a mess of extremely large fractions...
How can I create this function properly with reasonable results?
While writing this edit, I realize this may be better suited over at programmers.SE - If no one answers here, I will move the question there.
You don't have enough data as it is. The simplest formula which takes into account all your four explanatory variables would be linear:
x0 + x1*a + x2*b + x3*c + x4*d
If you formulate a set of equations for this, you have three equations but five unknowns, which means that you don't have a unique solution. On the other hand, the data points which you did provide are proof of the fact that the relation between scores and time is not exactly linear. So you might have to look at some family of functions which is even more complex, and therefore has even more parameters to tune. While it would be easy to tune parameters to match the input, that choice would be pretty arbitrary, and therefore without predictive power.
So while your system of four distinct scores might be useful in the long run, I'd not use that at the moment. I'd suggest you collect some more data points, see how long a given task actually did take you, and only use that fine-grained a model once you have enough data points to fit all of its parameters.
In the meantime, aggregate all four numbers into a single number. E.g. by taking their average. Then decide on a formula to choose. E.g. a quadratic one:
182 - 22.9*a + 0.49*a*a
That's a fair fit for your requirements, and not too complex or messy. But the choice of function, i.e. a polynomial one, is still pretty arbitrary. So revisit that choice once you have more data. Note that this polynomial is almost the one Wolfram Alpha found for your data:
1642/9 - 344/15*a + 22/45*a*a
I only converted these rational numbers to decimal notation, which I truncated pretty early on since all of this is very rough in any case.
On the whole, this question appears more suited to CrossValidated than to Programmers SE, in my opinion. But don't bother them unless you have sufficient data to actually fit a model.

Multi dimensional Scenario Outlines in Specflow

I'm creating a Scenario Outline similar to the following one (it is a simplified version but gives a good indication of my problem):
Given I have a valid operator such as 'MyOperatorName'
When I provide a valid phone number for the operator
And I provide an '<amount>' that is of the following '<type>'
And I send a request
Then the following validation message will be displayed: 'The Format of Amount is not valid'
And the following Status Code will be received: 'AmountFormatIsInvalid'
Examples:
| type | description | amount |
| Negative | An amount that is negative | -1.0 |
| Zero | An amount that is equal to zero | 0 |
| ......... | .......... | .... |
The Examples table provides the test data that I need but I would add another Examples table with just the names of the operators (instead of MyOperatorName) in order to replicate the tests for different operators
Examples:
| operator |
| op_numb_1 |
| op_numb_2 |
| op_numb_3 |
in order to avoid repeating the same scenario outline three times; I know that this is not possible but I'm wondering what is the best approach to avoid using three different scenario outlines inside the feature that are pretty the same apart from the operator name.
I know that I can reuse the same step definitions but I'm trying to understand if there is a best practice to prevent cluttering the feature with scenarios that are too much similar.
Glad you know this isn't possible...
So what options are there?
Seems like there are 5:
a: Make a table with every option (the cross product)
Examples:
| type | description | amount | operator |
| Negative | An amount that is negative | -1.0 | op_numb_1 |
| Zero | An amount that is equal to zero | 0 | op_numb_1 |
| Negative | An amount that is negative | -1.0 | op_numb_2 |
| Zero | An amount that is equal to zero | 0 | op_numb_2 |
| ......... | .......... | .... | ... |
b. Repeat the scenario for each operator, with a table of input rows
- but you said you didn't want to do this.
c. Repeat the scenario for each input row, with a table of operators
- I like this option, because each rule is a separate test. If you really, really want to ensure that every different implementation of your "operator" strategy passes and fails in the same validation scenarios, then why not write each validation scenario as a single Scenario Outline: e.g.
Scenario Outline: Operators should fail on Negative inputs
Given I have a valid operator such as 'MyOperatorName'
When I provide a valid phone number for the operator
And I send a request with the amount "-1.0"
Then the following validation message will be displayed: 'The Format of Amount is not valid'
And the following Status Code will be received: 'AmountFormatIsInvalid'
Scenario Outline: Operators should fail on Zero inputs
...etc...
d. Rethink how you are using Specflow - if you only need KEY examples to illustrate your features (as described by Specification by Example by Gojko Adzic), then you are overdoing it by checking every combination. If however you are using specflow to automate your full suite of integration tests then your scenarios could be appropriate... but you might want to think about e.
e. Write integration / unit tests based on the idea that your "operator" validation logic is applied only in one place. If the validation is the same on each operator, why not test it once, and then have all the operators inherit from or include in their composition the same validator class?

SpecFlow/Cucumber/Gherkin - Using tables in a scenario outline

Hopefully I can explain my issue clearly enough for others to understand, here we go, imagine I have the two following hypothetical scenarios:
Scenario: Filter sweets by king size and nut content
Given I am on the "Sweet/List" Page
When I filter sweets by
| Field | Value |
| Filter.KingSize | True |
| Filter.ContainsNuts | False |
Then I should see :
| Value |
| Yorkie King Size |
| Mars King Size |
Scenario: Filter sweets by make
Given I am on the "Sweet/List" Page
When I filter sweets by
| Field | Value |
| Filter.Make | Haribo |
Then I should see :
| Value |
| Starmix |
These scenarios are useful because I can add as many When rows of Field/Value and Then Value entries as I like without changing the associated compiled test steps. However copy/pasting scenarios for different filter tests will become repetitive and take up alot of code - something I would like to avoid. Ideally I would like to create a scenario outline and keep the dynamic nature I have with the tests above, however when I try to do that I run into a problem defining the example table I cant add new rows as I see fit because that would be a new test instance, currently I have this:
Scenario Outline: Filter Sweets
Given I am on the <page> Page
When I filter chocolates by
| Field | Value |
| <filter> | <value> |
Then I should see :
| Output |
| <output> |
Examples:
| page | filter | value | output |
| Sweet/List | Filter.Make | Haribo | Starmix |
So I have the problem of being able to dynamically add rows to my filter and expected data when using a scenario outline, is anyone aware of a way around this? Should I be approaching this from a different angle?
A workaround could be something like :
Then I should see :
| Output |
| <x> |
| <y> |
| <z> |
Examples:
| x | y | z |
But thats not very dynamic.... hoping for a better solution? :)
I don't think what you're asking for is possible with SpecFlow, Gherkin, and out-of-the-box Cucumber. I can't speak for the authors, but I bet it purposely is not meant to be used this way because it goes against the overall "flow" of writing and implementing these specs. Among many things, the specs are meant to be readable to non-programmers, to give the programmer a guide to implement code that matches the specs, for integration testing, and to give a certian amount of flexibility when refactoring.
I think this is one of the situations where the pain you're feeling is a sign that there's a problem, but it may not be the one you think. You said:
"However copy/pasting scenarios for different filter tests will become repetitive and take up alot of code - something I would like to avoid. "
First, I'd disagree that explaining yourself in writing is "repetitive," at least any more than it's repetitive to use specific words like "the, apple, car, etc." over and over again. The issue is: Are these words properly explaining what you're doing? If they are, and explaining your situation requires you to write out multiple scenarios, then that's just what it requires. Communication requires words, and sometimes the same ones.
In fact, what you call "repetitive" is one of the benefits of using Gherkin and a tool like Cucumber or SpecFlow. If you're able to use that sentence over and over and over and over, it means you're not having to write the test code over and over and over and over.
Second, are you sure you're writing a spec for the right thing? I ask only because if the number of scenarios gets out-of-hand, to the point where you have so many that a human can't follow what you write, it's possible that your spec isn't targeted at the right thing.
A possible example of this could be how you're testing the filtering and the pagination in this scenario. Yes, you want your specs to cover full features and your site will have pagination on the same page as your filtering, but at what cost? It takes experience and practice to know when giving up on the supposed "ideal" of no-mocking, full-integration tests will yield better results.
Third, don't think that specs are meant to be perfect coverage for every possible scenario. The scenarios are basically snapshots of state, which means that there are some features that could cover an infinitely-large set of scenarios, which is impossible. So what do you do? Write features that tell the story as best you can. Even let the story drive the development. However, details that don't translate to your specs or other cases are best left to straight-up TDD, done in addition to the specs.
In your example, it seems that you basically are telling a story about a site that lets a user create a dynamic search against sweets and candy. They enter one of a large set of possible search criteria, click a button, and get results. Just stick to that, writing only enough specs to fulfill the story. If you're not satisfied with your coverage, clean it up with more specs or unit tests.
Anyway, that's just my thoughts, hope it helps.
Technically, I think you could try calling steps from within a step definition:
Calling Steps from Step Definitions
For example I think you could rewrite the
Then I should see :
| Output |
| <output> |
To be a custom step like
I should have output that contains <output>
Where output is a comma separated list of expected values. In the custom step you could break the comma separated list into an array and iterate over it calling
Then "I should see #{iterated_value}"
You could use a similar technique to pass in lists of filters and filter values. Your example row for the king size test might look like
| page | filter | value | output |
| Sweet/List | Filter.KingSize, Filter.ContainsNuts | True, False | Yorkie King Size, Mars King Size |
Or maybe
| page | filter-value-pairs | output |
| Sweet/List | Filter.KingSize:True, Filter.ContainsNuts:False | Yorkie King Size, Mars King Size |
That being said, you should perhaps take Darren's words to heart. I'm not really sure that this method would help the ultimate goal of having scenarios that are readable by non-developers.

Shotgun vs Sequential Input Layout

I would like to determine which of the two layouts below is the better layout. I would like usability to be the main concern. Which one is better (in terms of usability) and why is it better?
Shotgun
Use as much of the horizontal screen width as possible without causing horizontal scrolling to occur. Obvious benefit is that vertical scrolling will be minimized/eliminated and screen real estate is maximized.
Sequential
One input per line. Downside is that there could be significantly more scrolling than the Shotgun layout.
Shotgun Sequential
|----------------------------------| |-----------------------------------|
| | | |
| Input1: ______ Input2: ______ | | Input1: ______ |
| | vs | |
| Input3: ______ Input4: ______ | | Input2: ______ |
| | | |
|----------------------------------| | Input3: ______ |
| |
| Input4: ______ |
| |
|-----------------------------------|
The sequential has better usability.
In both layouts user discerns lines. In the Shotgun case each line is about two things which requires extra mental processing to understand. In the Sequential case each line is about a single concept which is simpler.
Having more than one concept on a line not only divides attention but also takes additional brain power to identify possible relations between the concepts, to analyze whether the inputs are meant to be related until the analysis subroutine says "no".
As a general rule, dense interfaces with high ratio of elements per space area are more tiring and slowing down than "white space" interfaces. Elements include any UI entity, be it an active input element, a passive textual comment or a graphical element.
I would agree with New in town, with the exception of times when the fields make more sense to be beside each other. Such as when you are entering first and last names for a number of people, such as:
First Name: __________ Last Name: __________
First Name: __________ Last Name: __________
First Name: __________ Last Name: __________
If these were to be in a sequential, it would be much harder to understand and group the fields together (in your head).

Resources