Parameters for configuring the decoder in pocketsphinx - cmusphinx

I began to study the pocketsphinx.
I have a list of possible parameters for configuring the decoder. But there is no explanation of which parameter is responsible for which configuring. In the tutorial CMUSphinx there is only a small part of them. This is not enough for me. Somebody has materials, which explains what parameter is responsible for which configuring. I will be very grateful for the help!
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+00
-alpha 0.97 9.700000e-01
-ascale 20.0 2.000000e+01
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-48
-bestpath yes yes
-bestpathlw 9.5 9.500000e+00
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-08
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-64
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+00
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-29
-fwdtree yes yes
-hmm
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+00
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+02
-lpbeam 1e-40 1.000000e-40
-lponlybeam 7e-29 7.000000e-29
-lw 6.5 6.500000e+00
-maxhmmpf -1 -1
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-07
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+00
-pbeam 1e-48 1.000000e-48
-pip 1.0 1.000000e+00
-pl_beam 1e-10 1.000000e-10
-pl_pbeam 1e-5 1.000000e-05
-pl_window 0 0
-rawlogdir
-remove_dc no no
-round_filters yes yes
-samprate 16000 1.600000e+04
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-03
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-04
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+03
-usewdphones no no
-uw 1.0 1.000000e+00
-var
-varfloor 0.0001 1.000000e-04
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-29
-wip 0.65 6.500000e-01
-wlen 0.025625 2.562500e-02

Type pocketsphinx_continuous on command line and you will get all parameter list along with their default value and description.
like this:
Arguments list definition:
[NAME] [DEFLT] [DESCR]
-adcdev Name of audio device to use for input.
-agc none Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
-agcthresh 2.0 Initial threshold for automatic gain control
-allphone Perform phoneme decoding with phonetic lm
-allphone_ci yes Perform phoneme decoding with phonetic lm and context-independent units only
-alpha 0.97 Preemphasis parameter
-argfile Argument file giving extra arguments.
-ascale 20.0 Inverse of acoustic model scale for confidence score calculation
...

No need to install pocket-sphinx.
The whole list is in the source repo:
https://github.com/cmusphinx/pocketsphinx/blob/master/doc/pocketsphinx_continuous.1

Also I'd like to add that short description of parameters are not quite easy since most of them are parameters of the complex algorithms used in speech recognition like gaussian selection or trellis search. If you are interested in details, you'd better read more about algorithms. A good source is a thesis by Dr. Mosur K. Ravishankar:
Efficient Algorithms for Speech recognition

Related

How to calculate confidence intervals for crude survival rates?

Let's assume that we have a survfit object as follows.
fit = survfit(Surv(data$time_12m, data$status_12m) ~ data$group)
fit
Call: survfit(formula = Surv(data$time_12m, data$status_12m) ~ data$group)
n events median 0.95LCL 0.95UCL
data$group=HF 10000 3534 NA NA NA
data$group=IGT 70 20 NA NA NA
fit object does not show CI-s. How to calculate confidence intervals for the survival rates? Which R packages and code should be used?
The print result of survfit gives confidnce intervals by group for median survivla time. I'm guessing the NA's for the estimates of median times is occurring because your groups are not having enough events to actually get to a median survival. You should show the output of plot(fit) to see whether my guess is correct.
You might try to plot the KM curves, noting that the plot.survfit function does have a confidence interval option constructed around proportions:
plot(fit, conf.int=0.95, col=1:2)
Please read ?summary.survfit. It is the class of generic summary functions which are typically used by package authors to deliver the parameter estimates and confidence intervals. There you will see that it is not "rates" which are summarized by summary.survfit, but rather estimates of survival proportion. These proportions can either be medians (in which case the estimate is on the time scale) or they can be estimates at particular times (and in that instance the estimates are of proportions.)
If you actually do want rates then use a functions designed for that sort of model, perhaps using ?survreg. Compare what you get from using survreg versus survfit on the supplied dataset ovarian:
> reg.fit <- survreg( Surv(futime, fustat)~rx, data=ovarian)
> summary(reg.fit)
Call:
survreg(formula = Surv(futime, fustat) ~ rx, data = ovarian)
Value Std. Error z p
(Intercept) 6.265 0.778 8.05 8.3e-16
rx 0.559 0.529 1.06 0.29
Log(scale) -0.121 0.251 -0.48 0.63
Scale= 0.886
Weibull distribution
Loglik(model)= -97.4 Loglik(intercept only)= -98
Chisq= 1.18 on 1 degrees of freedom, p= 0.28
Number of Newton-Raphson Iterations: 5
n= 26
#-------------
> fit <- survfit( Surv(futime, fustat)~rx, data=ovarian)
> summary(fit)
Call: survfit(formula = Surv(futime, fustat) ~ rx, data = ovarian)
rx=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
59 13 1 0.923 0.0739 0.789 1.000
115 12 1 0.846 0.1001 0.671 1.000
156 11 1 0.769 0.1169 0.571 1.000
268 10 1 0.692 0.1280 0.482 0.995
329 9 1 0.615 0.1349 0.400 0.946
431 8 1 0.538 0.1383 0.326 0.891
638 5 1 0.431 0.1467 0.221 0.840
rx=2
time n.risk n.event survival std.err lower 95% CI upper 95% CI
353 13 1 0.923 0.0739 0.789 1.000
365 12 1 0.846 0.1001 0.671 1.000
464 9 1 0.752 0.1256 0.542 1.000
475 8 1 0.658 0.1407 0.433 1.000
563 7 1 0.564 0.1488 0.336 0.946
Might have been easier if I had used "exponential" instead of "weibull" as the distribution type. Exponential fits have a single parameter that is estimated and are more easily back-transformed to give estimates of rates.
Note: I answered an earlier question about survfit, although the request was for survival times rather than for rates. Extract survival probabilities in Survfit by groups

Gnuplot - How to join smoothly ordered points?

I've a set of data in three columns:
1st column: order criterion between 0 and 1
2nd: x vals
3rd: y vals
As a data file example:
0.027 -29.3 -29.6
0.071 -26.0 -31.0
0.202 -14.0 -32.8
0.304 -3.4 -29.3
0.329 -0.5 -26.0
0.409 6.7 -14.0
0.458 11.7 -3.4
0.471 12.8 -0.5
0.495 12.5 6.7
0.588 18.8 11.7
0.600 20.4 12.8
0.618 20.8 12.5
0.674 20.9 18.8
0.754 22.1 20.4
0.810 27.0 20.8
0.874 24.7 20.9
0.892 9.4 22.1
0.911 -11.5 27.0
0.943 -23.7 24.7
0.962 -29.6 9.4
0.991 -31.0 -11.5
0.999 -32.8 -23.7
My goal is to plot (x,y) points and a trend curve passing through each points ordered in ascending order with the first column values.
I use the following script:
set terminal png small size 600,450
set output "my_data_mcsplines_joined_points.png"
set table "table_interpolation.dat"
plot 'my_data.dat' using 2:3 smooth mcsplines
unset table
plot 'my_data.dat' using 2:3:(sprintf("%'.3f", $1)) with labels point pt 7 offset char 1,1 notitle ,\
"table_interpolation.dat" with lines notitle
Here mcspline results as an example:
mcspline joined points figure
The resulting curve should have the shape of a spindle or a loop.
Whatever smooth options used, Gnuplot seems invalid to handle such aim.
Unfortunatly most of smooth (mcspline, csplines...) options do a monotonic ordering of data.
How can I plot a trend curve passing through each points ordered in ascending order with the first column values?
Thanks.
I cannot post an image in a comment, and so place it here. I don't think a 2D plot will be sufficient, based on this 3D acatterplot of the data in your question.

replacing specific lines below the line containing a certain string using sed inplace editing in linux

I am trying to script the automatic input of file, which is as follows
*CONTACT_FORMING_ONE_WAY_SURFACE_TO_SURFACE
$# cid title
$# ssid msid sstyp mstyp sboxid mboxid spr mpr
1 2 3 3 0 0 0 0
$# fs fd dc vc vdc penchk bt dt
0.0100 0.000 0.000 0.000 0.000 0 0.000 1.0000E+7
$# sfs sfm sst mst sfst sfmt fsf vsf
1.000000 1.000000 0.000 0.000 1.000000 1.000000 1.000000 1.000000
*CONTACT_FORMING_ONE_WAY_SURFACE_TO_SURFACE
$# cid title
$# ssid msid sstyp mstyp sboxid mboxid spr mpr
1 3 3 3 0 0 0 0
$# fs fd dc vc vdc penchk bt dt
0.0100 0.000 0.000 0.000 0.000 0 0.000 1.0000E+7
$# sfs sfm sst mst sfst sfmt fsf vsf
1.000000 1.000000 0.000 0.000 1.000000 1.000000 1.000000 1.000000
I want to changed fifth line after the string
*CONTACT_FORMING_ONE_WAY_SURFACE_TO_SURFACE
with a line from other file frictionValues.txt
What I am using is as follows
sed -i -e '/^\*CONTACT_FORMING_ONE_WAY_SURFACE_TO_SURFACE/{n;n;n;n;n;R frictionValues.txt' -e 'd}' input.txt
but this changes all the 5 lines after the string but it reads the values 2 times from the file frictionValues.txt. I want that it reads only first line and then copy it at all the instance where it finds the string. Can anybody tell me using sed with inplace editing like this one?
This might work for you (I might be well off the mark as to what you want!):
sed '1s|.*|1{x;s/^/&/;x};/^\*CONTACT_FORMING_ONE_WAY_SURFACE_TO_SURFACE/{n;n;n;n;n;G;s/.*\\n//}|;q' frictionValues.txt |
sed -i -f - input.txt
Explanation:
Build a sed script from the first line of the frictionValues.txt that stuffs the said first line into the hold space (HS). The remaining script is as before but instead of R frictionValues.txt appends the HS to the pattern space using G.
Run the above sed script against the input.txt file using the -f - switch the sed script is passed via stdin from the previous pipeline.
Try with this:
Content of frictionValues.txt:
monday
tuesday
Content of input.txt will be the same that you pasted in the question.
Content of script.sed:
## Match literal string.
/^\*CONTACT_FORMING_ONE_WAY_SURFACE_TO_SURFACE/ {
## Append next five lines.
N
N
N
N
N
## Delete the last one.
s/\(^.*\)\n.*$/\1/
## Print the rest of lines.
p
## Queue a line from external file.
R frictionValues.txt
## Read next line (it will the external one).
b
}
## Print line.
p
Run it like:
sed -nf script.sed input.txt
With following result:
*CONTACT_FORMING_ONE_WAY_SURFACE_TO_SURFACE
$# cid title
$# ssid msid sstyp mstyp sboxid mboxid spr mpr
1 2 3 3 0 0 0 0
$# fs fd dc vc vdc penchk bt dt
monday
$# sfs sfm sst mst sfst sfmt fsf vsf
1.000000 1.000000 0.000 0.000 1.000000 1.000000 1.000000 1.000000
*CONTACT_FORMING_ONE_WAY_SURFACE_TO_SURFACE
$# cid title
$# ssid msid sstyp mstyp sboxid mboxid spr mpr
1 3 3 3 0 0 0 0
$# fs fd dc vc vdc penchk bt dt
tuesday
$# sfs sfm sst mst sfst sfmt fsf vsf
1.000000 1.000000 0.000 0.000 1.000000 1.000000 1.000000 1.000000
I got a two step approach :
First find out the line number that has your matching text:
linenum=`grep -m 1 \*CONTACT_FORMING_ONE_WAY_SURFACE_TO_SURFACE input.txt | awk '{print $1}'`
Now, combine sed commands to replace based on line number.
Change data at linenum+5 with value from "frictionValues.txt" - and also, delete data at linenum+5
sed -e "$((linenum+5)) c `cat frictionValues.txt`" -e "$((linenum+5)) d" input.txt
Assumptions
frictionValues.txt - has only one line
You are using one of the modern Linux OSs

Gnuplot Graph Is Disjointed (Data Seems Shifted/Misordered)

Gnuplot gives me the following picture with an odd disjoint in the second graph, whose origin I cannot determine. I've included the data below, in which the x-values are monotonically increasing, which should rule out the possibility of such a disjoint. Any help appreciated!
Generated from the following script:
set size 0.8,0.4
set lmargin 1
set terminal png
set output "test.png"
set multiplot
set origin 0.1,0.1
set xtics 5
set xrange[0:25]
set xlabel "Year"
plot "./g1" u ($1+1):2 w lines t "4 years"
set xlabel ""
set origin 0.1,0.5
set xtics format ""
set x2tics 5
plot "./g2" u ($1+1):2 w lines t "5 years"
unset multiplot
Data for g1 is:
0.000000 1.000000
1.000000 3.000000
2.000000 9.000000
3.000000 27.000000
4.000000 0.809131
5.000000 2.427394
6.000000 7.282183
7.000000 21.846549
8.000000 0.654694
9.000000 1.964081
10.000000 5.892243
11.000000 8.935199
12.000000 0.529733
13.000000 1.589200
14.000000 3.983240
15.000000 2.509780
16.000000 0.428624
17.000000 1.233139
18.000000 1.951804
19.000000 0.595792
20.000000 0.343980
21.000000 0.809600
22.000000 0.729229
23.000000 0.171423
24.000000 0.258384
25.000000 0.426250
Data for g2 is:
0.000000 1.000000
1.000000 3.000000
2.000000 9.000000
3.000000 27.000000
4.000000 81.000000
5.000000 2.427394
6.000000 7.282183
7.000000 21.846549
8.000000 65.539647
9.000000 196.618942
10.000000 5.892243
11.000000 17.676730
12.000000 53.030190
13.000000 159.090569
14.000000 241.250367
15.000000 14.302798
16.000000 42.908394
17.000000 128.725182
18.000000 322.642448
19.000000 203.292210
20.000000 34.718531
21.000000 104.155593
22.000000 299.652772
23.000000 474.288428
24.000000 144.777335
25.000000 84.275565
That's strange. On my system (ubuntu 11.10 64bit) I don't see the problem you have:
$ gnuplot --version
gnuplot 4.4 patchlevel 3
$ gnuplot < a.gnuplot # a.gnuplot is your script, unmodified
And it produces this:
If I were you I'd check:
gnuplot version
The input files - in vim use set list to see if there's any rampant characters hidden

Periodic function generator for Linux

In my work I need samples of mathematical functions in the form of text streams. For example, I need a program which generates values of sine function at discrete time points and prints them into stdout. Then I need to combine these samples in some form, for example sum two samples shifted by some phase. So I can split my question may by two:
Is there a pretty standard way to generate samples of mathematical function, such as sine, with given parameters – frequency, phase, amplitude, time step – in the form of simple text stream with two columns: time and function value? I know that simple script in Perl/Tcl can do this work, but I'd like to know the gereric solution.
What programs can manipulate these streams? I know about awk, but what can I do with awk when I have several streams as an input? For example, how can I make a sum or product of two or three sine samples?
I'm using Debian Linux and I prefer The Unix Way, when each program does only simple task and does it perfectly, and results of separate programs may be combined by another program.
Thanks.
You can do simple numeric calculations with bc. See the man page. More complicated calculations can be done with octave, which is a free Matlab clone.
For example this calculates the values of an interval:
$ octave -q --eval 'printf ("%f\n", [0:0.1:pi/2])'|nl|tee x.txt
1 0.000000
2 0.100000
3 0.200000
4 0.300000
5 0.400000
6 0.500000
7 0.600000
8 0.700000
9 0.800000
10 0.900000
11 1.000000
12 1.100000
13 1.200000
14 1.300000
15 1.400000
16 1.500000
This calculates the sin values:
$ octave -q --eval 'printf ("%f\n", sin([0:0.1:pi/2]))'|nl|tee y.txt
1 0.000000
2 0.099833
3 0.198669
4 0.295520
5 0.389418
6 0.479426
7 0.564642
8 0.644218
9 0.717356
10 0.783327
11 0.841471
12 0.891207
13 0.932039
14 0.963558
15 0.985450
16 0.997495
And the join command can be used to join the two files:
$ join -1 1 -2 1 -o 1.2 2.2 x.txt y.txt
0.000000 0.000000
0.100000 0.099833
0.200000 0.198669
0.300000 0.295520
0.400000 0.389418
0.500000 0.479426
0.600000 0.564642
0.700000 0.644218
0.800000 0.717356
0.900000 0.783327
1.000000 0.841471
1.100000 0.891207
1.200000 0.932039
1.300000 0.963558
1.400000 0.985450
1.500000 0.997495
But it is probably better to stay in Octave for the whole computation:
$ octave -q --eval 'for x = .1:0.1:pi/2 ; printf ("%f %f\n", x, sin(x)); end'
0.100000 0.099833
0.200000 0.198669
0.300000 0.295520
0.400000 0.389418
0.500000 0.479426
0.600000 0.564642
0.700000 0.644218
0.800000 0.717356
0.900000 0.783327
1.000000 0.841471
1.100000 0.891207
1.200000 0.932039
1.300000 0.963558
1.400000 0.985450
1.500000 0.997495
General text manipulation programs that would be useful:
paste or join (Merging two files together)
combine (Preform set-like operations on lines in files)
colrm (Remove columns)
sort (General sorting)
sed (Search and replace, and other ed commands)
grep (Searching)
awk (General text manipulation)
tee (A T-junction. Though if you need this you're probably doing something too complex and should break it down.)
I see no problem with using a perl script to generate the values. Using a bc script would of course also be an option.
Did you have a look to bc ?

Resources