Finding time of testing data in Sphinx train. - cmusphinx

I am training data via pocketsphinx and sphinxtrain. We can see our training data time in log file. like my current training data is shown as
Phase 5: Determine amount of training data, see if n_tied_states seems reasonable.
Estimated Total Hours Training: 1.00766111111111
After training, testing is done. for testing I have added 20 files. But I dont know what is length of these files. Finding it manually is a hard task as I am going to increase testing data.
So is there any log file or any other (than manual) way I can check my testing data time.

I just found it, I am posting own answer so it may be helping for others
You can find it under logdir/decode/dbname-1-1.log
while dbname is your main folder name in my case logdir/decode/tester-1-1.log.
Open this file and there will be a line
INFO: batch.c(778): TOTAL 81.24 seconds speech, 30.43 seconds CPU, 37.54 seconds wall
Here TOTAL 81.24 seconds speech is time of my testing audio data.

Related

why in Databricks the last part of running takes a lot of time?

I am using Databricks to create an algorithm for big data. I am wondering why the last 1% of my running process takes a lot of time?
I am writing the result in S3, the result for 111991 data (out of 116367) is done in 5 minutes and just for the last 5000 takes more than a hour!!!!!
can I fix this issue?
in the following picture it takes hour 119 become 120, but it came to 199 in a few minutes
Please check you are writing file in one shot or writing in one chunk.
If you are writing in one shot some time it switching log will take time. Also check if you are printing logs then it may take time.

Azure Anomaly Detector - only detects spikes

I am testing anomaly detector on metrics of count of specific event per hour for last 90 days. For some reason I always get spikes (isPositive) only, but never drops, while I'm mostly interested to detect drops.
Data has weekly seasonality (expected drops on weekends) and definitely has abnormal drops mid week unusual for this day of week.
I also tried to play with specific hours to take them to extremely low for this time and week day. I tried different values for sensitivity (between 90 and 20).
On the positive side I get too many spikes, which create a lot of noise, while low sensitivity value didn't help to get rid of them.
Below is a link to request JSON.
Request JSON
You can try to set the maxRatio to 0.01, it should be what you expect.
Currently the sensitivity control is not good enough for low value. But we will rollout a new version in next week to improve it.
And you can also leverage https://aka.ms/addemo, and use a CSV to have more test.

node.js persistent store json with minimal performance penalty

I'm running a node web server using express module and would like to include the following features in it:
primarily, track every visitors source IP, time and unique or repeated visit by saving it to a JSON file.
secondly, if someone is hitting my server more than 10 times in last 15 seconds looking for vulnerabilities (non-existent pages) then collect those attempts in a buffer (that holds 30 seconds worth of data) and once threshold is reached, start blocking that source IP for X number of hours.
I'm interested in finding out the fastest way to save this information with very minimal performance penalty.
My choice so far is to create a RAMDISK and save this info into a continuous file on that RAMDISK.
The info for Visitor info gets written to a database every few minutes.
The info for notorious visitors will be reset every 30 seconds so as to keep the lookup quick.
The question I have is - Is writing to RAMDISK the fastest way to retain information (so its not lost during a crash) or is there a better/faster way to achieve this goal ?

Caffe loading LMDB batches very slowly

I generated an LMDB database using the SSD-Caffe fork here. I have successfully generated the VOC LMDB trainval/test LMDB directories and am able to train the model.
However, during training, it takes inordinatly long to load data from the LMDB database. For example when profiling using Caffe's time function using this command:
ssdcaffe time --model "jobs/VGGNet/VOC0712/SSD_300x300/train.prototxt" --gpu 0 --iterations 20
I get that the forward pass takes on average 8.9s, and the backward pass takes on average 0.5s. On a layer-by-layer inspection, the data injestion layer takes the bulk of that time at 8.7s. See below:
I1129 10:14:11.094445 8011 caffe.cpp:404] data forward: 8660.38 ms.
...
I1129 10:14:11.095383 8011 caffe.cpp:412] Average Forward pass: 8933.31 ms.
I1129 10:14:11.095389 8011 caffe.cpp:414] Average Backward pass: 519.549 ms.
If I half the batchsize from 32 to 16, then the data injestion layer time decreases roughly in half:
I1129 10:20:07.975527 8093 caffe.cpp:404] data forward: 3906.53 ms.
This is clearly not the intended speed, and something is wrong. Any help would be greatly appreciated!
Found my issue:
My images were too big. The standard VOC images which the repo used were ~350x500 pixels, whereas my images were 1080x1920. When I downsized my images by 3x (eg 9x less pixels), my data ingestion layer took only 181ms (a 48x speedup over previous time of 8.6s)

Performance testing - Jmeter results

I am using Jmeter (started using it a few days ago) as a tool to simulate a load of 30 threads using a csv data file that contains login credentials for 3 system users.
The objective I set out to achieve was to measure 30 users (threads) logging in and navigating to a page via the menu over a time span of 30 seconds.
I have set my thread group as:
Number of threads: 30
Ramp-up Perod: 30
Loop Count: 10
I ran the test successfully. Now I'd like to understand what the results mean and what is classed as good/bad measurements, and what can be suggested to improve the results. Below is a table of the results collated in the Summary report of Jmeter.
I have conducted research only to find blogs/sites telling me the same info as what is defined on the jmeter.apache.org site. One blog (Nicolas Vahlas) that I came across gave me some very useful information,but still hasn't help me understand what to do next with my results.
Can anyone help me understand these results and what I could do next following the execution of this test plan? Or point me in the right direction of an informative blog/site that will help me understand what to do next.
Many thanks.
According to me, Deviation is high.
You know your application better than all of us.
you should focus on, avg response time you got and max response frequency and value are acceptable to you and your users? This applies to throughput also.
It shows average response time is below 0.5 seconds and maximum response time is also below 1 second which are generally acceptable but that should be defined by you (Is it acceptable by your users). If answer is yes, try with more load to check scaling.
In you requirement it is mentioned that you need have 30 concurrent users performing different actions. The response time of your requests is less and you have ramp-up of 30 seconds. Can you please check total active threads during the test. I believe the time for which there will be 30 concurrent users in system is pretty short so the average response time that you are seeing seems to be misleading. I would suggest you run a test for some more time so that there will be 30 concurrent users in the system and that would be correct reading as per your requirements.
You can use Aggregate report instead of summary report. In performance testing
Throughput - Requests/Second
Response Time - 90th Percentile and
Target application resource utilization (CPU, Processor Queue Length and Memory)
can be used for analysis. Normally SLA for websites is 3 seconds but this requirement changes from application to application.
Your test results are good, considering if the users are actually logging into system/portal.
Samples: This means the no. of requests sent on a particular module.
Average: Average Response Time, for 300 samples.
Min: Min Response Time, among 300 samples (fastest among 300 samples).
Max: Max Response Time, among 300 samples (slowest among 300 samples).
Standard Deviation: A measure of the variation (for 300 samples).
Error: failure %age
Throughput: No. of request processed per second.
Hope this will help.

Resources