Is there a way to change(increase) the half-second pause time for Watson speech to text recognize API - speech-to-text

As per the Watson Speech To Text documentation when the 'continuous' parameter is set to false(which is default), "the service stops transcription at the first pause, which is denoted by a half-second of non-speech". Is there a way to change this half-second to a higher time period say 2 seconds (or some other configurable value)?

Related

Delay in receiving DialogFlow results with very short sentences

I stream from the microphone of an Android client to a nodejs server
(built by me), which forwards the audio to a DialogFlow agent.
The streaming code of the nodejs server is based on this snippet: https://cloud.google.com/dialogflow/docs/detect-intent-stream#streaming_detect_intent
The server in nodejs first receives the intermediate results (with the words spotted by the automatic speech
recognition - ASR) and then the final result of DialogFlow (with NLU
analysis). The connection protocol between the Android client and the nodejs
server is websocket.
The problem that I am facing is that
when I stream the audio of short sentences (in my case the
words in Italian "sì"/"no"), the final result of dialogFlow
sometimes comes with a delay that is close to 10 seconds.
I measured the delay of the final DialogFlow result from when the first partial ASR result arrives.
So my experiments are carried on like this:
I open the microphone and I start speaking (I just say "sì" or "no")
the first intermediate result arrives (the timer starts)
the final result of DialogFlow arrives (the timer stops)
The intermediate results of the speech recognition arrive almost
immediately, that is, with a very low delay.
As for the delay with which the final result of DialogFlow arrives,
about once in two the result arrives within 4 seconds, which is
acceptable, otherwise it takes longer (in the worst case even with 10
or 11 seconds of delay). In the latter case, the delay is not acceptable
as the user experience is too slow.
Again, I must say that this problem is found only in extremely short
sentences. With sentences consisting of words of a just few syllables, the
delay of DialogFlow is always negligible, and everything is fine.
I also tried with the English models, but the sentences consisting of
"yes" and "no" suffer from the same delay. I also did some tests with
models other than the default one (in particular I tried the model
command_and_search, which is described as "Best for short queries
such as voice commands or voice search"), but without luck.
I understand that the use of such short sentences as "yes" and "no" is
not the most frequent use case for a DialogFlow agent, however I
believe that in certain circumstances it is useful (sometimes
necessary) to use them.
So I ask if anyone has experienced this problem and knows how to overcome
it.

Proper way of implementing Azure Stream Analytics notifications/alarms service

I'm working with sensor systems where each sensor sends a new reading every 15 seconds.
Each sensor type also has defined rules that when triggered will generate an alarms output - e.g. sensor of type "temperature" sends a value that is higher than MAX temperature allowed.
Lets assume sensor with ID "XXX_01" sends 2 readings in 30 seconds, each reading has higher value than MAX value allowed.
Event in: 01/10/2018 12:00:00
{ id:"XXX_01", value: 90, "temperature" }
Event in: 01/10/2018 12:15:00
{ id:"XXX_01", value: 95, "temperature" }
Now, I want to notify the end user that there is an alarm - I have to send out some sort of a notification to end user(s). The problem and confusion is that I do not want to send out the alarms twice.
Assuming I use something like Twilio to send SMS or just send out Email notifications, I don't want to spam my end users with a new notification every 15 seconds assuming incoming sensor readings stay above MAX value allowed.
What kind of an Azure Service, architecture or design paradigm could I use to avoid such issue?
I have to say that A (don't want to spam users notification) and B (alarm high temperature as soon as it touches MAX line) have some contradictions. It's hard to implement it.
In my opinion, you can send notifications to users at a fixed frequency.
1.In that frequency period, such as 1 minute, use Azure stream analytics service to receive sensor data every 15 seconds.
2.Then output the data to Azure Storage Queue.
3.Then use Azure Queue Time Trigger to get latest temperature value in the Azure Storage Queue current messages every 1 minute. If it touches MAX line,then send notification to end users. If you want to notify user that it touched MAX line no matter it already dropped, then just sort the messages by value and judge it.
4.Finally, empty the queue.
using Azure Stream Analytics, you can trigger the alert when the threshold is passed AND if it's the first time in the last 30s for example.
I give you the sample SQL for this example:
SELECT *
FROM input
WHERE ISFIRST(second, 30) OVER (WHEN value> 90)=1
Let us know if you have any further question.
I also agree with Jay's response about contradiction.
But one more way we can handle it, I also faced similar issue in one of my assignment, what I tried is keeping track of once sent alarm via cache (i.e. redi cache, memcache etc) and every time check if alarm already sent then don't sent . Obviously trade-off is that everytime we needs to check, but that's the concern that you needs to decide
We can also extend same to notify user if max temperature is reset to normal.
Hope this helps.

Partial playback using playbackDuration/startTime in Google Cast Chrome API (v3)

I am trying to cast just a snippet of a file (say, only from 00:00:30 to 00:00:40) from a Chrome sender to the default receiver. Reading the API reference documentation documentation for LoadRequest, MediaInfo, and QueueItem, it seemed like I should be able to do this with some combination of these. In particular, the first queued item (loaded with CastSession#loadMedia) would need LoadRequest#currentTime set to the offset (30 seconds in my example above) and MediaInfo#duration set to the duration (10 seconds in my example), while subsequently queued items would set QueueItem#startTime and QueueItem#playbackDuration to the offset and duration (respectively).
However, this isn't happening in practice. I can confirm that the queue on the receiver has these fields set, but the no matter how I go about this, I can't get the right snippet to play. When I add the first media item as described above, the receiver just plays the track from beginning to end, neither respecting the offset nor the duration. Since the combination of LoadRequest#currentTime and MediaInfo#duration is a bit odd, I tried using only the QueueItem method (add the first media item with autoplay = false, add another queue item, remove the first, and then start playing the queue). In this case, the offset was still not respected, and the duration ended up being (very strangely) the sum of startTime and playbackDuration (in addition, any subsequently queued items would load, and then "finish" playing without starting, which I also can't figure out).
Does anyone else have experience with this part of the API? Am I reading the documentation incorrectly and what I'm doing just isn't supported, or am I just piecing things together incorrectly?
I am not sure I understand why you are attempting to use a queue with multiple items. First, the duration field is not what you think it is; it is not the duration of play back that you want, it is the total duration of the media that is being loaded, regardless of where you start or stop the playback. In fact, in most cases, you don't even need to set that; the receiver gets the total duration of the media when it loads he item, at least in the majority of the cases. The currentTime should work (if it is not, please file a bug on our SDK issue tracker) and alternatively, you can load a media (with autoplay off) and "seek" to the time you want and then play. To stop at a certain point, you need to monitor the the playback location and when it reaches that point, pause the playback.

realtime midi input and synchronisation with audio

I have built a standalone app version of a project that until now was just a VST/audiounit. I am providing audio support via rtaudio.
I would like to add MIDI support using rtmidi but it's not clear to me how to synchronise the audio and MIDI parts.
In VST/audiounit land, I am used to MIDI events that have a timestamp indicating their offset in samples from the start of the audio block.
rtmidi provides a delta time in seconds since the previous event, but I am not sure how I should grab those events and how I can work out their time in relation to the current sample in the audio thread.
How do plugin hosts do this?
I can understand how events can be sample accurate on playback, but it's not clear how they could be sample accurate when using realtime input.
rtaudio gives me a callback function. I will run at a low block size (32 samples). I guess I will pass a pointer to an rtmidi instance as the userdata part of the callback and then call midiin->getMessage( &message ); inside the audio callback, but I am not sure if this is thread-sensible.
Many thanks for any tips you can give me
In your case, you don't need to worry about it. Your program should send the MIDI events to the plugin with a timestamp of zero as soon as they arrive. I think you have perhaps misunderstood the idea behind what it means to be "sample accurate".
As #Brad noted in his comment to your question, MIDI is indeed very slow. But that's only part of the problem... when you are working in a block-based environment, incoming MIDI events cannot be processed by the plugin until the start of a block. When computers were slower and block sizes of 512 (or god forbid, >1024) were common, this introduced a non-trivial amount of latency which results in the arrangement not sounding as "tight". Therefore sequencers came up with a clever way to get around this problem. Since the MIDI events are already known ahead of time, these events can be sent to the instrument one block early with an offset in sample frames. The plugin then receives these events at the start of the block, and knows not to start actually processing them until N samples have passed. This is what "sample accurate" means in sequencers.
However, if you are dealing with live input from a keyboard or some sort of other MIDI device, there is no way to "schedule" these events. In fact, by the time you receive them, the clock is already ticking! Therefore these events should just be sent to the plugin at the start of the very next block with an offset of 0. Sequencers such as Ableton Live, which allow a plugin to simultaneously receive both pre-sequenced and live events, simply send any live events with an offset of 0 frames.
Since you are using a very small block size, the worst-case scenario is a latency of .7ms, which isn't too bad at all. In the case of rtmidi, the timestamp does not represent an offset which you need to schedule around, but rather the time which the event was captured. But since you only intend to receive live events (you aren't writing a sequencer, are you?), you can simply pass any incoming MIDI to the plugin right away.

Probability distribution for sms answer delays

I'm writing an app using sms as communication.
I have chosen to subscribe to an sms-gateway, which provides me with an API for doing so.
The API has functions for sending as well as pulling new messages. It does however not have any kind of push functionality.
In order to do my queries most efficient, I'm seeking data on how long time people wait before they answer a text message - as a probability function.
Extra info:
The application is interactive (as can be), so I suppose the times will be pretty similar to real life human-human communication.
I don't believe differences in personal style will play a big impact on the right times and frequencies to query, so average data should be fine.
Update
I'm impressed and honered by the many great answers recieved. I have concluded that my best shot will be a few adaptable heuristics, including exponential (or maybe polynomial) backoff.
All along I will be gathering statistics for later analysis. Maybe something will show up. I think I will cheat start on the algorithm for generating poll-frquenzies from a probability distribution. That'll be fun.
Thanks again many times.
In the absence of any real data, the best solution may be to write the code so that the application adjusts the wait time based on current history of response times.
Basic Idea as follows:
Step 1: Set initial frequency of pulling once every x seconds.
Step 2: Pull messages at the above frequency for y duration.
Step 3: If you discover that messages are always waiting for you to pull decrease x otherwise increase x.
Several design considerations:
Adjust forever or stop after sometime
You can repeat steps 2 and 3 forever in which case the application dynamically adjusts itself according to sms patterns. Alternatively, you can stop after some time to reduce application overhead.
Adjustment criteria: Per customer or across all customers
You can chose to do the adjustment in step 3 on a per customer basis or across all customers.
I believe GMAIL's smtp service works along the same lines.
well I would suggest finding some statistics on daily SMS/Text Messaging usage by geographical location and age groups and come up with an daily average, it wont be an exact measurement for all though.
Good question.
Consider that people might have multiple tasks and that answering a text message might be one of those tasks. If each of those tasks takes an amount of time that is exponentially distributed, the time to get around to answering the text message is the sum of those task completion times. The sum of n iid random variables has a Gamma distribution.
The number of tasks ahead of the text return also has a dicrete distribution - let's say it's Poisson. I don't have the time to derive the resulting distribution, but simulating it using #Risk, I get either a Weibull or Gamma distribution.
SMS is a store-and-forward messaging service, so you have to add in the delay that can be added by the various SMSCs (Short Message Service Centers) along the way. If you are connecting to one of the big aggregation houses (Sybase, TNS, mBlox etc) commercial bulk SMS providers (Clickatel, etc) then you need to allow for the message to transverse their network as well as the carriers network. If you are using a smaller shop then most likely they are using a GSM Modem (or modems) and there is a throughput limit on the message the can receive and process (as well as push out)
All that said, if you are using a direct connection or one of the big guys MO (mobile originated) messages coming to you as a CP (content provider) should take less than 5 seconds. Add to that the time it takes the Mobile Subscribers to reply.
I would say that anecdotal evidence form services I've worked on before, where the Mobile Subscriber needs to provide a simple reply it's usually within 10 seconds or not at all.
If you are polling for specific replies I would poll at 5 and 10 seconds then apply an exponential back off.
All of this is from a North American point-of-view. Europe will be fairly close, but places like Africa, Asia will be a bit slower as the networks are a bit slower. (unless you are connected directly to the operator and even then some of them are slow).

Resources