Speech Services STT- Possible to Link Request to Result? - speech-to-text

I have a use case where a mobile app records a long series of commands. Each command is a short, single word (or number). They can happen quickly one right after the other, but the use case does not care if it takes several seconds to get results back from the Cognitive server. It is currently being implemented as discrete asynchronous requests rather than streaming (seems to be more reliable for us).
Since results are coming back async, I see no easy way to map the result back to its corresponding request (and ultimately the app command). Can I embed a unique ID somewhere that will get passed back to me? Is there some other option?

You are using the SDK?
If you do recognizeOnce you get the result from the audio as a call result (synchronous)
If you do continuousrecognition there is currently no way to tag the audio segment.

Related

How to manage the conversation flow if face timeout limit (5 seconds) in Dialogflow / Api.ai?

I am making a bot on Dialogflow with a Fulfillment. Considering the given strict 5-second window in DialogFlow, I am getting [empty response] as a response.
I want to overcome this issue, but my web service requires more than 9 seconds for the execution.
I am considering to redesigning the conversation flow in such a way that we will start streaming audio till the Response is processed.
Example:
User Question: xx xxx xxx xxxx xxxxx?
Response: a). We'll play fixed audio to keep the user engaged for few seconds till it finds a response text in the back end; b).
Receive answers from the web service and save them in the session to
display further.
How can I achieve this and how can I handle the Timeout issue?
You're on the right track, but there are a number of other things to consider.
First, however, keep in mind that anything that is trying to "avoid" the 5 second timeout already indicates some issues with the design. Waiting 10 seconds for a reply is a pretty long time with something as interactive as voice! Even 5 seconds, which is the timeout, is a long time. (And there is no way to change this timeout.)
So the first thing you may want to do is consider if there is a better/faster way to do what you want.
If not, the rough approach would be something like this:
Get the request from the user.
Track a unique identifier, either tied to the user or tied to the session. You'll be using this as a key into some kind of database or data store.
Start the API call as part of an asynchronous request or in another thread.
Reply immediately that you're working on it in a way that the user will send another request. (See below for this issue.) You'll want to make sure that the ID is maintained as part of this session - so you'll need to save it as part of the Session data.
At this point - you're basically doing two things in parallel.
When the API call completes, it needs to save the result in the datastore against the identifier. (It can't save it in the session itself - that response was already sent back to the Assistant.)
You're also waiting for a reply from the user. When it comes in:
Check to see if you have a response saved for this session yet.
If not, then go back to step 4. (You may want to track how many times you get here and give up at some point.)
If you do have the result, reply to the user with the information.
There is an issue with how you reply in step 4, since you want to do something that will guarantee you another request from the person expecting an answer. There are a few possible approaches:
The most straightforward way would be to send back a Media response to play a few seconds of "hold music". This has the advantage that, when the music stops, it will send an event to Dialogflow which you can capture as an Intent and then continue with step 5.
But there are some problems:
Not all versions of the Assistant support the Media response. You will need to check to confirm the feature is supported before you use it and, if not, use another approach (see below).
The media player that is presented on some Assistants allow the user to stop playback, or will not correctly send an event when the audio stops in some situations. So you may never get another request in this session.
Another approach involves some more advanced conversation design tricks, so may not always be suitable for your conversation. Your response can say that you're looking up the results but then ask the user a question - possibly one that is related to other information that you will need. With their reply, you can collect this information (if you need it) and then see if you have a result yet.
In some conversations - this works really well. For example, if you're looking up flights to somewhere, while you're looking that up you might ask them if they will need a hotel or rental car, which you might ask about anyway.
Other conversations, however, don't easily have such questions. In these cases, you may need to ask something that isn't relevant while you stall for time.

Best practices for internal api calls to external apis with buffer

I have different external APIs doing basically the same things but in a different way : add product informations (ext_api).
I would like to make an adapter API that would call, behind the scene, the different external APIs (adapter_api).
My problem is the following : the external APIs are optimised when calling them with a batch of products attributes. However, my API would be optimised on a product by product basis.
I would like to somehow make a buffer of product attributes that would grow when I call my adapter_api. When the number of product attributes reach a certain limit, the ext_api would be called and the buffer would be reset and ready to receive more product attributes.
I'm wondering how to achieve that. I was thinking of making a REST api in python that would store the buffer of product attributes. I would like this REST api to be able to scale on a Kubernetes cluster : it would need low latency, and several instance of this API would write in the buffer of products until one of them reach the limit and make the call to the external API.
Here is what I have in mind :
Are there any best practices concerning the buffer on this use case ? To add some extra informations : my main purpose here is to hide from internal business APIs (not drawn) the complexity of calling many different external APIs each of which have their own rules and credentials.
Thank you very much for your help.
You didn't tell us your performance evaluation criteria.
You did tell us this:
don't know how to store the buffer : I would like to avoid databases or files.
which makes little sense,
since there's a simple answer to this question:
Is there any best practices on this use case ?
Yes. The best practice is to append requests to buffer.txt
and send the batch when that file exceeds some threshold.
A convenient way to implement the threshold would be
to send when getsize() reports a large enough value.
If requests are of quite different size and the batch
size really matters to you, then append a single byte
to a 2nd file, and use size of that to indicate how
many entries are enqueued.
requirements
The heart of your question seems to revolve around
what was left unsaid:
What is the cost function for sending too many "small" batches to ext_api?
What is the cost function for the consumer of the adapter_api, what does it care about? Low latency return, perhaps?
If ext_api permanently fails (say, a day of downtime), do we have some responsibility for quickly notifying the consumer that its updates are going into a black hole?
And why would using the filesystem be inappropriate?
It seems a perfect match for your needs.
Consider using a global in-memory object,
such as list or queue for the batch you're accumulating.
You might want to protect accesses with a lock.
Maybe your client doesn't really want a
one-product-at-a-time API.
Maybe you'd prefer to have your client
accumulate items,
sending only when its batch size is big enough.

Why is urllib.request so slow?

When I use urllib.request.decode to get the python dictionary from JSON format it takes far too long. However upon looking at the data, I realized that I don't even want all of it.
Is there any way that I can only get some of the data, for example get the data from one of the keys of the JSON dictionary rather than all of them?
Alternatively, if there was any faster way to get the data that could work as well?
Or is it simply a problem with the connection and cannot be helped?
Also is the problem with the urllib.request.urlopen or is it with the json.loads or with the .read().decode().
The main symptoms of the problem is either taking roughly 5 seconds when trying to receive information which is not even that much (less than 1 page of non-formatted dictionary). The other symptom is that as I try to receive more and more information, there is a point when I simply receive no response from the webpage at all!
The 2 lines which take up the most time are:
response = urllib.request.urlopen(url) # url is a string with the url
data = json.loads(response.read().decode())
For some context on what this is part of, I am using the Edamam Recipe API.
Help would be appreciated.
Is there any way that I can only get some of the data, for example get the data from one of the keys of the JSON dictionary rather than all of them?
You could try with a streaming json parser, but I don't think you're going to get any speedup from this.
Alternatively, if there was any faster way to get the data that could work as well?
If you have to retrieve a json document from an url and parse the json content, I fail to imagine what could be faster than sending an http request, reading the response content and parsing it.
Or is it simply a problem with the connection and cannot be helped?
Given the figures you mentions, the issue is very certainly in the networking part indeed, which means anything between your python process and the server's process. Note that this includes your whole system (proxy/firewall, your network card, your OS tcp/ip stack etc, and possibly some antivirus on window), your network itself, and of course the end server which may be slow or a bit overloaded at times or just deliberately throttling your requests to avoid overload.
Also is the problem with the urllib.request.urlopen or is it with the json.loads or with the .read().decode().
How can we know without timing it on your own machine ? But you can easily check this out, just time the various parts execution time and log them.
The other symptom is that as I try to receive more and more information, there is a point when I simply receive no response from the webpage at all!
cf above - if you're sending hundreds of requests in a row, the server might either throttle your requests to avoid overload (most API endpoints will behave tha way) or just plain be overloaded. Do you at least check the http response status code ? You may get 503 (server overloaded) or 429 (too many requests) responses.

Streaming output from program to an arbitrary number of programs under Linux?

How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream, but the programs reading the stream do block if there's no output from the first-mentioned program?
I've been trying to Google around for a while now, but all I find is methods where the program does block if nothing is reading the stream.
How should I stream the output from one program to an undefined number of programs in such a fashion that the data isn't buffered anywhere and that the application where the stream originates from doesn't block even if there's nothing reading the stream
Your requirements as stated can not possibly be satisfied without some form of a buffer.
Most straightforward option is to write the output to the file and let consumers read that file.
Another option is to have a ring-buffer in a form of a memory mapped file. As the capacity of a ring-buffer is normally fixed there needs to be a policy for dealing with slow consumers. Options are: block the producer; terminate the slow consumer; let the slow consumer somehow recover when it missed data.
Many years ago I wrote something like what you describe for an audio stream processing app (http://hewgill.com/nwr/). It's on github as splitter.cpp and has a small man page.
The splitter program currently does not support dynamically changing the set of output programs. The output programs are fixed when the command is started.
Without knowing exactly what sort of data you are talking about (how large is the data, what format is it, etc, etc) it is hard to come up with a concrete answer. Let's say for example you want a "ticker-tape" application that sends out information for share purchases on the stock exchange, you could quite easily have a server that accepts a socket from each application, starts a thread and sends the relevant data as it appears from the recoder at the stock market. I'm not aware of any "multiplexer" that exists today (but Greg's one may be a starting point). If you use (for example) XML to package the data, you could send the second half of a packet, and the client code would detect that it's not complete, so throws it away.
If, on the other hand, you are sending out high detail live update weather maps for the whole country, the data is probably large enough that you don't want to wait for a full new one to arrive, so you need some sort of lock'n'load protocol that sets the current updated map, and then sends that one out until (say) 1 minute later you have a new one. Again, it's not that complex to write some code to do this, but it's quite a different set of code to the "ticker tape" solution above, because the packet of data is larger, and getting "half a packet" is quite wasteful and completely useless.
If you are streaming live video from the 2016 Olympics in Brazil, then you probably want a further diffferent solution, as timing is everything with video, and you need the client to buffer, pick up key-frames, throw away "stale" frames, etc, etc, and the server will have to be different.

Has anybody some advice on programming realtime audio synthesis?

I'm currently working on a personal project: creating a library for realtime audio synthesis in Flash. In short: tools to connect wavegenarators, filters, mixers, etc with eachother and supply the soundcard with raw (realtime) data. Something like max/msp or Reaktor.
I already have some working stuff, but I'm wondering if the basic setup that I wrote is right. I don't want to run into problems later on that force me to change the core of my app (although that can always happen).
Basically, what I do now is start at the end of the chain, at the place where the (raw) sounddata goes 'out' (to the soundcard). To do that, I need to write chunks of bytes (ByteArrays) to an object, and to get that chunk I ask whatever module is connected to my 'Sound Out' module to give me his chunk. That module does the same request to the module that's connected to his input, and that keeps happening until the start of the chain is reached.
Is this the right approach? I can imagine running into problems if there's a feedbackloop, or if there's another module with no output: if i were to connect a spectrumanalyzer somewhere, that would be a dead end in the chain (a module with no outputs, just an input). In my current setup, such a module wouldnt work because i only start calculating from the sound-output module.
Has anyone experience with programming something like this? I'd be very interested in some thoughts about the right approach. (For clarity: i'm not looking for specific Flash-implementations, and that's why i didnt tag this question under flash or actionscript)
I did a similar thing a while back, and I used the same approach as you do - start at the virtual line out, and trace the signal back to the top. I did this per sample though, not per buffer; if I were to write the same application today, I might choose per-buffer instead though, because I suspect it would perform better.
The spectrometer was designed as an insert module, that is, it would only work if both its input and its output were connected, and it would pass its input to the output unchanged.
To handle feedback, I had a special helper module that introduced a 1-sample delay and would only fetch its input once per cycle.
Also, I think doing all your internal processing with floats, and thus arrays of floats as the buffers, would be a lot easier than byte arrays, and it would save you the extra effort of converting between integers and floats all the time.
In later versions you may have different packet rates in different parts of your network.
One example would be if you extend it to transfer data to or from disk. Another example
would be that low data rate control variables such as one controlling echo-delay may, later, become a part of your network. You probably don't want to process control variables with the same frequency that you process audio packets, but they are still 'real time' and part of the function network. They may for example need smoothing to avoid sudden transitions.
As long as you are calling all your functions at the same rate, and all the functions are essentially taking constant-time, your pull-the-data approach will work fine. There will
be little to choose between pulling data and pushing. Pulling is somewhat more natural for playing audio, pushing is somewhat more natural for recording, but either works and ends up making the same calls to the underlying audio processing functions.
For the spectrometer you've got
the issue of multiple sinks for
data, but it is not a problem.
Introduce a dummy link to it from
the real sink. The dummy link can
cause a request for data that is not
honoured. As long as the dummy link knows
it is a dummy and does not care about
the lack of data, everything will be
OK. This is a standard technique for reducing multiple sinks or sources to a single one.
With this kind of network you do not want to do the same calculation twice in one complete update. For example if you mix a high-passed and low-passed version of a signal you do not want to evaluate the original signal twice. You must do something like record a timer tick value with each buffer, and stop propagation of pulls when you see the current tick value is already present. This same mechanism will also protect you against feedback loops in evaluation.
So, those two issues of concern to you are easily addressed within your current framework.
Rate matching where there are different packet rates in different parts of the network is where the problems with the current approach will start. If you are writing audio to disk then for efficiency you'll want to write large chunks infrequently. You don't want to be blocking your servicing of the more frequent small audio input and output processing packets during those writes. A single rate pulling or pushing strategy on its own won't be enough.
Just accept that at some point you may need a more sophisticated way of updating than a single rate network. When that happens you'll need threads for the different rates that are running, or you'll write your own simple scheduler, possibly as simple as calling less frequently evaluated functions one time in n, to make the rates match. You don't need to plan ahead for this. Your audio functions are almost certainly already delegating responsibility for ensuring their input buffers are ready to other functions, and it will only be those other functions that need to change, not the audio functions themselves.
The one thing I would advise at this stage is to be careful to centralise audio buffer
allocation, noticing that buffers are like fenceposts. They don't belong to an audio
function, they lie between the audio functions. Centralising the buffer allocation will make it easy to retrospectively modify the update strategy for different rates in different parts of the network.

Resources