Identify bytes corresponding to the transcription response in Google Cloud Transcription API - speech-to-text

While performing a live streaming recognition using google's ASR apis, we send byte arrays in real-time and receive tentative textual responses.
Is there a way to annotate each request corresponding to the bytes sent and receive an identifier of which text corresponds to what bytes in the response.
The use case is that I want to get an accurate handle on the latency of the transcription API.

Related

How to give data in small parts from BE to the FE?

My case is that, web application (FE / react.js) is trying to generate a csv file with the response coming from the gateway (BE / node.js) service request.
Because the data is too large, FE is sending partial requests with using limit and offset values. And then it tries to merge it.
But FE wants to get the data in a single request. For this problem, looks like we can use stream. However, when i searched for its usage, I couldn't find an example.
On gateway service, how can I send multiple requests to internal service using limit and offset, and serve it to FE via stream?
I'm expecting to return the data by parts to the web application
Yes, Stream or Server-Sent Events (SSE) is the way to go. In your case, I recommend trying SSE.
Here are some examples to start:
https://dev.to/dhiwise/how-to-implement-server-sent-events-in-nodejs-11d9
https://www.digitalocean.com/community/tutorials/nodejs-server-sent-events-build-realtime-app

How to parse a continuous multipart response in NodeJS

I'm creating a tool that needs to consume a Security System API. I can't give much information about it but as the API documentation explains:
A stream of events data will be sent using HTTP Multipart x-mixed-replace transmission. The response data stream is continuous.
Each event is separated by the multipart boundary --DummyBoundary.
From what I understood with some network sniffing tools, whenever an event happens, this stream returns data related to that but how can I read this data into variables?
Thanks.
You can try multer library. More details here

How to parse GPS data sent in NMEA protocol by multiple devices?

We are building a real-time bus tracking system for a client. The buses send GPS data to the configured server in NMEA 0183 protocol.
We tested the configuration on one bus. Our NMEA parser on our server is able to decode the sentences and give us the latitude longitude of the bus location.
But we are unable to verify that the data is coming from that bus itself. So, how will we detect and parse data sent by multiple buses?
The buses send GPS data to the configured server in NMEA 0183 protocol.
The NMEA protocol is very wordy and contains duplicated fields in different sentences. It would be much more efficient to parse the NMEA sentences in the bus. Then you can send a "message" to the server that contains the parsed values and a bus ID: latitude, longitude, date/time, speed and bus ID. This message is 10 to 80 times smaller than the raw NMEA data. The records in the server can then be used to display bus locations.
If you are using an Arduino microcontroller to connect to a GPS module, you should take a look at my NMEA parsing library, NeoGPS. It is supported on all the Arduino platforms.

How to send raw data using socket.io

I'm trying to reduce socket.io bandwidth when using websockets. I switched to binary data, but looking in the browser developer console, the packets are sent as:
[ 'type of the packet (first argument of .emit)', associated data ]
I'm using only one packet type, so this causes unnecessary overhead - useless bytes are sent and whole thing is json encoded for no reason.
How can I get rid of the packet type and just send raw data?
socket.io is an abstraction on top of webSocket. In order to support the features it provides, it adds some overhead to the messages. The message name is one such piece of that overhead since it is a messaging system, not just a packet delivery system.
If you want to squeeze all bytes out of the transport, then you probably need to get rid of socket.io and just use a plain webSocket where you control more of the contents of each packet (though you will have to reimplement some things that socket.io does for you).
With socket.io in node.js, you can send binary by sending an ArrayBuffer or Buffer. In the browser, you can send binary by sending an ArrayBuffer or Blob.

Connecting two Node/Express apps with streaming JSON

I currently have two apps running...
One is my REST API layer that provides a number of services to the frontend.
The other is a 'translation app', it can be fed a JSON object (over http POST call) , perform some data translation and mappings on that object and return it to the REST layer
My situation is I want to do this for a large number of objects. The flow i want is:
User requests 100,000 objects in a specific format -> REST layer retrieves that from the database -> passes each JSON data object to
translation service to perform formatting -> pass each one back to the
REST layer -> REST layer returns new objects to the user.
What I don't want to do is call tranlate.example.com/translate on 100,000 different calls, or pass megabytes of data through 1 single huge POST request.
So the obvious answer is streaming data to the translate app, and then streaming data back.
There seems to be a lot of solutions to stream data across apps: open a websocket (socket.io) , open a raw TCP connection between the two, or since the HTTP request and response data of Node is actually a stream I could utilize that then emit a JSON object when its successfully translated
My question is Is there a best practice here to stream data between two apps? It seems I should use http(req, res) stream and keep a long-lived connection open to preserve the 'REST' model. Any samples that could be provided would be great.
This is one of the best use cases for message queues. Your basically create a queue for data to be translated by the translate service, and a queue for data which is already translated and ready to be sent back to the user. Your REST layer and translation layer publish and subscribe to the applicable queues, and can process the data as it comes in. This has the added benefit of decoupling your REST and translation layer, meaning it becomes trivial to add multiple translation layers later to handle additional load if necessary.
Take a look at RabbitMQ, but there are plenty of other options as well.

Resources