How do I create an expressjs endpoint that uses azure tts to send audio to a web app? - azure

I am trying to figure out how to expose an express route (ie: Get api/word/:some_word) which uses the azure tts sdk (microsoft-cognitiveservices-speech-sdk) to generate an audio version of some_word (in any format playable by a browser), and res.send()'s the resulting audio, so that a front end javascript web app could consume the api in order to play the audio pronunciation of the word.
I have the azure sdk 'working' - it is creating an 'ArrayBuffer' inside my expressjs code. However, I do not know how to send the data in this ArrayBuffer to the front end. I have been following the instructions here: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-text-to-speech?tabs=import%2Cwindowsinstall&pivots=programming-language-javascript#get-result-as-an-in-memory-stream
Another way to phrase my question would be 'in express, I have an ArrayBuffer whose contents is an .mp3/.ogg/.wav file. How do I send that file via express? Do I need to convert it into some other data type(like a Base64 encoded string? A buffer?) Do I need to set some particular response headers?

I finally figured it out seconds after asking this question 😂
I am pretty new to this area, so any pointers on how this could be improved would be appreciated.
app.get('/api/tts/word/:word', async (req, res) => {
const word = req.params.word;
const subscriptionKey = azureKey;
const serviceRegion = 'australiaeast';
const speechConfig = sdk.SpeechConfig.fromSubscription(
subscriptionKey as string,
serviceRegion
);
speechConfig.speechSynthesisOutputFormat =
SpeechSynthesisOutputFormat.Ogg24Khz16BitMonoOpus;
const synthesizer = new sdk.SpeechSynthesizer(speechConfig);
synthesizer.speakSsmlAsync(
`
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="zh-CN">
<voice name="zh-CN-XiaoxiaoNeural">
${word}
</voice>
</speak>
`,
(resp) => {
const audio = resp.audioData;
synthesizer.close();
const buffer = Buffer.from(audio);
res.set('Content-Type', 'audio/ogg; codecs=opus; rate=24000');
res.send(buffer);
}
);
});

Related

How to send audio saved as a Buffer, from my api, to my React client and play it?

I've been chasing my tail for two days figuring out how to best approach sending the <Buffer ... > object generated by Google's Text-To-Speech service, from my express-api to my React app. I've come across tons of different opinionated resources that point me in different directions and only potentially "solve" isolated parts of the bigger process. At the end of all of this, while I've learned a lot more about ArrayBuffer, Buffer, binary arrays, etc. yet I still feel just as lost as before in regards to implementation.
At its simplest, all I aim to do is provide one or more strings of text to tts, generate the audio files, send the audio files from my express-api to my react client, and then automatically play the audio in the background on the browser when appropriate.
I am successfully sending and triggering google's tts to generate the audio files. It responds with a <Buffer ...> representing the binary data of the file. It arrives in my express-api endpoint, from there I'm not sure if I should
...
convert the Buffer to a string and send it to the browser?
send it as a Buffer object to the browser?
set up a websocket using socket.io and stream it?
then once it's on the browser,
do I use an <audio /> tag?
should I convert it to something else?
I suppose the problem I'm having is trying to find answers for this results in an information overload consisting of various different answers that have been written over the past 10 years using different approaches and technologies. I really don't know where one starts and the next ends, what's a bad practice, what's a best practice, and moreover what is actually suitable for my case. I could really use some guidance here.
Synthesise function from Google
// returns: <Buffer ff f3 44 c4 ... />
const synthesizeSentence = async (sentence) => {
const request = {
input: { text: sentence },
voice: { languageCode: "en-US", ssmlGender: "NEUTRAL" },
audioConfig: { audioEncoding: "MP3" },
};
const response = await client.synthesizeSpeech(request);
return response[0].audioContent;
};
(current shape) of express-api POST endpoint
app.post("/generate-story-support", async (req, res) => {
try {
// ? generating the post here for simplicity, eventually the client
// ? would dictate the sentences to send ...
const ttsResponse: any = await axios.post("http://localhost:8060/", {
sentences: SAMPLE_SENTENCES,
});
// a resource said to send the response as a string and then convert
// it on the client to an Array buffer? -- no idea if this is a good practice
return res.status(201).send(ttsResponse.data[0].data.toString());
} catch (error) {
console.log("error", error);
return res.status(400).send(`Error: ${error}`);
}
});
react client
so post
useEffect(() => {
const fetchData = async () => {
const data = await axios.post(
"http://localhost:8000/generate-story-support"
);
// converting it to an ArrayBuffer per another so post
const encoder = new TextEncoder();
const encodedData = encoder.encode(data.data);
setAudio(encodedData);
return data.data;
};
fetchData();
}, []);
// no idea what to do from here, if this is even the right path :/

Node js Converting text/pfd/word/jpg to Speech with Amazon Polly

Using AWS Polly service for converting text into speech. If headers, footers and long URLs exist in the file/jpg (which will get converted to speech) how can we remove these for a better user experience?
my code sequence and function used:
router.post(
"/abcApi",
fileUploader.fields([{ name: "file", maxCount: 1 }]),
textExtractingFromFile,
textController.sendSms
);
“fileUploader” upload the file to the S3. Then it passes to the “textExtractingFromFile” middleware which uses the AWS Textract(deep learning)for extracting the text.
const response = await textract.getDocumentTextDetection(data).promise();
Then the text is passed to the Polly for voice conversion (ML) in the controller.
const pollyUrls = await PollyTextConverted(textString);

Alexa - How to send image to user?

I'm using a Lambda function for my Alexa Skill. For my launch intent, I query DynamoDB and return a String that I first want to convert into a QRCode and then I want to return it to the Alexa Device as an Image inside the responseBuilder
Alexa works fine displaying images from external urls such as
const rabbitImage = "https://i.imgur.com/U6eF0oH.jpeg";
return responseBuilder
.speak(say)
.withStandardCard("Welcome to Alexa", "description", rabbitImage, rabbitImage)
.reprompt('try again, ' + say)
.getResponse();
But I'm stuck on how to send the QRCode back to the Alexa Device in the responseBuilder.
I'm using a nodejs library called qrcode that can convert the String into a QRCode and then into base64.
https://www.npmjs.com/package/qrcode
But according to the Alexa docs for sending a "card" aka image, back to the user it has to be a url.
https://developer.amazon.com/en-US/docs/alexa/custom-skills/include-a-card-in-your-skills-response.html
The Alexa Skills Kit provides different types of cards:
A Standard card also displays plain text, but can include an image. You provide the text for the title and content, and the URL for the image to display.
So I'm not sure if the base64 generated by the qrcode library would work in this case.
What's the best way to send the dynamically generated QRCode back to the Alexa Device as a response in this scenario?
const LaunchRequest_Handler = {
canHandle(handlerInput) {
const request = handlerInput.requestEnvelope.request;
return request.type === 'LaunchRequest';
},
handle(handlerInput) {
const responseBuilder = handlerInput.responseBuilder;
//Perform query to DynamoDB
var stringToCreateQRWith = "8306e21d-0c9e-4465-91e9-0cf86fca110d";
//Generate qr code and send back to user here
//???Unsure how to do and what format to send it in
var qrImageToSendToUser = ???
return responseBuilder
.speak(say)
.withStandardCard("Welcome to Alexa", "description", qrImageToSendToUser , qrImageToSendToUser )
.reprompt('try again, ' + say)
.getResponse();
}
As #kopaka proposed, this is the way to go.
There is no way around it.
As per the documentation
They are a few thing you need to have in mind.
On the images itself, you will want to create 2 images with 720px x 480px and 1200px x 800px. To make sure they display nicely on multiple screen size. Otherwise they do not guarantee the best experience for your user, as they may scale up/down the image to fit.
On the storage choice, you need to make sure to be able to serve those images via Https and with a valid ssl certificate trusted by amazon.

how to load atlas using base64 in phaser3?

I am making a phaser3 playable ad and require to put images and other assets in a single HTML file. I can load the images using textures.addBase64 but how can I load atlas using base64.
Also if you could tell me how to put JSON in the HTML file so that I can refer to it while loading the atlas.
Thank you 🙂
Untested personally but take a look at this solution
private loadBase64Atlas(key: string, data: string, json: object): void {
const imageElement = new Image();
imageElement.onload = () => {
this.scene.textures.addAtlas(key, imageElement, json);
this.onAssetLoaded();
};
const spriteBlob = this.base64ToBlob(data.split(',')[1], 'image/png');
imageElement.src = URL.createObjectURL(spriteBlob);
}
Adapt as needed if you're not using TypeScript. Here's the base64ToBlog implementation
The JSON object can be just bounced back and forth from and to base64 if you had the need to embed it in base64 aswell:
// get the base64 string to embed in the ad and physically store it
const atlasJSON64 = btoa(JSON.stringify(atlasJSON));
[...]
const objectatlasJSON64 = JSON.parse(atob(atlasJSON64));

AWS Lambda base64 encoded form data with image in Node

So, I'm trying to pass an image to a Node Lambda through API Gateway and this is automatically base64 encoded. This is fine, and my form data all comes out correct, except somehow my image is being corrupted, and I'm not sure how to decode this properly to avoid this. Here is the relevant part of my code:
const multipart = require('aws-lambda-multipart-parser');
exports.handler = async (event) => {
console.log({ event });
const buff = Buffer.from(event.body, 'base64');
// using utf-8 appears to lose some of the data
const decodedEventBody = buff.toString('ascii');
const decodedEvent = { ...event, body: decodedEventBody };
const jsonEvent = multipart.parse(decodedEvent, false);
const asset = Buffer.from(jsonEvent.file.content, 'ascii');
}
First off, it would be good to know if aws-sdk had a way of parsing the multipart form data rather than using this unsupported third party code. Next, the value of asset ends with a buffer that's exactly the same size as the original file, but some of the byte values are off. My assumption is that the way this is being encoded vs decoded is slightly different and maybe some of the characters are being interpreted differently.
Just an update in case anybody else runs into a similar problem - updated 'ascii' to 'latin1' in both places and then it started working fine.

Resources