Unable to stream microphone audio to Google Speech to Text with NodeJS - node.js

I am going to develop a simple web based Speech to Text project. Develop with NodeJS, ws (WebSocket), and Google's Speech to Text API.
However, I have no luck to get the transcript from Google's Speech to Text API.
Below are my server side codes (server.js):
ws.on('message', function (message) {
if (typeof message === 'string') {
if(message == "connected") {
console.log(`Web browser connected postback.`);
}
}
else {
if (recognizeStream !== null) {
const buffer = new Int16Array(message, 0, Math.floor(message.byteLength / 2));
recognizeStream.write(buffer);
}
}
});
Below are my client side codes (ws.js):
function recorderProcess(e) {
var floatSamples = e.inputBuffer.getChannelData(0);
const ConversionFactor = 2 ** (16 - 1) - 1;
var floatSamples16 = Int16Array.from(floatSamples.map(n => n * ConversionFactor));
ws.send(floatSamples16);
}
function successCallback(stream) {
window.stream = stream;
var audioContext = window.AudioContext;
var context = new audioContext();
var audioInput = context.createMediaStreamSource(stream);
var recorder = context.createScriptProcessor(2048, 1, 1);
recorder.onaudioprocess = recorderProcess;
audioInput.connect(recorder);
recorder.connect(context.destination);
}
When I run the project, and open http://localhost/ in my browser, trying to speaking some sentences to the microphone. Unfortunately, there are no transcription returned, and no error messages returned in NodeJS console.
When I check the status in Google Cloud Console, it only display a 499 code in the dashboard.
Many thanks for helping!

I think the issue could be related to the stream process. Maybe some streaming process is stopped before the end of an operation. My suggestion is to review the callbacks in the JasvaScript code in order to find some “broken" promises.
Also, maybe its obvious but there is a different doc for audios than more than a minute:
https://cloud.google.com/speech-to-text/docs/async-recognize
CANCELLED - The operation was cancelled, typically by the caller.
HTTP Mapping: 499 Client Closed Request
Since the error message, this also could be related to the asynchronous and multithread features of node js.
Hope this works!

Related

Filtering and mixing WebRTC sound in NodeJs

Few weeks ago I wrote WebRTC browser client using mediasoup library. Now I am in middle of rewriting it as a NodeJS client.
I am stuck with one thing. I want to receive multiple WebRTC audio souroces, mix them into single track then apply some filters(eg. biquad filter) and then resend this track via WebRTC. Like on this diagram:
With browser I could achieve this using Web Audio API this is the code I used:
this.audioContext = new AudioContext();
this.outgoingStream = this.audioContext.createMediaStreamDestination();
this.addSoundFilter();
this.mixedTrack = this.outgoingStream.stream.getAudioTracks()[0];
this.handleIncomingSound();
addSoundFilter() {
this.filter = this.audioContext.createBiquadFilter();
this.filter.type = "lowpass";
this.filter.frequency.value = this.mapFrequencyValue();
this.gainer = this.audioContext.createGain();
this.gainer.gain.value = this.mapGainValue();
}
handleIncomingSound() {
this.audios.forEach((audio, peerId) => {
this.filterAudio(audio);
});
}
filterAudio(audioConsumerId) {
let audio = this.audios.get(audioConsumerId);
const audioElement = document.getElementById(audio.id);
const incomingSource = this.audioContext.createMediaStreamSource(
audioElement.srcObject
);
incomingSource.connect(this.filter).connect(this.gainer).connect(this.outgoingStream);
}
And with this code I could then send this.mixedTrack via WebRTC
However in NodeJS there is no WebAudioApi.
So how this can be achieved or if it's even possible to do it?

Sending messages to IoT hub from Azure Time Trigger Function as device

At the moment Im simulating device where every 30 seconds I send telemetry data to IoT hub.
Here is simple code:
s_deviceClient = DeviceClient.Create(s_iotHubUri, new DeviceAuthenticationWithRegistrySymmetricKey(s_myDeviceId, s_deviceKey), TransportType.Mqtt);
using var cts = new CancellationTokenSource();
var messages = SendDeviceToCloudMessagesAsync(cts.Token);
await s_deviceClient.CloseAsync(cts.Token);
await messages;
cts.Cancel();
And function to send message:
string combinedString = fileStrings[0] + fileStrings[1];
var telemetryDataString = converter.SerializeObject(combinedString);
using var message = new Message(Encoding.UTF8.GetBytes(telemetryDataString))
{
ContentEncoding = "utf-8",
ContentType = "application/json",
};
await s_deviceClient.SendEventAsync(message);
await Task.Delay(interval);
Everything works fine and I created .exe file that was running without problems. But computer where code is running tends to shut-off from time to time which is problematic. So I tried to move this to Azure Time Trigger Function. While in logs everything looks ok, messages aren't actually posted to IoT hub.
I tried to find solution but have not been able to find anything. Is it possible to send messages as device with azure function?
You seem to be closing your DeviceClient before you start using it to send messages. Try the following:
public async Task Do()
{
// Using statement will dispose your client after you're done with it.
// No need to close it manually.
using(var client = DeviceClient.Create(s_iotHubUri, new DeviceAuthenticationWithRegistrySymmetricKey(s_myDeviceId, s_deviceKey), TransportType.Mqtt))
{
// Send messages, await for completion.
await SendDeviceToCloudMessagesAsync(client);
}
}
private async Task SendDeviceToCloudMessagesAsync(DeviceClient client)
{
string combinedString = fileStrings[0] + fileStrings[1];
var telemetryDataString = converter.SerializeObject(combinedString);
using var message = new Message(Encoding.UTF8.GetBytes(telemetryDataString))
{
ContentEncoding = "utf-8",
ContentType = "application/json",
};
await client.SendEventAsync(message);
await Task.Delay(interval);
}

Azure TTS neural voice audio file is created abnormally in 1 byte size

Azure TTS standard voice audio files are generated normally. However, for neural voice, the audio file is generated abnormally with the size of 1 byte. The code is below.
C# code
public static async Task SynthesizeAudioAsync()
{
var config = SpeechConfig.FromSubscription("xxxxxxxxxKey", "xxxxxxxRegion");
using var synthesizer = new SpeechSynthesizer(config, null);
var ssml = File.ReadAllText("C:/ssml.xml");
var result = await synthesizer.SpeakSsmlAsync(ssml);
​ using var stream = AudioDataStream.FromResult(result);
await stream.SaveToWaveFileAsync("C:/file.wav");
}
ssml.xml - The file below, set to standard voice, works fine.
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-GB-George-Apollo">
When you're on the motorway, it's a good idea to use a sat-nav.
</voice>
</speak>
ssml.xml - However, the following file set for neural voice does not work, and an empty sound source file is created.
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-AriaNeural">
When you're on the motorway, it's a good idea to use a sat-nav.
</voice>
</speak>
Looking at the behavior you have described due to some issues, the Speech service has returned no audio bytes.
I have checked the SSML file at my end it works completely fine i.e. there is no issues with the SSML.
As a next step to the solution, I would recommend you to add error handling code to give better picture of the error and take the action accordingly :
var config = SpeechConfig.FromSubscription("xxxxxxxxxKey", "xxxxxxxRegion");
using var synthesizer = new SpeechSynthesizer(config, null);
var ssml = File.ReadAllText("C:/ssml.xml");
var result = await synthesizer.SpeakSsmlAsync(ssml);
if (result.Reason == ResultReason.Canceled)
{
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
Console.WriteLine ("No error ")
using var stream = AudioDataStream.FromResult(result);
await stream.SaveToWaveFileAsync("C:/file.wav");
}
else if (cancellation.Reason == CancellationReason.Error)
{
{
Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
}
}
The above piece of modification will provide friendly error message on the console app.
Note : If you are not using the console app, you will have modify the code.
Sample output :
This is just a sample output. the error you might see would be different.

Get user join / leave events retroactively from Channels

I'm trying to do some analytics on average response time from some of our users on Twilio Chat.
I'm iterating through my channels, and I'm able to pull the info about messages, so I can compare times a message went un-responded to. However, I can't determine which users were in the channel at that time.
Is there anything on the channel that would give me historic member data? Who was in the channel? The channel.messages().list() method is only giving me the text of the messages sent to the channel and who it was by, but the user who may have been in a channel to respond changes throughout a channel's life time.
This is on the backend using the node.js SDK. note: This isn't a complete implementation for what I'm trying to do, but taking it in steps to get access to the information I'd need to do this. Once I have these messages and know which users are supposed to be in a channel at a given time, I can do the analytics to see how long it took for the users I am looking for to respond.
var fs = require('fs');
const Twilio = require('twilio');
const client = new Twilio(env.TWILIO_ACCOUNT_SID, env.TWILIO_AUTH);
const service = client.chat.services(env.TWILIO_IPM_SERVICE_SID);
async function getChatMessages() {
const fileName = 'fileName.csv';
const getLine = message => {
return `${message.channelSid},${message.sid},${message.dateCreated},${message.from},${message.type},${message.body}\n`;
}
const writeToFile = message => { fs.appendFileSync(fileName, getLine(message)); };
const headerLine = `channelSid,messageSid,dateCreated,author,type,body`;
fs.writeFileSync(fileName, headerLine);
await service.channels.each(
async (channel, done) => {
i++;
let channelSid = channel.sid;
if( channel.messagesCount == 0 ) return;
try {
await channel.messages().list({limit:1000, order:"asc"}).then(
messages => {
messages.forEach( writeToFile );
}
);
} catch(e) {
console.log(`There was an error getting messages for ${channelSid}: ${e.toString()}`);
}
if( i >= max ) done();
}
);
}
I'm beginning to be resigned to the fact that maybe this would only have been possible to track had I set up the proper event listeners / webhooks to begin with. If that's the case, I'd like to know so I can get that set up. But, if there's any endpoint I can reach out to and see who was joining / leaving a channel, that would be ideal for now.
The answer is that unfortunately you can not get this data retroactively. Twilio offers a webhooks API for chat which you can use to track this data yourself as it happens, but if you don't capture the events, you do not get access to them again.

Microsoft Azure Web Chat, trigger mic from other function

I have used Microsoft Bot Framework to create a bot for the client side. i.e. called WebChat. I have also added Speech SpeechRecognizer. However, I am trying to trigger a mic when a phrase is recited.
I couldn't find a function in Microsoft that does this. So I added my own speech recognizer that is called every second and once the phrase is called I want to call the mic function from the Cognitive Services.
How Can I achieve this?
I have got the speech recognizer from here
And the one I have written to identify a phrase is this:
function startDictation() {
if (window.hasOwnProperty('webkitSpeechRecognition')) {
var recognition = new webkitSpeechRecognition();
recognition.continuous = false;
recognition.interimResults = false;
recognition.lang = "en-US";
recognition.start();
recognition.onresult = function (e) {
var foundText = e.results[0][0].transcript;
console.log(foundText);
if (foundText == "hello hello") {
console.log("found text");
//call cognitive service mic function
recognition.stop();
}
else {
console.log("text not found");
recognition.stop();
startDictation();
}
};
recognition.onerror = function (e) {
console.log("found error", e);
recognition.stop();
}
}
}
Do let me know if any information is missing or miss-leading.
For more:
I tried to leverage startRecognizing() function in SpeechRecognizer class at https://github.com/Microsoft/BotFramework-WebChat/blob/master/src/CognitiveServices/SpeechRecognition.ts#L72 to trigger the recognize functionality. However, I found that only if I have click the mic item, then I could use startRecognizing() function to recognize the voice.
There is a trick workaround you can try to use at present:
I inspected the mic item, and try to triiger its click event in js, which worked exactly to recognize my speech.
You can try to use following js code snippet with jQuery:
$('.wc-mic').trigger("click")
Hope it helps.

Resources