Dialogflow, detection intent from audio - audio

I'm trying to send an audio file to dialogflow API for intent detection. I already have an agent working quite well but only with text. I'm trying to add the the audio feature but with no luck.
I'm using the example (Java) provided in this page:
https://cloud.google.com/dialogflow-enterprise/docs/detect-intent-audio#detect-intent-text-java
This is my code:
public DetectIntentResponse detectIntentAudio(String projectId, byte [] bytes, String sessionId,
String languageCode)
throws Exception {
// Set the session name using the sessionId (UUID) and projectID (my-project-id)
SessionName session = SessionName.of(projectId, sessionId);
System.out.println("Session Path: " + session.toString());
// Note: hard coding audioEncoding and sampleRateHertz for simplicity.
// Audio encoding of the audio content sent in the query request.
AudioEncoding audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16;
int sampleRateHertz = 16000;
// Instructs the speech recognizer how to process the audio content.
InputAudioConfig inputAudioConfig = InputAudioConfig.newBuilder()
.setAudioEncoding(audioEncoding) // audioEncoding = AudioEncoding.AUDIO_ENCODING_LINEAR_16
.setLanguageCode(languageCode) // languageCode = "en-US"
.setSampleRateHertz(sampleRateHertz) // sampleRateHertz = 16000
.build();
// Build the query with the InputAudioConfig
QueryInput queryInput = QueryInput.newBuilder().setAudioConfig(inputAudioConfig).build();
// Read the bytes from the audio file
byte[] inputAudio = Files.readAllBytes(Paths.get("/home/rmg/Audio/book_a_room.wav"));
byte[] encodedAudio = Base64.encodeBase64(inputAudio);
// Build the DetectIntentRequest
DetectIntentRequest request = DetectIntentRequest.newBuilder()
.setSession("projects/"+projectId+"/agent/sessions/" + sessionId)
.setQueryInput(queryInput)
.setInputAudio(ByteString.copyFrom(encodedAudio))
.build();
// Performs the detect intent request
DetectIntentResponse response = sessionsClient.detectIntent(request);
// Display the query result
QueryResult queryResult = response.getQueryResult();
System.out.println("====================");
System.out.format("Query Text: '%s'\n", queryResult.getQueryText());
System.out.format("Detected Intent: %s (confidence: %f)\n",
queryResult.getIntent().getDisplayName(), queryResult.getIntentDetectionConfidence());
System.out.format("Fulfillment Text: '%s'\n", queryResult.getFulfillmentText());
return response;
}
I have tried with several formats, wav (PCM 16 bits several sample rates) and FLAC, and also converting the bytes to base64 in two different ways as described here (by code or console):
https://dialogflow.com/docs/reference/text-to-speech
I have even tested with the .wav provided in this example creating a new intent in my agent called "book a room" with that training phrase. It works using text and audio from the dialogflow console but only works with text, not audio from my code... and I'm sending the same wav they provide! (code above)
I always receive the same response (QueryResult):
I need a clue or something, I'm totally stuck here. No logs, no errors in the response... but does not work.
Thanks

I wrote to the dialogflow support and the replied my with a working piece of code. It is basically the same posted above, the only difference is the base64 encoding, it is not necessary to do that.
So I removed:
byte[] encodedAudio = Base64.encodeBase64(inputAudio);
(And used inputAudio directly)
Now It is working as expected...

Related

Azure Text to Speech (Cognitive Services) in web app - how to stop it from outputting audio?

I'm using Azure Cognitive Services for Text to Speech in a web app.
I return the bytes to the browser and it works great, however on the server (or local machine) the speechSynthesizer.SpeakTextAsync(inp) line outputs the audio to the speaker.
Is there a way to turn this off, since this runs on a web server (and even if I ignore it, there's the delay while it outputs audio before sending back the data)
Here's my code ...
var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
speechConfig.SpeechSynthesisVoiceName = "fa-IR-FaridNeural";
speechConfig.OutputFormat = OutputFormat.Detailed;
using (var speechSynthesizer = new SpeechSynthesizer(speechConfig))
{
// todo - how to disable it saying it here?
var speechSynthesisResult = await speechSynthesizer.SpeakTextAsync(inp);
return Convert.ToBase64String(speechSynthesisResult.AudioData);
}
What you can do is add an audioconfig to the speechSynthesizer.
In this audioconfig object you can specify a file path to a .wav file which already exist on the server.
Whenever you run speaktextasyn instead of a speaker it will redirect the data to the .wav file.
This audio file you can read and perform your logic later.
Just add the following code before creating the speechSynthesizer object.
var audioconfig = AudioConfig.FromWavFileOutput(filepath);
here filepath is a location of the .wav file as a string.
Complete code :
string filepath = "<file path> " ;
var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
var audioconfig = AudioConfig.FromWavFileOutput(filepath);
speechConfig.SpeechSynthesisVoiceName = "fa-IR-FaridNeural";
speechConfig.OutputFormat = OutputFormat.Detailed;
using (var speechSynthesizer = new SpeechSynthesizer(speechConfig, audioconfig))
{
// todo - how to disable it saying it here?
var speechSynthesisResult = await speechSynthesizer.SpeakTextAsync(inp);
return Convert.ToBase64String(speechSynthesisResult.AudioData);
}

How can I handle embed text tracks on chromecast

I'm trying to play a live stream which is .m3u8 file and using embed text tracks in it. (I mean the text track is wrapped with video segment in the same .ts file) It works well on videoElement since I could access textTrack with videoElement.textTracks. However, When I cast it to chromecast, the text tracks is not show up.
Here is code snippet:
Sender:
const englishSubtitle = new chrome.cast.media.Track('1', // track ID
window.chrome.cast.media.TrackType.TEXT);
englishSubtitle.subtype = chrome.cast.media.TextTrackType.CAPTIONS;
englishSubtitle.name = 'English';
englishSubtitle.language = 'en-US';
englishSubtitle.customData = null;
englishSubtitle.trackContentId = '1';
englishSubtitle.trackContentType = 'text/vtt';
mediaInfo.tracks = [englishSubtitle];
}
const loadRequest = new chrome.cast.media.LoadRequest(mediaInfo);
Receiver:
const textTracksManager = this.playerManager.getTextTracksManager();
const tracks = textTracksManager.getTracks();
textTracksManager.setActiveByIds([tracks[0].trackId]);
BTW: This configuration works well when casting an VOD content. But we use out band text tracks in VOD. (I mean we use an separate vtt file URL)

Azure TTS neural voice audio file is created abnormally in 1 byte size

Azure TTS standard voice audio files are generated normally. However, for neural voice, the audio file is generated abnormally with the size of 1 byte. The code is below.
C# code
public static async Task SynthesizeAudioAsync()
{
var config = SpeechConfig.FromSubscription("xxxxxxxxxKey", "xxxxxxxRegion");
using var synthesizer = new SpeechSynthesizer(config, null);
var ssml = File.ReadAllText("C:/ssml.xml");
var result = await synthesizer.SpeakSsmlAsync(ssml);
​ using var stream = AudioDataStream.FromResult(result);
await stream.SaveToWaveFileAsync("C:/file.wav");
}
ssml.xml - The file below, set to standard voice, works fine.
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-GB-George-Apollo">
When you're on the motorway, it's a good idea to use a sat-nav.
</voice>
</speak>
ssml.xml - However, the following file set for neural voice does not work, and an empty sound source file is created.
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-AriaNeural">
When you're on the motorway, it's a good idea to use a sat-nav.
</voice>
</speak>
Looking at the behavior you have described due to some issues, the Speech service has returned no audio bytes.
I have checked the SSML file at my end it works completely fine i.e. there is no issues with the SSML.
As a next step to the solution, I would recommend you to add error handling code to give better picture of the error and take the action accordingly :
var config = SpeechConfig.FromSubscription("xxxxxxxxxKey", "xxxxxxxRegion");
using var synthesizer = new SpeechSynthesizer(config, null);
var ssml = File.ReadAllText("C:/ssml.xml");
var result = await synthesizer.SpeakSsmlAsync(ssml);
if (result.Reason == ResultReason.Canceled)
{
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
Console.WriteLine ("No error ")
using var stream = AudioDataStream.FromResult(result);
await stream.SaveToWaveFileAsync("C:/file.wav");
}
else if (cancellation.Reason == CancellationReason.Error)
{
{
Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
}
}
The above piece of modification will provide friendly error message on the console app.
Note : If you are not using the console app, you will have modify the code.
Sample output :
This is just a sample output. the error you might see would be different.

Cannot get back the DICOM metadata after decode from base64 string

I am sending DICOM images to my API by encoding as base64 from the frontend, which is in Angular CLI. Also, I have Rest API to get those encoded DICOM images and decode them back before had some process with them. But after decoding the DICOM image into the memory stream, metadata of DICOM images are lost. It is appreciatable if I got a better solution. Please find my codes below.
//Angular code
var file = event.dataTransfer ? event.dataTransfer.files[i] :
event.target.files[0];
//var pattern = /.dcm/;
var reader = new FileReader();
reader.onload = this._handleReaderLoaded.bind(this);
reader.readAsDataURL(file);
//Web API Code
[HttpPost("UploadFile/{Id}")]
public async Task<IActionResult> UploadFile(int Id, [FromBody] DICOMFiles
dicomfiles)
{
String base64Encoded = encodedImage;
string output =
encodedImage.Substring(encodedImage.IndexOf(',') + 1);
byte[] data = Convert.FromBase64String(output);
MemoryStream stream = new MemoryStream(data);
client.UploadFile(stream, "Projects/test_images/Test.dcm");
}
At last, I found a solution for this. The problem is not about decode from base64. The actual problem is with the client.UploadFile() method call.
Before using the client.uploadfile(), we need to make sure that the memory stream object is pointing to position "0". This will allow the client.UploadFile() method to create and write all the content of the mentioned file from the start of the byte[] array. we can do this as mentioned below.
stream.Position = 0;

Azure Service Bus GetBody<Stream> encoding cannot be determined

I'm trying to get a BrokeredMessage from AzureServiceBus in a .NET client and choose how to deal with the message based on the type of message coming in, but ContentType and other message properties are not set.
My test message sending looks like this:
var client =
QueueClient.CreateFromConnectionString(connectionString, queueName);
var message = new BrokeredMessage("test");
client.Send(message);
My code to receive is using GetBody so that I can inspect the serialized data and decide how to deal with it:
var stream = message.GetBody<Stream>();
string s = null;
using (StreamReader sr = new StreamReader(stream))
{
s = sr.ReadToEnd();
}
The problem is that "s" above ends up with what looks like it should be XML created from a DataContractSerializer, but it is strangely encoded. I've tried many encodings on the receiving side and none seem to get me valid xml. Example results:
#string3http://schemas.microsoft.com/2003/10/Serialization/�test
I see the serialization namespace and what looks like it should start with <string, but as you can see I'm getting control characters. Does any one know how I can try to get the serialized data here as valid XML so I can dynamically handle it?
TIA for any help.
To be really clear I want to test the body so I can do something like:
if (BodyIsString(s)) { do something }
if (BodyIsPerson(s)) { do something else }
If I could getbody twice this would be really easy.
As Sean Feldman mentioned when send message is string type we could use
var body = message.GetBody<string>();
to get message body, after I decompile the WindowsAzure.ServiceBus.dll then get the code:
public T GetBody<T>()
{
if (typeof (T) == typeof (Stream))
{
this.SetGetBodyCalled();
return (T) this.BodyStream;
}
if (!this.bodyObjectDecoded || this.bodyObject == null)
return this.GetBody<T>((XmlObjectSerializer) new DataContractBinarySerializer(typeof (T)));
this.SetGetBodyCalled();
return (T) this.bodyObject;
}
I find that if the send message
is not Stream type it will be DataContractBinarySerializer. so we also could get the message body with following way
var stream = message.GetBody<Stream>();
var messageBody = new DataContractSerializer(typeof(string)).ReadObject(XmlDictionaryReader.CreateBinaryReader(stream, XmlDictionaryReaderQuotas.Max));
From the decompiled code we could know that if we send the stream message, we could get the message body with the way you mentioned.
Send stream message code :
var client = QueueClient.CreateFromConnectionString(connectionString, queueName);
var byteArray = Encoding.UTF8.GetBytes("test stream");
MemoryStream stream = new MemoryStream(byteArray);
client.Send(new BrokeredMessage(stream));
then receive message as you mentioned it should work:
var stream = message.GetBody<Stream>();
string s = null;
using (StreamReader sr = new StreamReader(stream))
{
s = sr.ReadToEnd();
}
Edit :According to update question:
If I could getbody twice this would be really easy.
we could clone the BrokerMessage
var newMessage = receiveMessage.Clone();
Edit2:
We also can get the message Properties to know the body type if we set it during sending. Take Label for example:
var message = new BrokeredMessage(object);
message.Label = "Type of message body";
client.Send(message);
While we receive the message we could get the message Label value then select the corresponding way to get the body.
You passed your payload as a string
var message = new BrokeredMessage("test");
therefore it was serialized as a string. Upon receive you should get the body as a string as well in the following manner:
var body = message.GetBody<string>();
You would use Stream if you'd actually construct your brokered message using a stream.

Resources