How to use keyword spotting feature for IBM Waston Speech to text API?

How to use keyword spotting feature for IBM Waston Speech to text API? - speech-to-text

I am using IBM Watson Speech to text API to convert audio files into text. Every feature is working fine for me. But I am unable to use the Keyword Spotting feature. The output is not giving any info regarding spotted keywords.
Here is my code:
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("*********", "********");
//SpeechModel model =service.getModel("en-US_NarrowbandModel");
service.setEndPoint("https://stream.watsonplatform.net/speech-to-text/api");
String[] keys= {"abuse","bullying","parents","physical","assaulting"};
RecognizeOptions options = new RecognizeOptions().contentType("audio/wav").model("en-US_NarrowbandModel").continuous(true).inactivityTimeout(500).keywords(keys).keywordsThreshold(0.7);
File audio = new File("C:\\Users\\AudioFiles\\me.wav");
SpeechResults transcript = service.recognize(audio, options);
//Speech t1 = service.recognize(audio, options);
System.out.println(transcript);
Is there any special function to get the spotted keywords as output as well with the transcript?

This was fixed in the Java SDK v3.2.0. Make sure you download the latest version (4.2.1) jar: java-sdk-4.2.1-jar-with-dependencies.jar or update your Gradle/Maven to pull the latest version.
The code below is based on the code in your question.
SpeechToText service = new SpeechToText();
service.setUsernameAndPassword("USERNAME", "PASSWORD");
File audio = new File("C:\\Users\\AudioFiles\\me.wav");
RecognizeOptions options = new RecognizeOptions().Builder()
.contentType("audio/wav)
.inactivityTimeout(500)
.keywords({"abuse", "bullying", "parents", "physical", "assaulting"})
.keywordsThreshold(0.5)
.build();
SpeechResults transcript = service.recognize(audio, options).execute();
System.out.println(transcript);

Related

Azure TTS neural voice audio file is created abnormally in 1 byte size

Azure TTS standard voice audio files are generated normally. However, for neural voice, the audio file is generated abnormally with the size of 1 byte. The code is below.
C# code
public static async Task SynthesizeAudioAsync()
{
var config = SpeechConfig.FromSubscription("xxxxxxxxxKey", "xxxxxxxRegion");
using var synthesizer = new SpeechSynthesizer(config, null);
var ssml = File.ReadAllText("C:/ssml.xml");
var result = await synthesizer.SpeakSsmlAsync(ssml);
 using var stream = AudioDataStream.FromResult(result);
await stream.SaveToWaveFileAsync("C:/file.wav");
}
ssml.xml - The file below, set to standard voice, works fine.
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-GB-George-Apollo">
When you're on the motorway, it's a good idea to use a sat-nav.
</voice>
</speak>
ssml.xml - However, the following file set for neural voice does not work, and an empty sound source file is created.
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
<voice name="en-US-AriaNeural">
When you're on the motorway, it's a good idea to use a sat-nav.
</voice>
</speak>

Looking at the behavior you have described due to some issues, the Speech service has returned no audio bytes.
I have checked the SSML file at my end it works completely fine i.e. there is no issues with the SSML.
As a next step to the solution, I would recommend you to add error handling code to give better picture of the error and take the action accordingly :
var config = SpeechConfig.FromSubscription("xxxxxxxxxKey", "xxxxxxxRegion");
using var synthesizer = new SpeechSynthesizer(config, null);
var ssml = File.ReadAllText("C:/ssml.xml");
var result = await synthesizer.SpeakSsmlAsync(ssml);
if (result.Reason == ResultReason.Canceled)
{
var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
if (result.Reason == ResultReason.SynthesizingAudioCompleted)
{
Console.WriteLine ("No error ")
using var stream = AudioDataStream.FromResult(result);
await stream.SaveToWaveFileAsync("C:/file.wav");
}
else if (cancellation.Reason == CancellationReason.Error)
{
{
Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
}
}
The above piece of modification will provide friendly error message on the console app.
Note : If you are not using the console app, you will have modify the code.
Sample output :
This is just a sample output. the error you might see would be different.

Google Vision text detection for Node.js using base64 encoding

Just started exploring Google Cloud Vision APIs. From their guide:
const client = new vision.ImageAnnotatorClient();
const fileName = 'Local image file, e.g. /path/to/image.png';
const [result] = await client.textDetection(fileName);
However, I wanna use base64 representation of binary image data, since they claim that it's possible to use.
I found this reference on SO:
Google Vision API Text Detection with Node.js set Language hint
Instead of imageUri I used "content": stringas mentioned here. But the SO sample uses const [result] = await client.batchAnnotateImages(request);method. I tried using same technique on const [result] = await client.textDetection( method and it gave me an error.
So my question is: Is it possible to use base64 encoded string to represent image in order to perform TEXT_DETECTION ? And if so, how?
Any kind of help is highly appreciated.

You can use the quickstart guide and from there edit the lines after the creation of the client for the following:
// Value of the image in base64
const img_base64 = '/9j/...';
const request = {
image: {
content: Buffer.from(img_base64, 'base64')
}
};
const [result] = await client.textDetection(request);
console.log(result.textAnnotations);
console.log(result.fullTextAnnotation);
You can take a look at the function here, read the description of the request parameter, in particular the following part:
A dictionary-like object representing the image. This should have a
single key (source, content).
If the key is content, the value should be a Buffer.
Which leads to the structure used in the sample code from before. Opposed to when using imageUri or filename, which have to be inside of another object which key is source, as shown in the sample.

content field need to be Buffer.
You use the nodejs client library. The library use the grpc API internally, and grpc API expect bytes type at content field.
However, JSON API expect base64 string.
References
https://cloud.google.com/vision/docs/reference/rpc/google.cloud.vision.v1#image
https://googleapis.dev/nodejs/vision/latest/v1.ImageAnnotatorClient.html#textDetection

Saving conversation transcripts - bot framework v4.4 node.js [duplicate]

I have to log the user-bot conversation in CosmosDB for audit/history purpose. In V3 using .Net, I was using table logger module, like the below code.
builder.RegisterModule(new TableLoggerModule(account, chatHistoryTableName));
Now we are upgrading/rewriting the bot to V4 in NodeJS. Please guide if there is a similar approach available for V4 in NodeJS to save the entire conversation?

This example hasn't been merged yet: https://github.com/Microsoft/BotBuilder-Samples/pull/1266
It uses AzureBlobTranscriptStore and TranscriptLoggerMiddleware
const { AzureBlobTranscriptStore } = require('botbuilder-azure');
const { TranscriptLoggerMiddleware } = require('botbuilder-core');
// Get blob service configuration as defined in .bot file
const blobStorageConfig = botConfig.findServiceByNameOrId(BLOB_CONFIGURATION);
// The transcript store has methods for saving and retrieving bot conversation transcripts.
let transcriptStore = new AzureBlobTranscriptStore({storageAccountOrConnectionString: blobStorageConfig.connectionString,
containerName: blobStorageConfig.container
});
// Create the middleware layer responsible for logging incoming and outgoing activities
// into the transcript store.
var transcriptMiddleware = new TranscriptLoggerMiddleware(transcriptStore);
adapter.use(transcriptMiddleware);

This should provide a good start.
https://learn.microsoft.com/en-us/azure/bot-service/bot-builder-howto-v4-state?view=azure-bot-service-4.0&tabs=javascript

Google TTS in Crossrider

i developed Google chrome extension that contains Google TTS
i rewrite it with Crossrider to make it work in different platforms (it works great untill it comes to TTS part)
here is the code :
function PlayGoogleTTS(EngWord){
voices = speechSynthesis.getVoices();
msg = new SpeechSynthesisUtterance();
msg.volume = 1; // 0 to 1
msg.rate = 10; // 0.1 to 10
msg.pitch = 2; //0 to 2
msg.text = EngWord;
msg.lang = 'en-US';
msg.voice = voices[1];
msg.voice = voices[1]; // Note: some voices don't support altering params
speechSynthesis.speak(msg);
}
// Fetch the list of voices and populate the voice options.
function loadVoices() {
// Fetch the available voices.
var voices = speechSynthesis.getVoices();
}
// Chrome loads voices asynchronously.
window.speechSynthesis.onvoiceschanged = function(e) {
loadVoices();
};
so how can i convert it to make it work on Crossrider?

It's not clear from your question which speechSynthesis library/api you are using. However, assuming it is based on Chrome's TTS API, the required "tts" permission is not available.
[Disclosure: I am a Crossrider employee]

it's considered workaround more than an answer
just used another TTS that able to generate ogg maves in firefox

Windows.Media.Capture.CameraCaptureUI not working well in all platforms

In my windows store app I used Windows.Media.Capture.CameraCaptureUI class to capture Images and Videos. The recorded videos are working fine in Windows environment. But the same recorded file is not played well in Ipad.
I will include the code here
CameraCaptureUI dialog = new CameraCaptureUI();
dialog.PhotoSettings.Format = CameraCaptureUIPhotoFormat.Jpeg;
dialog.VideoSettings.Format = CameraCaptureUIVideoFormat.Mp4;
dialog.VideoSettings.MaxResolution = CameraCaptureUIMaxVideoResolution.LowDefinition;
StorageFile capturedMedia = null;
if( _showVideo )
capturedMedia = await dialog.CaptureFileAsync(CameraCaptureUIMode.PhotoOrVideo);
else
capturedMedia = await dialog.CaptureFileAsync(CameraCaptureUIMode.Photo);
Please help.

This looks more like a bug report. Sometimes there are bugs.
This is something you should report at http://connect.microsoft.com than here.
Your contribution to Connect helps other developers.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to use keyword spotting feature for IBM Waston Speech to text API? - speech-to-text

Related

Azure TTS neural voice audio file is created abnormally in 1 byte size

Google Vision text detection for Node.js using base64 encoding

Saving conversation transcripts - bot framework v4.4 node.js [duplicate]

Google TTS in Crossrider

Windows.Media.Capture.CameraCaptureUI not working well in all platforms

Categories

Resources