Speech to text in Electron - text

Is there a way I can record audio in electron?
I would also like to find out if it is possible to use Google speech API to convert speech to text in electron

Is there a way I can record audio in electron?
Of course. Electron is based on Chromium and Node.js so you have access to:
any APIs you can use in the browser (like the MediaStream Recording API)
all Node.js features and libraries that work with Node.js (like node-audiorecorder)
I would also like to find out if it is possible to use Google speech API to convert speech to text in electron
Yes. There's an official Node.js package for the Google Cloud Speech API. Again, an Electron-specific solution is not needed in this case.

Related

Is it possible to use Googles WaveNet Text-To-Speech model for the Actions-On-Google integration of a Dialogflow agent?

Google Clouds Text-To-Speech API has a WaveNet model whose output in my opinion sounds way better than the standard speech. This model can be used in Dialogflow agents (Settings > Speech > Text To Speech), which results in the generated speech being included in the DetectIntentResponse. However, I can find no way to use this speech with the Actions-On-Google integration, i.e. in an actual Google Assistant app. Have I overlooked this, or is this really not possible, and if so, does anyone know when they plan to enable this?
In the Actions console, going to the Invocation page lets you select a TTS voice.
All of the voices can be demoed on the Languages & Locales page of the docs, and the vast majority of them use WaveNet voices.

Dialogflow SDK or Dialogflow REST API, which is faster in term of response time?

I have used Dialogflow for developing the app for Google Assistant. I have created intents and entities in the Dialogflow web GUI and I'm using a webhook response for further conversation.
Now I want to build a chatbot that is part of an existing Android or iOS app and use the code I already wrote for Dialogflow as part of this. What do I need to be aware of when I do so? It looks like I can use the SDK for that platform or make calls to the Dialogflow REST API. Which is faster or are there any tradeoffs? Can I use the Dialogflow NLP without going over the network?
Note: Dialogflow API V1 is deprecated and will be shut down on October 23th, 2019.
That means that the official Javascript, native Android, native iOS and Cordova clients will stop working since they all use V1. There's no word if and when these clients will be upgraded to V2.
So the best bet right now is to use the REST APIs.
There are a few things to be aware of when moving from fulfillment that was built for Actions on Google to using this to also provide responses for other platforms. Actions on Google expects the responses to be formatted slightly differently, and if you're using AoG specific characteristics (such as a SimpleResponse object or a Card object), then it might not appear for other Dialogflow integrations. So you'll need to go over your webhook code to make sure what you send back works across platforms. Your logic and the Dialogflow UI builder should pretty much remain the same - it is just your backend that might need some work.
To make the call, as you say, you can either do the REST call yourself or use the SDK built by Dialogflow. While the SDK will be slightly faster, since it is using ProtoBuffs instead of REST, the difference will likely be fairly slight in most cases. If you're planning to stream audio, you will likely need to either use the SDK or your own ProtoBuff implementation because REST doesn't handle that as well. If you're just sending text, and are more comfortable with doing REST APIs, then this is a perfectly reasonable approach.
There is no "local Dialogflow" library. All calls have to go over the network. There are other libraries that do Speech-to-Text and NLP locally if that is what you need.

How can I use cortana voice commands in electron?

Is it possible to use cortana voice commands in electron? I'm talking about the actual UWP API not cortana skills. I don't need a bot I want to be able to use my voice commands offline and the type of actions that my app provides doesn't need any third-party API. (something like "hey cortana ask [MY APP] how many movies do I have?")
I have seen cortana voice command sample with winJS and it is possible to use winJS in electron. but how am I actually going to use a VCD file in Electron with winJS? the sample code is for visual studio and winJS only
so I'm hoping for some clarification or a guideline on how to use VCD in electron-
Electron enables developers to build Desktop apps using JavaScript and Node modules. Then, if you want to know whether the UWP APIs callable from a classic desktop app, you could check this document: https://msdn.microsoft.com/en-us/library/windows/desktop/mt695951(v=vs.85).aspx
After you know if the specific UWP API is callable from desktop app, then, next step is how to call this API in Electron. There’s an open-source project named as NodeRT.
NodeRT automatically exposes Microsoft’s UWP/WinRT APIs to the Node.js environment by generating Node modules. This enables Node.js developers to write code that consumes native Windows capabilities. The generated modules' APIs are (almost) the same as the UWP/WinRT APIs listed in MSDN.
So, you could use it to call the specific UWP APIs in Electron.

Enabling the microphone in a browser using node.js and capturing the information spoken

I have been struggling for a while and have been looking through many examples on how to enable the mic in a browser with Node.js. I have seen several Javascript examples but, I can't get the spoken content out of them and store it in variables. How can I enable the mic using Node.js? Will I need a specific npm package? I am currently working with the IBM Watson Speech to Text api. Any help is appreciated! Thanks in advance!
You will need to enable the mic in the browser using a client side library.
Use the Speech-to-Text SDK here:
https://github.com/watson-developer-cloud/speech-javascript-sdk
And a working example here:
https://watson-speech.mybluemix.net/microphone-streaming.html
Please be aware that streaming microphone will not work on any version of Safari. You will need to use FireFox, Chrome or IE to use streaming microphone into Watson Speech to Text. There's a YouTube tutorial on building a simple Bluemix App using Speech to Text here: (see Chapter 3) Youtube TutorialThe supporting code is in a public git repo here: Zero To Cognitive Repo

Using Bing Speech Recognition API with node.js Bot Framework on Facebook Messenger

I would like to use the Bing Speech Recognition API to convert speech to text when using the audio recording (microphone) button in Facebook Messenger to chat with my node.js chatbot. I have managed to convert speech to text using the instructions from BotBuilder-Samples. However, as according the Speech API's documentation, only the audio/wav codec is being supported. I have checked the content type of the audio recording attachments in Messenger and they are encoded in audio/aac.
I would like to ask you whether there is a nice way to convert audio/aac to audio/wav or if there is some other way to make Messenger work with the Bing Speech Recognition API. Ideally, there is already exisiting node.js code that I could adapt for my existing chatbot.
Thanks and best regards!

Resources