How to record audio in a Chrome Extension? - google-chrome-extension

What's the simplest way to setup a chrome extension to record audio from the microphone?
I see there is a working experimental speech input API but how come you don't have access to the recorded file? Seems like hooking up into that should be simple enough, as it's a step earlier in the process, no? Especially as there is also a text-to-speech API, so you could effectively record into text and then have the computer speak it back out, but unless you want a standard voice, how lame, redundant and prone to error is that?
Then there seem to be flash solutions like this but how can I use that in a chrome extension without having to setup anything server-side? (since I don't actually need to send anything to a server--it's all local and client-side)
Is NPAPI a possibility? Is there such a plugin ready-made?
Don't know of other possible alternatives (HTML5 isn't ready yet, it seems) but I welcome anything functional and simple to implement and hook into a chrome extension.

Finally a native solution appeared: Introducing getUserMedia

You cannot use the speech input API, since it will record only the microphone. Okay, you can grab the speakers like that, but it's clearly not the solution.
Using a NPAPI plugin is a solution. You'll can identify the sound made by a particular tab and after record that source, but it is no longer web dev.

Related

Using Google Cloud Speech on mobile never returns an isFinal

I am currently implementing a ChatBot using Google Cloud Speech.
I am using socket.io to record a microphone stream and then sending that through node to Google Cloud Speech.
Everything is working fine on my laptop and my android mobile phone (Nexus 5x, Chrome 68)
I record the audio, and having set single_utterance to true, get a result with "isFinal" as soon as I pause speaking.
But if I set the language code to 'da-DK', I never get a "isFinal" result (unless I end the stream myself) on mobile. Works fine on my laptop, but not mobile.
Have anyone experienced anything similar?
As a bonus info:
If I set interimResults to true, I do get multiple results, but they are just never isFinal.
So just to be clear: everything is working perfectly apart from the one case: da-DK on mobile.
As this behavior is occurring only when using the da-DK supported language on a mobile device, it might be related to an internal service issue; therefore, I think that you should take a look the Issue Tracker tool that you can use to raise a Speech-to-Text API in order to verify this scenario with the Google Technical Support Team. In this way, you will be able to share your code, audio files and internal project information if required by the troubleshooting process.
Additionally, I suggest you to take a look on this link that contains some useful documentation and example to use Google Cloud Speech API on an Android environment that you may use as a reference for your project.

How to develop Spotify Desktop Applications, now Libspotify is discontinued

have done my due diligence, and not found any other posts that answer this question, but as usual, if you know a similar question, point me that way!
I noticed a long time back that Libspotify has been dicontinued:
(https://developer.spotify.com/technologies/libspotify/)
So, my question is - what should we do for developing Desktop applications?
They do state: "We hope to be able to provide you with a new library for other platforms." But, this has been going on since 2015!
I have seen many projects in GitHub still using Libspotify - so what should we do? An update was promised "in the upcoming months" but I've not seen anything yet.
What should we do for developing Desktop Applications?
We at Spotify don't currently provide playback as part of our platform offering outside of our iOS and Android SDKs, and I don't have any updates on that at the moment. As mentioned on the website, we hope to be able to provide playback SDKs for more platforms in the future. We don't support any new development on libspotify.
You can use the Spotify Web API to interact with Spotify in a variety of ways, including getting information about metadata, and accessing/modifying user libraries and playlists, which may be useful. You can also use the Applescript API to control playback on macOS, which may also help.
The Spotify Web API is pretty straight forward to use. Of course it defines the protocol rather than implements it so it is OS independent.
I put together a few classes to help unwrap some of the JSON parameters simply. These were written in Swift for macOS.

Detect audio from the user and converte to text to command AI bots in Unity

I am making a game where I want to command the AI using word i speak.
Say for example I can say go and AI bot goes to certain distance.
Question is I am finding asset and no provider is giving me grantee that it is possible ?
What are the difficulties for doing it?
I am programmer so if some one suggest the way to handle it I can do it.
Should I make mic listener on all the time and read audio and then pass audio to some external sdk which can convert my voice to text ?
these are the asset provider i have contacted.
https://www.assetstore.unity3d.com/en/#!/content/73036
https://www.assetstore.unity3d.com/en/#!/content/45168
https://www.assetstore.unity3d.com/en/#!/content/47520
and few more !
If someone just explains the steps I need to follow then I can try it for sure.
I am currently using this external api for pretty much the same thing: https://api.ai/
It comes with a unity SDK that works quite well:
https://github.com/api-ai/api-ai-unity-sample#apiai-unity-plugin
You have to connect a audio source to the sdk, and tell it to start listening. It will then convert your voice audio to text, and even detect pre-selected intentions from your voice audio / text.
You can find all steps on how to integrate the unity plugin in the api.ai Unity SDK documentation on github.
EDIT: It's free too btw :)
If you want to recognize offline without sending data to the server, you need to try this plugin:
https://github.com/dimixar/unity3DPocketSphinx-android-lib
It uses open source speech recognition engine CMUSphinx

Apps Script vs Chrome Extension: Writing an alternative spellchecker to Google Docs

Say, I want to develop an alternative spellcheck module to google docs.
This means that I have to get corrections from my backend, and color the misspelled text's background, and do a small popup bubble when user hovers over it, where I'd display the correction. (please mind that spellcheck is not the actual goal of my project, but it does address my problems in a more simplified way)
What are my options? Any ideas how to do this?
Few possible solutions I came up with:
Chrome extension vs Apps script
Chrome extension
pros: user has to grant permissions once, can freely traverse and append anything to dom via content script
cons: is a "hacky" way, if google changes classnames or js source, it would stop working, and also, reverse engineering google docs's editor engine is impossible
Apps script
pros: supported by google: if it works, I dont need to be afraid of docs updates
cons: it seems to me that I can't just fiddle with the dom (because of Caja compiler), has very limited support (if any) for custom highlighting or hover functionality.
As I see it, neither of these are perfect solutions for this project. What do you think? Any suggestions are very welcomed.
I know this is an old question, but I have recently gotten into the same problem, and believe I have a solution. So for future Googler's I will post my answer here.
My solution was to create a Chrome Extension and understand how the Google Docs DOM's are structures to interact with it.
You can find my code to work with the Google Doc DOM's here
In Apps Script you can't "fiddle" with the DOM and you won't be able to implement hover functionality. Also, a lame Highlighting would involve changing the current document itself (which would go to revision history, undo queue, etc)
Therefore, your only altertive is the Chrome Extension. But I agree with you on the cons. It is a super hard task that could break at any minute without notice.

Do you know of any NPAPI Chrome plugin that plays local media and communicates with a web app?

Need some inspiration for such a plugin. We aim to mix media, stored locally and in the cloud, in one online experience. Access via the File API sucks badly. We need something better.
Sure... Flash does that, Silverlight does that, Quicktime does that, Move Media Player does that... I'm sure there are others, but I'm not sure how it helps you, since none that I am aware of except VLC is open source, and the VLC open source plugin isn't the best written one I have seen (no offense to those involved, it's a hard task).
If you haven't already, I'd take a look at FireBreath; it simplifies the plugin creation process and lets you focus more on what you want to do with it.

Resources