Use Web Speech API implementation in Google Chrome to circumvent rate limit of standard API? - node.js

To use Google's Speech API directly it is now required that you obtain an API key. To get that key you must subscribe to the chromium-dev#chromium.org newsgroup, and then follow a few steps and Google will give you a developer's key that is "not for distribution." The key is rate limited for 50 requests/day.
For example, node-google-speech-api outlines the need for having this key for a node application to access Google's Speech API directly (without the use of a browser):
https://github.com/psirenny/node-google-speech-api
There are also PHP libraries and Java libraries for accessing Google's Speech API, also requiring this key.
I would like to write a desktop application that utilizes Google's speech recognition technology, but the 50 requests/day limit is unacceptable for wide distribution and even for a single desktop deployment of my envisioned software. I see up to 500 requests/day by an individual desktop user if the voice recognition is broken up somehow, and most of these would probably be long-polling/continuous so maybe it'd only be 2 or 3 requests/day but hours at a time. Multiply that by a few hundred users and I'd be easily exceeding 50 requests/day.
I was trying to think of a way to access Google's superior speech recognition technology on the desktop in my own app (language doesn't matter but node.js would likely be part of the mix so a node.js solution would be preferred) without this limit and that brought me to consider the Web Speech API standard which Google Chrome happens to implement.
As far as I know, there is not a hard request/day limit imposed on Google Chrome's implementation of the Web Speech API, and I could happily write websites that used Web Speech API all day long without or with minimal restrictions compared to Google Speech API direct. This brought me to thinking, what if I distributed a Chrome (not Chromium) browser, so the bonafide Google Chrome browser, but added an "extension" to it that allowed javascript within a custom html5 web page to interface with other applications on the client's system (ie a Node.js app running alongside this special installation of Chrome) and wrote my speech recognition portion in Javascript, Web Speech API style, and piped the output into the other application I design and have installed on clients' systems.
Would/could that work?
What are the pitfalls of this approach?
Do you have suggestions of another approach or would you perhaps recommend a commercially-licensed solution that is comparable to the ease of use and extreme natural language accuracy of Google's speech technology?

One possible approach to try is a Chrome App
It will run in a sandboxed instance of Chrome and will be implemented with HTML + Javascript.
To the user it will look just like a desktop application.

Related

The technology behind Google Translate and DeepL websites

I am working on a front-end solution for translating text on Google Cloud platform. I want to know what is the technology behind the front-end of apps like deepl?
Which cloud components do they use to efficiently translate the text as the user types new characters inside the input field?
Translator services like Google Translate, or the one you mentioned, are normally built on top of many different software components, layers and servicers (quite often also involving ML/NLU etc).
If you are a frontend developer looking for an easy way to translate UIs or some user input, have a look to Google Translate API. Be aware of free quotas and prices.

Detecting known words using the Web Speech API

I'm in the planning stages of a web app that is intended to help children learn vocabulary. We would like to make the app as interactive as possible. For example, we would show a picture of an apple and ask the child to identify the object. The child would then say "apple" and we would determine whether they are correct, etc.
The new Web Speech API seems like a promising tool for this project. However, looking through the documentation, I believe it will only produce transcripts from speech (i.e. it cannot match a spoken word to a known word and produce a confidence value – at least not out of the box).
Does anyone have experience with leveraging the Web Speech API in this way (or any other API for that matter)? I'm trying to stick to technologies that can run in the browser, if possible.
Try ispikit.com, it's way more suitable for your needs, it's specifically designed for education tasks and works in a browser on a client side. Web Speech API is not designed for detection, so you will not be able to use it.

Tradeoffs of browser-based development vs. Smart Client

I've got an app that's been started on the Microsoft stack as a smart client (notionally WCF/WS enabled) with a small client app that gets deployed and the rest of the app running in our private cloud. It's only real dependency is internet connectivity, .net 4 and a windows operating system.
I am under pressure to convert over to a browser based architecture for all future development. Based on other web apps I've worked on, I'm concerned that the way that client IT organizations can control the browser, it will cause more problems down the line than what I really want to deal with.
Do you have experience making this kind of decision? What technical factors did you consider when deciding to go smart-client vs. browser? What resources were helpful in making this decision?
My app is a healthcare app targeted at healthcare providers (eg. hospitals), so everywhere I go, I have to worry about the Healthcare CIO looking over my shoulder.
Interesting. Originally I'm from C# winform and WPF Desktop programmer, and later being assigned to do web development. Haven't touch Smart Client yet but I think it should almost be the same with Native app. Based on experience, the technical things to consider are:
Multi browser support
Especially for reporting and graphic processing, without some library / plugins / framework for your component, it will be insanely hard to keep your app multibrowser. Especially in css style and less in javascript.
Client programming(javascript)
You will lose the ability to create controls and animation using C# controls. Instead you must using javascript (jquery or other library) in exchange. Javascript is not fully OOP, and intepret language (no compile error), making it harder (maybe there is some framework like coffeeScript which I haven't yet explore). In addition, it is harder to make since it will need server request / response activity in between the process, which I will describe later.
Request / Response Client-Server Architecture
This means that most process in client will need to request for the server (request for data to display, request to modify the data, etc). It also means that you lose the ability of control event, even if you use asp.net webform (it still need some tweaks for the event to work). However I assume you already used the WCF so this kind of architecture must be that hard.
Security
Don't keep important information such as password, etc in client (hidden field, javascript variable, etc). The concept should be the same with multitenant client, however in browser, user has free access to debug your webpage.
Concurrent and Multithreading
In browser, it is easier for multitab page and concurrent process will be very highly to occur. Your code must able to handle the multi threading for client side. For server side, you can still use your WCF to handle concurrencies.
My 2 cents.
Obviously the web application has its own challenges. I hope this link can help you in some aspects: http://msdn.microsoft.com/en-us/library/ee658099.aspx
Along with those you need to focus on non-function requirements like extensibility and scalability etc. too.

(Continuous) speech-recognition of limited words in the web-browser

Is there a solution for speech recognition which
Only has a few words (2 is enough, 10 would be cool. 100 is awesome. More isn't needed)
Runs on mobile browsers too (Is it possible to use flash (rather than java) for this?)
Can be installed on your own server. Preferably with PHP+MySQL (if server-side code is required)
I tried searching but I only found actual transcription services (like the Google Voice Search for Android).
An example of such a solution is touchless-timer, which is based on pocketsphinx.js (also mentioned in Nikolay Shmyrev's post). To answer your bullet points:
it supports a simple alarm clock grammar with ~60 words (phrases like "wake me up in five minutes");
I've managed to run it in Chrome Beta 32.0.1700.99 Android 4.1.2 (on Samsung Galaxy S2), it requires a modern Javascript engine, but does not require Flash;
it does not require a server, because speech recognition is done offline in Javascript, and all the required files can be cached using ApplicationCache.
For this application, the grammar was written in Grammatical Framework and automatically converted to the finite state model and dictionary required by pocketsphinx.js. For a simple "MP3 play/pause" grammar you can easily write the FSA directly.
The English acoustic models in this app are not very good, i.e. they might get confused by the MP3 playing in the background. You might be able to improve on that by training better models. However, better models might be larger (e.g. > 20 MB in Javascript) and not fit into memory anymore or just make the app run/load very slowly.
Screenshot of the app running on mobile:
These days you don't even need a server to run speech recognition, you only need a browser which supports Web Audio API (both recent firefox and chrome support it). CMUSphinx now can be executed in javascript in your browser.
For more details see
https://github.com/syl22-00/pocketsphinx.js
http://cmusphinx.sourceforge.net/2013/06/voice-enable-your-website-with-cmusphinx/

how to implement a web site like youtube?

I'm doing a language web site for my university language center, where students login and see videos to learn English. i have to do it like this,
person is logging in to the system, search using a search area and find the details,lessons and videos relevant to that videos. this functionality exactly matches the youtube scenario.
for implementing twitter like functionality we can use status-net, is there a similer library, statusnet like famous implementation for youtube or a some kind of platform or a framework like codeigniter that we can use to implement youtube like site very easily??
please suggest some options?? a open source one or a commercial one ???
and what is the best video format to use in a such web site?? flv?? mp4?? or mov???
regards,
Rangana
Your best option is to use a 'cloud' based video processing service. Most have a sample project / library for many different languages and frameworks. Here is a list of a few I've tried and liked:
http://zencoder.com/
http://transloadit.com/
http://pandastream.com/
The typical steps involve uploading the video files to a large 'cloud' static asset host (such as S3) through the browser. If you are inexperienced it is best to select a processor that provides an uploader (it will handle putting the files in the right spot). Of the three, Transloadit and Panda both have custom unloaders.
Usually the service will allow you to either pass the encoding settings (what formats and qualities to) output to as parameters or configure them in your account. To support all current HTML5 browsers you just need H264 (.m4a) and OGG (.ogv). However, the new trend in the video world is for WEBM (.webm) so you might want to include it as well.
Next you will receive a unique code from the web service that you must store in persistent storage (database). The web service can be configured to 'callback' (perform an HTTP POST or GET request to your service) once the video is encoded.
Once your recieve a callback you can activate your video and start dislpaying it on your pages. For displaying, if you are inexperienced I'd highly recommend you use one of the following players:
http://sublimevideo.net/
http://longtailvideo.com/
http://videojs.com/
They all do similar things for different prices. My current personal favourite is Sublime Video (it offers cool light box effects and a gorgeous player).
Why do you have to re-implement Youtube when you can just use it for hosting your videos for free? Many online e-learning portals (e.g. Khan academy) do exactly that.
As far as the best video format to use -- go read about H.264/AVC. It's what Youtube currently uses.
I think you will not find already built solution ;)
But it's not really that hard. You can use existing frameworks that will make your life easier while you build account management system, the rest shouldn't be really that hard (assuming you don't really want to re-build the whole Youtube ;D ).
For playing videos, you can use JW Player. A great piece of software, you should check it out.

Resources