everyone, I am learning speech recognition in python and I am quite interesting whether it can be used offline. I mean, we use:
rec = sr.Recognizer()
with sr.Microphone() as source:
audio = rec.listen(source)
said = rec.recognize_google(audio)
print(said)
to recognize our speech, however recognise_google() doesn't work without internet connection. Is there any other way, that works offline? I'll be grateful if someone helps...
I am assuming that you're using the python speech recognition library. This library can be used with CMU Sphinx, which works offline.
The library pocketsphinx is setup to work offline by default, so might be a good choice if youre just getting started.
Related
I am making an Ai assistant using python's Tensorflow module. Now I am trying to make a voice for my Ai assistant. Like Google assistant, Cortana, Siri all of them has their own voice. But I don't know how to make an artificial voice. I searched the web but not getting any helpful answer.
Can someone please tell me a way of making a artificial voice or just the methods I should look for. I don't know what this process is called. Probably that's why I can't find any answer on the web. It would be nice if someone please help me!
The easiest way to add a voice to your AI assistant is to use a text-to-speech library like:
pyttsx3
gTTS
Google's text-to-speech
If you want to add your own voice, you could use deep learning for that, like in:
Real-Time-Voice-Cloning
more approaches in this article
I have been struggling for a while and have been looking through many examples on how to enable the mic in a browser with Node.js. I have seen several Javascript examples but, I can't get the spoken content out of them and store it in variables. How can I enable the mic using Node.js? Will I need a specific npm package? I am currently working with the IBM Watson Speech to Text api. Any help is appreciated! Thanks in advance!
You will need to enable the mic in the browser using a client side library.
Use the Speech-to-Text SDK here:
https://github.com/watson-developer-cloud/speech-javascript-sdk
And a working example here:
https://watson-speech.mybluemix.net/microphone-streaming.html
Please be aware that streaming microphone will not work on any version of Safari. You will need to use FireFox, Chrome or IE to use streaming microphone into Watson Speech to Text. There's a YouTube tutorial on building a simple Bluemix App using Speech to Text here: (see Chapter 3) Youtube TutorialThe supporting code is in a public git repo here: Zero To Cognitive Repo
I am making a game where I want to command the AI using word i speak.
Say for example I can say go and AI bot goes to certain distance.
Question is I am finding asset and no provider is giving me grantee that it is possible ?
What are the difficulties for doing it?
I am programmer so if some one suggest the way to handle it I can do it.
Should I make mic listener on all the time and read audio and then pass audio to some external sdk which can convert my voice to text ?
these are the asset provider i have contacted.
https://www.assetstore.unity3d.com/en/#!/content/73036
https://www.assetstore.unity3d.com/en/#!/content/45168
https://www.assetstore.unity3d.com/en/#!/content/47520
and few more !
If someone just explains the steps I need to follow then I can try it for sure.
I am currently using this external api for pretty much the same thing: https://api.ai/
It comes with a unity SDK that works quite well:
https://github.com/api-ai/api-ai-unity-sample#apiai-unity-plugin
You have to connect a audio source to the sdk, and tell it to start listening. It will then convert your voice audio to text, and even detect pre-selected intentions from your voice audio / text.
You can find all steps on how to integrate the unity plugin in the api.ai Unity SDK documentation on github.
EDIT: It's free too btw :)
If you want to recognize offline without sending data to the server, you need to try this plugin:
https://github.com/dimixar/unity3DPocketSphinx-android-lib
It uses open source speech recognition engine CMUSphinx
The jarvis application that is currently developed, is in English. I want to customize it to use local language. How to develop this kind of app for local languages? what kind of programming languages I must know to proceed to the development? I have tested the english version of the jarvis, it works well for me. How to attach the c# with HTK for the purpose of the development?
How to develop this kind of app for local languages?
You don't need to develop from scratch, take existing software and build on it. For example you can consider https://github.com/jasperproject/jasper-client, it's pretty actively developed.
what kind of programming languages I must know to proceed to the development?
Most NLP libraries are in Python or Java. You also need shell scripting (awk/perl) experience because often models are built with Linux tools.
For speech recognition it's easiest to use CMUSphinx, the tutorial to add your language to CMUSphinx is at http://cmusphinx.sourceforge.net/wiki/tutorialam.
I have tested the english version of the jarvis, it works well for me. How to attach the c# with HTK for the purpose of the development?
There are many ways for interoperability:
1) C# can invoke HTK tools as binaries through Process.Start http://msdn.microsoft.com/en-us/library/system.diagnostics.process.start(v=vs.110).aspx
2) You can build a library from HTK and invoke it with PInvoke through interop framework
3) You can build a TCP or HTTP server with HTK tools and connect to this server from C# application to get speech recognition results.
Overall, you could probably use existing solutions like mentioned above, they have all hard things implemented, you only need to configure your local language.
I would suggest you to go for HTK or if you have lots of training data then go for kaldi one of the best toolkit for speech recognition for local language which uses deep learning.
I'm trying to develop the application based on native audio in gingerbread,
I executed the sample native audio program under the NDK ,but I'm not clear with
that. I need some example to learn how to use the openSL library.
Can any one suggest an example of open SL|ES based code ?
OpenSL ES documentation and that sample app are the best resources that are out there. Not to say that they're great, but they are definitely sufficient provided that you have the knowledge of object-oriented programming and audio. If you don't, those are the things you should look into first.