I'm currently working on a tool allowing me to read all my notifications thanks to the connection to different APIs.
It's working great, but now I would like to put some vocal commands to do some actions.
Like when the software is saying "One mail from Bob", I would like to say "Read it", or "Archive it".
My software is running through a node server, currently I don't have any browser implementation, but it can be a plan.
What is the best way in node JS to enable speech to text?
I've seen a lot of threads on it, but mainly it's using the browser and if possible, I would like to avoid that at the beginning. Is it possible?
Another issue is some software requires the input of a wav file. I don't have any file, I just want my software to be always listening to what I say to react when I say a command.
Do you have any information on how I could do that?
Cheers
Both of the answers here already are good, but what I think you're looking for is Sonus. It takes care of audio encoding and streaming for you. It's always listening offline for a customizable hotword (like Siri or Alexa). You can also trigger listening programmatically. In combination with a module like say, you could enable your example by doing something like:
say.speak('One mail from Bob', function(err) {
Sonus.trigger(sonus, 1) //start listening
});
You can also use different hotwords to handle the subsequent recognized speech in a different way. For instance:
"Notifications. Most recent." and "Send message. How are you today"
Throw that onto a Pi or a CHIP with a microphone on your desk and you have a personal assistant that reads your notifications and reacts to commands.
Simple Example:
https://twitter.com/_evnc/status/811290460174041090
Something a bit more complex:
https://youtu.be/pm0F_WNoe9k?t=20s
Full documentation:
https://github.com/evancohen/sonus/blob/master/docs/API.md
Disclaimer: This is my project :)
To recognize few commands without streaming them to the server you can use node-pocketsphinx module. Available in NPM.
The code to recognize few commands in continuos stream should look like this:
var fs = require('fs');
var ps = require('pocketsphinx').ps;
modeldir = "../../pocketsphinx/model/en-us/"
var config = new ps.Decoder.defaultConfig();
config.setString("-hmm", modeldir + "en-us");
config.setString("-dict", modeldir + "cmudict-en-us.dict");
config.setString("-kws", "keyword list");
var decoder = new ps.Decoder(config);
fs.readFile("../../pocketsphinx/test/data/goforward.raw", function(err, data) {
if (err) throw err;
decoder.startUtt();
decoder.processRaw(data, false, false);
decoder.endUtt();
console.log(decoder.hyp())
});
Instead of readFile you just read the data from microphone and pass it to recognizer. The list of keywords to detect should look like this:
read it /1e-20/
archive it /1e-20/
For more details on spotting with pocketsphinx see Keyword Spotting in Speech and Recognizing multiple keywords using PocketSphinx
To get audio data into your application, you could try a module like microphone, which I haven't used by it looks promising. This could be a way to avoid having to use the browser for audio input.
To do actual speech recognition, you could use the Speech to Text service of IBM Watson Developer Cloud. This service supports a websocket interface, so that you can have a full duplex service, piping audio data to the cloud and getting back the resulting transcription. You may want to consider implementing a form of onset detection in order to avoid transmitting a lot of (relative) silence to the service - that way, you can stay within the free tier.
There is also a text-to-speech service, but it sounds like you have a solution already for that part of your tool.
Disclosure: I am an evangelist for IBM Watson.
Related
I hope I'm saying this correctly. What I'm trying to do is write to a json file using fs.writeFile.
I can get it to work using the command line but what I want to do is call a function maybe a button click to update the json file.
I figure I would need some type of call to the node server which is local port 8080. I was researching and seen somebody mention using .post but still can't wrap my head around how to write the logic.
$(".button").on("click", function(event) {
fs.writeFile("./updateme.json", "{test: 1}", function(err) {
if(err) {
return console.log(err);
}
console.log("The file was saved!");
});
});
Using jQuery along with fs? Wow that could be great! Unfortunately that is not as simple as that!
Let me introduce you to server-side VS client-side JavaScript. Well actually there are a lot of resources on the net about that - just google it, or check the answers to this other StackOverflow question. Basically JavaScript can run either on a browser (Chrome, Mozilla...) or as a program (usually a server written in NodeJS), and while the language is (almost) the same, both platforms don't have the same features.
The script that you're showing should run in a browser, because it's using jQuery and interacting with buttons and stuff (aka the DOM). Can you imagine what a mess it would be if that script could interact with the file system? Any page you'll visit will be able to crawl around in your holiday pictures and other personal stuff you keep on your computer. Bad idea! That is why some libraries like fs are not available in the browser.
Similarly, some libraries like jQuery are not available (or simply useless) in the server, because there is no HTML and user interaction, only headless programs running.
So, what can I do to write a JSON file after a user clicks on a button?
You can set up:
A NodeJS server that will write a JSON file
Make jQuery call this server with the data to be written after the user clicks on a button
If you want further guidelines on this, tell me in the comments! I'll be ready to edit my question so as to include instructions on setting up such an environment.
I am creating an application which would watch a file and fetch the contents from that file (similar to tail but with the possibility of paging in previous data as well). I read up on quite a few solutions ranging from spawning a new process to getting only the updated bytes of the file but I am still a little confused on a few parts.
What I want to do exactly is the following:
Watch a file and trigger an event/callback whenever new data comes into the file
Read this new data from the file and efficiently send it to a client. Using a websocket or something else. (suggest a good way to do this please)
At the client end, take this data and display it to user and keep updating it with new data as it comes
If the user requests older data a way to fetch that data from the file we are watching
I am looking for efficient solutions for the above sub problems and any suggestions for a better approach are also welcome.
FYI I am new to nodejs so verbosity in your solutions would be highly appreciated.
Watch for changes
Suggest you look at chokidar, it is an optimized implementation of fs.watch, fs.events, the native node.js libraries.
// Initialize watcher.
const watcher = chokidar.watch('some/directory/**/*.xml', config);
// Add event listeners.
watcher
.on('add', path => log(`File ${path} has been added`))
.on('change', path => log(`File ${path} has been changed`))
.on('unlink', path => log(`File ${path} has been removed`));
To get the changed value
Here you can look at diff module. And you will need to store the state of the previous and current files. In order to build the changes.
To notify the client
You will need to create a websocket server, recommend you to use socket.io and then in your application you will create the diff and send a websocket message to the server. The server will notify/broadcast the message to the needed clients.
I am developing an Alexa Skill, and I am struggling a bit in understanding if I setup everything in the best way possible to debug while developing.
Right now I am developing locally using Node.js, uploading to the cloud when ready, and testing all the responses to intents using the Service Simulator in the Test section of the developer console.
I find the process a bit slow but working... But still, I have two questions:
1) Is there a way of avoiding the process of uploading to the cloud?
And mostly important 2) How do I test advanced interactions, for examples multi-step ones, in the console? How for example to test triggering the response to an intent, but then asking the user for confirmation (Yes/No)? Right now the only way of doing it is using the actual device.
Any improvement is highly appreciated
Like #Tom suggested - take a look at bespoken.tools for testing skills locally.
Also, the Alexa Command Line Interface was recently released and it has some command line options you might look into.
For example, the 'api invoke-skill' command lets you invoke the skill locally via the command line (or script) so you don't have to use the service simulator. Like this...
$ask api invoke-skill -s $SKILL_ID -f $JSON --endpoint-region $REGION --debug
Here is a quick video I did that introduces the ASK CLI. It doesn't specifically cover testing but it will provide a quick intro.
https://youtu.be/p-zlSdixCZ4
Hope that helps.
EDIT: Had another thought for testing locally. If you're using node and Lambda functions, you can call the index.js file from another local .js file (example: test.js) and pass in the event data and context. Here is an example:
//path to the Lambda index.js file
var lambdaFunction = require('../lambda/custom/index.js');
// json representing the event - just copy from the service simulator
var event = require('./events/GetUpdateByName.json');
var context = {
'succeed': function (data) {
console.log(JSON.stringify(data, null,'\t') );
},
'fail': function (err) {
console.log('context.fail occurred');
console.log(JSON.stringify(err, null,'\t') );
}
};
function callback(error, data) {
if(error) {
console.log('error: ' + error);
} else {
console.log(data);
}
}
// call the lambda function
lambdaFunction.handler (event, context, callback);
Here's how I'm testing multi-step interactions locally.
I'm using a 3rd party, free, tool called BSTAlexa:
http://docs.bespoken.tools/en/latest/api/classes/bstalexa.html
It emulates Amazon's role in accepting requests, feeding them to your skill, and maintaining the state of the interactions.
So I start my test script by configuring BSTAlexa - pointing it to my skill config (eg. intents) and to a local instance of my skill (in my case I'm giving it a local URL).
Then I feed BSTAlexa a sequence of textual requests and verify that I'm getting back the expected responses. And I put all this in a Mocha script.
It works quite well.
Please find answers (Answering in reverse order),
You can test multiple steps using simulator (echosim.io) but each time you have to press and hold Mic button (Or hold on space bar). Say for example first you are asking something to Alexa with echosim and alexa responding to confirm'yes/no' then you have to press and hold mic button again to respond to confirm it.
You can automate the lambda deployment process. Please see the link,
http://docs.aws.amazon.com/lambda/latest/dg/automating-deployment.html
It would be good to write complete unit tests so that you can test your logic before uploading Lambda. Also it will help to reduce the number of Lambda deployments
I have Clarion 9 app that I want to be able to communicate with HTTP servers. I come from PHP background. I have 0 idea on what to do.
What I wish to be able to do:
Parse JSON data and convert QUEUE data to JSON [Done]
Have a global variable like 'baseURL' that points to e.g. http://localhost.com [Done]
Call functions such apiConnection.get('/users') would return me the contents of the page. [I'm stuck here]
apiConnection.post('/users', myQueueData) would POST myQueueData contents.
I tried using winhttp.dll by reading it from LibMaker but it didn't read it. Instead, I'm now using wininet.dll which LibMaker successfully created a .lib file for it.
I'm currently using the PROTOTYPE procedures from this code on GitHub https://gist.github.com/ddur/34033ed1392cdce1253c
What I did was include them like:
SimpleApi.clw
PROGRAM
INCLUDE('winInet.equ')
ApiLog QUEUE, PRE(log)
LogTitle STRING(10)
LogMessage STRING(50)
END
MAP
INCLUDE('winInetMap.clw')
END
INCLUDE('equates.clw'),ONCE
INCLUDE('DreamyConnection.inc'),ONCE
ApiConnection DreamyConnection
CODE
IF DreamyConnection.initiateConnection('http://localhost')
ELSE
log:LogTitle = 'Info'
log:LogMessage = 'Failed'
ADD(apiLog)
END
But the buffer that winInet's that uses always returns 0.
I have created a GitHub repository https://github.com/spacemudd/clarion-api with all the code to look at.
I'm really lost in this because I can't find proper documentation of Clarion.
I do not want a paid solution.
It kind of depends which version of Clarion you have.
Starting around v9 they added ClaRunExt which provides this kind of functionality via .NET Interop.
From the help:
Use HTTP or HTTPS to download web pages, or any other type of file. You can also post form data to web servers. Very easy way to send HTTP web requests (and receive responses) to Web Servers, REST Web Services, or standard Web Services, with the most commonly used HTTP verbs; POST, GET, PUT, and DELETE.
Otherwise, search the LibSrc\ directory for "http" and you will get an idea of what is already there. abapi.inc for example, appears to provide a wrapper around wininet.lib.
We are designing an Azure Website which will allow users to Upload content(MP4,Docx...MSOffice Files) which can then be accessed.
Some video content we will encode to provide several differing quality formats, before it will be streamed (using Azure Media Services).
We need to add an intermediate step so we can scan uploaded files for potential virus risk. Is there functionality built into azure (or third party) which will allow us to call an API to scan content before processing it? We are ideally looking for an API rather than just a background service on a VM, so we can get feedback potentially for use in a web or worker role.
Had a quick look at Symantec Endpoint and Windows Defender but not sure these offer an API
I have successfully done this using the open source ClamAV. You don't specify what languages you are using, but as it's Azure I'll assume .Net.
There is a .Net wrapper that should provide the API that you are looking for:
https://github.com/tekmaven/nClam
Here is some sample code (note: this is copied directly from the nClam GitHub repo page and reproduced here just to protect against link rot)
using System;
using System.Linq;
using nClam;
class Program
{
static void Main(string[] args)
{
var clam = new ClamClient("localhost", 3310);
var scanResult = clam.ScanFileOnServer("C:\\test.txt"); //any file you would like!
switch(scanResult.Result)
{
case ClamScanResults.Clean:
Console.WriteLine("The file is clean!");
break;
case ClamScanResults.VirusDetected:
Console.WriteLine("Virus Found!");
Console.WriteLine("Virus name: {0}", scanResult.InfectedFiles.First().VirusName);
break;
case ClamScanResults.Error:
Console.WriteLine("Woah an error occured! Error: {0}", scanResult.RawResult);
break;
}
}
}
There are also APIs available for refreshing the virus definition database. All the necessary ClamAV files can be included in the deployment package and any configuration can be put into the service start-up code.
ClamAV is a good idea, specially now that 0.99 is about to be released with YARA rule support - it will make it really easy for you to write custom rules and allow clamav to use tons of good YARA rules in the open today.
Another route, and a bit of shameless plugging, is to check out scanii.com, it's a SaaS for malware/virus detection and it integrates quite nicely with AWS and Azures.
There are a number of options to achieve this:
Firstly you can use ClamAV as already mentioned. ClamAV doesn't always receive the best press for its virus databases but as others have pointed out it's easy to use and is expandable.
You can also install a commercial scanner, such as avg, kaspersky etc. Many of these come with a C API that you can talk to directly, although often getting access to this can be expensive from a licensing point of view.
Alternatively you can make calls to the executable directly using something like the following to capture the output:
var proc = new Process {
StartInfo = new ProcessStartInfo {
FileName = "scanner.exe",
Arguments = "arguments needed",
UseShellExecute = false,
RedirectStandardOutput = true,
CreateNoWindow = true
}
};
proc.Start();
while (!proc.StandardOutput.EndOfStream) {
string line = proc.StandardOutput.ReadLine();
}
You would then need to parse the output to get the result and use it within your application.
Finally, now there are some commercial APIs available to do this kind of thing such as attachmentscanner (disclaimer I'm related to this product) or scanii. These will provide you with an API and a more scalable option to scan specific files and receive the response from at least one virus checking engine.
New thing coming Spring / Summer 2020. Advanced threat protection for Azure Storage includes Malware Reputation Screening, which detects malware uploads using hash reputation analysis leveraging the power of Microsoft Threat Intelligence, which includes hashes for Viruses, Trojans, Spyware and Ransomware. Note: cannot guarantee every malware will be detected using hash reputation analysis technique.
https://techcommunity.microsoft.com/t5/Azure-Security-Center/Validating-ATP-for-Azure-Storage-Detections-in-Azure-Security/ba-p/1068131