How to implement activation phrase like "Hey Cortana" in SpeechRecognizer? - win-universal-app

In the SpeechAndTTS samples in Universal Windows demo apps (link), even the continuous dictation examples requires the user to click on a button to start the recognizer.
So my question is how can we implement an always listening SpeechRecognizer? Activated when hearing something like "Hey Cortana" or "Okay Google".
The closest thing I can think of is
place a SpeechRecognitionListConstraint on the speechRecoginzer, which only listen to the "wake up word" ("Hey Cortana" for example)
In the ResultGenerated event handler, check if "Hey Cortana" is heard with medium/high confidence. If "Hey Cortana" is not heard, use speechRecognizer.CompileConstraintsAsync() to force the recognizer to listen again.
In the Completed event handler, use speechRecognizer.CompileConstraintsAsync() to force the recognizer to listen again.
Another thing I have checked is the Timeouts on the speechRecognizer. https://msdn.microsoft.com/en-us/library/windows.media.speechrecognition.speechrecognizertimeouts.aspx
But it appears we cannot have infinite InitialSilenceTimeout.
So, is there a straight-forward way to have a speechRecognizer that does not stop listening until the "wake up phrase" is heard?

So my question is how can we implement an always listening SpeechRecognizer? Activated when hearing something like "Hey Cortana" or "Okay Google".
As we know, we can't implement Cortana into our App when the app is already running in the foreground, we need to use SpeechRecognition. But we can done this job using Continuous dictation.
even the continuous dictation examples requires the user to click on a button to start the recognizer.
Yes, but this is because await speechRecognizer.ContinuousRecognitionSession.StartAsync() is in the button click event, the session start work in this event. To make it start without button click event, you can start this session in the OnNavigateTo method of the Page and stop this session in the OnNavigateFrom method. And of course you can stop this session when "wake up phrase" is heard.
I agree with your mind that you can force it to listen in the Completed event, but I prefer to use speechRecognizer.ContinuousRecognitionSession.StartAsync() like this in the SpeechContinuousRecognitionSession.Completed | completed event:
if (args.Status != SpeechRecognitionResultStatus.Success)
{
if (args.Status == SpeechRecognitionResultStatus.TimeoutExceeded)
{
await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () =>
{
// Show the state on UI
});
await speechRecognizer.ContinuousRecognitionSession.StartAsync();
}
...
}
And for the time limit, I just tested, by default the Continuous dictation will last about 5s without any voice at first and then went to the state time out. And I also tested to set the time like this:
speechRecognizer.Timeouts.InitialSilenceTimeout = TimeSpan.FromSeconds(10.0);
It worked by my side.
Actually for your scenario, you can refer to a official video: Cortana and Speech Platform In Depth. In the sample of this video, it listens to two sentences: "take a note" and "save trip".

Related

How can I get who paused the video in Youtube API? (with Socket.io)

Basically, I'm challenging myself to build something similar to watch2gether, where you can watch youtube videos simultaneously through the Youtube API and Socket.io.
My problem is that there's no way to check if the video has been paused other than utilizing the 'onStateChange' event of the Youtube API.
But since I cannot listen to the CLICK itself rather than the actual pause EVENT, when I emit a pause command and broadcast it via socket, when the player pauses in other sockets, it will fire the event again, and thus I'm not able to track who clicked pause first NOR prevent the pauses from looping.
This is what I currently have:
// CLIENT SIDE
// onStateChange event
function YtStateChange(event) {
if(event.data == YT.PlayerState.PAUSED) {
socket.emit('pausevideo', $user); // I'm passing the current user for future implementations
}
// (...) other states
}
// SERVER SIDE
socket.on('pausevideo', user => {
io.emit('smsg', `${user} paused the video`)
socket.broadcast.emit('pausevideo'); // Here I'm using broadcast to send the pause to all sockets beside the one who first clicked pause, since it already paused from interacting with the iframe
});
// CLIENT SIDE
socket.on('pausevideo', () => {
ytplayer.pauseVideo(); // The problem here is, once it pauses the video, onStateChange obviously fires again and results in an infinite ammount of pauses (as long as theres more than one user in the room)
});
The only possible solution I've thought of is to use a different PLAY/PAUSE button other than the actual Youtube player on the iframe to catch the click events and from there pause the player, but I know countless websites that uses the plain iframe and catch these kind of events, but I couldn't find a way to do it with my current knowledge.
If the goal here is to be able to ignore a YT.PlayerState.PAUSED event when it is specifically caused by you earlier calling ytplayer.pauseVideo(), then you can do that by recording a timestamp when you call ytplayer.pauseVideo() and then checking that timestamp when you get a YT.PlayerState.PAUSED event to see if that paused event was occurring because you just called ytplayer.pauseVideo().
The general concept is like this:
let pauseTime = 0;
const kPauseIgnoreTime = 250; // experiment with what this value should be
// CLIENT SIDE
// onStateChange event
function YtStateChange(event) {
if(event.data == YT.PlayerState.PAUSED) {
// only send pausevideo message if this pause wasn't caused by
// our own call to .pauseVideo()
if (Date.now() - pauseTime > kPauseIgnoreTime) {
socket.emit('pausevideo', $user); // I'm passing the current user for future implementations
}
}
// (...) other states
}
// CLIENT SIDE
socket.on('pausevideo', () => {
pauseTime = Date.now();
ytplayer.pauseVideo();
});
If you have more than one of these in your page, then (rather than a variable like this) you can store the pauseTime on a relevant DOM element related to which player the event is associated with.
You can do some experimentation to see what value is best for kPauseIgnoreTime. It needs to be large enough so that any YT.PlayerState.PAUSED event cause by you specifically calling ytplayer.pauseVideo() is detected, but not so long that it catches a case where someone might be pausing, then unpausing relatively soon after.
I actually found a solution while working around what that guy answered, I'm gonna be posting it in here in case anyone gets stuck with the same problem and ends up here.
Since socket.broadcast.emit doesn't emit to itself, I created a bool ignorePause and made it to be true only when the client received the pause request.
Then I only emit the socket if the pause request wasn't already broadcasted and thus received, and if so, the emit is ignored and the bool is set to false again in case this client/socket pauses the video afterwards.

How to leave a socket room with vue-socket and rejoin without duplicate messages?

When I join the room, and then leave the route and go back, and then use the chat I've built, I get double messages of * amount of messages as many times I left and rejoined.
This problem goes away when I hard refresh.
I've tried everything I could find thus far, and have been unable to get it to work.
I tried on the client side, during beforeRouteLeave, beforeDestroy and window.onbeforeunload
this.$socket.removeListener("insertListener"); --> tried with all
this.$socket = null
this.$socket.connected = false
this.$socket.disconnected = true
this.$socket.removeAllListeners()
this.$socket.disconnect()
During the same events, I also sent a this.$socket.emit("leaveChat", roomId) and then on the server side tried the following inside the io.on("connection") receiver socket.on("leaveChat", function(roomId) {}):
socket.leave(roomId) --> this is what should according to docs work;
socket.disconnect()
socket.off() -- seems to be deprecated
socket.removeAllListeners(roomId)
There were a bunch of other things I tried that I can't remember but will update the post if I do.
Either it somehow disconnects and upon rejoining, previous listeners or something is still remaining, meaning all the messages are received * times rejoin. OR, if I disconnect, I don't seem to be able to reconnect.
On joining, I emit to server the room id and use socket.join(roomId).
All I want to do, is without refresh, when I leave the page, before that happens, the user can leave the room and when they go back, they get to rejoin, with no duplicate messages occurring.
I am currently trying to chew through the source code now.
Full disclosure here, I didn't read the full response posed by roberfoenix, but this is a common issue with socket.io and it comes down to calling the 'on' event multiple times.
When you create an .on event for your socket its a bind, and you can bind multiple times to the same event.
My assumption is, when a users hits a page you run something like
socket.on("joinRoom", data)
This in turn will say join the room, pull your messages from Mongo(or something else) and then emit to the room (side note, using .once on can help so you don't emit to every users when a user joings a room)
Now you leave the room, call socket.emit('leaveRoom',room), cool you left the room, then you go back into the room, guess what you now just binded to the same on event again, so when you emit, it emits two times to that user etc etc.
The way we addressed this is to place all our on-events into a function and call the function once. So, a user joins a page this will run the function like socketInit();
The socketInit function will have something like this
function socketInit(){
if (init === false){
//Cool it has not run, we will bind our on events
socket.on("event")
socket.on("otherEvent")
init = true;
}
}
Basically the init is a global variable, if is false, bind your events, otherwise don't rebind.
This can be improved to use a promis or could be done on connect but if a users reconnects it may run again.
If you're using Vue-Socket and feel like going slightly mad having tried everything, this may be your solution.
Turns out challenging core assumptions and investigating from the ground up pays off. It is possible that you forgot yourself so deeply in Socket.io, that you forgot you were using Vue-Socket.
The solution in my case was using Vue-Socket's built in unsubscribe function.
With Vue-Socket, one of the ways you can initially subscribe to events is as follows:
this.sockets.subscribe('EVENT_NAME', (data) => {
this.msg = data.message;
});
Because you're using Vue Socket, not the regular one, you also need to use Vue Socket's way for unsubscribing right before you leave the room (unless you were looking for a very custom solution). This is why I suspect many of the other things I tried didn't work and did next to nothing!
The way you do that is as follows:
this.sockets.unsubscribe('EVENT_NAME');
Do that for any events causing you trouble in the form of duplicates. The reason you'd be getting duplicates in the first place, especially upon rejoining post leaving a room, is because the previous event listeners were still running, and now the singular user would be playing the role of as if two or more listeners.
An alternative possibility is that you're emitting the message to everyone, including the original sender, when you should most likely be emitting it to everyone else except the sender (check here for socket.io emit cheatsheet).
If the above doesn't solve it for you, then make sure you're actually leaving the room, and doing so server-side. You can accomplish that through emitting a signal to the server right before leaving the route (in case you're using a reactive single page application), receiving it server side, and calling 'socket.leave(yourRoomName)' inside your io.on("connection", function(socket) {}) instance.

Is it possible for Alexa to wait in a skill without directly awaiting user input?

I realize the question may be badly phrased, but that is the best I could come up with.
My issue is that I use Alexa in a scenario where I sporadically give an Alexa skill commands (say every few minutes), and I don't want to have to re-invoke that skill every time.
Currently, after I give a command, Alexa replies that she is performing that action, but at the end she expects new user input via:
this.emit(':responseReady');
However, that isn't quite what I want, since most of the time I don't immediately want to give another command. Instead, a few minutes later, I will want to interact with that skill again.
If I completely exit the skill, though, I will have to re-invoke it next time, and get the whole skill welcome message again ("Welcome to skill name. You can say..."). I don't see that as optimal either.
Is there a way to keep that skill "open/active" so that the next command I give is interpreted in the context of that skill, without having to emit :responseReady (which expects an immediate response) and without having to relaunch the skill ("Alexa, open skill name")?
I figured out the most simple way to wait for a user response. In your reply you can add some silent audio. User can always interrupt this by saying "Alexa, [intent]", which will trigger a specific intent for your app.
You need to know that an audio should be less than 90 seconds. I used 80 seconds mp3. Grouping multiple audios in one reply does not help. The best you can do is to emit two replies, which will give you about 160 seconds of waiting time.
Here is an example code that I use in my skill:
// Get silence to wait for a user input.
var silence = '<audio src="' + PATHS.SILENCE_80_SEC + '" />';
var reply = 'Please go to <emphasis level="moderate">www.plumhead.xyz/pair</emphasis> and tell me your unique pairing code. Again, go to <emphasis level="moderate">www.plumhead.xyz/pair</emphasis> to get a pairing code. When you are ready say: "Alexa, and your pairing code".' + silence;
alexa.emit(':ask', reply, reply);
To generate silent mp3 for Alexa you can use ffmpeg. Next commands generate 80 seconds mp3 and then convert it to the format that Alexa can accept:
ffmpeg -f lavfi -i anullsrc=r=44100:cl=mono -t 80 -q:a 9 -acodec libmp3lame out.mp3
ffmpeg -y -i out.mp3 -ar 16000 -ab 48k -codec:a libmp3lame -ac 1 silence-80-sec.mp3
You could do this in a hacky way by enabling streaming audio in your skill. Essentially, you would give your response then tack on a silent (or not silent) audio track to keep your skill open.
The user will need to give the wake word to interrupt the audio and you will need to say the next thing before the audio runs out, otherwise you will need to open the skill again.
Enabling audio does come with some caveats, in particular you need to handle all of the associated built in intents. See
https://developer.amazon.com/blogs/post/Tx1DSINBM8LUNHY/New-Alexa-Skills-Kit-ASK-Feature-Audio-Streaming-in-Alexa-Skills
Keeping your skill open infinitely is not supported now and I do not think it will ever be supported in the future due to the security concerns involved in allowing a skill to listen in on the user all the time.
But, if your problem is "user getting the whole welcome message again on coming back to the skill", you can build a better experience by maintaining the last performed action of the user in a database. You can use the user id that is sent as part of the request to identify each user. So the next time an user comes back to the skill, you can use the database to identify the context and provide an appropriate response. And of course to make this work you will need to update the database for each action performed by the user.
In nodejs you will get the user id value in this.event.session.user.userId.
Here is a sample code snippet to showcase the welcome message logic,
"LaunchRequest": function () {
var userId = this.event.session.user.userId;
dbHelper.getUser(userId, function (response) {
if (response && response.lastPerformedAction) {
//respond based on context
}
else {
//respond with usual welcome message
}
});
}

Lync 2013 SDK - Join Conference & Connect AVModality when "Join meeting audio from" setting set to "Do not join audio"

I'm rather new to the Lync 2013 SDK (been using it for a couple weeks now) and have been able to figure out mostly everything I need except for this...
When I'm joining a conference (using ConversationManager.JoinConference()) it joins fine. However, in certain cases (not all), I want to then connect the AVModality on the conference. Sometimes it works, sometimes it just sits in "Connecting" and never connects (even though I've called EndConnect).
What I've found is the setting in Skype's Options -> Skype Meetings -> Joining conference calls section, seems to override my code. Maybe a race condition?
When the setting is "Do not join audio" and "Before I join meetings, ask me which audio device I want to use" is NOT CHECKED (meaning I get no prompt when joining): the conference joins, the AVModality goes Disconnected -> Connecting -> Disconnected. Then my code triggers a BeginConnect and the AVModality goes Disconnected -> Connecting - and never resolves (sometimes I get a fast busy tone audio sound).
When the "Before I join meetings, ask me which audio device I want to use" IS CHECKED (meaning I get the prompt): the conference joins, the prompt asks how to connect, if I select Skype for business - it connects audio fine (expected). Interestingly, if I hang up the call using the Lync UI (AVModality goes to Disconnected), it then immediately connects back again (assuming my BeginConnect doing this).
Here's where it gets really convoluted:
If I call BeginConnect when the state is Connecting on the AVmodality within the ModalityStateChanged event handler... the following happens:
Conference joins, prompt asks me how to connect (AVmodality state is "Connecting" at this point until a decision is made on the prompt) - this means my BeginConnect fires. Then if I choose "Do not join audio" in the prompt... the AVModality status goes Connecting -> Disconnected -> Connecting -> Joining -> Connected. So - my BeginConnect is already in progress and still works in this case so long as it fires BEFORE the selection of "Do not join audio".
So I'm wondering if the "Do not join audio" selection (whether with or without the prompt) actually sets some other properties on something which prevents the AVModality from being connected after that point without doing some additional hocus pocus? If so - I'd like to know the additional hocus pocus I need to perform :)
Thanks for any and all help!
It's come down to this... whether the conference joining does join the audio or not - I've handled every scenario except one, which I still can't figure out:
1. I need the conference audio to be joined, but the user selects to NOT join the audio (either on the prompt, or from the Skype options settings).
In this case - I have added an event handler to the modality state change event, and when the NewState == Disconnected, I trigger a BeginConnect on the modality itself. This works fine. Within the callback, I have the EndConnect call. However - the AVModality state continues to stay in "Connecting" and never resolves to being connected. On the UI - it shows the audio buttons, but all grayed out (like normal when it's connecting). I'm not sure how to make it finish connecting?
Here's a snippet of code:
if (merge)
{
myHandler = delegate (object sender1, ModalityStateChangedEventArgs e1)
{
AVModality avModality = (AVModality)sender1;
Globals.ThisAddIn.confConvo = avModality.Conversation;
if (e1.NewState == ModalityState.Connected)
{
DialNumberInSkype(meetingInfo);
avModality.ModalityStateChanged -= myHandler;
}
if (e1.NewState == ModalityState.Disconnected)
{
object[] asyncState = { avModality, "CONNECT" };
avModality.BeginConnect((ar) =>
{
avModality.EndConnect(ar);
DialNumberInSkype(meetingInfo);
}, asyncState);
avModality.ModalityStateChanged -= myHandler;
}
};
}
EDIT:
For some reason, I'm not able to add a comment right now...
I tried setting the endpoint as you suggested. However, I get an ArgumentException error "Value does not fall within the expected range." So I tried hardcoding the uri value in the CreateContactEndpoint to "sip:my_login#domain.com" (except real value of course) - and got the same ArgumentException error. I added a breakpoint before this and was able to see the value for the avModality.Endpoint - and it is actually set to me the entire time... it's not null or unset when I'm trying to call BeginConnect.
When JoinConference() is invoked audio modality will be connected even without explicitly invoking BeginConnect().
When prompt asking for audio device selection is shown(when ask before join option is set in skype) conversation property ConferenceEscalationProgress will be having value AwaitingJoinDialogResponse.
Setting conversation property ConferenceJoinDialogCompleted as true will initiate Modality connection even though the prompt is not closed.
Edited
If do not join audio is selected, modality will be disconnected, at this point you are trying to invoke BeginConnect(). Try setting modality endpoint before invoking BeginConnect().
conversation.Modalities[ModalityTypes.AudioVideo].Endpoint = lyncClient.Self.Contact.CreateContactEndpoint(lyncClient.Self.Contact.Uri);

How to monitor keyboard events from X11

I know there has been a few of these, but a lot of the answers always to have a lot of buts, ifs, and you shouldn't do that.
What I'm trying to do is have a background program that can monitor the keyboard events from X11. This is on an embedded device, and it will have a main app basically running in something like a kiosk mode. We want to have a background app that manages a few things, and probably a back doors hook. But this app generally will not have focus.
I can't use the main app, because its partially there for a fail safe if the main app ever fails, or to do some dev type things to bypass the main app.
The best question I found is a few years old, so I'm not sure how up to date it is. This was extremely easy to do with windows.
X KeyPress/Release events capturing irrespective of Window in focus
The correct way for doing that is using Xlib. Using this library you can write code like this:
while (1) {
XNextEvent(display, &report);
switch (report.type) {
case KeyPress:
if (XLookupKeysym(&report.xkey, 0) == XK_space) {
fprintf (stdout, "The space bar was pressed.\n");
}
break;
}
}
// This event loop is rather simple. It only checks for an expose event.
// XNextEvent waits for an event to occur. You can use other methods to get events,
// which are documented in the manual page for XNextEvent.
// Now you will learn how to check if an event is a certain key being pressed.
// The first step is to put case KeyPress: in your switch for report.type.
// Place it in a similar manner as case Expose.
Also you could use poll or select on the special device file that is mapped to your keyboard. In my case is /dev/input/event1.
If you have doubts about what's the special file mapped to your keyborad, read the file /var/log/Xorg.0.log (search for the word keyboard).
Here you have another link of interest: Linux keyboard event capturing /dev/inputX

Resources