Google actions sdk 2 nodejs response / chat bubble limit - node.js

I am using the Google-actions-sdk v2 and trying to build a gaming application. In the documentation it says conv.ask() is limited to 2 responses per turn. So this basically means I can only show 2 chat bubbles then it will not allow me to display more until after user input. But when I look at some other published applications they have many more then 2 in a row displayed. I can't seem to understand or find any info on how they can get around this limitation. 2 seems a unreasonable limit.
For speech you can merge text lines together and it will sound fine, but presentation on screen is awful without being able to break it down to more responses.
Does anyone out there have any insight on this?

In fact, everything in a single line would sound bad. Why don't you try to separate the necessary texts with the help of the SSML library, I recommend it to you.
You can use the break tag to put a pause between each text.
<speak>
I can pause <break time="3s"/>.
I can pause by second time <break time="3s"/>.
</speak>
Here you have the documentation.
Now if what you want to give is multiple selection options, you can also use the suggestion chip.
https://developers.google.com/actions/assistant/responses#suggestion_chip

Related

Cognitive Service show Fill words and hide personal data

We use the Azure Batch Transcription Service to get the Transcript of an Audio / Speech.
In here we noticed, that sometimes filler words like "uhm", "hm" or something similar are included, but very rarely - also as we used this service for a few months already and we have the feeling as if it "got less" (so less "uhm"s in the transcript)
Q1: Is there a way to get the fill words? We want to recieve them within the transcript.
Also, as we sometimes record conversations it can happen that someone says a name or is talking about other personal information.
Q2: Is there a way to "filter" those personal information / words within the transcript?
Sorry, I don't think there is a way to filter personal data/ word when translate. We only can do profanity Filter for batch transcription.
But I agree this feature will be very helpful. I will forward this feature request to product group to see if we can have this in the feature.
Thing I will suggest is to optimize the transcription as last to filter the sensitive information.
Regards,
Yutong

Having several GoogleResponses in a row without user input or interaction

I am working on a cooking recipe app for google home and I need a way to string several GoogleResponses (SimpleResponse etc..) together without requiring user interaction between them.
I have searched for other answers pertaining to this, and while I have found a few similar questions to mine, the replies tend to be along the lines of "the system was designed for dialogues so what would be the point?".
I fully understand this point of view, however because of the nature and behaviour requirements of the app that I am developing I find myself in need of this particular possibility.
The recipes are divided into steps (revolutionary, I know..) and there is roughly a 1 to 1 correspondence between steps and GoogleResponses.
To give an example of how a typical recipe unfolds it is usually like this (this is a simplification of course):
main content -> question -> main content -> question -> etc..
With each instance of "main content" being a step of the recipe and each "question" requiring user input.
If if was just like this all the time then there would not be a problem, I could just bundle each "main content -> question" section into one GoogleResponse and be done.
However there are often times where the recipe flows more like:
main content -> main content -> main content -> question
With each "main content" being a step in the recipe, it does not make sense in this context to bundle them together into the same response (there is a system for the user to move back and forth between steps).
I was originally using MediaResponses for the "main content" sections as those do not require user input to move onto the next step, but due to various reasons I won't go into here as this is already getting quite long, the project manager has decided that MediaResponses should not be used in this project.
The short answer is the one you already encountered - trying to make conversational actions not-so-conversational doesn't work very well. However, there are a few things you can look into.
Recipe Structured Data
Since you're working on a recipe action, specifically, it may be worthwhile to use the standard recipe support that comes with the Assistant.
On the upside - people will be familiar with it, and you don't need to do much code, just provide markup on a webpage.
On the downside - if you have other requirements for how you want the interaction to go, it isn't that flexible. (For example, if you're asking questions at some of the recipe points, or if you want to offer measurement adjustments based on number of people to serve.)
Misuse the "No Input" event
You can configured dynamic reprompts so you get an event if the user doesn't say anything after a few seconds. If they want to speed a reply, they could ask for the next context specifically, or you can catch the actions_intent_NO_INPUT event in Dialogflow and advance yourself.
There are a few downsides here:
Not all devices support no-input. In particular, for example, mobile devices won't generate this.
This may only be valid for two no-input events in a row. On the third event, the Assistant may automatically close the conversation. (The documentation is unclear on this, and the exact behavior has changed over time.)
Media Response
You're not clear why using Media Response "shouldn't be used", but this is one of the only ways way to trigger an event when speaking is completed.
There are several downsides, however:
There are a number of bugs with Media Response around quitting
On devices with screens, there is a media player. Since the media itself is incidental to what you're doing, having the player doesn't make sense
It isn't supported on all surfaces
Interactive Canvas
A similar approach, however, would be to use the Interactive Canvas. This gives you an HTML page with JavaScript that you control, including being able to generate responses to the server as if the user spoke them (or as if they touched a suggestion chip). You can also listen to events for when the generated speech has finished.
There are, however, a number of downsides which probably prevent you from using this right now:
The biggest is that the Interactive Canvas can only be used for games right now. (But this seems to be a policy decision, rather than a technical one. So perhaps it will be lifted in the future.)
It does not work on smart speakers - only some devices with screens.
Combining the above approaches
One way to get around the device limitations of the Interactive Canvas and the poor visuals that accompany Media Response might be to mix the two. For devices that support IC, use that. If not, try using Media Response. (You may even wish to consider the no-input reprompt for some platforms.)
But this still won't work on all devices, and still has the limitation that Interactive Canvas is only for games right now.
Summary
There is no one, clear, way to handle this... and this isn't a feature they are likely to add given the conversational nature of the platform. However, there may be some workarounds which might work for your scenario.

Bing Speech to Text API returning very wrong text

I am trying the "Bing Speech To Text API" in audio files that contains a real conversations between a person that answer customers in a call-center, and a customer that calls the call center to solve his doubts. Thus, these audios have two persons talking, and sometimes have long silence period when the customer is waiting an answer from support. These audios have 5 to 10 minutes long.
My doubt is:
What is the best aproach to translate audios like that to text, using Microsoft Cognitive Services?
What APIs do I have to use, besides Bing Speech To Text?
Do I have to cut or convert the audios before sending them to Bing Speech To Text?
I am asking that because the Bing Speech to text API is returning an text very very very very very different from the audio content. It is impossible to use or undertand. But, of course, I think I am doing some mistake.
Please, could you explain to me the best strategy to work with audio files like this?
I would be very glad for any help.
Best Regads,
I had run into this problem with conversations as well. Make sure that the transcription mode is set to "conversation" instead of "interactive."

searching YouTube for videos with specific range of views eg. between 9,000,000 and 11,000,000

first time posting.
I wanted to ask if anyone knows how I can search on YouTube for, let's say, music video's that have been viewed between a set number of times. Like the title says for example, between 9 and 11 million times.
One reason I want to do this is because I want to find good music that I haven't heard before. The logic I'm working on is that the Got Talent type video's that get viewed millions of times are generally viewed that many times for one of two reason. 1) they're amazing. 2) they're embarrassingly horrible.
And though I don't think a song being popular will necessarily mean I'll like it, I'm hoping this method will be successful to some degree.
Another reason is to look for trailers for independent films with a similar logic as above. Though with these movies I think I only hear about them six months to a year after they've been released because they're flying under the radar.
If I were to be able to search for movie trailers with 'x' number of views though.. for example, between 500,000 and a million, maybe I'd be able to find movies that I'll like quicker than via time passing and them getting mentioned to me by a friend.
Any help would be greatly appreciated as I've wanted to be able to perform these kind of searches for awhile now.
thanks
You will need to use YouTube API v3.
I havent written this exact request but it looks like you can list videos then filter by 'Chart' = 'mostPopular'
https://developers.google.com/youtube/v3/docs/videos/list
Perhaps a bit of background reading on the API would help too...
https://developers.google.com/youtube/v3/
First off, you would need the Youtube Data API. "v3" means nothing because it's simply the current version, like "Windows 10."
The API lets you get a video's view count, but doesn't put it in a range like 9 million to 11 million.
Youtube's own search function is pretty sophisticated. For instance,
https://www.youtube.com/results?search_query=movie+trailer&search_sort=video_view_count&filters=month. This gives all results for "movie trailer," within the last month, sorted by view count. You can customize the URL, i.e. "week" instead of month would return only trailers from the last week. Or year, etc. Essentially this is a "Videos: List: MostPopular" query, with subject filter.
I have a few Youtube API scripts, and I hardly think it's worth the hassle to do it that way when Youtube's advanced search get you 99% there. If you did, you would need to to a Search:list query for a given subject (i.e. "movie trailer"). Limited to a given time frame (i.e. last month). Then for each video ID, make a Videos:list query to get its view count. Then print all, sorted by views.

How to Look Up Spotify IDs (Song / Track IDs) in Bulk?

I have a list of songs - is there a way (using the Spotify / Echo Nest API) to look up the Spotify ID for each track in bulk?
If it helps, I am planning on running these IDs through the "Get Audio Features" part of their API.
Thanks in advance!
You can use the Spotify Web API to retrieve song IDS. First, you'll need to register to use the API. Then, you will need to perform searches, like in the example linked here.
The Spotify API search will be most useful for you if you can provide specifics on albums and artists. The search API allows you to insert multiple query strings. Here is an example (Despacito by Justin Bieber:
https://api.spotify.com/v1/search?q=track:"' + despacito + '"%20artist:"' + bieber + '"&type=track
You can paste that into your browser and scan the response if you'd like. Ultimately you are interested in the song id, which you can find in the uri:
spotify:track:6rPO02ozF3bM7NnOV4h6s2
Whichever programming language you choose should allow you to loop through these calls to get the song IDs you want. Good luck!
It has been a few years, and I am curious how far you got with this project. I was doing the same thing around 2016 as well. I am just picking up the project again, and noticing you still cannot do large bulk ID queries by Artist,Title.
For now I am just handling HttpStatusCode 429 and sleeping the thread as I loop through a library. It's kind of slow but, I mean it gets the job done. After I get them I do the AudioFeatures query for 100 tracks at a time so it goes pretty quickly that way.
So far, this is the slowest part and I really wish there was a better way to do it, or even a way to make your own 'Audio Features' based on your library It just takes a lot of computing cycles. However ... one possible outcome might be to only do it for tracks that you cannot find on Spotify ;s

Resources