Alexa - How to send image to user? - node.js

I'm using a Lambda function for my Alexa Skill. For my launch intent, I query DynamoDB and return a String that I first want to convert into a QRCode and then I want to return it to the Alexa Device as an Image inside the responseBuilder
Alexa works fine displaying images from external urls such as
const rabbitImage = "https://i.imgur.com/U6eF0oH.jpeg";
return responseBuilder
.speak(say)
.withStandardCard("Welcome to Alexa", "description", rabbitImage, rabbitImage)
.reprompt('try again, ' + say)
.getResponse();
But I'm stuck on how to send the QRCode back to the Alexa Device in the responseBuilder.
I'm using a nodejs library called qrcode that can convert the String into a QRCode and then into base64.
https://www.npmjs.com/package/qrcode
But according to the Alexa docs for sending a "card" aka image, back to the user it has to be a url.
https://developer.amazon.com/en-US/docs/alexa/custom-skills/include-a-card-in-your-skills-response.html
The Alexa Skills Kit provides different types of cards:
A Standard card also displays plain text, but can include an image. You provide the text for the title and content, and the URL for the image to display.
So I'm not sure if the base64 generated by the qrcode library would work in this case.
What's the best way to send the dynamically generated QRCode back to the Alexa Device as a response in this scenario?
const LaunchRequest_Handler = {
canHandle(handlerInput) {
const request = handlerInput.requestEnvelope.request;
return request.type === 'LaunchRequest';
},
handle(handlerInput) {
const responseBuilder = handlerInput.responseBuilder;
//Perform query to DynamoDB
var stringToCreateQRWith = "8306e21d-0c9e-4465-91e9-0cf86fca110d";
//Generate qr code and send back to user here
//???Unsure how to do and what format to send it in
var qrImageToSendToUser = ???
return responseBuilder
.speak(say)
.withStandardCard("Welcome to Alexa", "description", qrImageToSendToUser , qrImageToSendToUser )
.reprompt('try again, ' + say)
.getResponse();
}

As #kopaka proposed, this is the way to go.
There is no way around it.
As per the documentation
They are a few thing you need to have in mind.
On the images itself, you will want to create 2 images with 720px x 480px and 1200px x 800px. To make sure they display nicely on multiple screen size. Otherwise they do not guarantee the best experience for your user, as they may scale up/down the image to fit.
On the storage choice, you need to make sure to be able to serve those images via Https and with a valid ssl certificate trusted by amazon.

Related

How do I create an expressjs endpoint that uses azure tts to send audio to a web app?

I am trying to figure out how to expose an express route (ie: Get api/word/:some_word) which uses the azure tts sdk (microsoft-cognitiveservices-speech-sdk) to generate an audio version of some_word (in any format playable by a browser), and res.send()'s the resulting audio, so that a front end javascript web app could consume the api in order to play the audio pronunciation of the word.
I have the azure sdk 'working' - it is creating an 'ArrayBuffer' inside my expressjs code. However, I do not know how to send the data in this ArrayBuffer to the front end. I have been following the instructions here: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-text-to-speech?tabs=import%2Cwindowsinstall&pivots=programming-language-javascript#get-result-as-an-in-memory-stream
Another way to phrase my question would be 'in express, I have an ArrayBuffer whose contents is an .mp3/.ogg/.wav file. How do I send that file via express? Do I need to convert it into some other data type(like a Base64 encoded string? A buffer?) Do I need to set some particular response headers?
I finally figured it out seconds after asking this question 😂
I am pretty new to this area, so any pointers on how this could be improved would be appreciated.
app.get('/api/tts/word/:word', async (req, res) => {
const word = req.params.word;
const subscriptionKey = azureKey;
const serviceRegion = 'australiaeast';
const speechConfig = sdk.SpeechConfig.fromSubscription(
subscriptionKey as string,
serviceRegion
);
speechConfig.speechSynthesisOutputFormat =
SpeechSynthesisOutputFormat.Ogg24Khz16BitMonoOpus;
const synthesizer = new sdk.SpeechSynthesizer(speechConfig);
synthesizer.speakSsmlAsync(
`
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="zh-CN">
<voice name="zh-CN-XiaoxiaoNeural">
${word}
</voice>
</speak>
`,
(resp) => {
const audio = resp.audioData;
synthesizer.close();
const buffer = Buffer.from(audio);
res.set('Content-Type', 'audio/ogg; codecs=opus; rate=24000');
res.send(buffer);
}
);
});

Subscribe to long-processing request feedback on NodeJS server from client

I have created a Node JS server which does the following:
Uploads media files (videos and images) to the server using multer
If the media is an image, then resize it using sharp
It the media is a video , then resize and compress it using fluent-ffmpeg
Upload files to Firebase storage for backup
All this is working know fluently. The problem is that, when the size of an uploaded file is big, the request processing takes long time. So I want to show some progress on the client side as below:
State 1. The media is uploading -> n%
State 2. The media is compessing
State 3. The media is uploading to cloud -> n%
State 4. Result -> JSON = {status: "ok", uri: .., cloudURI: .., ..}
Firebase storage API has a functionality like this when we creating an upload task as shown below:
let uploadTask = imageRef.put(blob, { contentType: mime });
uploadTask.on('state_changed', (snapshot) => {
if (typeof snapshot.bytesTransferred == "number") {
let progress = (snapshot.bytesTransferred / snapshot.totalBytes) * 100;
console.log('Upload is ' + progress + '% done');
}
});
I have found that, it is possible to realize this using websockets, I am interested if there is other methods to do that.
The problem is described also here: http://www.tugberkugurlu.com/archive/long-running-asynchronous-operations-displaying-their-events-and-progress-on-clients
And there is one of the methods Accessing partial response using AJAX or WebSockets? but I am looking for a more flexible and professional solution.
I have solved this problem using GraphQL Subscriptions. The same approach can be realized using WebSockets. The steps to solve this problem are as below:
Post files to upload server
Generate operation unique ID and send it as response to the client
Ex: response = {op: "A78HNDGS89NSNBDV7826HDJ"}
Create a subscription by opID
Ex: subscription { uploadStatus(op: "A78HNDGS89NSNBDV7826HDJ") { status }}
Every time on status change send request to the GraphQL endpoint, which which publishes the data to the pubsub. To send GraphQL request from nodejs server you can use https://github.com/prisma-labs/graphql-request
Ex:
const { request } = require('graphql-request');
const GQL_URL = "YOUR_GQL_ENDPOINT";
const query = `query {
notify ("Status text goes here")
}`
request(GQL_URL, query).then(data =>
console.log(data)
)
notify resolver function publishes the data to the pubsub
context.pubsub.publish('uploadStatus', {
status: "Status text"
});
If you have more complicated architecture, you can use message brokers like RabbitMQ, Kafka etc.
If someone knows other solutions, please let us know )

Google Actions MediaResponse Callback Not Working on iPhone Google Assistant App, Works in Simulator and on Google Home Mini

I'm having an issue with my Google Assistant Action and using it in the Google Assistant Mobile app.
I am trying to play a tracklist of 1-3 minute mp3s using Media Responses and callbacks and it is working perfectly in the simulator and on my Google Home Mini, but not on the Google Assistant app on my phone.
What I've noticed happening is the MediaResponse callback isn't sent when I test on iPhone. The first MediaResponse will play but then the app is silent. It doesn't exit my action, though, it leaves the mic open and when I try to talk to it again whatever I say is sent to my action. This part is very similar to Starfish Mint's problem, though mine seems to work on my Google Home device. They said they fixed it by
"After waiting 6 months, We manage to solve it ourselves. On
MEDIA_FINISHED, we must return Audio text within your media response
to get subsequent MEDIA_FINISHED event. We tested this with playlist
of 100 media files and it plays like a charm."
though I'm not entirely sure what that means.
This might be an obvious answer to my question but where it says: Media responses are supported on Android phones and on Google Home , does this mean that they aren't supported on iPhone and that's the issue? Are there any workarounds for this, like using a Podcast action or something?
I have tried another audio playing app, the Music Player Sample app which is one of Google's sample Dialogflow apps and it also doesn't work on my phone though does in the other places. Maybe it is just an iPhone thing?
The thing that I find confusing, though, is when I look at the capabilities of the action on my phone: conv.surface.capabilities.has("actions.capability.MEDIA_RESPONSE_AUDIO")it includes actions.capability.MEDIA_RESPONSE_AUDIO in its capabilities. If it didn't have this I would be more inclined to believe it doesn't include iPhones but it seems weird that it would have it in the capabilities but then not work.
Here's the code where I am playing the first track:
app.intent('TreatmentResponse', (conv, context, params) => {
var treatmentTracks = [{url: 'url', name: 'name'},{url: 'url', name: 'name'}];
var result = playNext(treatmentTracks[0].url, treatmentTracks[0].name);
var response = result[0];
conv.data.currentTreatment = 'treatment';
conv.data.currentTreatmentName = 'treatmentName';
conv.data.treatmentPos = 1;
conv.data.treatmentTracks = treatmentTracks;
conv.ask("Excellent, I'll play some tracks in that category.");
conv.ask(response);
conv.ask(new Suggestions(['skip']));
});
and here is my callback function:
app.intent('Media Status', (conv) => {
const mediaStatus = conv.arguments.get('MEDIA_STATUS');
var { treatmentPos, treatmentTracks, currentTreatment, currentTreatmentName } = conv.data;
if (mediaStatus && mediaStatus.status === 'FINISHED' && treatmentPos < treatmentTracks.length) {
playNextTrack(conv, treatmentPos, treatmentTracks);
} else {
endConversation(conv, currentTreatment);
}
});
Here's playNextTrack()
function playNextTrack(conv, pos, medias) {
conv.data.treatmentPos = pos+1;
var result = playNext(medias[pos].url, medias[pos].name);
var response = result[0];
var ssml = result[1];
conv.ask(ssml);
conv.ask(response);
conv.ask(new Suggestions(['skip']));
}
and playNext()
function playNext(url, name) {
const response = new MediaObject({
name: name,
url: url,
});
var ssml = new SimpleResponse({
text: 'Up next:',
speech: '<speak><break time="1" /></speak>'
});
return [response, ssml];
}
The other issue is when the MediaResponse is playing on my iPhone if I interrupt it to say "Next" or "Skip", rather than using my "NextOrSkip" intent like it does in the simulator and on the Google Home Mini, it just says "sure" or "alright" [I don't have that in my code anywhere] and then is silent (and listening).

NodeJS/React: Taking a picture, then sending it to mongoDB for storage and later display

I've been working on a small twitter-like website to teach myself React. It's going fairly well, and i want to allow users to take photos and attach it to their posts. I found a library called React-Camera that seems to do what i want it do to - it brings up the camera and manages to save something.
I say something because i am very confused about what to actually -do- with what i save. This is the client-side code for the image capturing, which i basically just copied from the documentation:
takePicture() {
try {
this.camera.capture()
.then(blob => {
this.setState({
show_camera: "none",
image: URL.createObjectURL(blob)
})
console.log(this.state);
this.img.src = URL.createObjectURL(blob);
this.img.onload = () => { URL.revokeObjectURL(this.src); }
var details = {
'img': this.img.src,
};
var formBody = [];
for (var property in details) {
var encodedKey = encodeURIComponent(property);
var encodedValue = encodeURIComponent(details[property]);
formBody.push(encodedKey + "=" + encodedValue);
}
formBody = formBody.join("&");
fetch('/newimage', {
method: 'post',
headers: {'Content-type': 'application/x-www-form-urlencoded;charset=UTF-8'},
body: formBody
});
console.log("Reqd post")
But what am i actually saving here? For testing i tried adding an image to the site and setting src={this.state.img} but that doesn't work. I can store this blob (which looks like, for example, blob:http://localhost:4000/dacf7a61-f8a7-484f-adf3-d28d369ae8db)
or the image itself into my DB, but again the problem is im not sure what the correct way to go about this is.
Basically, what i want to do is this:
1. Grab a picture using React-Camera
2. Send this in a post to /newimage
3. The image will then - in some form - be stored in the database
4. Later, a client may request an image that will be part of a post (ie. a tweet can have an image). This will then display the image on the website.
Any help would be greatly appreciated as i feel i am just getting more confused the more libraries i look at!
From your question i came to know that you are storing image in DB itself.
If my understanding is correct then you are attempting a bad approcah.
For this
you need to store images in project directory using your node application.
need to store path of images in DB.
using these path you can fetch the images and can display on webpage.
for uploading image using nodejs you can use Multer package.

Uploading image as Binary Data to cognitive Services with Node

I am trying to pass the Microsoft Cognitive services facial API an image which the user has uploaded. The image is available on the server in the uploads folder.
Microsoft is expecting the image to be 'application/octet-stream' and passed as binary data.
I am currently unable to find a way to pass the image to the API that is satisfactory for it to be accepted and keep receiving "decoding error, image format unsupported". As far as im aware the image must be uploaded in blob or file format but being new to NodeJs im really unsure on how to achieve this.
So far i have this and have looked a few options but none have worked, the other options i tried returned simmilar errors such as 'file too small or large' but when ive manually tested the same image via Postman it works fine.
image.mv('./uploads/' + req.files.image.name , function(err) {
if (err)
return res.status(500).send(err);
});
var encodedImage = new Buffer(req.files.image.data, 'binary').toString('hex');
let addAPersonFace = cognitive.addAPersonFace(personGroupId, personId, encodedImage);
addAPersonFace.then(function(data) {
res.render('pages/persons/face', { data: data, personGroupId : req.params.persongroupid, personId : req.params.personid} );
})
The package it looks like you're using, cognitive-services, does not appear to support file uploads. You might choose to raise an issue on the GitHub page.
Alternative NPM packages do exist, though, if that's an option. With project-oxford, you would do something like the following:
var oxford = require('project-oxford'),
client = new oxford.Client(YOUR_FACE_API_KEY),
uuid = require('uuid');
var personGroupId = uuid.v4();
var personGroupName = 'my-person-group-name';
var personName = 'my-person-name';
var facePath = './images/face.jpg';
// Skip the person-group creation if you already have one
console.log(JSON.stringify({personGroupId: personGroupId}));
client.face.personGroup.create(personGroupId, personGroupName, '')
.then(function(createPersonGroupResponse) {
// Skip the person creation if you already have one
client.face.person.create(personGroupId, personName)
.then(function(createPersonResponse) {
console.log(JSON.stringify(createPersonResponse))
personId = createPersonResponse.personId;
// Associate an image to the person
client.face.person.addFace(personGroupId, personId, {path: facePath})
.then(function (addFaceResponse) {
console.log(JSON.stringify(addFaceResponse));
})
})
});
Please update to version 0.2.0, this should work now.

Resources