how to convert linear16 text-to-speech to audio file - audio

I just started to play with Google Text-To-Speech API. I generated a post request to:
https://texttospeech.googleapis.com/v1/text:synthesize?fields=audioContent&key={YOUR_API_KEY}
with the following data:
{
"input": {
"text": "Hola esto es una prueba"
},
"voice": {
"languageCode": "es-419"
},
"audioConfig": {
"audioEncoding": "LINEAR16",
"speakingRate": 1,
"pitch": 0
}
}
and I got a 200 response, with the content:
{
"audioContent" : "UklGRn6iCwBXQVZFZm10I...(super long string)"
}
I am assuming this is encoded (or decoded, not sure about the naming), but I would like to actually hear what is that "audioContent".

As Tanaike pointed out, the response is indeed Base64. To actually listen the audio, I pasted the base64 encoded string into a file, then ran:
base64 -d audio.txt > audio.wav
and that made the trick.

Related

How to insert an image into CouchDB

I'm trying to figure out how to insert an image into CouchDB using the node-CouchDB library found here: https://www.npmjs.com/package/node-couchdb
Here's what I've done:
fs.readFile('download.jpeg', (err, data) => {
binary_data = new Buffer(data, 'binary');
couch.insertAttachment("node_db", doc_number, "download.jpeg", binary_data, rev_number).then(({data, headers, status}) => {
}, err => {
console.log("ERROR"+ err.code);
});
});
The result is that CouchDB stores this in the document format like such:
{
"_id": "2741d6f37d61d6bbdf63df3be5000504",
"_rev": "22-bfdbe6db35c7d9873a2cc8a38afb2833",
"_attachments": {
"attachment": {
"content_type": "application/json",
"revpos": 22,
"digest": "md5-on0A+d7045WPI6FyS1ut4g==",
"length": 22482,
"stub": true
}
}
}
//This is what the data looks like in CouchDB using the View Attachment Function through the interface:
{"type":"Buffer","data":[255,216,255,224,0,16,74,70,73,70,0,1,1,0,0,1,0,1,0,0,255,219,0,132,0,9,6,7,18,18,18,21,18,19,19,22,21,21,23,23,23,24,21,21,21,23,23,21,21,24,21,21,21,23,22,22,21,21,22,24,29,40,32,24,26,37,29,21,21,33,49,33,37,41,43,46,46,46,23,31,51,56,51,45,55,40,45,46,43,1,10,10,10,14,13,14,26,16,16,26,45,37,29,37,45,45,45,45,45,45,45,241,...]
I then tried changing the Content-Type attribute to "image/jpeg" in the header of the request resulting in:
{
"_id": "2741d6f37d61d6bbdf63df3be5000504",
"_rev": "23-cf8c2076b43082fdfe605cad68ef2355",
"_attachments": {
"attachment": {
"content_type": "image/jpeg",
"revpos": 23,
"digest": "md5-SaekQP37DCCeGX2M8UVeGQ==",
"length": 22482,
"stub": true
}
}
}
However, this still results in an image that isn't viewable from the CouchDB interface (clicking View Attachments). The image, in this case, is only size 6,904 bytes but it's being stored with a length of ~22k (inflating the size in CouchDB) so I'm assuming I'm not passing the correct representation (encoding) of the image to CouchDB.
You can encode your image data as a base64 string and save it, although i would not recommend it at all. What I would do is to upload the file to a object storage like AWS S3 or it's open source alternative MinIO, and then save in the DB just a reference to the file (e.g. an Image URL).
P.S.: I'm sorry about the lack of links and references in my answer, I'm writing it on my phone. I can edit it and include references as soon as I'm home.

Post request with encrypted json data with several keys and one image file() multipart , android

Have to pass this json to server, and encrypted param will be created after creating the json and it will return the same json with three parameters, but I am not able to do it with image or file data.
After encrypting the data these params(param1, param2, param3) automatically get created
I tried using #PartMap using Map, #Part #Multipart
{
"fullname": "hello",
"dob": "20-12-1992",
"locale":{
"language: "en",
"version": 2,
},
"media": {
"media":{
"name:"",
"size":""
}
},
"param1":"",
"param2":"",
"param3":""
}

Failed to parse Dialogflow response into AppResponse because of empty speech response for Ssml response

Trying to figure out the right Dialogflow fulfillment webhook json response.
The Json is being generated by .NET Core on AWS Lambda.
{
"fulfillmentText": "<speak><p>Welcome to Alterians News. I provide news that is of interest to Alterianss from a variety of sources, and on a variety of topics.</p> \r\n <p>You can say <emphasis>read</emphasis>, then the topic name. For example, you could say <emphasis>read today's news</emphasis>, or <emphasis>read <prosody rate=\"112%\">Banking and Finance</prosody> news</emphasis>.</p> \r\n <p>I cover the following Alterians news topics: house cats, Agriculture, \r\n <prosody rate=\"112%\">Banking and Finance</prosody>, Party Politics, <prosody rate=\"112%\">Police and Crime</prosody>, and the Military</p></speak>",
"fulfillmentMessages": null,
"source": null,
"payload": {
"google": {
"text": null,
"expectUserResponse": false,
"richResponse": {
"items": [
{
"simpleResponse": {
"textToSpeech": null,
"displayText": null,
"ssml": "<speak><p>Welcome to Alterians News. I provide news that is of interest to Alterianss from a variety of sources, and on a variety of topics.</p> \r\n <p>You can say <emphasis>read</emphasis>, then the topic name. For example, you could say <emphasis>read today's news</emphasis>, or <emphasis>read <prosody rate=\"112%\">Banking and Finance</prosody> news</emphasis>.</p> \r\n <p>I cover the following Alterians news topics: house cats, Agriculture, \r\n <prosody rate=\"112%\">Banking and Finance</prosody>, Party Politics, <prosody rate=\"112%\">Police and Crime</prosody>, and the Military</p></speak>"
}
}
]
},
"SystemItent": null
}
},
"outputContexts": null,
"followupEventInput": null
}
I have also tried this format:
{
"payload":
{
"google":
{
"expectUserResponse":true,
"richResponse":
{
"items":
[
{
"simpleResponse":
{
"textToSpeech":"<speak><p>Welcome to Alterian News. I provide news that is of interest to Alterians from a variety of sources, and on a variety of topics.</p></speak>"
}
}
]
}
}
}
}
The response from the actions on google Simulator was:
{
"responseMetadata": {
"status": {
"code": 10,
"message": "Failed to parse Dialogflow response into AppResponse because of empty speech response",
"details": [
{
"#type": "type.googleapis.com/google.protobuf.Value",
"value": "{\"id\":\"f4fdf231-5316-454c-a969-6f36bd889d67\",\"timestamp\":\"2018-08-20T18:30:33.509Z\",\"lang\":\"en-us\",\"result\":{},\"status\":{\"code\":206,\"errorType\":\"partial_content\",\"errorDetails\":\"Webhook call failed. Error: 502 Bad Gateway\"},\"sessionId\":\"1534789833483\"}"
}
]
}
}
}
The Error Tab of the simulator has a lot of encoded Json but I believe the key information was:
Failed to parse Dialogflow response into AppResponse because of empty
speech response
Finally, the Errors Tab contained the following:
MalformedResponse 'final_response' must be set.
I have looked at the samples for the DialogFlow fulfillment webhook and tried to follow that.
There's comprehensive documentation of the Node.js client library, but not that much on the correct Json response formats that use Ssml.
Any help will be a appreciated.
Thanks
If you're sending back a reply to be used for the Assistant:
You can omit the fulfillmentText field.
Fields that are null should be omitted. Including fulfillmentMessages, and the SimpleResponse fields textToSpeech and displayText.

Create custom response object with Node.js SDK

I'm building a skill with the node.js SDK for the Echo Show. I want to use the VideoApp feature (documentation) to display a video based on information I elicit from the user. I have the video in S3, and constructed the directive and the entire response object, and called "response ready" like so:
var directive = [
{
"type": "VideoApp.Launch",
"videoItem": {
"source": "https://s3.amazonaws.com/path/to/video.mp4",
"metadata": {
"title": "Video Title",
"subtitle": "Subtitle to video"
}
}
}
];
this.handler.response = buildResponse(directive);
this.emit(':responseReady');
I expected the Echo Show to use the response object I generated to display my video, but instead it says "there was a problem with the requested skill's response." It also displays "Invalid directive" in the corner when it says that. Below is the full response object I generate, any help on how to properly launch a video would be appreciated!
{
"version": "1.0",
"response": {
"shouldEndSession": true,
"outputSpeech": null,
"reprompt": null,
"directives": [
{
"type": "VideoApp.Launch",
"videoItem": {
"source": "https://s3.amazonaws.com/path/to/video.mp4",
"metadata": {
"title": "Video title",
"subtitle": "Subtitle to video"
}
}
}
],
"card": null
}
}
Also I've found that you get invalid responses if the shouldEndSession attribute is included with the VideoApp.Launch directive.
I figured it out. I just had to make the video, and bucket it was residing in, publicly readable in S3 permissions.

Azure Encoding Job returns no OuputAssetFiles

Using the REST API documentation I am encoding a video using the following request to azure:
{
"Name": "NewTestJob",
"InputMediaAssets": [{
"__metadata": {
"uri": "https://media.windows.net/api/Assets('nb%3Acid%3AUUID%3Ab5cb32de-AAAA-BBBB-a6eb-1b3a61c795be')"
}
}
],
"Tasks": [{
"Configuration": "H264 Single Bitrate 720p",
"MediaProcessorId": "nb:mpid:UUID:ff4df607-d419-42f0-bc17-a481b1331e56",
"TaskBody": "<?xml version=\"1.0\" encoding=\"utf-8\"?><taskBody><inputAsset>JobInputAsset(0)</inputAsset><outputAsset>JobOutputAsset(0)</outputAsset></taskBody>"
}
]
}
From what I can see in the Azure dashboard this creates an encoded version of my video, the problem I have is that the returned job information does not have any OutputMediaAssets. The response is:
{
"odata.metadata": "https://wamsamsclus001rest-hs.cloudapp.net/api/$metadata#Jobs/#Element",
"Id": "nb:jid:UUID:e4bf4cff-0300-80c0-c4c5-f1e75c34a72c",
"Name": "NewTestJob",
"Created": "2017-06-28T19:04:55.8442399Z",
"LastModified": "2017-06-28T19:04:55.8442399Z",
"EndTime": null,
"Priority": 0,
"RunningDuration": 0.0,
"StartTime": null,
"State": 0,
"TemplateId": null,
"JobNotificationSubscriptions": []
}
This means I can't locate the newly created encoded asset. What am I doing wrong? Is there another way to locate the generated asset?
Please start with querying for the Task(s) in the Job, via a call like
GET https://media.windows.net/API/Jobs('nb:jid:UUID:b1f956b3-774c-bb44-a3f7-ee47e23add31')/Tasks HTTP/1.1
The problem wasn't caused by the Request Body but instead the header.
I was passing the header:
Accept: application/json
Instead of the header:
Accept: application/json;odata=verbose
The lack of the odata=verbose means that only a subset of the available data is returned.

Resources