I would like to use the Microsoft Azure Cognitive Service Speech-to-text. It offers a REST API, which I succesfully have used. I can point to an Azure blob storage using a SAS URI, and the files in the container are transcribed.
My problem is, that when I try to retrieve the transcription results from the API, they are published to a public url. Since voice data can be sensitive, I would like to keep the results stored privately. Is there any way to do this?
I does not seem like it is an option in the API schema, although you can set a destinationContainerUrl. I have tried to set the destinationContainerUrl, but the result does not appear in the container.
I have only used the API reference, which is why I am not posting any code.
You've found the correct option. Using destinationContainerUrl will write the results into this container. Make sure you provide a container SAS which allows listing and writing.
When the job succeeds, the results should be there. Please check that status of your job, maybe it was set to failed.
Documentation about transcriptions:
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription
If the job succeeds and the files are not on this container, please let us know the $.self link of the job and the creation time, to help us gathering the logs.
Ok. So the solution was super simple. I just did the post request json wrong. destinationContainerUrl needs to be under properties, as shown below:
{"contentUrls": ["LINK-TO-BLOB-SAS-URI-TOKEN"],
"properties": {
"diarizationEnabled": false,
"wordLevelTimestampsEnabled": false,
"punctuationMode": "DictatedAndAutomatic",
"profanityFilterMode": "Masked",
"destinationContainerUrl": "LINK-TO-BLOB-SAS-URI-TOKEN"
},
"locale": "en-US",
"displayName": "Transcription from blob to blob"
}
Related
I'm using the following api to successfully get file data:
https://acme.sharepoint.com/sites/my-site/_api/Web/Lists(guid'xxx')/files('yyy')
This is a docx file on which I've posted comments using the web console.
How can I fetch these comments using the rest api? I tried appending /comments to the url, but I'm getting the following 404 error:
{
"error": {
"code": "-1, Microsoft.SharePoint.Client.ResourceNotFoundException",
"message": {
"lang": "en-US",
"value": "Cannot find resource for the request Comments."
}
}
}
The Comments() endpoint currently exists only under the Items() endpoint and not under the Files() endpoint.
Basically, you can access the Comments() functionality only under the below endpoint:
GET https://{site_url}/_api/web/lists/GetByTitle({list_title})/items({item_id})/Comments
You can easily test the above in a PowerAutomate scenario with a Send Http Request to SharePoint actions.
In the below example I attempt to target the file in the document library:
On the other hand, if I attempt to target the file based on the List Item Id that it got in the document library I will get the below response:
As you can see from the above, I am also able to target a specific comment that I left.
Please take note of the below
The Comments() endpoint is not available for MS resources, meaning docx, excels and such files. It is only available for non-MS resource files like pdfs, txts and so on. I am not sure why this rule is in effect but, my best guess would be because there is a "commenting" functionality provided within a Word Document, for example.
You could find a bit more info about the above here.
I'm in the process of creating an Azure Logic App to work with Abbyy's OCR REST API.
I use the Create SAS URI by path action which returns Web URL. Web URL returns the FQDN, incuding the SAS token, to my blob.
Web URL is passed to an Http action as a query parameter. In code view, the relevant part of the JSON looks like:
"method": "GET",
"uri": "https://cloud.ocrsdk.com/processRemoteImage?source=#{body('Create_SAS_Uri_by_path')?['WebUrl']}&language=English&exportformat=pdfSearchable&description=blah&imageSource=scanner"
The uri resolves thus:
https://cloud.ocrsdk.com/processRemoteImage?source=https://mysaaccount.blob.core.windows.net/inbox/180730110047_0001.pdf?sv=2017-04-17&sr=b&sig=2IGMt1qDZthaBSyvD3WJ6T1zc36Wr%2FNoiB4Wki5Lf28%3D&se=2018-08-16T11%3A16%3A48Z&sp=r&language=English&exportformat=pdfSearchable&description=blah&imageSource=scanner"
This results in the error (450):
<?xml version="1.0" encoding="utf-8"?><error><message language="english">Invalid parameter: sr</message></error>
Which is basically picking up sr= query parameter from the SAS token and of course, the API doesn't have an sr argument, and even if it did, its value would be wrong.
I did find this question and I attempted "percent-escape" the ampersands (&) by adjusting my code to use the replace function, thus:
"method": "GET",
"uri": "https://cloud.ocrsdk.com/processRemoteImage?source=#{replace(body('Create_SAS_Uri_by_path')?['WebUrl'],'%26','%2526' )}&language=English&exportformat=pdfSearchable&description=blah&imageSource=scanner"
However, this has no effect. I.e the resulting URI is the same as above. Interestingly, it appears the SAS token itself has made use of "percent-escape".
If anyone has any suggestion on how to resolve or work around this problem, I'd be most greatful if you would share your thoughts.
Does anyone know if the LogicApp actions are opensource and if so what the GitHub link is. I can then raise an issue.
Resolved it.
Basically, I was on the right track, but I used %26 when I should have used &, so using the code above, it should read:
"method": "GET",
"uri": "https://cloud.ocrsdk.com/processRemoteImage?source=#{replace(body('Create_SAS_Uri_by_path')?['WebUrl'],'&','%26' )}&language=English&exportformat=pdfSearchable&description=blah&imageSource=scanner"
And therefore, the URI reads:
https://cloud.ocrsdk.com/processRemoteImage?source=https://mysaaccount.blob.core.windows.net/intray/180730110047_0001.pdf?st=2018-08-17T10%3A55%3A38Z%26se=2018-08-18T10%3A55%3A38Z%26sp=r%26sv=2018-03-28%26sr=b%26sig=FTRoVgV7MRz5d5gTgrEs6D0QSy3268BqscZX1LHbJYQ%3D&language=English&exportformat=pdfSearchable&description=blah&imageSource=scanner
Next step: convert XML body into JSON...
Update 1
I have updated (17/08/2018) the replace with value from %2526 to %26. I was clearly getting my knickers in a twist. Must have been trying to double-escape, which isn't needed.
Although Microsoft seem to partially percent-encode the SAS token, I note they percent-encode = with %3D, however, the Abbyy API doesn't seem to care about =. (Testing from Postman).
Update 2
Not sure why Update 1 did work. It could be because I set the access policy to blob (anonymous read access for blobs only), and setting it back to private hadn't taken effect, or it might be the wrong content-type. Anyway, it worked and then it didn't. Well, it was replacing the & with %26 without issue, and I tried escaping the = too, but the endpoint didn't like it, testing via Postman and Logic App.
My trigger is actually a C# BlobTrigger* Azure Function App, and I have written code to generate the SAS token, so I used dotnet's Uri.EscapeDataString() method:
string fullUrl = cloudBlobContainer.Uri + "/" + blobName + sasToken;
string escapedUri = Uri.EscapeDataString(fullUrl);
log.LogInformation($"Full URL: {fullUrl}");
log.LogInformation($"Escaped URL: {escapedUri}");
return escapedUri;
Update 3
Just seen this line, created by the Logic App designer:
"path": "/datasets/#{encodeURIComponent(encodeURIComponent('https://someaccount.sharepoint.com/sites/eps'))}/tables//items//attachments",
Looks like encodeURIComponent() might be a function to do the same thing as Uri.EscapedDataString and would replace replace() function. I haven't tested it yet.
*the reason I am using a function app for the trigger is down to cost. Although there is a Logic App trigger that runs when a new blob is detected, it has to run to a schedule. My plan is consumption, and I have read that MS charge when a Logic App runs, regardless of whether it does anything or not. IMHO, it is inefficient to have a task triggering every 5 seconds, when 90% of the time there isn't anything for it to do. The function app is better suited to my requirements, although there is apparently a ~10 minute "warm-up" period, if the app has gone to sleep. I can live with that.
I'm trying to figure out how to get the route for an HTTP triggered Azure Function within an ARM template.
Thanks to a blog post I managed to find out the listsecret command, but when trying to execute this action via powershell, the output doesn't give me the trigger_url I was expecting. The URL does not comply with the configured route of the function, and shows the default trigger if no route would have been configured.
Any way I can get a hold of the configured route instead since I can't seem to use the trigger_url.
My configured route has got parameters in the path as well, e.g.:
{
"name": "req",
"type": "httpTrigger",
"direction": "in",
"authLevel": "function",
"methods": [
"POST"
],
"route": "method/{userId}/{deviceId}"
}
The output of listsecrets is:
trigger_url: https://functionapp.azurewebsites.net/api/method?code=hostkey
Is there any other way to extract the host key and route?
Try playing with the API version, but I would suspect that this is not possible as of now.
Currently, the only way to get the route is by reading the function.json file and parsing that information out, which you can do by using Kudu's VFS API.
For the keys, I would actually recommend using the key management APIs instead of listSecrets. As the latter is meant to address a small set of scenarios (primarily to enable some internal integrations) where the key management API more robust API and will continue to work with different secret storage providers (e.g. Azure Storage, which is what is used when slots are enabled and will eventually become the default).
when I start a webjob using the rest api :
/api/triggeredwebjobs/{job name}/run?arguments={arguments}
I need to know if the program invoked ran successfully or not and for now I request the latest result from the history using .../api/triggeredwebjobs/{job name}/history
Now, is there a way to get the {id} of the Job just after I invoke ? Because obviously there's no way to be sure that the latest history is the job I just ran. Or is there another way to get things done?
Thanks.
Yes, we added a new binder in the extensions library to allow you to get the instance ID - ExecutionContext. See an example here in the extensions repo samples. To use this binding you'll have to pull in the beta1 Microsoft.Azure.WebJobs.Extensions prerelease package, and add config.UseCore() to your startup code (as the sample app shows). This was added based on another ask similar to yours.
You can call this anywhere in your code and it works !
(Not in debug, but when published)
Console.Out.WriteLine("RUN NAME : " + Environment.GetEnvironmentVariable("WEBJOBS_RUN_ID"));
We just added support for this in the WebJobs API. The way it works is that when you send the POST request to trigger the WebJob, you now get back a location attribute, with a URL to the details of the run that was started. e.g.
Location: https://mysite.scm.azurewebsites.net/api/triggeredwebjobs/SomeJob/history/201605192149381933
You can then query this URL to track the run, e.g.
{
"id": "201605192149381933",
"name": "201605192149381933",
"status": "Success",
"start_time": "2016-05-19T21:49:38.1933956Z",
"end_time": "2016-05-19T21:49:39.4826458Z",
"duration": "00:00:01.2892502",
"output_url": "https://mysite.scm.azurewebsites.net/vfs/data/jobs/triggered/SomeJob/201605192149381933/output_log.txt",
"error_url": null,
"url": "https://mysite.scm.azurewebsites.net/api/triggeredwebjobs/SomeJob/history/201605192149381933",
"job_name": "SomeJob",
"trigger": "External - ARMClient/1.1.1.0"
}
I have a Logic App with Twitter connector and a Dropbox connector. The latter has repeater, which loops over the Twitter body and upload a text file in each iteration with Tweet_ID as file name. The Dropbox connector many times returns conflict errors, it seems Tweet connector keeps returning same tweets again and again, which had been already processed, which results in duplicate file names.
When I look at the output of the Dropbox connector, below is the body it returns.
"body": {
"status": 409,
"source": "api-content.dropbox.com",
"message": "conflict_file"
}
You have probably seen this page https://azure.microsoft.com/sv-se/documentation/articles/app-service-logic-use-logic-app-features/ where they show how to do this.
Have you checked that you don't supply the same Tweet_ID several times? The logic app json format it a bit tricky right now, with not so much documentation.
/dag
You are right. The twitter connector doesn't "remember" the tweets that are returned from a search. It will return the same again. (Just to be clear. We are discussing the Twitter Connector Action Search Tweets.)