I want transcribe longer audio files (at least 5 minutes) using REST APIs from Microsoft. There are a lot of different products and names, e.g. Speech service API or Bing Speech API. None of the REST APIs I tried so far supports transcribing longer audio files.
The documentation states there is a REST API exactly for this case:
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription
What is the endpoint for this service?
There is a sample available on GitHub here: https://github.com/PanosPeriorellis/Speech_Service-BatchTranscriptionAPI
The endpoint is CRIS's endpoint, as in this code:
private const string HostName = "cris.ai";
// ...
var client = CrisClient.CreateApiV2Client(SubscriptionKey, HostName, Port);
Then I found on the documentation that the API is exposed on Swagger (link visible here), so it's easier to explore the methods available (switch from 2.0beta to 2.0 on top):
For West Europe: https://westeurope.cris.ai/swagger/ui/index
For West US : https://westus.cris.ai/swagger/ui/index
So to create a new transcription, the path is: /api/speechtotext/v2.0/transcriptions, called with the POST method, so the full endpoint is:
For West Europe: https://westeurope.cris.ai/api/speechtotext/v2.0/transcriptions
For West US : https://westus.cris.ai/api/speechtotext/v2.0/transcriptions
Please note that the level of your subscription key needed to use the transcription must be a Standard level pricing S0, not Free one.
Related
I was trying to get API Key for AwesomeTTS (Anki)
I want to use Azure Text-to-Speech service REST API
I followed two YouTube videos (one is below and I can't find the other one right now)
https://www.youtube.com/watch?v=EcZF73bsme0
I only got an audio file, but I have no idea where is the API Key I need for ANKI.
I also read this article on Microsoft website many times
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech#authentication
Could anyone tell me where/how I could get the API Key?
Thanks
Could anyone tell me where/how I could get the API Key?
You can get the API key by following steps:
Create an account on https://portal.azure.com
Create a Subscription which will have your billing information
Create a Resource Group in East US
Create a Speech Service entry
In your Speech Service entry, your API key can be found under Resource Management -> Keys and Endpoint
You can refer to Key creation, API key to TTS and AwesomeTTS API keys
I tried to fetch azure subscription belongs to what type for example Pay As you Go, MCA, EA, CSP by using REST API But I am not able to find the appropriate API for this case.
I used Consumption Usage Details API, from this API response I am getting kind as Legacy or Modern
Is there any Rest API for this?
It's not possible to get the Offer Types from any API's, But you can get the offerId from the Consumption Usage Details API.
So, After you get offerId (for example "offerId" : "MS-AZR-0003P") , then you can take just the offerId number (i.e. 0003P). And refer it here to check what type of Offer it is (in this case its Pay-As-You-Go).
I can see there are two versions of REST API endpoints for Speech to Text in the Microsoft documentation links.
https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription and https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-speech-to-text
One endpoint is [https://.api.cognitive.microsoft.com/sts/v1.0/issueToken] referring to version 1.0 and another one is [api/speechtotext/v2.0/transcriptions] referring to version 2.0. How can I create a speech-to-text service in Azure Portal for the latter one?
Whenever I create a service in different regions, it always creates for speech to text v1.0.
Any tips?
PS: I've Visual Studio Enterprise account with monthly allowance and I am creating a subscription (s0) (paid) service rather than free (trial) (f0) service.
Thanks,
Ozgur
All official Microsoft Speech resource created in Azure Portal is valid for Microsoft Speech 2.0
I understand that this v1.0 in the token url is surprising, but this token API is not part of Speech API.
So go to Azure Portal, create a Speech resource, and you're done.
If you want to be sure, go to your created resource, copy your key. That's what you will use for Authorization, in a header called Ocp-Apim-Subscription-Key header, as explained here
Demo:
Get your key on your created resource
Go to https://[REGION].cris.ai/swagger/ui/index (REGION being the region where you created your speech resource)
Click on Authorize: you will see both forms of Authorization
Paste your key in the 1st one (subscription_Key), validate
Close this window
Test one of the endpoints, for example the one listing the speech endpoints, by going to the GET operation on /api/speechtotext/v2.0/endpoints
Click 'Try it out' and you will get a 200 OK reply!
Understand your confusion because MS document for this is ambiguous. Per my research,let me clarify it as below: Two type services for Speech-To-Text exist, v1 and v2.
v1 could be found under Cognitive Service structure when you create it:
Based on statements in the Speech-to-text REST API document:
Before using the speech-to-text REST API, understand:
Requests that use the REST API and transmit audio directly can only
contain up to 60 seconds of audio.
The speech-to-text REST API only returns final results. Partial
results are not provided.
If sending longer audio is a requirement for your application, consider using the Speech SDK or a file-based REST API, like batch
transcription.
So v1 has some limitation for file formats or audio size. If you have further more requirement,please navigate to v2 api- Batch Transcription hosted by Zoom Media.You could figure it out if you read this document from ZM. You could create that Speech Api in Azure Marketplace:
That's the creation page for it :
Also,you could view the API document at the foot of above page, it's V2 API document.
Final tip:
v1's endpoint like: https://eastus.api.cognitive.microsoft.com/sts/v1.0/issuetoken
v2's endpoint like:
I'm developing a bot application in which I'm using face api and vision api. The app is streaming pictures to those apis. According to GDPR I will need consent from the user(s) of the app to send those pictures to the api. But GDPR also states that you are able to withdraw your consent, so my question is: if I have a user who has given consent, used the app and then says: "I changed my mind". Can I then guarantee that all personal information (pictures) of that person has been deleted? I'm not using personIds or personGrops or anything like that. The Face api documentation says:
Microsoft will receive the images, audio, video, and other data that you upload (via this app) and may use them for service improvement purpose
According to this it's not really clear what becomes of the actual pictures. I'm grateful for any input on this.
Since spring 2018 Microsoft updated their Online Service Terms to align the Cognitive Services with the rest of Azure services. Meaning: They do not store or use customer data:
Under the new terms, Cognitive Services customers own, and can manage and delete their customer data. With this change, many Cognitive Services are now aligned with the same terms that apply to other Azure services.
Source: https://azure.microsoft.com/en-us/blog/microsoft-updates-cognitive-services-terms/
But you are right, many sources on the web still refer to the old terms.
How to make 3rd party api call in dialogflow using inline editor please share if you have some code regarding this.
Thank you
You cannot make external network requests (for example an API call) from the inline editor. You need to deploy your code elsewhere. The easiest way to do this is by deploying to a paid Firebase plan. From Google:
Network calls originating from your Cloud Function for Firebase to destinations outside Google's network require billing to be enabled for the underlying Google Cloud or Firebase project.
You can create an API with the technology that you want, as long as the response from the API can be understand by Dialogflow.
You need to configure it by going to Fulfillment, and point it to your API.
The API needs to respond with this structure: Dialogflow.
And If you plan to integrate with Actions on Google, you have this repository which includes some examples of responses.
And here you have some libraries to interact with Dialogflow with different languages.