Client billing/client usage for microsoft cognitive services speech to text? - azure

I'm working on a website that is supposed to offer users to make use of azures cognitive services api. They can play audio or use their microphone to transform speech into text.
I'm currently using azures js sdk and technically it's working fine. However, I noticed a big shortcoming with this approach. The sdk connects through a websocket with the azure server, which exposes the subscription key to the client. So every member could theoretically read it out and sell it or alike.
Furthermore, if the client connects directly with azure, I have no secure way of preventing clients abusing the service. I need a way to measure roughly how much time a customer uses the service to take into account individual billing.
I could not find anything about that in the official documentation. So what are my options?
Should I redirect the clients' audio input to my own server, do some quantitative analysis, and then forward the input from a server side connection to azure? I fear with many concurrent customers, it might get laggy or connections might get dropped...
Is there any way to attach at least client ids or alike to azure websocket connection that I can read out somehow later?
Do you have any advice for me?

Given your additional comment, I would suggest that you switch your implementation from using subscription key to using authentication tokens.
That would:
generate a unique token for each client, based on 1 global subscription key
not expose your subscription key to your clients
restrict the use of the API, as the token is only valid for 10 minutes
Each access token is valid for 10 minutes. You can get a new token at any time, however, to minimize network traffic and latency, we recommend using the same token for nine minutes.
See documentation here for global implementation. In a nutshell, you need to implement this token generation in your backend, and serve the page to your client with this token instead of the key.
Side note 1: be careful about the maximum number of concurrent requests (100 - see here).
Note 2: that will not help you bill clients given their usage as you have just 1 key and there is no way to identify distinct usages in it

Related

Best practices to follow when building API service to be used by customers

Throughout my career, I've relied on and used various API services in my project. I saw multiple mechanisms of how these APIs are secured, but most common one seems to be via API Keys.
I am now planning to build out my own API service and being unfamiliar with security part of this I had few questions:
So far, what I gathered is to do the following: Create API key, store it's hash in db, only show api key to user 1 time, check for api key in requests and rate-limit based on it.
But above raises one concern, if someone was to inspect customer website they could easily get this api key (if customer is calling api directly from their front end) and abuse it, correct? This can be done in form of constantly hitting rate-limits or sending bad data to customers dashboard.
I feel like I am missing few key parts here and would appreciate if someone could outline best practices of how this is done nowadays in NodeJS. Thank you.
EDIT: Users of such service would be developers utalizing this API in their product

Connect Google calendar api and api.ai

So here is what i am trying to do :
I built a bot with api.ai for my business that is hosted on my webpage and my Facebook page right now. Bot works well.
I want to push it to the next step by allowing my customers to make querys on my calendar, ask to book a specific time, see if available, if not offer other time similar, then make a booking.
I have been reading this thread and the great answer attached to it but i think my case is a bit different.
I was wondering if the bot could always have a token so every guests won't have to Auth to query the calendar ?
Obviously i am new to this, i have been reading the guide of google calendar api and api.ai but i don't really see how to do that yet. I guess there is a way to store a token somewhere and then just trigger the query with some specific intents but not to sure how.
I have also done the node.js quickstart guide of the G-calendar api, and it works fine if that helps.
Thanks for your help !
You will probably want to use a Service Account that is permitted to the calendar in question. Service Accounts are similar to regular accounts, but they are expected to do server-to-server communication only, so the method to create an auth token is a little different to keep it secure.
See https://developers.google.com/identity/protocols/OAuth2ServiceAccount for more information about using Service Accounts.
In general, you'll be using a shared secret to create and sign a JSON Web Token (JWT) you send to Google's servers. You'll get back an access token which you'll then use to call the Calendar API. The access token expires in about an hour, at which point you'll need to repeat the process.
There are libraries available to do much of this for you. For example, if you're using the node.js library https://github.com/google/google-api-nodejs-client, then it will take care of this for you (although you need to modify the key file - see the documentation for details).

How do I identify two requests from the same source in NodeJS?

my case is simple:
I need an application layer solution to identify and then apply some sort of rule to requests coming from the same origin.
If a guy will request my server from Postman, or from a browser or from a cURL I want to identify this guy and then do something with this information.
In my particular case I want to blacklist a guy who would be attacking my server for sometime.
Is it possible in Node/Express?
There is no uber identifier that comes with a web request that tells you who the user is behind the request, no matter how the request was initiated (browser, cURL, Postman, node.js app, PHP app, etc...).
This question comes up pretty regularly among new web developers. In the end it boils down to two things:
Requiring users to have an account, login to that account in order to use your service, requiring login credentials with every use of the service and then tracking their usage to see if it meets your usage guidelines. If it does not, you can ban that account.
Rate limiting users either by account or by IP address or some combination of both. If they exceed a certain rate limit, you can slow them down or deny access.
A browser provides a cookie so you can attempt to identify repeat users via browser cookies. But, this can be defeated by clearing cookies. Cookies are per-browser though so you can't correlate the same user across multiple devices or across multiple browsers with a plain cookie.
cURL and Postman don't provide any identifying information by default other than the originating IP address. You can attempt to track IP address, but there are some issues with relying only on IP address because corporate users may be going through a proxy which makes them all appear to come from the same IP address. If you ban one user for misbehavior, that may affect lots of other innocent users.
If you look how Google, Facebook, etc... do this, they all require you to create some sort of account and then provide credentials for that account with every request. This allows them to track your usage and manage your traffic if needed. And, for free usage, they generally all have rate limits that limit how frequently you can make API calls. This prevents any single user from using more than an appropriate share of the load of the service. And, it allows them to detect and regulate accounts that are abusing the system.
One step further than this concerns how an account is created because you don't want an abuser to just be able to run a script every 10 minutes to automatically create a new account. There are a variety of schemes for protecting this too. The most common is just requiring some proof that a human is involved in creating the new account (captcha, question/answer, etc...) which prevent automated account creation. Other checks can require a valid credit card, unique email address verification, etc...

Limit the Outbound Data Transfer of a Video in a Given Timespan

I've started publishing videos using Azure Media Services.
The cost of experimenting is reasonable. To start I've added one 30 second video. If nobody watches it, this will cost less than a penny per month. If it receives 1300 monthly views, it will cost only $1.00/month.
My concern is a malicious user who might rack up views. That could cost a fortune in outbound data transfer fees.
So, I need to limit views. I would like a data transfer limit that is both per video and per time frame. For instance, I would like to limit each video to 10 views per hour.
I'm afraid a simple spending limit won't work, because my Azure account hosts other services. Those may need to scale beyond the outbound limit for a video.
You can tryout archive your scenario with Azure Media Services Content protection functionality.
Before user playback video it will get JWT token and video will be configured to use token authentication. Only logged in user or user who get token by solving some simple challenge (captcha or promo code) will be able to watch your video.
Pricing is $0.10 per delivered 100 keys. 1300 monthly users will cost you $1.30.
With JWT token you can cofigure token expiration and have additional logic in your app in regards who will be able to get new JWT token.
Code samples how to configure token authentication can be found in https://github.com/Azure/azure-media-services-samples/tree/master/KDWithADMVC or you can also looked into tests associated with JWT usage in Azure Media services .NET SDK repository (See GetHlsKeyDeliveryUrlAndFetchKeyWithJWTAuthentication test)

How to secure account creation via (private) API?

Some time ago, it was commonplace for smartphone apps to open a browser to a registration page with a CAPTCHA, or to require separate signup via web, because API signup was seen as vulnerable.
Now most apps seem to offer registration via native form, though endpoints for this are usually not documented in their public API. I haven't seen many reports of this being abused to create spam accounts.
How is this done? Is there a standard crypto/handshake process to verify real signups, or does signup typically rely on undocumented endpoints and simple API key passing?
Embedding yields a better experience but has the issue you mention. Yes, the service owners on the other end are still worried about this and combating the problem. And undocumented APIs don't help and the service owners know this.
One of the tools in the toolbox these days is keys assigned to devices which can be used for throttling. This would essentially let you limit the amt of service that can be consumed on a per device basis and it would require you have a device (or can steal the key from one) in order to provide service. So long as the process to issue keys to new devices is strong (a solvable problem) then you can offer a CAPTCHA-free signup experience within the confines of what you are willing to give to a device.
I'd also note that there are other well known approaches you can use, like IP throttling and handshakes with other service providers (like a phone carrier). Depending upon the problem domain these are on the table too...

Resources