Costs for the Microsoft Azure Translation API 3.0 - azure

my question does not concern programming but the cost of the Microsoft Azure Translator API Version 3.0.
Sorry if this is the wrong place for this question, but maybe someone can help me. Unfortunately I could not find any exact information online.
I wonder if you pay per input or per output character count.
So does the translation from one input language into multiple output languages cost more (i.e. with "&to=de&to=en") than into a single output language? (I use the S1 instance tier.)
Thanks already for the help!

The pricing is here:
https://azure.microsoft.com/en-us/pricing/details/cognitive-services/translator-text-api/
The link for the calculator is this one:
https://azure.microsoft.com/en-us/pricing/calculator/?service=cognitive-services
Select 'Translator Text' and a unit.
For the pricing, for most western languages the difference between paying for input vs output would be in the -30% to +30% range.

Here is a link to find what you need;
API Management: https://azure.microsoft.com/en-us/pricing/details/api-management/
API Text Translator: https://azure.microsoft.com/en-us/pricing/details/cognitive-services/translator-text-api/
There is also a link at the bottom "Calculator" that can assist you in calculating costs. That could help you.
Its calculated in units. Units in a simple explanation would mean packages. so each package as of the time I looked it (API Management) up is 2,500 and that means 2,500 requests per second.
I hope this clarifies it better.

Related

How to get unsampled data from Google Analytics API in a specific day

I am building a package that uses the Google Analytics API for Python.
But, in severous cases when I have multiple dimensions the extraction by day is sampled.
I know that if I use sampling_level = LARGE will use a sample more accurate.
But, somebody knows if has a way to reduce a request that you can extract one day without sampling?
Grateful
setting sampling to LARGE is the only method we have to decide the amount of sampling but as you already know this doesn't prevent it.
The only way to reduce the chances of sampling is to request less data. A reduced number of dimensions and metrics as well as a shorter date range are the best ways to ensure that you dont get sampled data
This is probably not the answer you want to hear but, one way of getting unsampled data from Google analytics is to use unsampled reports. However this requires that you sign up for Google Marketing Platform. With these you can create an unsampled report request using the API or the UI.
There is also a way to export the data to Big Query. But you lose the analysis that Google provides and will have to do that yourself. This too requires that you sign up for Google Marketing Platform.
there are several tactics of building unsampled reports, most popular is splitting your report into shorter time ranges up to hours. Mark Edmondson did a great work on anti-sampling in his R package so you might find it useful. You may start with this blog post https://code.markedmondson.me/anti-sampling-google-analytics-api/

Mobile Data Analysis in Excel

I collected the mobile data consumption using DATA USAGE in android. Spread over days of the weeks (Monday to Sunday), I want to analyse two apps i.e. Facebook and Messenger, to check whether there was a significant data usage difference depending on the days of the weeks. Should I be using t-test or some other method? What's the best method that can be used in excel to analyse this.
P.s. Help will be much appreciated. Thanks
If you believe your data is normally distributed then statistically speaking it sounds like you're going to want to use the t-test method. You don't know the population's standard deviation so that would be my choice. However, this data should be taken over at least 30 weeks if you want the data for each weekday to be somewhat accurate.

Compare two web pages (A/B testing) - Two sample portion test

I have two changes on my web page but I'm monitoring a bunch of variables. So what I'm able to extract from my website monitoring experiment is as follows:
Original solution: Visitors, body link click-visitors, most popular click-visitors, share-visitors.
Solution with some change: Visitors, body link click-visitors, most popular click-visitors, share-visitors.
I was wondering about simple 2 sample portion test. Take each of the monitored variable and compute portion test for original and changed solution.
I don't know if it tells me something about the overall result - if original solution is better than the solution with some change or not.
Is there something better what can I use for this purpose. I'll appreciate any of your advice.
Sounds to me like you’re confusing two things: business metric of interest and test for statistical significance. The former is some business mesurement that you would like to measure for. This could be sales, conversion, subscription rate, or many others. See e.g. this paper for a good discussion on the perils of using the wrong metric. Statistical significance is a test that tells you if the number of measurements you’ve seen so far is enough to substantiate a claim that the difference between the two experiences is very unlikely random. See e.g. this paper for a good discussion.

FourSquare vs. Google Places vs. Yelp API

I am trying to create an app that will help users find restaurants/movie theaters/malls/etc. to hang out based on ratings and distance. Other than just the place itself, I would also like to know more detailed information about the place. For example, if I were to look for parks, I would also like to know if theres a basketball or tennis court there. Ratings and popularity would also be an important aspect to prioritize suggestions.
After looking through all three of the APIs, I could not really find any substantial differences other than their search limits. Could anyone really differentiate each API for me? Maybe even recommend one based on my specific need?
Thanks!
The Foursquare API would fit this use case perfectly because you can supply very specific filters through the API. Also, they have extensive coverage around the world, unlike Google or Yelp.
I would check out the venues/explore endpoint and use a categoryId of Parks. You can use a query parameter of "basketball" or "tennis" to find parks that have courts for these.

How to detect near duplicate rows in Azure Machine Learning?

I am new to azure machine learning. We are trying to implement questions similarity algorithm using azure machine learning. We have large set of questions and answers. Our objective is to identify whether newly added questions are duplicates or not? Just like Stackoverflow suggests existing questions when we ask new questions?Can we use azure machine learning services to solve this? Can someone guide us in the right direction?
Yes you can use Azure Machine Learning studio and could use the method Jennifer proposed.
However, I would assume it is much better to run a R script against a database containing all current questions in your experiment and return a similarity metric for each comparison.
Have a look at the following paper for some examples (from simple/basic to more advanced) how you could do this:
https://www.researchgate.net/publication/4314910_Question_Similarity_Calculation_for_FAQ_Answering
A simple way to start would just be to implement a simple "bags of words" comparison. This will yield a distance matrix that you could use for clustering or use to give back similar questions. The following R code would so such a thing, in essence you build a large string with as first sentence the new question and then follow it with all known questions. This method will, obviously, not really take into consideration the meaning of the questions and would just trigger on equal word usage.
library(tm)
library(Matrix)
x <- TermDocumentMatrix( Corpus( VectorSource( strings.with.all.questions ) ) )
y <- sparseMatrix( i=x$i, j=x$j, x=x$v, dimnames = dimnames(x) )
plot( hclust(dist(t(y))) )
Yes, you can definitely do this with Azure Machine Learning. It sounds like you have a clustering problem (you are trying to group together similar questions).
There is a "Clustering: Find similar companies" sample that does a similar thing at https://gallery.cortanaanalytics.com/Experiment/60cf8e46935c4fafbf86f669121a24f0. You can read the description on that page and click the "Open in Studio" button in the right-hand sidebar to actually open the workspace in Azure Machine Learning Studio. In that sample, they are finding similar companies based on the text from the company's Wikipedia article (for example: Microsoft and Apple are similar companies because the word "computer" appears a lot in both articles). Your problem is very similar except you would use the text in your questions to find similar questions and cluster them into groups accordingly.
In k-means clustering, "k" is the number of clusters that you want to form, so this number will probably be pretty big for your specific problem. If you have 500 questions, perhaps start with 250 centroids? But mess around with this number and see what works. For performance reasons, you might want to start with a small dataset for testing and then run all of your data through the model after it seems to be grouping well.
Also, the documentation for K-means clustering is here.

Resources