What is the relationship between bandwidth and CDN in Windows Azure?
Let's say I have 3 MB of content seen by 100000 users monthly = 300 GB bandwidth without a CDN.
If I want to use their CDN how does this work? Is the bandwidth calculated to feed the various CDN nodes (i.e. 3MB * (number of nodes))? From there on is the price calculated as CDN price?
Regards,
Matteo
The CDN bills for egress in two locations - first to fill the cache, and secondly to serve the resource. You are also billed for transactions. Here is an example:
You have a icon, 'icon.png' that will be served from the CDN. It is 1K and cached for long time. 1M users hit it from every location in the world.
In this scenario, you would be billed for the bandwidth from blob storage to each CDN location used (there were 26+ or so locations not too long ago). That would be 26x 1K or 26K of egress + 26 transactions from blob storage to each location. Now, you would serve the file 1M times - 1 GB of bandwith and 1M transactions. Your total charge would be 1GB of bandwidth (broken up by region prices) + 1M transactions + 26 transactions to fill cache + 26K of bandwith (again, by region).
The CDN is good for serving data that does not change frequently. This is not a bad deal at all and great use of CDN. However, if you introduce the added complexity of frequently expired objects, you will then hit the case where you need to repopulate the cache. Final example: you have 1MB that changes frequently (every 15 mins) and 100K users requesting it over month from around the world. Here is what you would be billed:
1MB x 26 CDN x 4 update/hr x 24 hrs (2.5GB/day, 73GB/month) egress to populate caches
100K x 1MB = 97GB of bandwidth to serve actual file.
100K transactions for serving + transactions for filling cache
In this case, you can see that you actually spend a fair amount of money filling the cache and almost as much just serving it. It might be better to just serve from blob storage here assuming latency was not a huge factor.
Related
In the service high-level description Microsoft mentions that I can stream millions of events per second and it is highly scalable
Event Hubs is a fully managed, real-time data ingestion service that’s simple, trusted, and scalable. Stream millions of events per second from any source
https://azure.microsoft.com/en-us/services/event-hubs/
But when I go to the official documentation the maximum throughput units (TUs) limit is 20, which translates into 1000 event per TU * 20 TUs = 20,000 events:
Event Hubs traffic is controlled by throughput units. A single throughput unit allows 1 MB per second or 1000 events per second of ingress and twice that amount of egress. Standard Event Hubs can be configured with 1-20 throughput units, and you can purchase more with a quota increase support request.
https://azure.microsoft.com/en-us/services/event-hubs/
How does 20TUs translate into streaming millions of events?
You can increase 20-TUs by raising a support request.
But if you need to go very high you can also use Dedicated Clusters for Event Hubs.
Two important notes from the docs
A Dedicated cluster guarantees capacity at full scale, and can ingress up to gigabytes of streaming data with fully durable storage and sub-second latency to accommodate any burst in traffic.
At high ingress volumes (>100 TUs), a cluster costs significantly less per hour than purchasing a comparable quantity of throughput units in the Standard offering.
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-dedicated-overview
The throughput capacity of Event Hubs is controlled by throughput units. Throughput units are pre-purchased units of capacity. A single throughput unit lets you: Ingress: Up to 1 MB per second or 1000 events per second (whichever comes first). Egress: Up to 2 MB per second or 4096 events per second.
I am inserting the data to azure cosmos db. In some time it throws an error (Request Timeout : 408). I have increased the Request Timeout to 10 mins.
Also, i have iterate each item from api and calling CreateItemAsync() method instead of bulk executor.
Data To Insert = 430 K Items
Microsoft.Azure.Cosmos SDK used = v3
Container Throughput = 400
Can anyone help me to fix this issue.
Just increase your throughput. But it's going to cost you a lot of money if you leave it increased. 400 RU/s isn't going to cut it unless you batch your operation to the point where it's going to take a long time to insert 400k items.
If this is a one-time deal, increase your RU/s to 2000+, then start slowly inserting items. I would say, depending on the size of your documents, maybe do 50 at a time, then wait 250 milliseconds, then do 50 more until you are done. You will have to play with this though.
Once you are done, move your RU/s back down to 400.
Cosmos DB can be ridiculously expensive, so be careful.
ETA:
This is from some documentation:
Increase throughput: The duration of your data migration depends on the amount of throughput you set up for an individual collection or a set of collections. Be sure to increase the throughput for larger data migrations. After you've completed the migration, decrease the throughput to save costs. For more information about increasing throughput in the Azure portal, see performance levels and pricing tiers in Azure Cosmos DB.
The documentation page for 408 timeouts lists a number of possible causes to investigate.
Aside from addressing the root cause with the SDK client app or increasing throughput, you might also consider leveraging Azure Data Factory to ingest the data as in this example. This assumes your data load is an initialization process and your data can be made available as a blob file.
Some of the articles I have read suggest that items cached by service worker (web Cache API) is stored in system forever.
I have come across a scenario when some of the cached resources are evicted automatically for users who revisit my website after a long time(~ > 2 months)
I know for a fact that assets cached via HTTP caching are removed by browser after certain time. Does same apply for service worker too?
If that is the case, then how does browser decide what asset it has to remove and is there a way I can tell browser that if it is removing something from cache, then remove everything that are cached with same cache name?
It seems it lasts forever, until it doesn't :) (ie. storage space is low)
https://developers.google.com/web/ilt/pwa/caching-files-with-service-worker
You are responsible for implementing how your script (service worker)
handles updates to the cache. All updates to items in the cache must
be explicitly requested; items will not expire and must be deleted.
However, if the amount of cached data exceeds the browser's storage
limit, the browser will begin evicting all data associated with an
origin, one origin at a time, until the storage amount goes under the
limit again. See Browser storage limits and eviction criteria for more
information.
If their storage is running low then it may be evicted: (See Storage Limits)
https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API/Browser_storage_limits_and_eviction_criteria
We have lots of images in Azure Blob Storage (LRS Hot). We calculate around 15 million downloads per month for a total of 5000 GB egress (files are on average 350kB). I can calculate the price for the Blob Storage but the Function proxy is unknown. The Azure Functions pricing document doesn't say anything about proxy functions and specifically about bandwidth.
Question 1: Are these calculations correct?
Execution count price is €0,169 per million executions, which equals to 15 * 0,169€=2,54€/month.
GB-s price is €0,000014/GB-s and memory usage is rounded to nearest 128MB. If file download time is 0,2s and memory is 128MB we have 0,2 * (128/1024) * 15000000 * 0,000014 = 5,25€/month
Question 2: What about bandwidth? Is there any cost for that?
Q1: Mostly yes.
Azure Functions Proxies (Preview) works just like regular functions, meaning that any routing done by your proxy counts as one execution. Also, just like standard functions, it uses your GB-s while it's running. Your calculation approach is correct, with the caveat that reading from blog storage is actually a streaming activity, which will consume a fixed amount of memory multipled by the time it will take to each file to download.
Q2: This works the same way as Azure App Service. From the pricing page:
165 MB outbound network traffic included, additional outbound network bandwidth charged separately.
I cannot figure out what is the cause of the bottleneck on this site, very bad response times once about 400 users reached. The site is on Google compute engine, using an instance group, with network load balancing. We created the project with sailjs.
I have been doing load testing with Google container engine using kubernetes, running the locust.py script.
The main results for one of the tests are:
RPS : 30
Spawn rate: 5 p/s
TOTALS USERS: 1000
AVG(res time): 27500!! (27,5 seconds)
The response time initially is great, below one second, but when it starts reaching about 400 users the response time starts to jump massively.
I have tested obvious factors that can influence that response time, results below:
Compute engine Instances
(2 x standard-n2, 200gb disk, ram:7.5gb per instance):
Only about 20% cpu utilization used
Outgoing network bytes: 340k bytes/sec
Incoming network bytes: 190k bytes/sec
Disk operations: 1 op/sec
Memory: below 10%
MySQL:
Max_used_connections : 41 (below total possible)
Connection errors: 0
All other results for MySQL also seem fine, no reason to cause bottleneck.
I tried the same test for a new sailjs created project, and it did better, but still had terrible results, 5 seconds res time for about 2000 users.
What else should I test? What could be the bottleneck?
Are you doing any file reading/writing? This is a major obstacle in node.js, and will always cause some issues. Caching read files or removing the need for such code should be done as much as possible. In my own experience, serving files like images, css, js and such trough my node server would start causing trouble when the amount of concurrent requests increased. The solution was to serve all of this trough a CDN.
Another proble could be the mysql driver. We had some problems with connection not being closed correctly (Not using sails.js, but I think they used the same driver at the time I encountered this), so they would cause problems on the mysql server, resulting in long delays when fetching data from the database. You should time/track the amount of mysql queries and make sure they arent delayed.
Lastly, it could be some special issue with sails.js and Google compute engine. You should make sure there arent any open issues on either of these about the same problem you are experiencing.