Questions About Azure Instance Allocated Bandwidth - azure

I know few facts from Azure:
Limited bandwidth for each instance, for example: Extra Small instance has 5Mb/s
We only pay outbound bandwidth, inbound is free
For same data center, traffic is free
even for small instance, I checked the network interface, the connection speed is 10G. So Allocated Bandwidth is always lower than the connection speed which the network interface capable with.
I like to know:
Allocated Bandwidth for outbound only or for all traffic?
Allocated Bandwidth for outside of data center or for any data?
I like to know more for useful answers to me:
Will the Allocated Bandwidth limit the traffic between instances in one deploy (they will be in same data center, same network) I will exchange data between instance, so I like to know if the whole network interface capacity will be utilized.
Will the Allocated Bandwidth limit the traffic between instance and CloudDrive which in same data center? If does, then limit on Read or Write or Both? I will use CloudDrive lots, and the bandwidth won't cost since they are in same center, so I like to know the speed limitation.
Will the Allocated Bandwidth limit the instance connection to outside, for example, to send email by a outside SMTP server?
Any official source is highly appreciated.

At the end of the day, you are mostly constrained to the NIC on the box from within the datacenter. Each of the physical machines has a 1Gbps NIC on it. It was found to have about 800Mbps sustained transfer speed. Since each host could also currently have 8 guests on it, the ratio of the reserved NIC was a multiplier of the cores. If you had a Small instance, you were reserved 100Mbps, Medium has 200Mbps, L 400Mbps, etc. However, this does not taken into account bursting. Small instances can burst to 250Mbps or so in practice - it just depends on what your neighbors on the box are doing. The more cores you reserve, the higher your bursting will be upto the max.
The XS instance size is actually limited to 5Mbps, so it does not follow the other pattern.
Again, your connection within the datacenter will mostly be limited by the NIC bandwidth. This is true for instance to instance and instance to storage. For out of datacenter (where prevalent network conditions generally matter more), the NIC is the limit still, but generally other factors outside of datacenter are bottleneck.
The sort of exception to all this is when accessing a particular partition in storage: You can get around 60MB/s transfer per blob (or partition really), as this is limited to the rate at which a partition server will service your request as opposed to NIC speed. However, the entire storage account's limit is 3Gbps (more than your NIC). That can only be achieved when accessing multiple partitions.
Back into your context, that means you will get at max around 60MB/sec (480Mbps) for a particular cloud drive. To fully saturate that, you would need a Large instance or higher. That is why I said you are mostly constrained to NIC.

Regarding your questions,
bandwidth throttling is applied to outbound traffic only.
it is for any data, even if the VM talks to another VM in the same cluster.
If you have multiple instances in one deployment, bandwidth throttling will apply to the traffic between them.
You may see How Does Bandwidth Throttling Work in Azure? for more technical details.

Related

Choosing the right EC2 instance for three NodeJS Applications

I'm running three MEAN stack programmes. Each application receives over 10,000 monthly users. Could you please assist me in finding an EC2 instance for my apps?
I've been using a "t3.large" instance with two vCPUs and eight gigabytes of RAM, but it costs $62 to $64 per month.
I need help deciding which EC2 instance to use for three Nodejs applications.
First check CloudWatch metrics for the current instances. Is CPU and memory usage consistent over time? Analysing the metrics could help you to decide whether you should select a smaller/bigger instance or not.
One way to avoid too unnecessary costs is to use auto scaling groups and load balancers. By using them and finding and applying proper settings, you could have always right amount of computing power for your applications.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html
https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html
Depends on your applications. If your apps need more compute power or more memory or more storage? Deciding a server is similar to installing an app on system. Check what are basic requirements for it & then proceed to choose server.
If you have 10k+ monthly customers, think about using ALB so that traffic gets distributed evenly. Try caching to server some content if possible. Use unlimited burst mode of t3 servers if CPU keeps hitting 100%. Also, try to optimize code so that fewer resources are consumed. Once you are comfortable with ec2 choice, try to purchase saving plans or RIs for less cost.
Also, do monitor the servers & traffic using Cloudwatch agent, internet monitor etc features.

How to improve download speed on Azure VMs?

My organization is spinning down its private cloud and preparing to deploy a complete analytics and data warehousing solution on Azure VMs. Naturally, we are doing performance testing so we can identify and address any unforeseen issues before fully decommissioning our datacenter servers.
So far, the only significant issue I've noticed in Azure VMs is that my download speeds don't seem to change at all no matter which VM size I test. The speed is always right around 60 Mbps downstream. This is despite guidance such as this which indicates I should see improvements in ingress based on VM size.
I have done significant and painstaking research into the issue, but everything I've read so far only really addresses "intra-cloud" optimizations (e.g., ExpressRoute, Accelerated Networking, VNet peering) or communications to specific resources. My requirement is for fast download speeds on the open internet (secure and trusted sites only). To preempt any attempts to question this requirement, I will not specifically address the reasons why I can't consider alternatives such as establishing private/dedicated connections, but suffice it to say, I have no choice but to rule out those options.
That said, I'm open to any other advice! What, if anything, can I do to improve download speeds to Azure VMs when data needs to be transferred to the VM from the open web?
Edit: Corrected comment about Azure guidance.
I finally figured out what was going on. #evilSnobu was onto something -- there was something flawed about my tests. Namely, all of them (seemingly by sheer coincidence) "throttled" my data transfers. I was able to confirm this by examining the network throughput carefully. Since my private cloud environment never provisioned enough bandwidth to hit the 50-60 Mbps ceiling that seems to be fairly common among certain hosts, it didn't occur to me that I could eventually be throttled at a higher throughput rate. Real bummer. What this experiment did teach me, is that you should NOT assume more bandwidth will solve all your throughput problems. Throttling appears to be exceptionally common, and I would suggest planning for what to do when you encounter it.

In Azure, is it possible to increase the NIC bandwidth limit without increasing VM size?

I see in this answer that the bandwidth limit is depending on the machine size: Azure cloud service network bandwith
But is there a way to increase that limit without increasing the machine size?
No, bandwidth limits are tied to VM size.
Bandwidth is purely based on your VM size, so you will not be able to modify that however you can use Accelerated networking it's a feature designed to improve network performance, including latency, throughput, and CPU utilisation. While accelerated networking can improve a virtual machine’s throughput, it can do so only up to the virtual machine’s allocated bandwidth.

Best way to manage big files downloads

I'm looking for the best way to manage the download of my products on the web.
Each of them weighs between 2 and 20 Go.
Each of them is approximatively downloaded between 1 and 1000 times a day by our customers.
I've tried to use Amazon S3 but the download speed isn't good and it quickly becomes expensive.
I've tried to use Amazon S3 + CloudFront but files are too large and the downloads too rare: the files didn't stay in the cache.
Also, I can't create torrent files in S3 because the files size is too large.
I guess the cloud solutions (such as S3, Azure, Google Drive...) work well only for small files, such as image / css / etc.
Now, I'm using my own servers. It works quite well but it is really more complex to manage...
Is there a better way, a perfect way to manage this sort of downloads?
This is a huge problem and we see it when dealing with folks in the movie or media business: they generate HUGE video files that need to be shared on a tight schedule. Some of them resort to physically shipping hard drives.
When "ordered and guaranteed data delivery" is required (e.g. HTTP, FTP, rsync, nfs, etc.) the network transport is usually performed with TCP. But TCP implementations are quite sensitive to packet loss, round-trip time (RTT), and the size of the pipe, between the sender and receiver. Some TCP implementations also have a hard time filling big pipes (limits on the max bandwidth-delay product; BDP = bit rate * propagation delay).
The ideal solution would need to address all of these concerns.
Reducing the RTT usually means reducing the distance between sender and receiver. Rule of thumb, reducing your RTT by half can double your max throughput (or cut your turnaround time in half). Just for context, I'm seeing an RTT from US East Coast to US West Coast as ~80-85ms.
Big deployments typically use a content delivery network (CDN) like Akamai or AWS CloudFront, to reduce the RTT (e.g. ~5-15ms). Simply stated, the CDN service provider makes arrangements with local/regional telcos to deploy content-caching servers on-premise in many cities, and sells you the right to use them.
But control over a cached resource's time-to-live (TTL) can depend on your service level agreement ($). And cache memory is not infinite, so idle resources might be purged to make room for newly requested data, especially if the cache is shared with others.
In your case, it sounds to me like you want to meaningfully reduce the RTT while retaining full control of the cache behaviour, so you can set a really long cache TTL. The best price/performance solution IMO is to deploy your own cache servers running CentOS 7 + NGINX with proxy_cache turned on and enough disk space, and deploy a cache server for each major region (e.g. west coast and east coast). Your end users could select the region closest to them, or your could add some code to automatically detect the closest regional cache server.
Deploying these cache servers on AWS EC2 is definitely an option. Your end users will probably see much better performance than by connecting to AWS S3 directly, and there are no BW caps.
The current AWS pricing for your volume is about $0.09/GB for BW out to the internet. Assuming your ~50 files at an average of 10GB, that's about $50/month for BW from cache servers to your end users - not bad? You could start with c4.large for low/average usage regions ($79/month). Higher usage regions might cost you about ~$150/month (c4.xl), ~$300/month (c4.2xl), etc. You can get better pricing with spot instances and you can tune performance based on your business model (e.g. VIP vs Best-Effort).
In terms of being able to "fill the pipe" and sensitivity to network loss (e.g. congestion control, congestion avoidance), you may want to consider an optimized TCP stack like SuperTCP (full disclaimer, I'm the director of development). The idea here is to have a per-connection auto-tuning TCP stack with a lot of engineering behind it, so it can fill huge pipes like the ones between AWS regions, and not overreact to network loss as regular TCP often does, especially when sending to Wi-Fi endpoints.
Unlike UDP solutions, it's a single-sided install (<5 min), you don't get charged for hardware or storage, you don't need to worry about firewalls, and it won't flood/kill your own network. You'd want to install it on your sending devices: the regional cache servers and the origin server(s) that push new requests to the cache servers.
An optimized TCP stack can increase your throughput by 25%-85% over healthy networks, and I've seen anywhere from 2X to 10X throughput on lousy networks.
Unfortunately I don't think AWS is going to have a solution for you. At this point I would recommend looking into some other CDN providers like Akamai https://www.akamai.com/us/en/solutions/products/media-delivery/download-delivery.jsp that provide services specifically geared toward large file downloads. I don't think any of those services are going to be cheap though.
You may also want to look into file acceleration software, like Signiant Flight or Aspera (disclosure: I'm a product manager for Flight). Large files (multiple GB in size) can be a problem for traditional HTTP transfers, especially over large latencies. File acceleration software goes over UDP instead of TCP, essentially masking the latency and increasing the speed of the file transfer.
One negative to using this approach is that your clients will need to download special software to download their files (since UDP is not supported natively in the browser), but you mentioned they use a download manager already, so that may not be an issue.
Signiant Flight is sold as a service, meaning Signiant will run the required servers in the cloud for you.
With file acceleration solutions you'll generally see network utilization of about 80 - 90%, meaning 80 - 90 Mbps on a 100 Mbps connection, or 800 Mbps on a 1 Gbps network connection.

Azure compute instances

On Azure I can get 3 extra small instances for the price 1 small.I'm not worried about my site not scaling.
Are there any other reasons I should not go for 3 extra small instead of 1 small?
See: Azure pricing calculator.
An Extra Small instance is limited to approx. 5Mbps bandwidth on the NIC (vs. approx. 100Mbps per core with Small, Medium, Large, and XL), and has less than 1GB of RAM. So, let's say you're running something that's very storage-intensive. You could run into bottlenecks accessing SQL Azure or Windows Azure storage.
With RAM: If you're running 3rd-party apps, such as MongoDB, you'll likely run into memory issues.
From a scalability standpoint, you're right that you can spread the load across 2 or 3 Extra Small instances, and you'll have a good SLA. Just need to make sure your memory and bandwidth are good enough for your performance targets.
For more details on exact specs for each instance size, including NIC bandwidth, see this MSDN article.
Look at the fine print - the I/O performance is supposed to be much better with the small instance compared to the x-small instance. I am not sure if this is due to a technology related bottleneck or a business decision, but that's the way it is.
Also I'm guessing the OS takes a bit of RAM in each of the instances, so in 3 X-small instances it takes it up three times instead of just once in a small instance. That would reduce the resources that are actually available for your application needs.
While 3 xtra-small instances theoretically may equal or even be better "on paper" than one small instance, do remember that xtra-small instances do not have dedicated cores and their raw computing resources are shared with other tenants. I've tried these xtra-small instances in an attempt to save money for tiny-load website and must say that there were simply outages or times of horrible performance that I've found unacceptable.
In short: I would not use xtra-small instances for any sort of production environment

Resources