Azure compute power: Extra Large VM slow

Azure compute power: Extra Large VM slow - azure

Can anyone offer me any insights into why my cloud deployment would be slower than an on-premises computer in "horsepower" terms?
I have a compute intensive application which uses a worker role to carry out millions of computations (in parallel).
Currently in Azure I'm testing using an Extra Large (8 core, 16GB) VM to do the processing. On average it's taking 45 minutes per iteration whereas the same code running on a 4 core, 8GB on-premises machine was taking only 15 minutes.
Azure logs indicate total processor utilisation is 99% but I have 12GB memory free so I'll definitely try loading more data into memory for each iteration.
Are the 8 cores just individually very low spec? Is local storage really local? That is, is local storage really on a different physical device and therefore fetching data from file and writing results to disk is slow?

Scott Guthrie (main at Windows Azure team) to me
Hi Ivan,
We have other VM HW configurations as well – including multi-proc and high memory options. You’ll see even more options in the future.
Hope this helps,
Scott
My test: (100% of processor time)
Lucas-Lehmer math calculations. Multithread version uses Parallel.For implementation
Home computer Core i7 3770K (4 cores x 3.5GHz) (Win 8)
SINGLETHREADED (17 primary numbers): 11676 ms (11.6 secs.)
MULTITHREADED (17 primary numbers): 2816 ms (2.8 secs.)
Azure Large VM (4 cores x 1.6 GHZ) (Win S 2008)
SINGLETHREADED (17 primary numbers): 37275 ms
MULTITHREADED 17 primary numbers): 10118 ms
Azure Extra Large VM (8 cores x 1.6 GHZ) (Win S 2008)
SINGLETHREADED (17 primary numbers): 36232 ms
MULTITHREADED (17 primary numbers): 6498 m
Work computer - AMD FX 6100 (6 cores x 3.3 Ghz) (Win 7 w upd)
SINGLETHREADED (17 primary numbers): 48758 ms
MULTITHREADED (17 primary numbers): 16486 ms
Vote for this idea on first page http://www.mygreatwindowsazureidea.com/forums/34192-windows-azure-feature-voting/suggestions/3622286-upgrade-windows-azure-processor-from-1-6-ghz-to-mi

I am experiencing the same issue. My web app with the database (on sql azure) is also really slow compared to my on-premise computer.
Local server details:
- dell's entry level server < $1000, with 4 cores and 8GB memory.
- Server is running as VMs
- even DB server is on the same server (sharing same hardware with the web server)
Azure:
- Webrole on Extra large server with 8 cores.
- SQL Azure (I guess on the different physical server)
My expectation was that it will improve the performance when I deploy to azure! :(
Guess what, it is 4 times slower (verified using the profiler code that times every request)
I am disappointed, I think it is really slow 8 cores.
I ran the test on my old computer (Intel Pentium). Installed the same local VMs on that (VMWare host). It is even faster than azure.

Couple questions in here, I'll try to answer some...
Local storage is local - means on the same disk, in a restricted area. Are you using the local storage APIs to access it? Local storage is also disposable - if your app is redeployed, all data in local storage is lost. If you are using an Azure Drive, then yes I would expect some delays since this writes to blob storage but you haven't mentioned that.
CPU spec is defined on the Azure website.
It is difficult to solve your actual slowness problem though without getting a better idea of the architecture and process your background work is following. But as a general rule, I would be surprised to see the results you are indicating. (Is your on prem machine a VM or dedicated hardware?)

I find the same thing when running analytics-heavy code (ie. little disk usage, not too much RAM needed). I guess the problem is that they select CPUs based on price and number of cores rather than power. The theory is that you should be parallelizing your code to take advantage of all those cores, but sometimes that's hard or expensive (in coding time). Consider voting for more CPU power, but sometimes that's hard or expensive.

Related

Azure Functions: Is using gcServer recommended for consumption plans?

I'm going through the list of perf improvements that can be made against Cosmos DB. My APIs are hosted in a Function app in consumption mode. Is turning on gcServer recommended for Azure Functions?
There is more information on gcServer here.
For single-processor computers, the default workstation garbage
collection should be the fastest option. Either workstation or server
can be used for two-processor computers. Server garbage collection
should be the fastest option for more than two processors. Most
commonly, multiprocessor server systems disable server GC and use
workstation GC instead when many instances of a server app run on the
same machine.
How many processors run in an active instance in a consumption plan?

How many processors run in an active instance in a consumption plan?
Each instance of the Functions host in the Consumption plan is limited to 1.5 GB of memory and one CPU. So there is only 1 processor for that. For more details, you can refer to this article.

Oracle IDAM installation | system requirement

I have installed OIM [11GR2 PS2] and OAM [R2PS2] in my PC, but system hangs with 12Gb of RAM.
I have I3 5th generation processor along with 12 Gb of RAM.I use win10 as my basic OS; however for installing oracle product I use VM where I have installed win7[ultimate version ].
Though as per oracle pre-requisite chart, 8GB of RAM is enough to run single instance of OIM / OAM, however I have allocated almost 10.5 GB of RAM to those VM's running OIM / OAM, but each time, after admin server start, whenever I try to start any of the manage server, the CPU consumption reaches 100% and everything hangs, I had to shut down my VM.
Though the question is a basic one, but have not found exact answer anywhere. Looking for help/suggestion .

The memory requirement of 8 GB is bare minimum and 16 GB is recommended. See this 11gR2 memory requirements and 11gR2 requirements. Also Refer to 3.1 Minimum Memory Requirements for Oracle Identity and Access Management and the section 3.3 Examples: Determining Memory Requirements for an Oracle Identity and Access Management Production Environment. (Even though it is mentioned Production but is valid for your instance since you have one VM, which is hosting all the components, inlcuding WebLogic server, OIM server, SOA server and also OAM server.
Here is the estimate of RAM from the above Oracle 11gR2 reference
To estimate the suggested memory requirements, you could use the following formula:
4 GB for the operating system and other software
+ 4 GB for the Administration Server
+ 8 GB for the two Managed Servers (OIM Server and SOA Server)
-----------------------------------------------------------
16 GB
With 4 GB for OS and 4 GB for Admin, that makes 8 GB RAM consumed already. And as you start a Managed server which would make it 12 GB, which the VM does not have... Hence as soon as you start your Managed server the all RAM is consumed which makes your VM to hang.
As you can see Oracle is recommending 16 GB and that too it is without OAM server (which also you have installed on the same VM). So definitely you are constrained with your current 10.5 GB. Since your PC max is 12 GB, suggest you install only OIM on one VM on the current PC and OAM on a different VM on separate PC if possible. Yes Oracle IAM software is definitely a memory hog.
BTW, I have two suggestions for you, first if you want to install 11gR2 version then go for PS3 (11.1.2.3) or better go with 12c which is latest. 11.1.2.2 is considered old now. Here is link for PS3 download. And second consider Oracle's free downloadable Pre-built VMs here. Although the pre-built VMs will be on linux.

Migrating an On Premise solution to Azure

I have an on premise solution that consists of a server (1 machine) and X users (each in 1 machine). All the users are using the same Win32 application. The question is: How do I translate this in to an Azure enviroment? Each of the users machines are using 4 CPUs and 8 GB of RAM (this is necessary).
Do I have to configure a new machine which has to have the 4 CPUs and 8GB per user, or is there a more efficient way to get this done? Because otherwhise this is not economically profitable.
I was thinking about using XenApp and only one VM for all the users to solve this problem. But I'm not quite sure.
Any help is welcome.

you could automate the provisioning of the VMs with ARM templates. Your virtualizationception could also work. But then: how do you garantee the 8GB of RAM that is necessary? Would be good to have some more details about the requirements.

Cassandra SSD / VHDX

I'm about to create my first Cassandra cluster, starting from the first node :) But immediately I've ran into dilemma on how to implement drives to be used, so any word of advice will be appreciated. Here are musts:
The node must run as Hyper-V (Win srv 2012 R2) VM
I have 2 SSD 256 GB drives available for it
Preferably Ubuntu 14.04 guest OS
My options:
Create dynamic stripped drive (basically software RAID0) in host OS (Win srv), and then create VHDX on top of it that will be used by guest Ubuntu;
No RAID, simply create two VHDX (one per SSD drive), and create guest Ubuntu that uses both VHDXs. Later within Ubuntu use one of the drives for logs and another for SSTables;
Do not create VHDX but connect (passthrough) physical SSDs to the newly created Ubuntu guest, then software RAID0 both SSD during Ubuntu setup process;
Similar with the previous but without software RAID0, and with assigning one drive for logs and other for SSTables
Which of the previous configuration would satisfy Cassandra best? Any resource (experience) about the differences in performances?
It is also important to know that the following is NOT important:
SSD life time - If SSD will survive a year or 10 years is not important at all.
Fault tolerance - I'm not afraid of zero fault tolerance of RAID0 configuration. Fault tolerance of the system will be achieved by using multiple nodes and the appropriate replication policy, so failure of one node is not important.
Also, I'll say that I would be happiest with the first option since it allows me to use my existing VM snapshot-based backup infrastructure, and maybe even add another VHDX on the same RAID0 that will be used by another, non-IO intensive VM.
Finally, when it comes to VHDX on top of SSD - dynamically expanding or fixed?
Many thanks!
I've forgot to say (not sure if important but...):
The cluster should be write-optimized. Expected ingesting rate is 50,000 data points per second. Rare reads - probably no more than one per second.

Azure VM pricing - Is it better to have 80 single core machines or 10 8-core machines?

I am limited by a piece of software that utilizes a single core per instance of the program run. It will run off an SQL server work queue and deposit results to the server. So the more instances I have running the faster the overall project is done. I have played with Azure VMs a bit and can speed up the process in two ways.
1) I can run the app on a single core VM, clone that VM and run it on as many as I feel necessary to speed up the job sufficiently.
OR
2) I can run the app 8 times on an 8-core VM, ...again clone that VM and run it on as many as I feel necessary to speed up the job sufficiently.
I have noticed in testing that the speed-up is roughly the same for adding 8 single core VMs and 1 8-core VM. Assuming this is true, would it better better price-wise to have single core machines?
The pricing is a bit of a mystery, whether it is real cpu usage time, or what. It is a bit easier using the 1 8-core approach as spinning up machines and taking them down takes time, but I guess that could be automated.
It does seem from some pricing pages that the multiple single core VM approach would cost less?
Side question: so could I do like some power shell scripts to just keep adding VMs of a certain image and running the app, and then start shutting them down once I get close to finishing? After generating the VMs would there be some way to kick off the app without having to remote in to each one and running it?

I would argue that all else being equal, and this code truly being CPU-bound and not benefitting from any memory sharing that running multiple processes on the same machine would provide, you should opt for the single core machines rather than multi-core machines.
Reasons:
Isolate fault domains
Scaling out rather than up is better to do when possible because it naturally isolates faults. If one of your small nodes crashes, that only affects one process. If a large node crashes, multiple processes go down.
Load balancing
Windows Azure, like any multi-tenant system, is a shared resource. This means you will likely be competing for CPU cycles with other workloads. Having small VMs gives you a better chance of having them distributed across physical servers in the datacenter that have the best resource situation at the time the machines are provisioned (you would want to make sure to stop and deallocate the VMs before starting them again to allow the Azure fabric placement algorithms to select the best hosts). If you used large VMs, it would be less likely to find a suitable host with optimal contention to accommodate many virtual cores.
Virtual processor scheduling
It's not widely understood how scheduling a virtual CPU is different than scheduling a physical one, but it is something worth reading up on. The main thing to remember is that hypervisors like VMware ESXi and Hyper-V (which runs Azure) schedule multiple virtual cores together rather than separately. So if you have an 8-core VM, the physical host must have 8 physical cores free simultaneously before it can allow the virtual CPU to run. The more virtual cores, the more unlikely the host will have sufficient physical cores at any given time (even if 7 physical cores are free, the VM cannot run). This can result in a paradoxical effect of causing the VM to perform worse as more virtual CPU cores are added to it. http://www.perfdynamics.com/Classes/Materials/BradyVirtual.pdf
In short, a single vCPU machine is more likely to get a share of the physical processor than an 8 vCPU machine, all else equal.
And I agree that the pricing is basically the same, except for a little more storage cost to store many small VMs versus fewer large ones. But storage in Azure is far less expensive than the compute, so likely doesn't tip any economic scale.
Hope that helps.

Billing
According to Windows Azure Virtual Machines Pricing Details, Virtual Machines are charged by the minute (of wall clock time). Prices are listed as hourly rates (60 minutes) and are billed based on total number of minutes when the VMs run for a partial hour.
In July 2013, 1 Small VM (1 virtual core) costs $0.09/hr; 8 Small VMs (8 virtual cores) cost $0.72/hr; 1 Extra Large VM (8 virtual cores) cost $0.72/hr (same as 8 Small VMs).
VM Sizes and Performance
The VMs sizes differ not only in number of cores and RAM, but also on network I/O performance, ranging from 100 Mbps for Small to 800 Mbps for Extra Large.
Extra Small VMs are rather limited in CPU and I/O power and are inadequate for workloads such as you described.
For single-threaded, I/O bound applications such as described in the question, an Extra Large VM could have an edge because of faster response times for each request.
It's also advisable to benchmark workloads running 2, 4 or more processes per core. For instance, 2 or 4 processes in a Small VM and 16, 32 or more processes in an Extra Large VM, to find the adequate balance between CPU and I/O loads (provided you don't use more RAM than is available).
Auto-scaling
Auto-scaling Virtual Machines is built-into Windows Azure directly. It can be based either on CPU load or Windows Azure Queues length.
Another alternative is to use specialized tools or services to monitor load across the servers and run PowerShell scripts to add or remove virtual machines as needed.
Auto-run
You can use the Windows Scheduler to automatically run tasks when Windows starts.

The pricing is "Uptime of the machine in hours * rate of the VM size/hour * number of instances"
e.g. You have a 8 Core VM (Extra Large) running for a month (30 Days)
(30 * 24) * 0.72$ * 1= 518.4$
for 8 single cores it will be
(30 * 24) * 0.09 * 8 = 518.4$
So I doubt if there will be any price difference. One advantage of using smaller machines and "scaling out" is when you have more granular control over scalability. An Extra-large machine will eat more idle dollars than 2-3 small machines.
Yes you can definitely script this. Assuming they are IaaS machines you could add the script to windows startup, if on PaaS you could use the "Startup Task".
Reference

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string