We scale our single-threaded app by running it in separate vm's - each instance is configured to work on a particular partition of the overall workload. An idea has been circulating that we can get better performance by adding threads to some parts of the app, though we would not be eliminating the current vm dependence.
Is architecting threading for an app that has been designed for a vm environment different than for an app designed for a non-vm environment? My primary concern is that for every thread designed into the app the actual number of threads that may be spun up per machine is a function of the number of vm instances running on the machine and may actually lead to performance degradation.
Thanks in advance.
Edit: By vm above I mean a virtual machine as provided by VMWare.
I think your concerns about "performance degradation" are warranted. If you are running multiple VMs on a machine and add multiple threads to the VMs, you are most likely going to be increasing the context switching only -- not getting more work out of a VM.
It depends a lot on the jobs you are running of course. If they are IO bound, then adding threads may give you better parallelization. However, if they are CPU/computation bound, then you will most likely not get a win and most likely see a drop in performance.
Is architecting threading for an app that has been designed for a vm environment different than for an app designed for a non-vm environment?
Not IME, but then I don't tend to write CPU-intensive apps - I most often thread off to get stuff out of the GUI and to simplify design for multiple users/clients. I just design the apps as if I am on a native OS.
I don't know how the threads are mapped. I have an XP VM running now. The XP task manager shows 518 threads, the host, (Vista 64), task manager shows only 11 threads for 'VMware Workstation VMX', though there are some 22 other threads for NAT Sevice, VMnet DHCP, Tray Process etc. I have 2 'processors' assigned to the VM to give any multithreading bugs more chance of showing up.
Related
I have written an application (Qt/C++) that creates a lot of concurrent worker threads to accomplish its task, utilizing QThreadPool (from the Qt framework). It has worked flawlessly running on a dedicated server/hardware.
A copy of this application is now running in a virtual machine (RHEL 7), and performance has suffered significantly in that the queue (from the thread pool) is being exercised quite extensively. This has resulted in things getting backed up a bit. This, despite having more cores available to the application through this VM version than the dedicated, non-virtualized server.
Today, I did some troubleshooting with the top -H -p <pid> command, and found that there were 16 total llvmpipe-# threads running all at once, apparently for software rendering of my application's very simple graphical display. It looks to me like the presence of so many of these rendering threads has left limited resources available for my actual application's threads to be concurrently running. Meaning, my worker threads are yielding/taking a back seat to these.
As this is a small/simple GUI running on a server, I don't care to dedicate so many threads to software rendering of its display. I read some Mesa3D documentation about utilizing the LP_NUM_THREADS environment variable, to limit its use. I set it to LP_NUM_THREADS=4, and as a result I seem to have effectively opened up 12 cores for my application to now use for its worker threads.
Does this sound reasonable, or will I pay some sort of other consequence for doing this?
I am currently investigating the feasibility of an architecture where we will have potentially thousands of AppPools and therefore Worker Processes for each of our micro-services running in IIS (10+). (It is one of a few options)
I understand the overhead of each worker process. Currently my estimation would be that each worker is going to be about 20-30MB. Server resourcing should not be too much of an issue as we are likely going to be provisioning servers with 32-64GB of RAM. To add to this not all workers would be active at all times so we should gain headroom when AppPools are idle.
My question: Can IIS handle this many AppPools/Worker processes?
I don't see a reason it shouldn't given sufficient resources however have not been able to find any documentation around it after some brief searching.
So I'll add some answers to my own question here as I did a little bit of testing.
Server
Intel Xeon - X5550
32GB Ram
Windows Server 2012 R2
Application
Created a barebones WebAPI only ASP.Net application with a single controller and action.
When installed in IIS this is the observed memory footprint.
Memory (Idle) = ~ 5172 K
Memory (Running) = ~26 000 K
Prep
I created some powershell scripts (sorry can't share it as they leverage our closed source deployment scripts) to:
Create - Unique folder for each application to prevent possible resource sharing
Launch - Makes a web request
Cleanup - Deletes all applications, pools and folders
Recycle - Unloads the application, sets it back to Idle state
Test
Below are my results observed from PerfMon
As you will note I could not get all 1000 running at once. I ran into a few things:
Trying to fire a call to all 1000 so they all running simultaneously is not as easy as it sounds.
ASP.Net temporary internet files is on the C:\ which ran out of space
Things began running slowly since memory was being paged.
Conclusion
It seems that IIS really has no limit on the number of processes. The core constraint is the resourcing on the machine.
What is interesting is it is unlikely all applications would be running simulationoesly and so one can take advantage of the fact that IIS will provision mem
I am limited by a piece of software that utilizes a single core per instance of the program run. It will run off an SQL server work queue and deposit results to the server. So the more instances I have running the faster the overall project is done. I have played with Azure VMs a bit and can speed up the process in two ways.
1) I can run the app on a single core VM, clone that VM and run it on as many as I feel necessary to speed up the job sufficiently.
OR
2) I can run the app 8 times on an 8-core VM, ...again clone that VM and run it on as many as I feel necessary to speed up the job sufficiently.
I have noticed in testing that the speed-up is roughly the same for adding 8 single core VMs and 1 8-core VM. Assuming this is true, would it better better price-wise to have single core machines?
The pricing is a bit of a mystery, whether it is real cpu usage time, or what. It is a bit easier using the 1 8-core approach as spinning up machines and taking them down takes time, but I guess that could be automated.
It does seem from some pricing pages that the multiple single core VM approach would cost less?
Side question: so could I do like some power shell scripts to just keep adding VMs of a certain image and running the app, and then start shutting them down once I get close to finishing? After generating the VMs would there be some way to kick off the app without having to remote in to each one and running it?
I would argue that all else being equal, and this code truly being CPU-bound and not benefitting from any memory sharing that running multiple processes on the same machine would provide, you should opt for the single core machines rather than multi-core machines.
Reasons:
Isolate fault domains
Scaling out rather than up is better to do when possible because it naturally isolates faults. If one of your small nodes crashes, that only affects one process. If a large node crashes, multiple processes go down.
Load balancing
Windows Azure, like any multi-tenant system, is a shared resource. This means you will likely be competing for CPU cycles with other workloads. Having small VMs gives you a better chance of having them distributed across physical servers in the datacenter that have the best resource situation at the time the machines are provisioned (you would want to make sure to stop and deallocate the VMs before starting them again to allow the Azure fabric placement algorithms to select the best hosts). If you used large VMs, it would be less likely to find a suitable host with optimal contention to accommodate many virtual cores.
Virtual processor scheduling
It's not widely understood how scheduling a virtual CPU is different than scheduling a physical one, but it is something worth reading up on. The main thing to remember is that hypervisors like VMware ESXi and Hyper-V (which runs Azure) schedule multiple virtual cores together rather than separately. So if you have an 8-core VM, the physical host must have 8 physical cores free simultaneously before it can allow the virtual CPU to run. The more virtual cores, the more unlikely the host will have sufficient physical cores at any given time (even if 7 physical cores are free, the VM cannot run). This can result in a paradoxical effect of causing the VM to perform worse as more virtual CPU cores are added to it. http://www.perfdynamics.com/Classes/Materials/BradyVirtual.pdf
In short, a single vCPU machine is more likely to get a share of the physical processor than an 8 vCPU machine, all else equal.
And I agree that the pricing is basically the same, except for a little more storage cost to store many small VMs versus fewer large ones. But storage in Azure is far less expensive than the compute, so likely doesn't tip any economic scale.
Hope that helps.
Billing
According to Windows Azure Virtual Machines Pricing Details, Virtual Machines are charged by the minute (of wall clock time). Prices are listed as hourly rates (60 minutes) and are billed based on total number of minutes when the VMs run for a partial hour.
In July 2013, 1 Small VM (1 virtual core) costs $0.09/hr; 8 Small VMs (8 virtual cores) cost $0.72/hr; 1 Extra Large VM (8 virtual cores) cost $0.72/hr (same as 8 Small VMs).
VM Sizes and Performance
The VMs sizes differ not only in number of cores and RAM, but also on network I/O performance, ranging from 100 Mbps for Small to 800 Mbps for Extra Large.
Extra Small VMs are rather limited in CPU and I/O power and are inadequate for workloads such as you described.
For single-threaded, I/O bound applications such as described in the question, an Extra Large VM could have an edge because of faster response times for each request.
It's also advisable to benchmark workloads running 2, 4 or more processes per core. For instance, 2 or 4 processes in a Small VM and 16, 32 or more processes in an Extra Large VM, to find the adequate balance between CPU and I/O loads (provided you don't use more RAM than is available).
Auto-scaling
Auto-scaling Virtual Machines is built-into Windows Azure directly. It can be based either on CPU load or Windows Azure Queues length.
Another alternative is to use specialized tools or services to monitor load across the servers and run PowerShell scripts to add or remove virtual machines as needed.
Auto-run
You can use the Windows Scheduler to automatically run tasks when Windows starts.
The pricing is "Uptime of the machine in hours * rate of the VM size/hour * number of instances"
e.g. You have a 8 Core VM (Extra Large) running for a month (30 Days)
(30 * 24) * 0.72$ * 1= 518.4$
for 8 single cores it will be
(30 * 24) * 0.09 * 8 = 518.4$
So I doubt if there will be any price difference. One advantage of using smaller machines and "scaling out" is when you have more granular control over scalability. An Extra-large machine will eat more idle dollars than 2-3 small machines.
Yes you can definitely script this. Assuming they are IaaS machines you could add the script to windows startup, if on PaaS you could use the "Startup Task".
Reference
When it comes to virtualization, I have been deliberating on the relationship between the physical cores and the virtual cores, especially in how it effects applications employing parallelism. For example, in a VM scenario, if there are less physical cores than there are virtual cores, if that's possible, what's the effect or limits placed on the application's parallel processing? I'm asking, because in my environment, it's not disclosed as to what the physical architecture is. Is there still much advantage to parallelizing if the application lives on a dual core VM hosted on a single core physical machine?
Is there still much advantage to parallelizing if the application lives on a dual core VM hosted on a single core physical machine?
Always.
The OS-level parallel processing (i.e., Linux pipelines) will improve performance dramatically, irrespective of how many cores -- real or virtual -- you have.
Indeed, you have to create fairly contrived problems or really dumb solutions to not see performance improvements from simply breaking a big problem into lots of smaller problems along a pipeline.
Once you've got a pipelined solution, and it ties up 100% of your virtual resources, you have something you can measure.
Start trying different variations on logical and physical resources.
But only after you have an OS-level pipeline that uses up every available resource. Until then, you've got fundamental work to do just creating a pipeline solution.
Since you included the F# tag, and you're interested in parallel performance, I'll assume that you're using F# asynchronous IO, hence threads never block, they just swap between CPU bound tasks.
In this case it's ideal to have the same number of threads as the number of virtual cores (at least based on my experiments with F# in Ubuntu under Virtualbox hosted by Windows 7). Having more threads than that decreases performance slightly, having less decreases performance quite a bit.
Also, having more virtual cores than physical cores decreases performance a little. But if this is something you can't control, just make sure you have an active worker thread for each virtual core.
I need to run 8-10 instances of my application on IIS 6.0 that are all identical but point to different backends (handled via config files, which would be different for each virtual directory). I want to create multiple virtual directories that point to different versions of the app and I want to know if there is any significant performance penalty for this. The server (Windows Server 2003) is a quad-core with 4 GB of ram and the single install of the app barely touches the CPU or memory, so it doesn't seem to be a concern. This doesn't seem to justify another server, especially since some of the instances will be very lightly used. Obviously, performance depends on the server and the application, but are there any concerns with this situation?
IIS on Windows Server 2003 is built to handle lots of sites, so the number of sites itself is not a concern. The resource needs of your application is much more of a factor. I.e., How much, i/o, cpu, threads, database resources does it consume?
We have a quad-core Windows Server 2003 server here handling several hundred sites no problem. But one resource-intensive app can eat a whole server no problem.
If you find your application is cpu bound, you can put each instance in its own application pool and then limit the amount of cpu each pool can use, so that no one instance can bottleneck any of the others.
I suggest you add a few at a time and see how it goes.
No concerns. If you run into any performance issues, it won't be with IIS for 10 apps that size.
You should consider using multiple application pool. If you do that, and the cpu, memory, IO and network resources of the server are in order. Then there is no performance issue.
It is possible to run them all on the same application pool. But then add to the list, thread pool usage issue, because all application will use one thread pool, and if it is 32 bit server Then there is a limit( around 1.5 Gb ) for the w3wp process.
We constantly run 15-20 per server on a 10 server load balanced farm. We don't come across any issues
The short answer is no, there should be no concerns.
In effect, you are asking if IIS can host 8 - 10 websites... of course it can. Perhaps, you might want to configure it as individual websites rather than virtual directories, and perhaps with individual application pools so that each instance is entirely independent.
You mention that these aren't vary demanding applications; assuming they aren't all linking into the same Access database, I can't see any problems.