We have the following situation at work: Currently we have problems with obtaining reports, which are generated through an application developed in .NET and is mounted on a SharePoint 2010 site. We have done tests generating reports of short time periods (per month) and it works perfect, the problem arises only when trying to obtain the report per year (due to the large number of records).
There are two servers, production and QA, the problem only occurs in the production environment (obviously it is the one with the highest workload).
We were reviewing the configurations of both servers and found the following difference in IIS:
PROD
Connection Time-out (seconds): 120
QA
Connection Time-out (seconds): 180
We would like to increase this time in the production environment, however we would like to know if there is any risk of impacting some other process / application.
Any recommendation?
Thank you
IIS Connection timeouts help reduce the amount of memory resources that are consumed by idle connections. Time-out settings also allow you to specify how long server resources are allocated to specific tasks or clients.
When increasing the Time Out value, it will be good for large number records process, as it will need more time to connect to the IIS.
Related
I have a requirement to write a load test measuring message transmission latencies. In order to simulate a large number of simultaneous uses without running into thread contention problem on one box, I'm spinning up multiple servers in Azure.
When I got my first results back, I was a little shocked to see that the results indicated the message was received before it was sent. I immediately realized that, while I had an implicit assumption that all the VMs would have their clocks synced to within milliseconds, that was clearly not the case.
I've spent several hours googling ways to resolve this, and I'm not getting anywhere. One thought was to have each VM query the time on a central server using NetRemoteTOD() using a technique similar to this NetRemoteTOD, and then establish a per-machine correction factor to be added to the time measured from the local machine's clock. However when I tried to run that method, I got a error 2184, "The service has not been started" I have verified that both the RPC service and the Windows Time service are running on the both the client and target machines, and I have not been successful in finding any information indicating what other service needs to be running (or even if the error really means what it seems to mean). (I also get the same error when running between my development desktop and a server on our corporate network. However, I can run it successfully to a PDC on the corporate network - but I can't find a PDC on Azure, since neither machine is part of a domain.)
So, does any one have either any information on what service needs to be started to get NetRemoteTOD (or the windows NET TIME command, which relies on NetRemoteTOD under the covers) working. Alternatively, does anyone have a suggestion for some other technique to get a consistent time reference across multiple VMs in Azure? (Note, I don't necessarily need their clocks synced, I just need a way to establish a consistent correction factor to reference the times to a common source. Note also, I need sub-second accuracy - probably about 100 msec will do.) Basically, I just need a windows function or shell command that will get me the time to sub-second accuracy on a given remote server.
Thanks in advance.
PS. Azure servers are running Server 2008 R2 SP1
I maintain an azure cloud service. It is set to auto-scale based on load. To monitor the health of this service I have another service which pings this service every 2 minutes. The usual response time from this service is around 100ms.
Once or twice a week I see that the service does not respond. It is not really a worry for me - because it happens quite infrequently. I still am trying to figure out what could be causing the service to not respond. I do not think the problem is with the pinging service - I don't see any of the other services (not on azure, but on other servers) that it pings having any issues.
What could be causing these occasional delays. Any other azure service owners seeing such delays ?
Having quite similar problems. But I use Applications Inside, so I have some statistics. For example that reponse time increases together with SQL azure access time and CPU usage. My average response time according to Applications Inside is about 600ms and average RPS is about 0,6. During these problems RPS usually higher than avarage - up to 1.5, but average response time grows up to 1min! (During the day my RPS can grow up to 3 or even higher without any reponse time growth). As I have 1min sql connection timeout and I have drammatical growth of total SQL azure access time during this periods I can assume that problem happens bacause of SQL Azure. This also happens once a day or two, for about 10-15 minutes max and my ping service also always reports that service doesn't respond.
So my advice here - install Application Insights to analyze what happens dusring these response delays. It would be great if you share your results here.
P.S. I also use autoscale based on load. Though it doesn't really help in these concrete situations.
Do these already exist? Does anyone have any real experience with highload applications using the meteor.js framework? Are there any such known services or applications or designs based on this technolody?
update apr 4, 2013: since we were not able to scale on Heroku due to Meteor's session affinity, we moved to EC2 with ELB. We've tested this with about 800 concurrent users, which is 4x what we were getting with Heroku. FMI: [1] http://www.ripariandata.com/blog/creating-an-aws-elastic-load-balancer
update jan 18, 2013: we've done some load testing on our app using blitz.io. On a single heroku Dyno, with a MongoHQ back end, we are able to get to 35 concurrent operations before timeouts start exceeding 1 second. We able able to get to 180 concurrent operations before the Dyno dies and needs a restart. This was against our most complex query using a five field compound index, returning between 10-30 documents, and randomly using a skip value of 0-30 (to simulate paging).
Currently, increasing the number of dynos had no substantial increase in performance. We are investigating the bottlenecks and will report back our findings.
First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
And...
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.
Per this helpful article I have confirmed I have a connection pool leak in some application on my IIS 6 server running W2k3.
The tough part is that I'm serving 300 websites written by 700 developers from this server in 6 application pools, 50% of which are .NET 1.1 which doesn't even show connections in the CLR Data performance counter. I could watch connections grow on my end if everything were .NET 2.0+, but I'm even out of luck on that slim monitoring tool.
My 300 websites connect to probably 100+ databases spread out between Oracle, SQLServer and outliers, so I cannot watch the connections from the database end either.
Right now my best and only plan is to do a loose binary search for my worst offenders. I will kill application pools and slowly remove applications from them until I find which individual applications result in the most connections dropping when I kill their pool. But since this is a production box and I like continued employment, this could take weeks as a tracing method.
Does anyone know of a way to interrogate the IIS connection pools to learn their origin or owner? Is there an MSMQ trigger I might be able to which I might be able to attach when they are created? Anything silly I'm overlooking?
Kevin
(I'll include the error code to facilitate others finding your answers through search:
Exception: System.InvalidOperationException
Message: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.)
Try starting with this first article from Bill Vaughn.
Todd Denlinger wrote a fantastic class http://www.codeproject.com/KB/database/connectionmonitor.aspx which watches Sql Server connections and reports on ones that have not been properly disposed within a period of time. Wire it into your site, and it will let you know when there is a leak.