I was wondering if anyone can educate me on an area of perf testing an API. I have an API which caches all its operations for extensive periods let's say a day. Based on this I felt there was no need to perf test the API as I would be testing infrastructure, configuration and http caching. Am I correct in my thinking? I am eager to know peoples opinions and any useful docs and papers would be beneficial.
Cache should be load-tested too, just to know your system's overall capacity (yes, most load goes to infrastructure, but at least you'll know that it does actually cache)
Second, but not by priority - you have the cache-miss branch and its capacity, if there's lot of cache misses - cache will not save your app, it still has to perform.
Related
I've come across many clients who aren't really able to provide real production data about a website's peak usage. I often do not get peak pageviews per hour, etc.
In these circumstances, besides just guessing or going with what "feels right" (i.e. making it all up), how exactly does one come up with a realistic workload model with an appropriate # of virtual users and a good pacing value?
I use Loadrunner for my performance/load testing.
Ask for the logs for a month.
Find the stats for session duration, then count the number of distinct IP's blocked by session duration.
Once you have the high volume hour, count the number of page instances. Business processes will typically have a termination page which is distinct and allows you to understand how many times a particular action takes place, such as request new password, update profile, business process 1, etc...
With this you will have a measurement of users and actions. You will want your stakeholder to take ownership of this data. As Quality assurance, we should not own both the requirement and the test against it. We should own one, but not both. If your client will not own the requirement, cascading it down to rest of the organization, assume you will be left out in the cold with a result they do not like....i.e., defects that need to be addressed before deployment to production.
Now comes your largest challenge, which is something that needs to be fixed with a process issue by your client.....You are about to test using requirements that no other part of the organization, architecture, development, platform engineering, had when they built the solution. Even if your requirements are a perfect recovery, plus some amount for growth, any defects you find will be challenged aggressively.
Your test will not match any assumptions or requirements used by any other portion of the organization.
And, in a sense, these other orgs will be correct in aggressively challenging your results. It really isn't fair to hold their designed solution to a set of requirements which were not in place when they made decisions which impacted scalability and response times for the system. You would be wise to call this out with your clients before the first execution of any performance test.
You can buy yourself some time. If the client does have a demand for a particular response time, such as an adoption of the Google RAIL model, then you can implement a gate before accepting any code for multi-user performance testing that the code SHALL BE compliant for a single user. It is not going to get any faster for two or more users. Implementing this hard gate will solve about 80% of your performance issues, for the changes required to bring code into compliance for a single user most often will have benefits on the multi-user front.
You can buy yourself some time in a second way as well. Take a look at their current site using tools such as Google Lighthouse and GTMetrix. Most of us are creatures of habit, that includes architect, developers, and ops personnel. We design, build, deploy to patterns we know and are comfortable with....usually the same ones over and over again until we are forced to make a change. It is highly likely that the performance antipatterns pulled from Lighthouse and GTMetrix will be carried forward into a future release unless they are called out for mitigation. Begin citing defects directly off of these tools before you even run a performance test. You will need management support, but you might consider not even accepting a build for multi-user performance testing until GTMetrix scores at least a B across the board and Lighthouse a score of 90 or better.
This should leave edge cases when you do get to multi-user performance testing, such as too early allocation of a resource, holding onto resources too long, too large of a resource allocation, hitting something too often, lock contention on a shared resource. An architectural review might pick up on these, where someone might say, "we are pre-allocating this because.....," or "Marketing says we need to hold the cart for 30 minutes before de-allocation," or "...." Well, you get the idea.
Don't forget to have the database profiler running while functional testing is going on. You are likely to pick up a few missing indexes or high cost queries here which should be addressed before multi-user performance testing as well.
You are probably wondering why am I pointing out all of these things before your performance test takes place. Darnit, you are hired to engage in a performance test! The test you are about to conduct is very high risk politically. Even if it finds something ugly, because the other parts of the organization did not benefit from the requirements, the result is likely to be rejected until the issue shows up in production. By shifting the focus to objective measures even before you need to run two users in anger together there are many avenues to finding and fixing performance issues which are far less politically volatile. Food for thought.
For small app they are no problem.
But for apps with traffic you can hit limits easily.
Http protocol is req-res driven. Just because your backend is stuck with limit, you can't really wait to send respond back until rate limit allows you to resume making your api calls.
What do you do?
I can think of several scenarios:
Wait it out: while it sucks, but sometimes it's easy fix, as you don't need to do anything.
Queue it: this a lot of work oppose to making just api call. This requires that first you store it in database, then have background task go through database and do the task. Also user would be told "it is processing" not "it's done"
Use lot of apis: very hacky... and lot of trouble to manage. Say you are using amazon, now you would have to create, verify, validate like 10 accounts. Not even possible for where you need to verify with say domain name. Since amazon would know account abc already owns it.
To expand on what your queueing options are:
Unless you can design the problem of hitting this rate limit out of existence as #Hammerbot walks through, I would go with some implementation of queue. The solution can scale in complexity and robustness according to what loads you're facing and how many rate limited APIs you're dealing with.
Recommended
You use some library to take care of this for you. Node-rate-limiter looks promising. It still appears you would have to worry about how you handle your user interaction (make them wait, write to a db/cache-service and notify them later).
"Simplest case" - not recommended
You can implement a minimally functioning queue and back it with a database or cache. I've done this before and it was fine, initially. Just remember you'll run into needing to implement your own retry logic, will have to worry about things like queue starvation **. Basically, the caveats of rolling your own < insert thing whose implementation someone already worried about > should be taken into consideration.
**(e.g. your calls keep failing for some reason and all of a sudden your background process is endlessly retrying large numbers of failing queue work elements and your app runs out of memory).
Complex case:
You have a bunch of API calls that all get rate-limited and those calls are all made at volumes that make you start considering decoupling your architecture so that your user-facing app doesn't have to worry about handling this asynchronous background processing.
High-level architecture:
Your user-facing server pushes work units of different type onto different queues. Each of these queues corresponds to a differently rate-limited processing (e.g. 10 queries per hour, 1000 queries per day). You then have a "rate-limit service" that acts as a gate to consuming work units off the different queues. Horizontally distributed workers then only consume items from the queues if and only if the rate limit service says they can. The results of these workers could then be written to a database and you could have some background process to then notify your users of the result of the asynchronous work you had to perform.
Of course, in this case you're wading into a whole world of infrastructure concerns.
For further reading, you could use Lyft's rate-limiting service (which I think implements the token bucket algorithm to handle rate limiting). You could use Amazon's simple queueing service for the queues and Amazon lambda as the queue consumers.
There are two reasons why rate limits may cause you problems.
Chronic: (that is, a sustained situation). You are hitting rate limits because your sustained demand exceeds your allowance.
In this case, consider a local cache, so you don't ask for the same thing twice. Hopefully the API you are using has a reliable "last-modified" date so you can detect when your cache is stale. With this approach, your API calling is to refresh your cache, and you serve requests from your cache.
If that can't help, you need higher rate limits
Acute: your application makes bursts of calls that exceed the rate limit, but on average your demand is under the limit. So you have a short term problem. I have settled on a brute-force solution for this ("shoot first, ask permission later"). I burst until I hit the rate limit, then I use retry logic, which is easy as my preferred tool is python, which supports this easily. The returned error is trapped and retry handling takes over. I think every mature library would have something like this.
https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html
The default retry logic is to backoff in increasingly big steps of time.
This has a starvation risk, I think. That is, if there are multiple clients using the same API, they share the same rate limit as a pool. On your nth retry, your backoff may be so long that newer clients with shorter backoff times are stealing your slots ... by the time your long backoff time expires, the rate limit has already been consumed by a younger competitor, so you now retry even longer, making the problem worse,although at the limit, this just means the same as the chronic situation: the real problem is your total rate limit is insufficient, but you might not be sharing fairly among jobs due to starvation. An improvement is to provide a less naive algorithm, it's the same locking problem that you do in computer science (introducing randomisation is a big improvement). Once again, a mature library is aware of this and should help with built-in retry options.
I think that this depends on which API you want to call and for what data.
For example, Facebook limits their API call to 200 requests per hour and per user. So if your app grows, and you are using their OAuth implementation correctly, you shouldn't be limited here.
Now, what data do you need? Do you really need to make all these calls? Is the information you call somewhat storable on any of your server?
Let's imagine that you need to display an Instagram feed on a website. So at each visitor request, you reach Instagram to get the pictures you need. And when your app grows, you reach the API limit because you have more visitors than what the Instagram API allows. In this case, you should definitely store the data on your server once per hour, and let your users reach your database rather than Instagram's one.
Now let's say that you need specific information for every user at each request. Isn't it possible to let that user handle his connection to the API's? Either by implementing the OAuth 2 flow of the API or by asking the user their API informations (not very secure I think...)?
Finally, if you really can't change the way you are working now, I don't see any other options that the ones you listed here.
EDIT: And Finally, as #Eric Stein stated in his comment, some APIs allow you to rise your API limit by paying (a lot of SaaS do that), so if your app grows, you should afford to pay for those services (they are delivering value to you, it's fair to pay them back)
I'm working on a node.js web app that's mainly an image based bulletin board. I read somewhere that we always need to cache since it helps improve performance, so normally I added my own caching which basically works like this: when a user visits the bulletin board the number of uploaded posts is cached, when visited again a comparison is made between the cached number and the the available number of posts, if it's the same then the cached query kicks in if not the new posts gets uploaded. Now this isn't a matter of code or anything. My code works perfectly and it does what I want it to do even if I might've explained it wrong or gave a wrong impression somehow. Problem is I read after doing all the work that caching frequently updated data is wrong and this is a bulletin board I'm talking about where users always upload images. So is it best to remove or keep the cache?
"There are two hard things in computer science: cache invalidation,
naming things, and off-by-one errors."
— Phil Karlton
Caching depends heavily on the kind of application. Just because you've heard "caching is always better" doesn't mean you should. A good rule of thumb:
Lean toward caching the data that is read-heavy but seldom updated.
Have a strategy for cache invalidation on write-heavy operations.
If you believe you will be doing crazy amounts of read-writes against the same entity, consider microcaching if you're comfortable with a bit of stale data for, say, less than a minute. Just set an expiry on your cache on a minute.
If you feel like you will be benefit from serving from the cache (and, by the way, what kind of cache are you talking about? In-memory? HTTP caching? Some kind of backend, such as Redis/Memcached?), have a plan to always check cache, and if you can't find it there, serve from your data store and then cache. Then, on POST/PATCH/DELETE calls, invalidate by that key.
In the case of your bulletin board, you might want to consider invalidating cache any time someone posts or uploads anything, then cache the subsequent call. But, as I mentioned, be sure to have a strategy for invalidating.
Adding caching makes your application more complex. You shouldn't add it just because some random people on the internet say you should. There are tons of situations were caching makes things better and there are other that things are made much worse.
Caching is a solution to a problem. So what problem are you seeing? Slow client experience because of repeated downloads? Is your server load high? Bandwidth high? Then perhaps you should add in some kind of caching.
Too often people hear of solutions and work towards a problem. Identify your problems and then look into solutions.
We are currently in the process of organising a student conference.
The issue is that we offer several different events at the same time over the course of a week. The conference runs the whole day.
It's currently been operating on a first come, first served basis, however this has led to dramatic problems in the past, namely the server crashing almost immediately, as 1000+ students all try to get the best events as quickly as they can.
Is anyone aware of a way to best handle this so that each user has the best chance of enrolling in the events they wish to attend, firstly without the server crashing and secondly with people registering for events which have a maximum capacity, all within a few minutes? Perhaps somehow staggering the registration process or something similar?
I'm aware this is a very broad question, however I'm not sure where to look to when trying to solve this problem...
Broad questions have equally broad answers. There are broadly two ways to handle it
Write more performant code so that a single server can handle the load.
Optimize backend code; cache data; minimize DB queries; optimize DB queries; optimize third party calls; consider storing intermediate things in memory; make judicious use of transactions trading off consistency with performance if possible; partition DB.
Horizontally scale - deploy multiple servers. Put a load balancer in front of your multiple front end servers. Horizontally scale DB by introducing multiple read slaves.
There are no quick fixes. It all starts with analysis first - which parts of your code are taking most time and most resources and then systematically attacking them.
Some quick fixes are possible; e.g. search results may be cached. The cache might be stale; so there would be situations where the search page shows that there are seats available, when in reality the event is full. You handle such cases on registration time. For caching web pages use a caching proxy.
Say I have a bunch of webservers each serving 100's of requests/s, and I want to see real time stats like:
Request rate over last 5s, 60s, 5 min etc
Number of unique users seen again per time window
Or in general for a bunch of timestamped events, I want to see real-time derived statistics - what's the best way to go about it?
I've considered having each GET request update a global counter somewhere, then sampling that at various intervals, but at the event rates I'm seeing it's hard to get a distributed counter that's fast enough.
Any ideas welcome!
Added: Servers are Linux running Apache/mod_wsgi, with a Python (Django) stack.
Added: To give a sense of the event rates I want to track stats for, they're coming in at over 10K events/s. Even incrementing a distributed counter at that rate is a challenge.
You might like to help us try out the beta of our agent for application performance monitoring in Python web applications.
http://newrelic.com
It delves more into the application performance rather than just the web server, but since any bottlenecks aren't generate going to be the web server, but your application then that is going to be more useful anyway.
Disclaimer. I work for New Relic and this is the project I am working on. It is a paid product, but the beta means it is free for now with all features. Later when that changes, if you didn't want to pay for it, their is still a Lite subscription level which is free and which gives you basic web metrics reporting which still covers some of what you are after. Anyway, right now would be a great opportunity to make use of it to debug your performance while you can.
Virtually all good servers provide this kind of functionality out of the box. For example, Apache has the mod_status module and Glassfish supports JMX. Furthermore, there are many commercial packages for monitoring clusters, such as Hyperic and Zenoss.
What web or application server are you using? It is difficult to provide a solution without that information.
Look at using WebSockets, their overhead is much smaller than a HTTP request, they are very well suited to real-time web applications. See: http://nodeknockout.com/ for Node based websocket examples.
http://en.wikipedia.org/wiki/WebSocket
You will need to run a daemon if you want to run it on your apache server.
Also take a look at:
http://kaazing.com/ if you wan't less hassle, but are willing to fork out some cash.
On the Windows side, Perfmonance monitor is the tool you should investigate.
As Jared O'Connor said, you should precise what kind of web server you want to monitor.