How do I determine virtual user amount and pacing if the client cannot give me any real data about the website? - performance-testing

I've come across many clients who aren't really able to provide real production data about a website's peak usage. I often do not get peak pageviews per hour, etc.
In these circumstances, besides just guessing or going with what "feels right" (i.e. making it all up), how exactly does one come up with a realistic workload model with an appropriate # of virtual users and a good pacing value?
I use Loadrunner for my performance/load testing.

Ask for the logs for a month.
Find the stats for session duration, then count the number of distinct IP's blocked by session duration.
Once you have the high volume hour, count the number of page instances. Business processes will typically have a termination page which is distinct and allows you to understand how many times a particular action takes place, such as request new password, update profile, business process 1, etc...
With this you will have a measurement of users and actions. You will want your stakeholder to take ownership of this data. As Quality assurance, we should not own both the requirement and the test against it. We should own one, but not both. If your client will not own the requirement, cascading it down to rest of the organization, assume you will be left out in the cold with a result they do not like....i.e., defects that need to be addressed before deployment to production.
Now comes your largest challenge, which is something that needs to be fixed with a process issue by your client.....You are about to test using requirements that no other part of the organization, architecture, development, platform engineering, had when they built the solution. Even if your requirements are a perfect recovery, plus some amount for growth, any defects you find will be challenged aggressively.
Your test will not match any assumptions or requirements used by any other portion of the organization.
And, in a sense, these other orgs will be correct in aggressively challenging your results. It really isn't fair to hold their designed solution to a set of requirements which were not in place when they made decisions which impacted scalability and response times for the system. You would be wise to call this out with your clients before the first execution of any performance test.
You can buy yourself some time. If the client does have a demand for a particular response time, such as an adoption of the Google RAIL model, then you can implement a gate before accepting any code for multi-user performance testing that the code SHALL BE compliant for a single user. It is not going to get any faster for two or more users. Implementing this hard gate will solve about 80% of your performance issues, for the changes required to bring code into compliance for a single user most often will have benefits on the multi-user front.
You can buy yourself some time in a second way as well. Take a look at their current site using tools such as Google Lighthouse and GTMetrix. Most of us are creatures of habit, that includes architect, developers, and ops personnel. We design, build, deploy to patterns we know and are comfortable with....usually the same ones over and over again until we are forced to make a change. It is highly likely that the performance antipatterns pulled from Lighthouse and GTMetrix will be carried forward into a future release unless they are called out for mitigation. Begin citing defects directly off of these tools before you even run a performance test. You will need management support, but you might consider not even accepting a build for multi-user performance testing until GTMetrix scores at least a B across the board and Lighthouse a score of 90 or better.
This should leave edge cases when you do get to multi-user performance testing, such as too early allocation of a resource, holding onto resources too long, too large of a resource allocation, hitting something too often, lock contention on a shared resource. An architectural review might pick up on these, where someone might say, "we are pre-allocating this because.....," or "Marketing says we need to hold the cart for 30 minutes before de-allocation," or "...." Well, you get the idea.
Don't forget to have the database profiler running while functional testing is going on. You are likely to pick up a few missing indexes or high cost queries here which should be addressed before multi-user performance testing as well.
You are probably wondering why am I pointing out all of these things before your performance test takes place. Darnit, you are hired to engage in a performance test! The test you are about to conduct is very high risk politically. Even if it finds something ugly, because the other parts of the organization did not benefit from the requirements, the result is likely to be rejected until the issue shows up in production. By shifting the focus to objective measures even before you need to run two users in anger together there are many avenues to finding and fixing performance issues which are far less politically volatile. Food for thought.

Related

How to deal with api that rate limits requests?

For small app they are no problem.
But for apps with traffic you can hit limits easily.
Http protocol is req-res driven. Just because your backend is stuck with limit, you can't really wait to send respond back until rate limit allows you to resume making your api calls.
What do you do?
I can think of several scenarios:
Wait it out: while it sucks, but sometimes it's easy fix, as you don't need to do anything.
Queue it: this a lot of work oppose to making just api call. This requires that first you store it in database, then have background task go through database and do the task. Also user would be told "it is processing" not "it's done"
Use lot of apis: very hacky... and lot of trouble to manage. Say you are using amazon, now you would have to create, verify, validate like 10 accounts. Not even possible for where you need to verify with say domain name. Since amazon would know account abc already owns it.
To expand on what your queueing options are:
Unless you can design the problem of hitting this rate limit out of existence as #Hammerbot walks through, I would go with some implementation of queue. The solution can scale in complexity and robustness according to what loads you're facing and how many rate limited APIs you're dealing with.
Recommended
You use some library to take care of this for you. Node-rate-limiter looks promising. It still appears you would have to worry about how you handle your user interaction (make them wait, write to a db/cache-service and notify them later).
"Simplest case" - not recommended
You can implement a minimally functioning queue and back it with a database or cache. I've done this before and it was fine, initially. Just remember you'll run into needing to implement your own retry logic, will have to worry about things like queue starvation **. Basically, the caveats of rolling your own < insert thing whose implementation someone already worried about > should be taken into consideration.
**(e.g. your calls keep failing for some reason and all of a sudden your background process is endlessly retrying large numbers of failing queue work elements and your app runs out of memory).
Complex case:
You have a bunch of API calls that all get rate-limited and those calls are all made at volumes that make you start considering decoupling your architecture so that your user-facing app doesn't have to worry about handling this asynchronous background processing.
High-level architecture:
Your user-facing server pushes work units of different type onto different queues. Each of these queues corresponds to a differently rate-limited processing (e.g. 10 queries per hour, 1000 queries per day). You then have a "rate-limit service" that acts as a gate to consuming work units off the different queues. Horizontally distributed workers then only consume items from the queues if and only if the rate limit service says they can. The results of these workers could then be written to a database and you could have some background process to then notify your users of the result of the asynchronous work you had to perform.
Of course, in this case you're wading into a whole world of infrastructure concerns.
For further reading, you could use Lyft's rate-limiting service (which I think implements the token bucket algorithm to handle rate limiting). You could use Amazon's simple queueing service for the queues and Amazon lambda as the queue consumers.
There are two reasons why rate limits may cause you problems.
Chronic: (that is, a sustained situation). You are hitting rate limits because your sustained demand exceeds your allowance.
In this case, consider a local cache, so you don't ask for the same thing twice. Hopefully the API you are using has a reliable "last-modified" date so you can detect when your cache is stale. With this approach, your API calling is to refresh your cache, and you serve requests from your cache.
If that can't help, you need higher rate limits
Acute: your application makes bursts of calls that exceed the rate limit, but on average your demand is under the limit. So you have a short term problem. I have settled on a brute-force solution for this ("shoot first, ask permission later"). I burst until I hit the rate limit, then I use retry logic, which is easy as my preferred tool is python, which supports this easily. The returned error is trapped and retry handling takes over. I think every mature library would have something like this.
https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html
The default retry logic is to backoff in increasingly big steps of time.
This has a starvation risk, I think. That is, if there are multiple clients using the same API, they share the same rate limit as a pool. On your nth retry, your backoff may be so long that newer clients with shorter backoff times are stealing your slots ... by the time your long backoff time expires, the rate limit has already been consumed by a younger competitor, so you now retry even longer, making the problem worse,although at the limit, this just means the same as the chronic situation: the real problem is your total rate limit is insufficient, but you might not be sharing fairly among jobs due to starvation. An improvement is to provide a less naive algorithm, it's the same locking problem that you do in computer science (introducing randomisation is a big improvement). Once again, a mature library is aware of this and should help with built-in retry options.
I think that this depends on which API you want to call and for what data.
For example, Facebook limits their API call to 200 requests per hour and per user. So if your app grows, and you are using their OAuth implementation correctly, you shouldn't be limited here.
Now, what data do you need? Do you really need to make all these calls? Is the information you call somewhat storable on any of your server?
Let's imagine that you need to display an Instagram feed on a website. So at each visitor request, you reach Instagram to get the pictures you need. And when your app grows, you reach the API limit because you have more visitors than what the Instagram API allows. In this case, you should definitely store the data on your server once per hour, and let your users reach your database rather than Instagram's one.
Now let's say that you need specific information for every user at each request. Isn't it possible to let that user handle his connection to the API's? Either by implementing the OAuth 2 flow of the API or by asking the user their API informations (not very secure I think...)?
Finally, if you really can't change the way you are working now, I don't see any other options that the ones you listed here.
EDIT: And Finally, as #Eric Stein stated in his comment, some APIs allow you to rise your API limit by paying (a lot of SaaS do that), so if your app grows, you should afford to pay for those services (they are delivering value to you, it's fair to pay them back)

How to securely distinguish traffic from my app and browser traffic

I'm designing a game that makes queries to a database on the web. The database is fronted by a web service. For example, a request could look like this:
Endpoint: "server.com/user/UID/buygold"
POST:
amount: 100
The web service would make sure that userid has enough funds to purchase 100 gold, then would return a Boolean answer based on the success of the transaction.
However, I want to limit the amount of scripting someone could possibly do to automate gameplay. For example, they could figure out their userid and have automated tasks that buy gold for them while they are at work.
On the web services side, what are some sound security measures that I can put in place to decline all but real app traffic. Is there also a way to trump reverse engineers who will take the app apart and look for keys/certs?
I hope that this is not for a production environment, the security implications alone are mind boggling and certainly go beyond the scope of the allowed response, as it would require a rather lengthy and in depth list of requirements and recommendations, that are really contingent on many other factors that make up your web-service environment. For example, the network topology, authentication, session control and management, and various other variables all play important roles in fostering and implementing a sound cyber security counter measure.
However, assuming that you have all that taken care of, I will answer your main questions as follows:
For question:
"what are some sound security measures that I can put in place to decline all but real app traffic"
Answer:
This is one of many options out there that would address your particular concern, and that is to check for the client User-Agent header in the the request, which may appear something like this:
"Mozilla/5.0 (Macintosh; Intel Mac OS X 32.11; rv:49.0) Gecko/20122101 Firefox/42.0"
Depending on how the script is being run to automate gameplay, if it is in the form of a browser extension, then the User Agent really plays very poorly as a counter measure, if on the other hand, the script is being run directly from the client to your server (web-service), then, you can detect it right away, and there are ways to detect if someone spoofed a User Agent just to bypass this counter measure.
Another counter measure you can utilize is session management at the client level. So, this would require an architectural overview of how you implemented your particular project, but a general summary would follow a pattern like this:
Customer/GamePlayer would naturally be required to login (authentication of some sort)
The client system (which is the User Interface) that the game player is using, will have counter measures implemented in a front-end type of scripting language, i.e. Javascript or any framework that makes use of JS, such as jQuery, DoJo, etc.
Register event handlers that monitor actions, such as type of input, some Boolean logic that will follow something like this:
"if input is not from keyboard or mouse, then send flag with request"
The server/web-service will have logic to handle this request appropriately. This would be a way to catch/detect the game player after they commit the violation, used for legal reasons and such to profile evidence and such. If on the other hand you want to prevent that from happening, then, you could have some Boolean logic that goes something like this: "if input is not from the keyboard or mouse (or whatever permitted input device), then do not allow action (GamePlay), and still report back to web-server".
There are a dozen other ways, but this one seems to generally address your question, provided that you take into consideration that there are hundreds of other factors to think about, from networking level all the way to the application layer, and down to what pattern are you using for your web-service, such as whether it's a REST/API type of environment, does it follow an MVC pattern, and so on. There is no silver lining when it comes to cyber security, it's really a proactive and constant initiative on your end to ensure that all stakeholders' assets are protected, in this case, the asset is the web-service and gameplay, and the threat is the risk of gameplay tampering, that would affect the integrity of your game.
Now, regarding your second question:
" Is there also a way to trump reverse engineers who will take the app apart and look for keys/certs?"
When reverse engineers really put their mind to it, there is nothing you can really do, as whatever counter measure you may implement, they will find a workaround, that's why it's called reverse engineering, they will reverse engineer your "counter measure", so, not to be all cynical about it, you have to accept the reality that there is really no such thing as a "trump" counter measure when it comes to cyber security. You can however employ various mechanisms at both, the network layer and all the way to the application level, combined with proactive initiatives, intrusion detection, abnormal behavioral characteristics of the gameplay pattern, all will mitigate your risk; with all that said, your final frontier will be to ensure you have a good legal policy in place in your TOS (terms of service), and depending where you're hosting your web-service (geographically), you will be protected when users violate such terms, especially when you have verbiage that precludes users from attempting to reverse-engineer, tamper with gameplay scoreboards or currency, and so on.
Another good way, is to really connect with users, users are people, and people sometimes forget that they are also affecting others by their actions, so, once a user is aware of how his/her actions may negatively affect others, such as in the case of a user's actions to increase their 100 Gold, they may financially and emotionally affect others who may have put real time and effort into making this service even possible, so, a simple introductory welcome video upon signing up can do wonders for example; however, sometimes the user may not really know that they were prohibited from using auto-scripts for gameplay, or at least can raise an affirmative defense of that, so, having well published policies can really mitigate and potentially reduce altogether these types of risks. Despite the potentially optimistic outlook on users, you still have to exercise good programming practices and have security counter measures in place though.
I hope that I have given you some insight and direction to assist you with this matter, and that as you can see, it is really an involved and can be a very complicated type of process.
Good luck with your initiatives.

Best way to manage 1000+ users registering for conference events at once

We are currently in the process of organising a student conference.
The issue is that we offer several different events at the same time over the course of a week. The conference runs the whole day.
It's currently been operating on a first come, first served basis, however this has led to dramatic problems in the past, namely the server crashing almost immediately, as 1000+ students all try to get the best events as quickly as they can.
Is anyone aware of a way to best handle this so that each user has the best chance of enrolling in the events they wish to attend, firstly without the server crashing and secondly with people registering for events which have a maximum capacity, all within a few minutes? Perhaps somehow staggering the registration process or something similar?
I'm aware this is a very broad question, however I'm not sure where to look to when trying to solve this problem...
Broad questions have equally broad answers. There are broadly two ways to handle it
Write more performant code so that a single server can handle the load.
Optimize backend code; cache data; minimize DB queries; optimize DB queries; optimize third party calls; consider storing intermediate things in memory; make judicious use of transactions trading off consistency with performance if possible; partition DB.
Horizontally scale - deploy multiple servers. Put a load balancer in front of your multiple front end servers. Horizontally scale DB by introducing multiple read slaves.
There are no quick fixes. It all starts with analysis first - which parts of your code are taking most time and most resources and then systematically attacking them.
Some quick fixes are possible; e.g. search results may be cached. The cache might be stale; so there would be situations where the search page shows that there are seats available, when in reality the event is full. You handle such cases on registration time. For caching web pages use a caching proxy.

Best/common practices for website transfer (what time of day?)

Basically, my question is to get a feel for what time of day you do things like DNS transfers when moving sites across servers, and provide updates to sites. Does anyone do this during the day, or do most developers do this at night?
I think different approaches can be valid, depending on your situation, the actual change to be performed, how often a change happens and what your users expect.
If you do it during the day, you potentially have a longer time to fix issues if there are any, including if they are issues that need to be dealt with your service provider (which might be more responsive during business hours than at night or during week-ends).
If you enjoy working at night or during week-ends and you don't rely on any service provider to solve any issue related to the move/migration you're asking about, doing it at this time might be less disruptive to your users.
But all that also depends on who are your actual users. More and more users are spread throughout the globe and can interact with your sites at any time of day or night.
Common systems administration practice, when dealing with big changes or migrations, usually recommends not doing it just before off-time (be it night, or week-end, or holidays), unless you're sure that you (or the team) is available to work to fix issues. Basically, you need to plan for issues, because they always happen.
Something else to consider is any SLA you, or your company, might have with the users. In that case, you might need to perform your maintenance action outside of business hours, or with provisional notice, or risking any penalty to be applied.
A proper answer to the question you're asking really depends on your situation and the business you're in.

Deploying software on compromised machines

I've been involved in a discussion about how to build internet voting software for a general election. We've reached a general consensus that there exist plenty of secure methods for two way authentication and communication.
However, someone came along and pointed out that in a general election some of the machines being used are almost certainly going to be compromised. To quote:
Let me be an evil electoral fraudster.
I want to sample peoples votes as they
vote and hope I get something
scandalous. I hire a bot-net from some
really shady dudes who control 1000
compromised machines in the UK just
for election day.
I capture the voting habits of 1000
voters on election day. I notice 5 of
them have voted BNP. I look these
users up and check out their machines,
I look through their documents on
their machine and find out their names
and addresses. I find out one of them
is the wife of a tory MP. I leak 'wife
of tory mp is a fascist!' to some
blogger I know. It hits the internet
and goes viral, swings an election.
That's a serious problem!
So, what are the best techniques for running software where user interactions with the software must be kept secret, on a machine which is possibly compromised?
It can't be done. Fortunately, banks face exactly the same problem, so those little home chip'n'pin doohickies are pretty cheap.
So, if you want secure online voting, you send a custom voting doohicky to everyone who applies for one. This doohicky signs and encrypts their vote before sending it to the PC to be transmitted over the wire. The only thing an attacker on the wire can do, is eavesdrop whether or not the voter voted at all. Since political parties already do this, by posting party workers outside polling stations, that's not a significant risk to the system ;-)
You still face some of the problems of postal voting, such as vote buying and coercion, or stealing someone's doohicky, but only via physical access, not by compromising their PC. There's obvious DOS attacks if you rely on home internet connections, but there's no reason the voter can't have the option of going to the polling station if their connection goes down.
Whether the doohicky is cheap enough is still doubtful - I guess they cost a few pounds each, which I don't think is cheap on the scale of what is actually spent on elections. But they're not infeasibly expensive. I doubt they save much money at polling stations, unfortunately. The cost of polling in the UK depends pretty much on the number of polling stations. Problems this time notwithstanding, the number of polling stations isn't driven by the need to provide a fast enough throughput, it's driven by a desire that people not have to travel far to get to them. So having fewer voters doesn't really allow you to reduce the number of polling stations. Reducing paper might save time and money at the count, but surely not enough to pay for doohickies.
Finally of course there's still a risk of attack on the hardware. Someone could maybe intercept them in the post and replace them with identical-looking devices. But unlike attacking the hardware at a polling station, the attacker only affects one vote per piece of dedicated voting hardware compromised, so at least the bar is set high to begin with.
So, what are the best techniques for running software where user interactions with the software must be kept secret, on a machine which is possibly compromised?
The only answer is that you cannot / must not do it. If the hardware or OS might have been compromised you cannot guarantee to keep the user interactions secret.
But the other take on this is that no voting system known to mankind (electronic or otherwise) is incorruptible. That is why you need to have people checking for fraud, and people watching the people, and a culture where corrupt behavior is not the norm.
EDIT
... if one can reduce the impact of compromised machines to below the level of corruption in a paper voting system you're achieving a positive gain.
You also have to take into account other forms of corruption that are much easier with electronic voting from home. Like stand-over tactics, votes for sale, the fact that most people do not properly protect their electronic credentials, etc). In short, what you are proposing is hypothetical, and (IMO) unrealistic.
It is simpler to fix the flaws with in-person, on-paper voting than to address a whole bunch of potentially worse problems with a hypothetical from-home, electronic voting.
(Also, you are implying a level of corruption with UK paper voting that surprises me as an ex UK resident. This is off topic, but can you provide references / links that back this up?)
You have two main choices, either sidestep the comprimized part of the machine (e.g. provide the full OS) or work within the comprimise and make it hard to get hold of the data.
The second choice is more practical. Although you can't stop the shady dudes from eventually getting the data, you can make it difficult enough that it will take longer than a day, rendring the leaked voting habits harmless.
Assuming a web application, not using standard UI components and varying their locations on the screen, using multiple layers of encryption, disabling keyboard input, and using animations to fool screen grabbers can all make the process tricker to buy more time.
Obviously you can not ensure confidentiality of the vote if the machine the vote is entered with is compromised. Whatever measures you take, all an attacker needs to do is to execute your software in a virtual machine that records all access to keyboard, mouse and screen. By playing back the recording, the attacker can see how the user voted ...
However, when designing a E-Voting protocol this is the least of your worries. How do you prevent somebody from hacking the election server and manipulating results? How do you even detect tampering? What about the secrecy of my vote if the server is compromised? Can I be forced to reveal my vote?
The biggest threat facing e-voting is the ability for an attacker to influence the election. By spending CD's to people you make Massive Identity Leaks more valuable. Not only can an attacker destroy their credit, but they can also destroy their country.
Even forcing people to use specific hardware doesn't work. Look at console modding, or ATM Skimmers and Hardware Keyloggers. You have to worry about transferring the votes to be counted, even SSL has secuirty problems. There are also the problem of the centralized database, sql injection would be devastating.
The real question is, "Is e-voting more secure than paper voting?" What is harder for an attacker to influence? To be honest I don't think e-voting machines would have changed the outcome of the recent Iranian election.
An obvious solution is to send the software to the end user on a bootable CD. The user simply restarts their computer and they're now on a non compromised computer.
However, this is not terribly simple to develop (trying to make the OS on the CD compatible with all the variations of hardware we're going to encounter on machines). Also, I can't imagine that the average home user has their BIOS set to "Boot from CD" and telling voters to modify their BIOS settings is just going to far.

Resources