How to keep track of performance testing - performance-testing

I'm currently doing performance and load testing of a complex many-tier system investigating the effect of different changes, but I'm having problems keeping track of everything:
There are many copies of different assemblies
Orignally released assemblies
Officially released hotfixes
Assemblies that I've built containing further additional fixes
Assemblies that I've build containing additional diagnostic logging or tracing
There are many database patches, some of the above assemblies depend on certain database patches being applied
Many different logging levels exist, in different tiers (Application logging, Application performance statistics, SQL server profiling)
There are many different scenarios, sometimes it is useful to test only 1 scenario, other times I need to test combinations of different scenarios.
Load may be split across multiple machines or only a single machine
The data present in the database can change, for example some tests might be done with generated data, and then later with data taken from a live system.
There is a massive amount of potential performance data to be collected after each test, for example:
Many different types of application specific logging
SQL Profiler traces
Event logs
DMVs
Perfmon counters
The database(s) are several Gb in size so where I would have used backups to revert to a previous state I tend to apply changes to whatever database is present after the last test, causing me to quickly loose track of things.
I collect as much information as I can about each test I do (the scenario tested, which patches are applied what data is in the database), but I still find myself having to repeat tests because of inconsistent results. For example I just did a test which I believed to be an exact duplicate of a test I ran a few months ago, however with updated data in the database. I know for a fact that the new data should cause a performance degregation, however the results show the opposite!
At the same time I find myself sepdning disproportionate amounts of time recording these all these details.
One thing I considered was using scripting to automate the collection of performance data etc..., but I wasnt sure this was such a good idea - not only is it time spent developing scripts instead of testing, but bugs in my scripts could cause me to loose track of things even quicker.
I'm after some advice / hints on how better to manage the test environment, in particular how to strike a balance between collecting everything and actually getting some testing done at the risk of missing something important?

Scripting the collection of the test parameters + environment is a very good idea to check out. If you're testing across several days, and the scripting takes a day, it's time well spent. If after a day you see it won't finish soon, reevaluate and possibly stop pursuing this direction.
But you owe it to yourself to try it.

I would tend to agree with #orip, scripting at least part of your workload is likely to save you time. You might consider taking a moment to ask what tasks are the most time consuming in terms of your labor and how amenable are they to automation? Scripts are especially good at collecting and summarizing data - much better then people, typically. If the performance data requires a lot of interpretation on your part, you may have problems.
An advantage to scripting some of these tasks is that you can then check them in along side the source / patches / branches and you may find you benefit from organizational structure of your systems complexity rather than struggling to chase it as you do now.

If you can get away with testing only against a few set configurations that will keep the admin simple. It may also make it easier to put one on each of several virtual machines which can be quickly redeployed to give clean baselines.
If you genuinely need the complexity you describe I'd recommend building a simple database to allow you to query the multivariate results you have. Having a column for each of the important factors will a allow you to query in for questions like "what testing config had the lowest variance in latency?" and "which test database allowed the raising of most bugs?". I use sqlite3 (probably through the Python wrapper or the Firefox plug-in) for this kind of lightweight collection, because it keeps maintenance overhead relatively low and allows you to avoid perturbing the system under test too far, even if you need to run on the same box.
Scripting the tests will make them quicker to execute and permit results to be gathered in an already-ordered way, but it sounds like your system may be too complex to make this easy to do.

Related

Node server: code weight and server performance

I would like to know how important could be the impact between using a 15.5k library just for doing very simple validations, and, using my own 1k super-simple validation class, in the time when I'll have more than 10k users on my system (Node + Mongo running on a super pentium 8 core 32gb ram).
Is it worst to care about this 14.5k of code?
I cant find any clue in my so bleak but always wondering mind.
I'll apretiate very much your opinion.
A nice thing about server development is that you usually have significant RAM available and the code is generally loaded just once at the startup of the server so load time is not part of the user experience.
Because of these, you'd be hard pressed to even measure a meaningful impact between a 1k library and a 15k library. You might care about 15k of memory usage per active user, but you will not care about an extra 15k of code loaded into memory one time and it will not affect your server performance in any way.
So, I'd say you should not worry about skrimping on code size (within reason). Instead, pick the tool that best solves your problem, makes your development quickest and the most reliable. And, where possible, use what has been built and tested before rather than build your own from scratch. That will leave you more development time to spend on the things that really matter to making your website better, different or great. Or, it will get you to market faster.
For reference, 15k is 0.000045% of your total computer's RAM.
I agree with #jfriend00. There's almost no impact on memory/performance for the code sizes you describe. You can always benchmark different modules according to your usage profile and choose by yourself. However, I think you should ask yourself some other (similar) questions -
Why the package I use is so 'big'? maybe there's a much 'smaller' one
that does the same job with the same performance. When I say big or small here I mean in terms of functionality. Most of the times you'd want to go with minimum functionality, even if its size might seem big. If you use a validation module that also validates emails, but you don't need it - doesn't mean that you shouldn't use it, just know the tradeoffs - it might get updated more frequently because bugs in the email validation that might cause other bugs in integer validations that you use, you have more code to read if you want your production code to feel more safe (explained bellow).
Does the package function as I expect? (read the tests)
Is the package I use "secured"/"OK for production"? Read the code of the packages you use, make sure there isn't something fishy going on - usually node packages are not that big because most are minimal (I never used it, but I know https://requiresafe.com/ exists for these types of questions - you might want to check it out). Note that if they are larger in size that might mean you would have to read more code.
Ask these questions (and others of you feel you should) recursively on the package' dependencies.

Screwturn Wiki Search Latency

Does anyone know of ways to optimize the screwturn search functionality?
We've been using it for internal documentation, and I'm the tech expert on it, but I have not had the opportunity to analyze it much. After inputting a lot of information into it, we've noticed that text searches take a noticeable delay; on the order of up to 10 seconds in some cases. I'm pasting a screengrab of the search index status here. We have a good 30 different namespaces, which I suspect is more than we really need, but the decision was made to use them for organizational purposes, and I couldn't think of a reason why not. Is it possible the high number of namespaces impacts the search time?
When doing tests on the search, the only resource spike I could find was a big CPU usage spike on the webserver.
If you profile the SQL for a single search from your original question, you will likely notice that the web app is very chatty with the database. Having a larger number of namespaces to search will have an impact on search performance due to the way the search is performed (by namespace, from what I have seen, seems very inefficient). After reviewing the code a bit, I can see where you might see a spike on the web server. The best bet would be to refactor the search function to work better with a larger # of namespaces.

What are the advantages of merging multiple .NET assemblies into one?

I've seen discussions of techniques to merging multiple assemblies into one (e.g. ilmerge). I am scratching my head on why does one want to do so. Is there any reason other than the obvious "one file is easier to deploy/track/maintain/reference"?
Some reasons may be:
Small utility programs, for example when you want to just copy/paste the whole program to a server without any folders or anything.
Deployment scenarios where you want to keep minimal amount of files that should be copied, uploaded to FTP etc, for example installers.
NuGet packages where you want to keep the amount of added references to a minimum.
There is a faster startup time associated with decreased I/O on an application's Cold Startup. Best way to see how much improvement time can be saved is to measure as each app is different: number of assemblies, size of assemblies, etc.

Reorganizing a project for expansion/reuse

The scope of the project I'm working on is being expanded. The application is fairly simple but currently targets a very specific niche. For the immediate future I've been asked to fork the project to target a new market and continue developing the two projects in tandem.
Both projects will be functionally similar so there is a very strong incentive to generalize a lot of the guts of the original project. Also I'm certain I'll be targeting more markets in the near future (the markets are geographic).
The problem is a previous maintainers of the project made a lot of assumptions that tie it to its original market. It's going to take quite a bit of refactoring to separate the generic from the market specific code.
To make things more complex several suggestions have been tossed around on how to organize the projects for the growing number of markets:
Each market is a separate project, commonalities between projects are moved to a shared library, projects are deployed independently.
Expand the existing project to target multiple markets, limiting functionality based on purchased license.
Create a parent application and redesign projects as plugins, purchased separately
All three suggestions have merit and ideally I would like to structure the codeto be flexible enough that any of these is possible with minor adjustments. Suggestion 3 appears to be the most daunting as it would require building a plugin architecture. The first two suggestions are a bit more plausible.
Are there any good resources available on the pros and cons of these different architectures?
What are the pros and cons on sharing code between projects verses copying and forking?
Forking is usually going to get you a quicker result initially, but almost always going to come around and bite you in maintenance -- bug fixes and feature enhancements from one fork get lost in the other forks, and eventually you find yourself throwing out whole forks and having to re-add their features to the "best" fork. Avoid it if you can.
Moving on: all three of your options can work, but they have trade-offs in terms of build complexity, cost of maintenance, deployment, communication overhead and the amount of refactoring you need to do.
1. Each market is a separate project
A good solution if you're going to be developing simultaneously for multiple markets.
Pros:
It allows developers for market A to break the A build without interfering with ongoing work on B
It makes it much less likely that a change made for market A will cause a bug for market B
Cons:
You have to take the time to separate out the shared code
You have to take the time to set up parallel builds
Modifications to the shared code now have more overhead since they affect both teams.
2. Expand the existing project to target multiple markets
Can be made to work okay for quite a while. If you're going to be working on releases for one market at a time, with a small team, it might be your best bet.
Pros:
The license work is probably valuable anyway, even if you move toward (1) or (3).
The single code base allows refactoring across all markets.
Cons:
Even if you're just working on something for market A, you have to build and ship the code for markets B, C and D as well -- okay if you have a small code base, but increasingly annoying as you get into thousands of classes
Changes to one market risk breaking the code for other markets
Changes to one market require other markets to be re-tested
3. Create a parent application and redesign projects as plugins
Feels technically sweet, and may allow you to share more code.
Pros:
All the pros of (1), potentially, plus:
clearer separation of shared and market-specific code
may allow you to move toward a public API, which would allow offloading some of your work onto your customers and/or selling lucrative service projects
Cons:
All the cons of (1), plus requires even more refactoring.
I would guess that (2) is sort of where you find yourself now, apart from the licensing. I think it's okay to stay there for a little while, but put some effort into moving toward (1) -- moving the shared code into a separate project even if it's all built together, for instance, trying to make sure the dependencies from market code to shared code are all one-way.
Whether you end up at (1) or (3) kind of depends. Mostly it comes down to who's "in charge" -- the shared code, or the market-specific code? The line between a plugin, and a controller class that configures some shared component, can be pretty blurry. My advice would be, let the code tell you what it needs.
1) NO! You don't want to manage different branches of the same code base... Because as common as the code may be, you will want to make sweeping changes, and one project will "at the moment" not be as important as the others, and then you will get one branch growing faster than the others.... insert snowball.
2) This is more or less the industry standard. Big config file, limit things based on license/configuration. It can make the app a bit cumbersome, but as long as the code complains about mutually exclusive stuff and all the developers are in constant communication about new features and how they ripple throughout the entire application, you should do fine. This also is the easiest to hack, if that is a concern.
3) This also 'can' work. If you are using C#, plugins are relatively simple, you only have to worry about dependency hell. If the plugins have any chance of becoming circularly interdependant (that is, a requires b requires c requires a), then this will quickly explode and you will revert (quite easily) back to #2.
The best resources you have are probably the past experiences of your coworkers on different projects, and the experience of people yammering about it on here or Slashdot or wherever. Certainly the cheapest.
Pros of sharing code:
One change changes everything.
Unified data model.
There is only one truth. (Much easier for everyone to be on the same page)
Cons of sharing code:
One change changes everything.. Be careful.
If one bug is in it, it affects everything.
Pros of copying/forking:
Usually quicker to implement a specific feature for a specific customer.
Faster to hack when you realize that assumption A is only applicable for markets B and C, not D.
Cons of copying/forking:
One or more of the copied projects will eventually fail, due to a lack of cohesion in your code.
As above said: Sweeping changes take a lot longer.
Good luck.
You said "copying and forking" which leads me to think that perhaps you haven't considered managing this "fork" as a branch in a revision control system like SVN. By doing it this way, when you refactor the branch to accomodate a different industry, you can merge those changes back into the main trunk with the aid of the revision control system.
If you are following a long term strategy of moving to a single app where all the variations are controlled by a config file (or an SQLITE config database) then this approach will help you. You don't have to merge anything until you are confident that you have generalised it for both industries, so you can still build two unique systems as long as you need to. But, you aren't backing yourself into a corner because it is all in one source code tree, the trunk for the legacy industry, and one branch for each new industry.
If your company really wants to atack multiple industries, then I don't think that the config database solution will meet all your needs. You will still need to have special code modules of some sort. A plug-in architecture is a good thing to put in because it will help, particularly if you embed a scripting engine like Python into your app. However, I don't think that plugins will be able to meet all your code variation requirements when you get into the "thousands of classes" scale.
You need to take a pragmatic approach that allows you to build a separate app today for the new industry, but makes it relatively easy to merge the improvements into the existing app as you go along. You may never reach the nirvana of a single trunk with thousands of classes and several industries, but you will at least have tamed the complexity, and will only have to deal with really important variations where there is real divergence in the industry need.
If I were in your shoes, I would also be looking at any and all features in the app which might be considered "reporting" and trying to factor them out, maybe even into an off the shelf reporting tool.

How do you evaluate reliability in software?

We are currently setting up the evaluation criteria for a trade study we will be conducting.
One of the criterion we selected is reliability (and/or robustness - are these the same?).
How do you assess that software is reliable without being able to afford much time evaluating it?
Edit: Along the lines of the response given by KenG, to narrow the focus of the question:
You can choose among 50 existing software solutions. You need to assess how reliable they are, without being able to test them (at least initially). What tangible metrics or other can you use to evaluate said reliability?
Reliability and robustness are two different attributes of a sytem:
Reliability
The IEEE defines it as ". . . the
ability of a system or component to
perform its required functions under
stated conditions for a specified
period of time."
Robustness
is robust if it continues to operate despite abnormalities in input, calculations, etc.
So a reliable system performs its functions as it was designed to within constraints; A robust system continues to operate if the unexpected/unanticipated occurs.
If you have access to any history of the software you're evaluating, some idea of reliability can be inferred from reported defects, number of 'patch' releases over time, even churn in the code base.
Does the product have automated test processes? Test coverage can be another indication of confidence.
Some projects using agile methods may not fit these criteria well - frequent releases and a lot of refactoring are expected
Check with current users of the software/product for real world information.
It depends on what type of software you're evaluating. A website's main (and maybe only) criteria for reliability might be its uptime. NASA will have a whole different definition for reliability of its software. Your definition will probably be somewhere in between.
If you don't have a lot of time to evaluate reliability, it is absolutely critical that you automate your measurement process. You can use continuous integration tools to make sure that you only ever have to manually find a bug once.
I recommend that you or someone in your company read Continuous Integration: Improving Software Quality and Reducing Risk. I think it will help lead you to your own definition of software reliability.
Talk to people already using it. You can test yourself for reliability, but it's difficult, expensive, and can be very unreliable depending on what you're testing, especially if you're short on time. Most companies will be willing to put you in contact with current clients if it will help sell you their software and they will be able to give you a real-world idea of how the software handles.
As with anything, if you don't have the time to assess something yourself, then you have to rely on the judgement of others.
Reliability is one of three aspects of somethings' effectiveness.. The other two are Maintainability and Availability...
An interesting paper... http://www.barringer1.com/pdf/ARMandC.pdf discusses this in more detail, but generally,
Reliability is based on the probability that a system will break.. i.e., the more likely it is to break, the less reliable it is... In other systems (other than software) it is often measured in Mean Time Between Failure (MTBF) This is a common metric for things like a hard disk... (10000 hrs MTBF) In software, I guess you could measure it in Mean Time between critical system failures, or between application crashes, or between unrecoverable errors, or between errors of any kind that impede or adversely affect normal system productivity...
Maintainability is a measure of how long/how expensive (how many man-hours and/or other resources) it takes to fix it when it does break. In software, you could add to this concept how long/how expensive it is to enhance or extend the software (if that is an ongoing requirement)
Availability is a combination of the first two, and indicates to a planner, if I had a 100 of these things running for ten years, after figuring the failures and how long each failed unit was unavailable while it was being fixed, repaired, whatever, How many of the 100, on average, would be up and running at any one time? 20% , or 98% ?
Well, the keyword 'reliable' can lead to different answers... When thinking of reliability, I think of two aspects:
always giving the right answer (or the best answer)
always giving the same answer
Either way, I think it boils down to some repeatable tests. If the application in question is not built with a strong suite of unit and acceptance tests, you can still come up with a set of manual or automated tests to perform repeatedly.
The fact that the tests always return the same results will show that aspect #2 is taken care of. For aspect #1 it really is up to the test writers: come up with good tests that would expose bugs or imperfections.
I can't be more specific without knowing what the application is about, sorry. For instance, a messaging system would be reliable if messages were always delivered, never lost, never contain errors, etc etc... a calculator's definition of reliability would be much different.
My advice is to follow SRE methodology around SLI, SLO and SLA, best summarized in free ebooks:
Site Reliability Engineering which provides principal introduction
The Site Reliability Workbook which comes with concrete examples
Looking at the reliability more from tool perspective you need:
monitoring infrastructure (I recommend Prometheus)
alerting (I recommend Prometheus AlertManager, OpsGenie or PagerDuty)
SLO computation tooling for instance slo-exporter
You will have to go into the process by understanding and fully accepting that you will be making a compromise, which could have negative effects if reliability is a key criterion and you don't have (or are unwilling to commit) the resources to appropriately evaluate based on that.
Having said that - determine what the key requirements are that make software reliability critical, then devise tests to evaluate based on those requirements.
Robustness and reliability cross in their relationship to each other, but are not necessarily the same.
If you have a data server that cannot handle more than 10 connections and you expect 100000 connections - it is not robust. It will be unreliable if it dies at > 10 connections. If that same server can handle the number of required connections but intermittently dies, you could say that it is still not robust and not reliable.
My suggestion is that you consult with an experienced QA person who is knowledgeable in the field for the study you will conduct. That person will be able to help you devise tests for key areas -hopefully within your resource constraints. I'd recommend a neutral 3rd party (rather than the software writer or vendor) to help you decide on the key features you'll need to test to make your determination.
If you can't test it, you'll have to rely on the reputation of the developer(s) along with how well they followed the same practices on this application as their other tested apps. Example: Microsoft does not do a very good job with the version 1 of their applications, but 3 & 4 are usually pretty good (Windows ME was version 0.0001).
Depending on the type of service you are evaluating, you might get reliability metrics or SLI - service level indicators - metrics capturing how well the service/product is doing. For example - process 99% of requests under 1sec.
Based on the SLI you might setup service level agreements - a contract between you and the software provider on what SLO (service level objectives) you would like with the consequences of not them not delivering those.

Resources