Screwturn Wiki Search Latency - search

Does anyone know of ways to optimize the screwturn search functionality?
We've been using it for internal documentation, and I'm the tech expert on it, but I have not had the opportunity to analyze it much. After inputting a lot of information into it, we've noticed that text searches take a noticeable delay; on the order of up to 10 seconds in some cases. I'm pasting a screengrab of the search index status here. We have a good 30 different namespaces, which I suspect is more than we really need, but the decision was made to use them for organizational purposes, and I couldn't think of a reason why not. Is it possible the high number of namespaces impacts the search time?
When doing tests on the search, the only resource spike I could find was a big CPU usage spike on the webserver.

If you profile the SQL for a single search from your original question, you will likely notice that the web app is very chatty with the database. Having a larger number of namespaces to search will have an impact on search performance due to the way the search is performed (by namespace, from what I have seen, seems very inefficient). After reviewing the code a bit, I can see where you might see a spike on the web server. The best bet would be to refactor the search function to work better with a larger # of namespaces.

Related

Maximum/Optimal number of Cucumber test scenarios in one file

I have a Cucumber feature file with over 66 scenarios! The title of the feature file does represent what the scenarios are all about.
But 66 (200 steps) feels like quite a large number. Does this suggest that my feature title is too broad?
What is the maximum number of scenarios one should have in a single feature file (from a best practice point of view)?
Thanks in advance :)
Although I don't know your system and feature file, I can surely say that there is a misunderstanding of scenarios and their purpose.
The purpose of scenarios is to bring a clarification for the feature by examples. Usually, people tend to write scenarios to cover all use cases. If you do scenarios that way, the feature loses the ability to be human-readable.
Keep in mind that acceptance tests are expensive to write and expensive to change. Write the minimum scenarios. If there is a scenario that doesn't bring any additional value for the understanding of the feature, then that scenario shouldn't be there. Move all use cases into a lower level of testing - unit tests.
In most cases, the feature has the number of scenarios in units, or tens if it's a complex feature.
Edit: If the number of scenarios would go close to 10, I would rather split the feature file into more files describing deeper part of the feature.
Yes, 200 is an unusually large number of scenarios for a single file. It is likely to be hard to find a particular scenario in the file or to keep it organized. (Multiple smaller files are easier to organize; a directory of files is easier for people to understand and maintain than a long file with comments or worse yet some uncommented ordering scheme.) It will also take a long time to run the file, which will make development difficult.
More importantly, 200 scenarios for a single feature might mean that the feature is extremely complex or that it is very broad. In either case it can probably be broken up into multiple smaller feature files. It also might mean that there are too many scenarios. There might be a scenario for every value of some variable (it might be sufficient to write a single scenario and not worry about different values) or a scenario for every detail of every feature (it might be better to write unit tests, which are smaller and more focused and faster, for details).
But, as with any software metric about the size of a piece of code, there might be a typical size, but every problem is different. Your feature might really be that complex. We can't say without understanding the domain and seeing the feature file.

Node server: code weight and server performance

I would like to know how important could be the impact between using a 15.5k library just for doing very simple validations, and, using my own 1k super-simple validation class, in the time when I'll have more than 10k users on my system (Node + Mongo running on a super pentium 8 core 32gb ram).
Is it worst to care about this 14.5k of code?
I cant find any clue in my so bleak but always wondering mind.
I'll apretiate very much your opinion.
A nice thing about server development is that you usually have significant RAM available and the code is generally loaded just once at the startup of the server so load time is not part of the user experience.
Because of these, you'd be hard pressed to even measure a meaningful impact between a 1k library and a 15k library. You might care about 15k of memory usage per active user, but you will not care about an extra 15k of code loaded into memory one time and it will not affect your server performance in any way.
So, I'd say you should not worry about skrimping on code size (within reason). Instead, pick the tool that best solves your problem, makes your development quickest and the most reliable. And, where possible, use what has been built and tested before rather than build your own from scratch. That will leave you more development time to spend on the things that really matter to making your website better, different or great. Or, it will get you to market faster.
For reference, 15k is 0.000045% of your total computer's RAM.
I agree with #jfriend00. There's almost no impact on memory/performance for the code sizes you describe. You can always benchmark different modules according to your usage profile and choose by yourself. However, I think you should ask yourself some other (similar) questions -
Why the package I use is so 'big'? maybe there's a much 'smaller' one
that does the same job with the same performance. When I say big or small here I mean in terms of functionality. Most of the times you'd want to go with minimum functionality, even if its size might seem big. If you use a validation module that also validates emails, but you don't need it - doesn't mean that you shouldn't use it, just know the tradeoffs - it might get updated more frequently because bugs in the email validation that might cause other bugs in integer validations that you use, you have more code to read if you want your production code to feel more safe (explained bellow).
Does the package function as I expect? (read the tests)
Is the package I use "secured"/"OK for production"? Read the code of the packages you use, make sure there isn't something fishy going on - usually node packages are not that big because most are minimal (I never used it, but I know https://requiresafe.com/ exists for these types of questions - you might want to check it out). Note that if they are larger in size that might mean you would have to read more code.
Ask these questions (and others of you feel you should) recursively on the package' dependencies.

What are the disadvantage of having a lot of dynamic URLs in Nodejs-ExpressJs server

Currently I am working with ExpressJs and NodeJs. My question is, If I have a lot of dynamically registered URLs in server (using app.get("/xyz", page.xyz)), what are the issues associated with it? Will it affect performance or memory usages of server?
Regards,
Harikrishnan
Disadvantages:
you're probably repeating a lot of code
manageability of your methods
improper design patterns
larger code base affects performance
easier to performance measure
more manageable in a team setting
easier/more manageable to add features
Advantages:
explicit
potentially less functions utilized (i.e. per call)
more small data passed on the wire
easier to debug
easier to read by you
There shouldn't be an outright need for many endpoints, however, that's kind of your decision. It boils down to whether or not the app works, meeting deadlines, and of course you can performance test to see whether or not areas in your app could improve with design patterns or different data structures. This is a tough question to answer, however, I'd suggest you look at the process as a learning opportunity with the expectation to improve on your following iteration or next app. Good luck!

Finding Vulnerabilities in Software

I'm insterested to know the techniques that where used to discover vulnerabilities. I know the theory about buffer overflows, format string exploits, ecc, I also wrote some of them. But I still don't realize how to find a vulnerability in an efficient way.
I don't looking for a magic wand, I'm only looking for the most common techniques about it, I think that looking the whole source is an epic work for some project admitting that you have access to the source. Trying to fuzz on the input manually isn't so comfortable too. So I'm wondering about some tool that helps.
E.g.
I'm not realizing how the dev team can find vulnerabilities to jailbreak iPhones so fast.
They don't have source code, they can't execute programs and since there is a small number of default
programs, I don't expect a large numbers of security holes. So how to find this kind of vulnerability
so quickly?
Thank you in advance.
On the lower layers, manually examining memory can be very revealing. You can certainly view memory with a tool like Visual Studio, and I would imagine that someone has even written a tool to crudely reconstruct an application based on the instructions it executes and the data structures it places into memory.
On the web, I have found many sequence-related exploits by simply reversing the order in which an operation occurs (for example, an online transaction). Because the server is stateful but the client is stateless, you can rapidly exploit a poorly-designed process by emulating a different sequence.
As to the speed of discovery: I think quantity often trumps brilliance...put a piece of software, even a good one, in the hands of a million bored/curious/motivated people, and vulnerabilities are bound to be discovered. There is a tremendous rush to get products out the door.
There is no efficient way to do this, as firms spend a good deal of money to produce and maintain secure software. Ideally, their work in securing software does not start with a looking for vulnerabilities in the finished product; so many vulns have already been eradicated when the software is out.
Back to your question: it will depend on what you have (working binaries, complete/partial source code, etc). On the other hand, it is not finding ANY vulnerability but those that count (e.g., those that the client of the audit, or the software owner). Right?
This will help you understand the inputs and functions you need to worry about. Once you localized these, you may already have a feeling of the software's quality: if it isn't very good, then probably fuzzing will find you some bugs. Else, you need to start understanding these functions and how the input is used within the code to understand whether the code can be subverted in any way.
Some experience will help you weight how much effort to put at each task and when to push further. For example, if you see some bad practices being used, then delve deeper. If you see crypto being implemented from scratch, delve deeper. Etc
Aside from buffer overflow and format string exploits, you may want to read a bit on code injection. (a lot of what you'll come across will be web/DB related, but dig deeper) AFAIK this was a huge force in jailbreaking the iThingies. Saurik's mobile substrate allow(s) (-ed?) you to load 3rd party .dylibs, and call any code contained in those.

How to keep track of performance testing

I'm currently doing performance and load testing of a complex many-tier system investigating the effect of different changes, but I'm having problems keeping track of everything:
There are many copies of different assemblies
Orignally released assemblies
Officially released hotfixes
Assemblies that I've built containing further additional fixes
Assemblies that I've build containing additional diagnostic logging or tracing
There are many database patches, some of the above assemblies depend on certain database patches being applied
Many different logging levels exist, in different tiers (Application logging, Application performance statistics, SQL server profiling)
There are many different scenarios, sometimes it is useful to test only 1 scenario, other times I need to test combinations of different scenarios.
Load may be split across multiple machines or only a single machine
The data present in the database can change, for example some tests might be done with generated data, and then later with data taken from a live system.
There is a massive amount of potential performance data to be collected after each test, for example:
Many different types of application specific logging
SQL Profiler traces
Event logs
DMVs
Perfmon counters
The database(s) are several Gb in size so where I would have used backups to revert to a previous state I tend to apply changes to whatever database is present after the last test, causing me to quickly loose track of things.
I collect as much information as I can about each test I do (the scenario tested, which patches are applied what data is in the database), but I still find myself having to repeat tests because of inconsistent results. For example I just did a test which I believed to be an exact duplicate of a test I ran a few months ago, however with updated data in the database. I know for a fact that the new data should cause a performance degregation, however the results show the opposite!
At the same time I find myself sepdning disproportionate amounts of time recording these all these details.
One thing I considered was using scripting to automate the collection of performance data etc..., but I wasnt sure this was such a good idea - not only is it time spent developing scripts instead of testing, but bugs in my scripts could cause me to loose track of things even quicker.
I'm after some advice / hints on how better to manage the test environment, in particular how to strike a balance between collecting everything and actually getting some testing done at the risk of missing something important?
Scripting the collection of the test parameters + environment is a very good idea to check out. If you're testing across several days, and the scripting takes a day, it's time well spent. If after a day you see it won't finish soon, reevaluate and possibly stop pursuing this direction.
But you owe it to yourself to try it.
I would tend to agree with #orip, scripting at least part of your workload is likely to save you time. You might consider taking a moment to ask what tasks are the most time consuming in terms of your labor and how amenable are they to automation? Scripts are especially good at collecting and summarizing data - much better then people, typically. If the performance data requires a lot of interpretation on your part, you may have problems.
An advantage to scripting some of these tasks is that you can then check them in along side the source / patches / branches and you may find you benefit from organizational structure of your systems complexity rather than struggling to chase it as you do now.
If you can get away with testing only against a few set configurations that will keep the admin simple. It may also make it easier to put one on each of several virtual machines which can be quickly redeployed to give clean baselines.
If you genuinely need the complexity you describe I'd recommend building a simple database to allow you to query the multivariate results you have. Having a column for each of the important factors will a allow you to query in for questions like "what testing config had the lowest variance in latency?" and "which test database allowed the raising of most bugs?". I use sqlite3 (probably through the Python wrapper or the Firefox plug-in) for this kind of lightweight collection, because it keeps maintenance overhead relatively low and allows you to avoid perturbing the system under test too far, even if you need to run on the same box.
Scripting the tests will make them quicker to execute and permit results to be gathered in an already-ordered way, but it sounds like your system may be too complex to make this easy to do.

Resources