Cleaning up mongodb after integration tests in node

Cleaning up mongodb after integration tests in node - node.js

I have an api written in node with a mongodb back end.
I'm using supertest to automate testing of an api. Of course this results in a lot of changes to database and I like to get some input on options for managing this. The goal is for each test to have no permanent impact on the database. The database should look exactly the same after the test is finished as it did before the test ran.
In my case, I don't want the database to be dropped or fully emptied out between tests. I need some real data maintained in the database at all times. I just want the changes by the tests themselves to be reverted.
With a relational database, I would put a transaction around each unit tests and roll it back after the test was done (pass or fail). As far as I know, this is not an option with mongo.
Some options I have considered:
Fake databases
I've heard of in-memory databases like fongo (which is a Java thing) and tingodb. I haven't used these but the issue with this type of solution is always that it requires a good parity with the actual product to maintain itself as a viable option. As soon as I used a mongo feature that the fake doesn't support I'll have a problem unit testing.
Manual cleanup
There is always the option of just having a routine that finds all the data added by the test (marked in some way) and removes it. You'd have to be careful about updates and deletes here. Also there is likely a lot of upkeep making sure the cleanup routine accurately cleans things up.
Database copying
If it were fast enough, maybe having a baseline test database and making a copy of it before each test could work. It'd have to be pretty fast though.
So how do people generally handle this?

I think this is a brand new way in testing without transaction.
imho - using mongo >=3.2, we can setup inMemory storage engine, which is perfect for this kind of scenario.
Start mongo with inMemory
restore database
create a working copy for test
perform a test on working copy
drop working copy
if more tests GOTO 3

Related

Testing programs written with libpq / libpqxx

I'm working on a program written with libpqxx. It has extensive unit tests except for the libpqxx queries, which is becoming problematic as more and more logic gets pushed to SQL.
The only way I've found to add tests that cover this part is to run them against a test database whose data is constantly setup up for tests and then removed after. The downside of that is that it is reasonably heavy, requiring a container or vm with full postgresql instance and harnasses to bring up and tear down tests.
This seems like it must be a solved problem, but I've not found anything I can just copy and use. It also means our developers have to wait longer for test results, since the tests are heavier, though perhaps there's no way around that.
Is there a standard solution to this problem? My friends who write web frameworks test their database code so easily that I'm hesitant to believe the problem is really roll-your-own here.

Populate TingoDB with data for acceptance test

I have NodeJS app that uses MongoDB as database. I'm using native mongo driver (not mongoess).
The application allow users to work on projects and share them and the logic that decide which projects a user is allowed to see is built as mongo criteria selector.
In order to test that I've found TingoDB which looks like a great candidate for mocking the MongoDB to be able to run the real model code and check that it is working.
My question is what is the best way to load the initial data? keep it in separate file? Keep it as another model?
Thank you,
Ido.

TingoDB actually stores it's data in flat-files, so if you want, you could just keep a copy of the database in a directory and load that.
However, if you're just testing with a small amount of data, you'd probably be better off keeping the test-data as in your testing scripts, and inserting it through your application as part of the test. That way, you can easily compare the data in the application to the data you loaded in your assertions.
Finally, if you're running MongoDB in production, then you should probably use MongoDB in your tests. While they do have nearly identical APIs, they have very different performance, which should be something you're keeping track of in testing. Unless there's a need to use TingoDB during testing, I'd try to make it as similar to the production environment as possible.

Unit testing queries with MongoDB

I'm currently building a rest api and I'm struggling to find the best way to unit test each route.
A route handler performs various things and one of them is to execute a query to mongodb. I can unit test the route handler by using stubs, but if I'm testing the query I cannot stub the query itself, I need to have an in-memory mongodb that I could reset and insert new data for each test.
How do you test queries? I'm thinking that the only real way to ensure that the query does what I need is to use a real mongodb database installed in the testing machine (typically in the same machine used for developing).

yes, just as for relation databases, you need to have real base. if mongo offers in-memory auto-created version then it's easy. if not, then each developer has to have running mongo before he runs integration tests. for CI you can have one single dedicated mongo but than you have to prevent concurrent access (schema creation, multiple transactions etc). you should also implement automatic creation of schema if needed and empty database before each test. in relational db rollback is usually enough. when it's not enough then trimming all tables helps. although we had to implement it manually as we couldn't find any existing tools

DB migration: Single creation script vs change sets

I am creating a DB schema per customer. So whenever a new customer registers I need to quickly create their schema in runtime.
Option 1
In runtime, use Liquibase (or equivalent) to run all the changesets to generate the latest schema.
Cons:
This is slow, there can be multiple historical change setsa which are not relevant now any more (create table and year later drop it).
Liquibase is used here in runtime and not just "migration time". Not sure if this is a good idea.
Standartizing on Liquibase as a mean to create schema will force all developers to use it during development. We try to avoid loading more tools on the developers.
Option 2
After each build we generate a temporary DB using Liquibase changesets. Then from the DB we create a clean schema creation script based on the current snapshot. Then when a new customer comes we just run the clean script, not the full change set history.
Cons:
Next time I run liquibase it will try to run from changeset 1. A workaround might be to include in the generation script the creation of the changeset table and inserting to it the latest changeset.
New schemas are created using one script, while old schemas go through the changeset process. In theory this might cause a different schema. However, the single script went through the changeset process as well so I can't think of exact case that will cause an error, this is a theoretical problem for now.
What do you think?

I would suggest option #1 for the consistency.
Database updates can be complex and the less chance for variation the better. That means you should have your developers create the liquibase changeSets initially to update their databases as they are implementing new features to know they are running as they expect and then know that those same steps will be ran in QA all the way through production. It is an extra tool they need to deal with, but it should be easy to integrate into their standard workflow in a way that is easy for them to use.
Similarly, I usually recommend leaving non-relevant historical changeSets in your changeLog because if you remove them you are deviating from your known-good update path. Databases are fast with most operations, especially on a system with little to no data. If you have specific changeSets that are no longer needed and are excessively expensive for one reason or another you can remove them on a case by case basis, but I would suggest doing that very rarely.
You are right that creating a database snapshot from a liquibase script should be identical to running the changeLog--as long as you include the databasechangelog table in the snapshot. However, sticking with an actual Liquibase update all the way through to production will allow you to use features such as contexts, preconditions and changelog parameters that may be helpful in your case as well.

There are two approaches for database deployment:
Build once deploy many – this approach uses the same principle as the native code, compile once and copy the binaries across the environments. From database point of view this approach means that the deploy script is generated once and then executed across environments.
Build & Deploy on demand – this approach generates the delta script when needed in order to handle any out of process changes.
If you use the Build & Deploy on demand approach you can generate the delta script for the entire schema or work-item / changeset.

Azure Web Site Migrations & Concurrency

I have two Azure Websites set up - one that serves the client application with no database, another with a database and WebApi solution that the client gets data from.
I'm about to add a new table to the database and populate it with data using a temporary Seed method that I only plan on running once. I'm not sure what the best way to go about it is though.
Right now I have the database initializer set to MigrateDatabaseToLatestVersion and I've tested this update locally several times. Everything seems good to go but the update / seed method takes about 6 minutes to run. I have some questions about concurrency while migrating:
What happens when someone performs CRUD operations against the database while business logic and tables are being updated in this 6-minute window? I mean - the time between when I hit "publish" from VS, and when the new bits are actually deployed. What if the seed method modifies every entry in another table, and a user adds some data mid-seed that doesn't get hit by this critical update? Should I lock the site while doing it just in case (far from ideal...)?
Any general guidance on this process would be fantastic.

Operations like creating a new table or adding new columns should have only minimal impact on the performance and be transparent, especially if the application applies the recommended pattern of dealing with transient faults (for instance by leveraging the Enterprise Library).
Mass updates or reindexing could cause contention and affect the application's performance or even cause errors. Depending on the case, transient fault handling could work around that as well.
Concurrent modifications to data that is being upgraded could cause problems that would be more difficult to deal with. These are some possible approaches:
Maintenance window
The most simple and safe approach is to take the application offline, backup the database, upgrade the database, update the application, test and bring the application back online.
Read-only mode
This approach avoids making the application completely unavailable, by keeping it online but disabling any feature that changes the database. The users can still query and view data while the application is updated.
Staged upgrade
This approach is based on carefully planned sequences of changes to the database structure and data and to the application code so that at any given stage the application version that is online is compatible with the current database structure.
For example, let's suppose we need to introduce a "date of last purchase" field to a customer record. This sequence could be used:
Add the new field to the customer record in the database (without updating the application). Set the new field default value as NULL.
Update the application so that for each new sale, the date of last purchase field is updated. For old sales the field is left unchanged, and the application at this point does not query or show the new field.
Execute a batch job on the database to update this field for all customers where it is still NULL. A delay could be introduced between updates so that the system is not overloaded.
Update the application to start querying and showing the new information.
There are several variations of this approach, such as the concept of "expansion scripts" and "contraction scripts" described in Zero-Downtime Database Deployment. This could be used along with feature toggles to change the application's behavior dinamically as the upgrade stages are executed.
New columns could be added to records to indicate that they have been converted. The application logic could be adapted to deal with records in the old version and in the new version concurrently.
The Entity Framework may impose some additional limitations in the options, because it generates the SQL statements on behalf of the application, so you would have to take that into consideration when planning the stages.
Staging environment
Changing the production database structure and executing mass data changes is risky business, especially when it must be done in a specific sequence while data is being entered and changed by users. Your options to revert mistakes can be severely limited.
It would be necessary to do extensive testing and simulation in a separate staging environment before executing the upgrade procedures on the production environment.

I agree with the maintenance window idea from Fernando. But here is the approach I would take given your question.
Make sure your database is backed up before doing anything (I am assuming its SQL Azure)
Put up a maintenance page on the Client Application
Run the migration via Visual Studio to your database(I am assuming you are doing this through the console) or a unit test
Publish the website/web api websites
Verify your changes.
The main thing is working with the seed method via Entity Framework is that its easy to get it wrong and without a proper backup while running against Prod you could get yourself in trouble real fast. I would probably run it through your test database/environment first (if you have one) to verify what you want is happening.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string