The Goal: Run our database tests (that use different databases) to run in parallel.
We are using Resharper's Nunit option to integrate our Unit Tests into visual studio. Resharper allows you to set the number of assemblies to run in parallel. The tests never fail when run in serial; however, when we set the number of assemblies to run in parallel to 3 or higher (maybe two, although none have failed yet), some database tests consistently fail.
Our guess, is that data providers are getting switched out from under the tests. We use spring as an IOC and in our tests it also handles the transaction management. Our older tests required seeded data, but our new tests expect an empty database and creates any necessary data for the test. We think that because our bootstrap for the test fixtures is setting the connection string properties of the db provider, when another test runs in parallel, it can change the provider out from under the tests.
Either way, it would be nice to run the database tests in parallel and not lose the transaction management and test cleanup we get with spring.
The bootstrap that runs for each test is setting the connection string (and the db provider connection string).
Any ideas on how to get these tests (with different connection strings) to run in parallel.
Related
I have built an application using Express, Postgres, and Sequelize on Google App Engine and I'm having some trouble running a longer migration. This migration simply dumps the data from one of my large tables into elastic search.
As of right now, I have been running my migrations in the pre-start command as such
npm i && sequelize db:migrate
but I notice that Google App Engine has been running my migration over and over again due to the auto-scaling nature of the instances. Is there a better practice for running migrations? Is there a way to only run this migration once and prevent auto-scaling for just the pre-start command?
First is necessary understand how App Engine handle the scalling types:
Automatic scaling creates instances based on request rate, response latencies, and other application metrics. You can specify thresholds for each of these metrics, as well as a minimum number instances to keep running at all times.
Basic scaling creates instances when your application receives requests. Each instance will be shut down when the application becomes idle. Basic scaling is ideal for work that is intermittent or driven by user activity.
Manual scaling specifies the number of instances that continuously run regardless of the load level. This allows tasks such as complex initializations and applications that rely on the state of the memory over time.
I recommend you to choose the manual scaling in order to set the specific number of instances you need/want, or, if you are going to use automatic scaling just pay attention to the limits (max/min (idle) instances) to set specific limits. However, it is up to you choosing the configuration that best suits your requirements.
Being this said, regardless the scaling method you choose, it seems that your script is being restarted every time your GAE scales, or, that it is the script telling your application to repeat the process over and over. It could be useful if you share details on how you are executing your script and what it does, in order to get a better perspective.
A possible workaround for this task could be port the functionality of the migration script itself into the body of an admin-protected handler in the GAE app which can be triggered with a HTTP request for a particular URL.
I think is possible to split the potentially long-running migration operation into a sequence of smaller operations (using push task queues), much more GAE-friendly.
Also, I suggest you take a look on this thread.
However I understand you want to migrate your data from PostgreSQL to one Elasticsearch table, I found out this tutorial where is recommended create a CSV file from your PostgreSQL database, then you can pass the data from CSV to the Json format, this is because you can use the service Elasticdump format your Json file as Elastic Search document,these steps are on Node JS, therefore you can create a script on App Engine or in Cloud functions depending of data size and execute the import, by example:
# import
node_modules/elasticdump/bin/elasticdump --input=formatted.json --output=http://localhost:9200/
I'm using pytest to run a few thousands of tests against an API.
The need is now to not only use multiprocessing (pytest-xdist) and multithreading (pytest-parallel) but to also have them run on multiple machines (still keeping the multi process and threading capabilities).
This is the current state, the need is to basically duplicate this chart.
https://i.imgur.com/AKj2nmL.jpg
Our last resort will be to develop a test runner service that will be deployed on as many machines as needed, and use an sqs so these machines can pull work from there.
Are there any better way of achieving this? Using pytest or maybe combined with Jenkins.
I'm currently writing a Node library to execute untrusted code within Docker containers. It basically maintains a pool of containers running, and provides an interface to run code in one of them. Once the execution is complete, the corresponding container is destroyed and replaced by a new one.
The four main classes of the library are:
Sandbox. Exposes a constructor with various options including the pool size, and two public methods: executeCode(code, callback) and cleanup(callback)
Job. A class with two attributes, code and callback (to be called when the execution is complete)
PoolManager, used by the Sandbox class to manage the pool of containers. Provides the public methods initialize(size, callback) and executeJob(job, callback). It has internal methods related to the management of the containers (_startContainer, _stopContainer, _registerContainer, etc.). It uses an instance of the dockerode library, passed in the constructor, to do all the docker related stuff per se.
Container. A class with the attributes tmpDir, dockerodeInstance, IP and a public method executeCode(code, callback) which basically sends a HTTP POST request against ContainerIP:3000/compile along with the code to compile (a minimalist API runs inside each Docker container).
In the end, the final users of the library will only be using the Sandbox class.
Now, my question is: how should I test this?
First, it seems pretty clear to my that I should begin by writing functional tests against my Sandbox class:
it should create X containers, where X is the required pool size
it should correctly execute code (including the security aspects: handling timeouts, fork bombs, etc. which are in the library's requirements)
it should correctly cleanup the resources it uses
But then I'm not sure what else it would make sense to test, how to do it, and if the architecture I'm using is suitable to be correctly tested.
Any idea or suggestion related to this is highly appreciated! :) And feel free to ask for a clarification if anything looks unclear.
Christophe
Try and separate your functional and unit testing as much as you can.
If you make a minor change to your constructor on Sandbox then I think testing will become easier. Sandbox should take a PoolManager directly. Then you can mock the PoolManager and test Sandbox in isolation, which it appears is just the creation of Jobs, calling PoolManager for Containers and cleanup. Ok, now Sandbox is unit tested.
PoolManager may be harder to unit test as the Dockerode client might be hard to mock (API is fairly big). Regardless if you mock it or not you'll want to test:
Growing/shrinking the pool size correctly
Testing sending more requests than available containers in the pool
How stuck containers are handled. Both starting and stopping
Handling of network failures (easier when you mock things)
Retries
Any of failure cases you can think of
The Container can be tested by firing up the API from within the tests (in a container or locally). If it's that minimal recreating it should be straightforward. Once you have that it's really just testing an HTTP client it sounds like.
The source code for the actual API within the container can be tested however you like with standard unit tests. Because you're dealing with untrusted code there are a lot of possibilities:
Doesn't compile
Never completes execution
Never starts
All sorts of bombs
Uses all host's disk space
Is a bot and talks over the network
The code could do basically anything. You'll have to pick the things you care about. Try and restrict everything else.
Functional tests are going to be important too, there are a log of pieces to deal with here and mocking Docker isn't going to be easy.
Code isolation is a difficult problem; I wish Docker was around last time I had to deal with it. Just remember that your customers will always do things you didn't expect! Good luck!
Lets say we have 100 functional Coded UI scripts. What kind of infrastructure is needed to run these tests in parallel? I'm afraid it would take down Visual Studio Server.
Because coded UI interacts with a UI, only one instance can run on a machine at a time. If you set up a lab, or VM instances all running parts of your test your limit is # of instances and how much your application under test can handle.
I have a collection of integration tests running with SpringJUnit4ClassRunner. I'm trying to run these in parallel using maven surefire. However, I have noticed that the the code is blocking before entering the synchronized block in CacheAwareContextLoaderDelegate.loadContext().
Is there a way to bypass this cache? I tried doing this, but it seems like there is more shared state than just the cache itself since my application deadlocked inside Spring code. Or could the synchronization be made more fine-grained by somehow synchronizing on the map key rather than the entire map?
My motivation for parallelising tests is twofold:
In some tests, I replace beans with mocks. Since mocks are inherently stateful, I have to build a fresh ApplicationContext for every test method using #DirtiesContext.
In other tests, I only want to deploy a subset of Jersey resources. To do this, I specify a subset of Spring configuration classes. Since Spring uses the MergedContextConfiguration as a key in the context cache, these tests will be unable to share ApplicationContexts.
It is possible that may get a better turn-around time for your test suit if you disable the parallell test execution. In the testing chapter of Spring's reference docs there is a paragraph about Context caching:
Once the TestContext framework loads an ApplicationContext (or WebApplicationContext) for a test, that context will be cached and reused for all subsequent tests that declare the same unique context configuration within the same test suite.
Why is it implemented like this?
This means that the setup cost for loading an application context is incurred only once (per test suite), and subsequent test execution is much faster.
How does the cache work?
The Spring TestContext framework stores application contexts in a static cache. This means that the context is literally stored in a static variable. In other words, if tests execute in separate processes the static cache will be cleared between each test execution, and this will effectively disable the caching mechanism.
To benefit from the caching mechanism, all tests must run within the same process or test suite. This can be achieved by executing all tests as a group within an IDE. Similarly, when executing tests with a build framework such as Ant, Maven, or Gradle it is important to make sure that the build framework does not fork between tests. For example, if the forkMode for the Maven Surefire plug-in is set to always or pertest, the TestContext framework will not be able to cache application contexts between test classes and the build process will run significantly slower as a result.
One easy thing that I could think of is using #DirtiesContext