I have an add written in nodejs which uses rethinkdb. At startup time, the app does a bunch of database setup, including creating necessary tables if they do not exist. The code (simplified) looks something like:
r.tableList().run(conn).then(existingTables =>
requiredTables
.filter(t => existingTables.indexOf(t) === -1)
.map(name => r.tableCreate(name).run(conn)));
This works fine. The problem is that the app is running inside a docker container, and I need to be able to scale out using docker-compose scale app=3, for example. When the deploy job runs this, three new containers are immediately created, each of which creates a set of tables resulting in database issues which I need to resolve manually. I think I can understand why this happens, but I can't see how to solve it. I had thought of trying to write it all in a single query, but the real use is a quite a bit more complex (i.e. creates indexes, runs migrations, populates sample data) and I don't think there is any way I can do the lot in a single query.
RethinkDB currently doesn't guarantee that administrative actions are atomic. The best thing to do would probably be to separate out the administrative actions (creating databases, tables, and indexes) and run those in a separate setup step that runs in only one container.
Related
Building my CI tests and would like to get a fresh database on every start. How can I tell arango to reset/clear/clean the database and initialize say a "test" db.
arangodb --starter.local --starter.port=8529 start
There are two ways I usually do something like that:
Run ArangoDB in a docker container. The Arango official image is easy to use and you can create containers that can either keep the data or start empty every time. The official image can be found here
Create a Foxx micro service and populate the setup and a teardown scripts. These scripts run automatically when you install/upgrade/replace the service. The setup could create the necessary tables. The teardown could remove related tables. You can learn more about these life cycle scripts here
I have a Node.js app running on Google App Engine.
I want to run sequelize migrations.
Is it possible to run a command from within the Instance of my node.js app?
Essentially something like heroko's run command which will run a one-off process inside a Heroku dyno.
If this isn't possible what's the best practice in running migrations?
I could always just add it to the gcp-build but this will run on every deploy.
It's not possible to run standalone scripts/apps in GAE, see How do I run custom python script in Google App engine (in python context, but the general idea applies to all runtimes).
The way I ran my (datastore) migrations was to port the functionality of the migration script itself into the body of an admin-protected handler in my GAE app which I triggered with a HTTP request for a particular URL. I re-worked it a bit to split the potentially long-running migration operation into a sequence of smaller operations (using push task queues), much more GAE-friendly. This allowed me to live-test the migration one datastore entity set at a time and only go for multiple sets when completely confident with its operation. Also didn't have to worry about eventual consistency (I was using queries to determine the entities to be migrated) - I just repeatedly invoked the migration until there was nothing left to do.
Once the migration was completed I removed the respective code (but kept the handler itself for future migrations). As a positive side effect I pretty much had the migration history captured in my repository's history itself.
Potentially of interest: Handling Schema Migrations in App Engine
I have my entire stack in a docker compose container setup. Currently, load is such that it can all run on a single instance. I have two separate applications and they both use redis and elastic search.
I have seen people suggesting that in cases like MySQL, proper container theory suggests that you should have two separate containers for two separate databases, if you have two separate applications using them.
Which I think is fine for MySQL because my understanding is that separate instances of MySQL doesnt really add much memory or processor overhead.
I'm wondering if this same strategy should apply to redis and elasticsearch. My understanding is that both of these applications can come with considerable overhead. So it seems like it might be inefficient to run more than one instance of them.
It's an interesting question, but I'm not sure there is an universal answer to this. It mostly depends on your situation.
However, there are advantages and drawbacks you must know if you are using a unique container for multiple applications. As an example, let's say you have only 2 applications containers : A and B, and a shared DB container, whatever the technology behind.
Advantages
resource usage is limited. Nonetheless, as you states in your question, if DB container overhead is not that important, then it's not really an advantage
Drawbacks
If A and B are independent applications, then the main disadvantage by sharing DB is that you break that independency and tightly couple your applications via DB :
you cannot update independently the DB container. Version of DB needs to be aligned for both applications. If A requires a new version of DB (new features needed for example), then DB must be upgraded, potentially breaking B
configuration of DB cannot be different for A and B : if A is issuing more writes than read, and if B is intensively reading data, then you probably won't find a perfect configuration for both usages
crash of DB have impacts on both applications : A could even crash B by crashing DB
security concerns : even if A and B have separate database instances in the DB, A could possibly access B database instance, unless you're setting up different access/roles; it's probably easier here to have one container per application, and don't worry about access if they are on the same network (and if DB cannot be accessed from outside, of course)
you have to put A, B and DB services inside the same docker-compose file
Conclusion
If A and B are already tightly coupled apps, then you can probably go for 1 DB. If you don't have many resources, you can also share DB. But don't forget that by doing this, you couple your apps, which you probably doesn't want. Otherwise, the cleanest solution is to go for 1 DB per application.
The main benefit I see which comes from having all linked services in the docker compose stack is that docker will then ensure that all required services are up. However with services like redis and elastic it is fine to have them installed stand-alone with the application just pointing to them via environment variables passed in the docker compose file.
e.g.
myapp:
image: myawesomerepo/myapp:latest
depends_on:
- someother_svc_in_same_docker_compose
environment:
- DB_HOST=172.17.2.73
- REDIS_HOST=172.17.2.103
- APP_ENV=QA
- APM_ENABLE=false
- APM_URL=http://172.17.2.103:8200
- CC_HOST=cc-3102
volumes:
- /opt/deploy/cc/config:/server/app/config
- /opt/deploy/cc/slogs:/server/app/logs
command: node ./app/scheduler/app.js
In the future if you decide you want to have these services hosted, for example, you just need to point the URL in the right direction.
I am currently changing our database deployment strategy to use FluentMigration and have been reading up on how to run this. Some people have suggested that it can be run from Application_Start, I like this idea but other people are saying no but without specifying reasons so my questions are:
Is it a bad idea to run the database migration on application start and if so why?
We are planning on moving our sites to deploying to azure cloud services and if we don't run the migration from application_start how should/when should we run it considering we want to make the deployment as simple as possible.
Where ever it is run how do we ensure it is running only once as we will have a website and multiple worker roles as well (although we could just ensure the migration code is only called from the website but in the future we may increase to 2 or more instances, will that mean that it could run more than once?)
I would appreciate any insight on how others handle the migration of the database during deployment, particularly from the perspective of deployments to azure cloud services.
EDIT:
Looking at the comment below I can see the potential problems of running during application_start, perhaps the issue is I am trying to solve a problem with the wrong tool, if FluentMigrator isn't the way to go and it may not be in our case as we have a large number of stored procedures, views, etc. so as part of the migration I was going to have to use SQL scripts to keep them at the right version and migrating down I don't think would be possible.
What I liked about the idea of running during Application_Start was that I could build a single deployment package for Azure and just upload it to staging and the database migration would be run and that would be it, rather thank running manual scripts, and then just swap into production.
Running migrations during Application_Start can be a viable approach. Especially during development.
However there are some potential problems:
Application_Start will take longer and FluentMigrator will be run every time the App Pool is recycled. Depending on your IIS configuration this could be several times a day.
if you do this in production, users might be affected i.e. trying to access a table while it is being changed will result in an error.
DBA's don't usually approve.
What happens if the migrations fail on startup? Is your site down then?
My opinion ->
For a site with a decent amount of traffic I would prefer to have a build script and more control over when I change the database schema. For a hobby (or small non-critical project) this approach would be fine.
An alternative approach that I've used in the past is to make your migrations non-breaking - that is you write your migrations in such a way they can be deployed before any code changes and work with the existing code. This way both code and migrations both can be deployed independently 95% of the time. For example instead of changing an existing stored procedure you create a new one or if you want to rename a table column you add a new one.
The benefits of this are:
Your database changes can be applied before any code changes. You're then free to roll back any breaking code changes or breaking migrations.
Breaking migrations won't take the existing site down.
DBAs can run the migrations independently.
I have already spending a lot of time googling for some solution but I'm helpless !
I got an MVC application and I'm trying to do "integration testing" for my Views using Coypu and SpecFlow. But I don't know how I should manage IIS server for this. Is there a way to actually run the server (first start of tests) and making the server use a special "test" DB (for example an in-memory RavenDB) emptied after each scenario (and filled during the background).
Is there a better or simpler way to do this?
I'm fairly new to this too, so take the answers with a pinch of salt, but as noone else has answered...
Is there a way to actually run the server (first start of tests) ...
You could use IIS Express, which can be called via the command line. You can spin up your website before any tests run (which I believe you can do with the [BeforeTestRun] attribute in SpecFlow) with a call via System.Diagnostics.Process.
The actual command line would be something like e.g.
iisexpress.exe /path:c:\iisexpress\<your-site-published-to-filepath> /port:<anyport> /clr:v2.0
... and making the server use a special "test" DB (for example an in-memory RavenDB) emptied after each scenario (and filled during the background).
In order to use a special test DB, I guess it depends how your data access is working. If you can swap in an in-memory DB fairly easily then I guess you could do that. Although my understanding is that integration tests should be as close to production env as possible, so if possible use the same DBMS you're using in production.
What I'm doing is just doing a data restore to my test DB from a known backup of the prod DB, each time before the tests run. I can again call this via command-line/Process before my tests run. For my DB it's a fairly small dataset, and I can restore just the tables relevant to my tests, so this overhead isn't too prohibitive for integration tests. (It wouldn't be acceptable for unit tests however, which is where you would probably have mock repositories or in-memory data.)
Since you're already using SpecFlow take a look at SpecRun (http://www.specrun.com/).
It's a test runner which is designed for SpecFlow tests and adds all sorts of capabilities, from small conveniences like better formatting of the Test names in the Test Explorer to support for running the same SpecFlow test against multiple targets and config file transformations.
With SpecRun you define a "Profile" which will be used to run your tests, not dissimilar to the VS .runsettings file. In there you can specify:
<DeploymentTransformation>
<Steps>
<IISExpress webAppFolder="..\..\MyProject.Web" port="5555"/>
</Steps>
</DeploymentTransformation>
SpecRun will then start up an IISExpress instance running that Website before running your tests. In the same place you can also set up custom Deployment Transformations (using the standard App.Config transformations) to override the connection strings in your app's Web.config so that it points to the in-memory DB.
The only problem I've had with SpecRun is that the documentation isn't great, there are lots of video demonstrations but I'd much rather have a few written tutorials. I guess that's what StackOverflow is here for.