How to create a fresh test database at startup

How to create a fresh test database at startup - arangodb

Building my CI tests and would like to get a fresh database on every start. How can I tell arango to reset/clear/clean the database and initialize say a "test" db.
arangodb --starter.local --starter.port=8529 start

There are two ways I usually do something like that:
Run ArangoDB in a docker container. The Arango official image is easy to use and you can create containers that can either keep the data or start empty every time. The official image can be found here
Create a Foxx micro service and populate the setup and a teardown scripts. These scripts run automatically when you install/upgrade/replace the service. The setup could create the necessary tables. The teardown could remove related tables. You can learn more about these life cycle scripts here

Related

Build an extensible system for scraping websites

Currently, I have a server running. Whenever I receive a request, I want some mechanism to start the scraping process on some other resource(preferably dynamically created) as I don't want to perform scraping on my main instance. Further, I don't want the other instance to keep running and charging me when I am not scraping data.
So, preferably a system that I can request to start scraping the site and close when it finishes.
Currently, I have looked in google cloud functions but they have a cap at 9 min max for every function so it won't fit my requirement as scraping would take much more time than that. I have also looked in AWS SDK it allows us to create VMs on runtime and also close them but I can't figure out how to push my API script onto the newly created AWS instance.
Further, the system should be extensible. Like I have many different scripts that scrape different websites. So, a robust solution would be ideal.
I am open to using any technology. Any help would be greatly appreciated. Thanks

I can't figure out how to push my API script onto the newly created AWS instance.
This is achieved by using UserData:
When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts.
So basically, you would construct your UserData to install your scripts, all dependencies and run them. This would be executed when new instances are launched.
If you want the system to be scalable, you can lunch your instances in Auto Scaling Group and scale it up or down as you require.
The other option is running your scripts as Docker containers. For example using AWS Fargate.
By the way, AWS Lambda has limit of 15 minutes, so not much more than Google functions.

How to run a migration on Google App Engine

I have a Node.js app running on Google App Engine.
I want to run sequelize migrations.
Is it possible to run a command from within the Instance of my node.js app?
Essentially something like heroko's run command which will run a one-off process inside a Heroku dyno.
If this isn't possible what's the best practice in running migrations?
I could always just add it to the gcp-build but this will run on every deploy.

It's not possible to run standalone scripts/apps in GAE, see How do I run custom python script in Google App engine (in python context, but the general idea applies to all runtimes).
The way I ran my (datastore) migrations was to port the functionality of the migration script itself into the body of an admin-protected handler in my GAE app which I triggered with a HTTP request for a particular URL. I re-worked it a bit to split the potentially long-running migration operation into a sequence of smaller operations (using push task queues), much more GAE-friendly. This allowed me to live-test the migration one datastore entity set at a time and only go for multiple sets when completely confident with its operation. Also didn't have to worry about eventual consistency (I was using queries to determine the entities to be migrated) - I just repeatedly invoked the migration until there was nothing left to do.
Once the migration was completed I removed the respective code (but kept the handler itself for future migrations). As a positive side effect I pretty much had the migration history captured in my repository's history itself.
Potentially of interest: Handling Schema Migrations in App Engine

How to upgrade a NodeJS Docker container?

I have a NodeJS image based on the official node Docker image running in a production environment.
How to keep the NodeJS server up-to-date?
How do I know when or how often to rebuild and redeploy the docker image? (I'd like to keep it always up to date)
How do I keep the npm packages inside of the Docker image up to date?

You can use jenkins to schedule job that create nodejs image on desired interval.
Best way to handle the package and updates for docker images is to create separate tags with all updates. Separate tags for all new updates enable you to rollback in case of any backward compatibility issue.
With this new image create your application image and always run test suite if you want to achieve continuous delivery.

[UPDATE] - Based on comments from OP
To get the newest images from Docker, and then deploy them through the following process, you can use the DockerHub API (Based on the Registry HTTP API) to query for tags of an image. Then find the image you use (Alpine, Slim, Whatever) and take it's most recent tag. After this, run through your test pipeline and register that tag as a deploy candidate
TOKEN=//curl https://hub.docker.com/v2/users/login with credentials
REPO="node"
USERNAME="MyDockerHubUsername"
TAGS=$(curl -H "Authorization: JWT ${TOKEN}" https://hub.docker.com/v2/repositories/${USERNAME}/${REPO}/tags/)
Your question is deceptively simple. In reality, Keep a production image up-to-date requires a lot more than just updating the image on some interval. To achieve true CI/CD of your image you'll need to run a series of steps each time you want to update.
A successful pipeline (Jenkins, Bamboo, CircleCi, CodePipeline, etc) will incorporate all of these steps. And will, ideally, be ran on each commit:
Static Analysis
First, analyze your code using a linter (eslint) and some code coverage metric. I won't say what is considered acceptable level of coverage as that is largely opinion based, but at least some amount of coverage should be expected.
Test (Unit)
Use something like Karma/Mocha/Cucumber to run unit tests on your code.
Build
Now you can build your Docker image. I prefer tools like Hashicorp's Packer for building images.
Since I assume you're running a node server (Express or something like it) from within the container, you may also want to spin up the container and run some local acceptance testing after this stage.
Register
After you've accepted local testing of the container, register the image with whichever service you use (ECR, Dockerhub, Nexus) and tag it in some meaningful way.
Deploy
Now that you have a functioning container, you'll need to deploy it to your orchestration environment. This might be Kubernetes, Docker Swarm, AWS ECS or whatever. It's important that you don't yet serve traffic to this container, however.
Test (Integration)
With the container running in a meaningful test environment (nonprod, stage, test, whatever) you can now run integration tests against it. These would check to make sure it can connect with data tier, or would look for a large occurrence of 500/400 errors.
Don't forget - Security should always be a part of your testing also. This is a good place for that
Switch
Now that you've tested in nonprod, you can either deploy to the production env or switch routing to the standing containers which you just tested against. Here you should decide if you'll use green/blue or A/B deployment. If blue/green then start routing all traffic to the new container. If A/B, set up a routing policy based on some ratio. Which ever you use, make sure you have an idea of what failure rate is considered acceptable. Monitor the new deployment for any failures (500 error codes or whatever you think is important) and make sure you have the ability to quickly roll back to the old containers if something goes wrong.
Acceptance
After enough time has passed without defects, you can accept the new container as a stable candidate. Retag the image, or save the image tag somewhere with the denotation that it is "stable" and make that the new defacto image for launching.
Frequency
Now to answer "How Often". Frequency is a side effect of good iterative development. If your code changes are limited in size and scope, then you should feel very confident in launching whenever code passes tests. Thus, with strong DevOps practices, you'll be able to deploy a new image whenever code is committed to the repo. This might be once, twice or fifty times a day. The number eventually becomes arbitrary.
Keep NPM Packages Up To Date
This'll depend on what packages you're using. For public packages, you might want to constrain to a version. Then create pipelines that test certain releases of those packages in a sandbox environment before allowing them into your environment.
For private packages, make sure you have a pipeline for each of those also. The pipeline should run analysis, testing and other important tasks before registering new code with npm or your private repos (Nexus, for example)

Rethinkdb race conditions creating table

I have an add written in nodejs which uses rethinkdb. At startup time, the app does a bunch of database setup, including creating necessary tables if they do not exist. The code (simplified) looks something like:
r.tableList().run(conn).then(existingTables =>
requiredTables
.filter(t => existingTables.indexOf(t) === -1)
.map(name => r.tableCreate(name).run(conn)));
This works fine. The problem is that the app is running inside a docker container, and I need to be able to scale out using docker-compose scale app=3, for example. When the deploy job runs this, three new containers are immediately created, each of which creates a set of tables resulting in database issues which I need to resolve manually. I think I can understand why this happens, but I can't see how to solve it. I had thought of trying to write it all in a single query, but the real use is a quite a bit more complex (i.e. creates indexes, runs migrations, populates sample data) and I don't think there is any way I can do the lot in a single query.

RethinkDB currently doesn't guarantee that administrative actions are atomic. The best thing to do would probably be to separate out the administrative actions (creating databases, tables, and indexes) and run those in a separate setup step that runs in only one container.

How to create a nodejs instance to run cron jobs at set schedule?

I need to create a nodejs "server" which wont actually serve any assets or content, but will just run some scheduled job to fetch contents from one database and update another database. The schedule of the job should be configurable and should be able to cancel the job at any time. Basically what I need is to run a node script periodically. In past, I have created node/express projects, but I am having a hard time understanding how to implement such a node instance which will run on a remote machine and how to start or terminate it. I found a npm package called "node-schedule" which runs the job periodically, but how to put this package on a remote machine instance and run it?
One possibility that was considered was to schedule a cron job on remote machine which will execute "node updateDB.js" on set schedule, but it is a requirement to keep everything in node package and not depend on cron.

Sounds like a job for ssh.
Personally I wouldn't use NodeJS for this, this should be pretty trivial to do, with Node or otherwise, not sure why you are stuck, honestly. I have nothing against Node, but I don't see why it would be necessary for this task, but certainly you could use it for such a thing.
EDIT: After reading your comment I'm convinced someone thinks Node is a good tool for this task. I guess I don't understand where you are stuck. What part are you stuck on?
I think you should be able to puzzle this out pretty fast. The link below should be enough to put this together. http://book.mixu.net/node/ch9.html
If you need to execute ad hoc commands on a remote server you could use Node to call an Ansible playbook, in that case you'll need to share the public ssh key on the target instance(s) with the instance issuing the commands. There are other ways to skin this cat, but based on the information given, that's how I'd do it. I'd use Node and Ansible (requires python) + SSH.
Oh neato, maybe if I were forced to use NodeJS I'd use this package. https://www.npmjs.com/package/ssh2-exec
Did you find an answer to your problem? Share it here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string