Heroku workers in dev

Heroku workers in dev - node.js

I'm looking into using a worker as well as a web for the first time as I have to scrape a website. I'm just wondering before I commit to this about working in a dev environment. How do jobs in a queue get handled when I'm testing my app before it's pushed to Heroku?
I will probably be using RabbitMQ if that's relevant here.

I guess it depends on what you mean by testing. You can unit test the code that does the scraping in isolation from any queue, and you can provide a mock implementation of the queue operations to handle a goodly portion of your integration tests.
I suppose you might want a real instance of the queue for certain tests, but depending on the nature of your project, you might be satisfied with the sorts of tests described in the first paragraph.
If you simply must test the queue operation and/or you want to run a complete copy of production locally then you'll have to stand up an instance of Rabbitmq. You can stand one up locally or use one of the SAAS providers.
If you have multiple developers working on the project, you might want to make it easy for them by creating something like a vagrant script that sets up a complete environment in a vm. Or better still something like docker. Doing so also gives you a lot more deployment options (making you less dependent on the heroku tooling).
Lastly, numerous CI solutions like Travis CI provide instances of popular services for running tests (including rabbit).

Related

Intermediate step(s) between manual prod and CI/CD for Node/Next on EC2

For about 18 months now I've been working in Node; and for the last 6 months I've been slowly migrating my existing WordPress websites to NextJS.
To date, I've been deploying to production manually. I log into my production server, checkout the latest release from GitHub, build, and do a pm2 restart.
Even though the above workflow seems to be the most commonly documented around the internet, it's always felt a little wrong to me.
Recently, I found myself in a situation where I needed to customise some 3rd party code. So, my main code now has a line in package.json that says
{
...
"dependencies": {
...
"react-share": "file:../react-share/react-share-4.4.1.tgz",
...
},
...
}
which implies that I'm going to checkout my custom react-share, build it somewhere on the production server, change this line to point to wherever I put it, and then rebuild.
Also, I'm using Prisma, which means that every time I deploy, before I do a build, I need to do an npx prisma generate to create the client.
This now all seems really, really wrong.
I don't know how a "simple" CI/CD environment might look, but whatever it looks like, it feels like overkill. It's just me doing development, and my production environment is a single EC2 server sitting behind AWS CloudFront.
It seems to me that I should be doing something more/different than what I'm currently doing, in service to someday moving to a CI/CD model, if/when I have a whole team working on this, or sufficient users that I have multiple load-balanced servers and need production to be continually up.
In particular, it feels like I shouldn't be building on the production server.
Are there any intermediary step(s) I can/should be taking for faster/less-error-prone/less-down-time deployment to a single EC2 instance for Next/Node apps, between manually deploying as I am currently, and some sort of CI/CD setup? Or are my only choices to do what I'm doing now, or go research how to do CI/CD?

You're approaching towards your initial stages of what technically is called DevOps, if not already as it appears from your context. What you're asking is a broad topic, which is an understatement, and explaining each and everything here will almost be like writing an article about it, at the very least.
However, I'll brief you overall on how to approach with this.
I don't know how a "simple" CI/CD environment might look, but whatever it looks like, it feels like overkill.
Simplicity & complexity are relative terms. A system which is complicated for one might be simple for another. CI/CD doesn't define any laws that you need to follow in order to create a perfect deployment procedure, as everyone's deployment requirement is unique (at some point).
If I mention it in bullet points, what you need to figure out before you start with setting up CI&CD, is -
The sequence of steps your deployment procedure needs in order to deploy your latest version. As you have stated already that you've been doing deployment manually, that means you already know your steps. All you need to do is to fine-tune each step so that it shouldn't require manual intervention while being executed automatically by the CI program.
Choose a CI program, like Travis CI, Circle CI, or if you're using GitHub, it has it's own GitHub Actions for the purpose, you can read their documentation for more details. Your CI program will be responsible for executing your deployment steps which you'll mention to it in whichever format it understands (mostly .yml).
The CI program will execute your steps on behalf of you based on the condition which you'll provide, (like when code is pushed on prod branch). It will execute the commands on a machine (like your EC2), specifically, GitHub actions runner will be responsible for running your commands on your machine, the runner should be setup beforehand in the instance you intend to deploy your code on. More details on runners can be found in relevant documentations.
Since the runner will actually execute the commands on your machine, make sure that all required commands and parameters, including the concerned files & directories are accessible to the runner program, from permissions point of view at least. For example, running your npx prisma generate command should require that npx command is available and executable in the system, and the concerned folders in which the command will CRUD files is accessible by the runner program. Similarly for all other commands.
Get your hands on bash scripting as well.
If your steps contain dynamic info, like the one you mentioned that in your package.json an npm script needs to be updated, then a custom bash script created to update the same automatically will help, for instance. There will be however, several other ways depending on the specific nature of the dynamic changes.
The above points are huge (by huge, I mean astronomically huge) oversimplification of the ways through which CI&CD pipelines are setup. But I hope you get the idea of it at least.
In particular, it feels like I shouldn't be building on the production server.
Your feeling is legitimate. You should replicate your production environment (including deployment procedures) into a separate development environment as close as possible, in order to have all your experiments, development and testing done separately from production environment, and after successful evaluation on the development environment, deploy on production one. Steps like building will most likely be done on both environments, as it is something your program needs to run, irrespective of the environment it is running in. Your future team will appreciate this separation of environments.
if/when I have a whole team working on this, or sufficient users that I have multiple load-balanced servers and need production to be continually up.
Again, this small statement in itself is a proper domain of IT department, known as System Design, in which, to put it simply, you or your team will create an architecture for your whole system which will support your business requirements and scaling as your audience increases, which is something a simple Stackoverflow QnA won't suffice to explain.
Therefore,
or go research how to do CI/CD?
is what I'd recommend and you should also feel is the right way ahead, after reading everything above.
Useful references to begin with (not endorsing any resources, you can search for relevant/better resources too)
GitHub Actions self-hosted runners
System Design - Getting started
Bash scripting
Development, Staging, Production

Simplest setup for a staging server and a production server -

What's the simplest way to manage a staging server vs production?
What's the point of having a staging server if you could just push changes to a different branch in production?
What's the best way to merge the staging server with production? Cron job?
Current setup is staging server which we don't use we are just pushing straight to production, but trying to improve the process

What's the simplest way to manage a staging server vs production?
The simplest and cheapest way is to get rid of your staging server. Staging servers don't inherently make deploys safer, but generally developers want at least a dev environment (functionally not necessarily distinct from the idea of staging server) to host their code in a prod-like environment before they push it to prod.
What's the point of having a staging server if you could just push changes to a different branch in production?
If you have 2 branches running in production simultaneously, that's functionally equivalent to a staging server. Most shops prefer to have a staging environment not server so that their data tier, 3rd party integrations, etc. are completely separate between staging and prod.
Simply deploying another copy of your application in prod is deceptively dangerous because if you mess up the data tier or 3rd party integrations you can easily effect prod.
trying to improve the process
Feature flags. if you can enable new features or even fixes for specific users you can then roll it out to your QA team (or the devs, whomever is going to test) and then when you're happy with it, roll it out to the general user base. This isn't inherently safer than anything else, but it has the advantage that it front loads the work of planning for multiple concurrent code paths and makes that planning more explicit.
Unfortunately there's no magic bullet for having testing (dev ,staging , whatever you want to call them) environments increase reliability.
What's the best way to merge the staging server with production? Cron job?
For code, usually the preferred method is to "promote" the artifact you deployed to staging over to prod without rebuilding, guaranteeing the same thing is shipped.
for runtime environment, using containerization makes most of that part of the code artifact and that's the simplest way. If you're running on a container-centric hosting like ECS Fargate or Google's docker oriented service, there's nothing else on the app side to ship. This is what I recommend, it's straight forward and easy to reason about. Adding virtual servers into the mix just adds an OS level to manage and there's little benefit to that. If you can make your app serverless so it's not sitting waiting for connections but instead is invoked when connections come in, the same thing applies, no OS to manage (AWS lambda for example has serverless docker image support)
Data is generally considered the tricky bit to having test environments by those who have experience with them. If your production data is not at all sensitive you can copy it over, but that may or may not actually work depending on what's in the data and how distributed your data ends up being. Generally production data is sensitive enough that you don't want to expose it to dev environments, which makes it tricky to ensure the dev data is appropriate for testing features. One common methodology for overcoming that obstacle is automating end-to-end tests via something like selenium for web broswers, and automated API tests for non-browser-centric endpoints. This allows you to write the test along with the app to prove it's working.

How to upgrade a NodeJS Docker container?

I have a NodeJS image based on the official node Docker image running in a production environment.
How to keep the NodeJS server up-to-date?
How do I know when or how often to rebuild and redeploy the docker image? (I'd like to keep it always up to date)
How do I keep the npm packages inside of the Docker image up to date?

You can use jenkins to schedule job that create nodejs image on desired interval.
Best way to handle the package and updates for docker images is to create separate tags with all updates. Separate tags for all new updates enable you to rollback in case of any backward compatibility issue.
With this new image create your application image and always run test suite if you want to achieve continuous delivery.

[UPDATE] - Based on comments from OP
To get the newest images from Docker, and then deploy them through the following process, you can use the DockerHub API (Based on the Registry HTTP API) to query for tags of an image. Then find the image you use (Alpine, Slim, Whatever) and take it's most recent tag. After this, run through your test pipeline and register that tag as a deploy candidate
TOKEN=//curl https://hub.docker.com/v2/users/login with credentials
REPO="node"
USERNAME="MyDockerHubUsername"
TAGS=$(curl -H "Authorization: JWT ${TOKEN}" https://hub.docker.com/v2/repositories/${USERNAME}/${REPO}/tags/)
Your question is deceptively simple. In reality, Keep a production image up-to-date requires a lot more than just updating the image on some interval. To achieve true CI/CD of your image you'll need to run a series of steps each time you want to update.
A successful pipeline (Jenkins, Bamboo, CircleCi, CodePipeline, etc) will incorporate all of these steps. And will, ideally, be ran on each commit:
Static Analysis
First, analyze your code using a linter (eslint) and some code coverage metric. I won't say what is considered acceptable level of coverage as that is largely opinion based, but at least some amount of coverage should be expected.
Test (Unit)
Use something like Karma/Mocha/Cucumber to run unit tests on your code.
Build
Now you can build your Docker image. I prefer tools like Hashicorp's Packer for building images.
Since I assume you're running a node server (Express or something like it) from within the container, you may also want to spin up the container and run some local acceptance testing after this stage.
Register
After you've accepted local testing of the container, register the image with whichever service you use (ECR, Dockerhub, Nexus) and tag it in some meaningful way.
Deploy
Now that you have a functioning container, you'll need to deploy it to your orchestration environment. This might be Kubernetes, Docker Swarm, AWS ECS or whatever. It's important that you don't yet serve traffic to this container, however.
Test (Integration)
With the container running in a meaningful test environment (nonprod, stage, test, whatever) you can now run integration tests against it. These would check to make sure it can connect with data tier, or would look for a large occurrence of 500/400 errors.
Don't forget - Security should always be a part of your testing also. This is a good place for that
Switch
Now that you've tested in nonprod, you can either deploy to the production env or switch routing to the standing containers which you just tested against. Here you should decide if you'll use green/blue or A/B deployment. If blue/green then start routing all traffic to the new container. If A/B, set up a routing policy based on some ratio. Which ever you use, make sure you have an idea of what failure rate is considered acceptable. Monitor the new deployment for any failures (500 error codes or whatever you think is important) and make sure you have the ability to quickly roll back to the old containers if something goes wrong.
Acceptance
After enough time has passed without defects, you can accept the new container as a stable candidate. Retag the image, or save the image tag somewhere with the denotation that it is "stable" and make that the new defacto image for launching.
Frequency
Now to answer "How Often". Frequency is a side effect of good iterative development. If your code changes are limited in size and scope, then you should feel very confident in launching whenever code passes tests. Thus, with strong DevOps practices, you'll be able to deploy a new image whenever code is committed to the repo. This might be once, twice or fifty times a day. The number eventually becomes arbitrary.
Keep NPM Packages Up To Date
This'll depend on what packages you're using. For public packages, you might want to constrain to a version. Then create pipelines that test certain releases of those packages in a sandbox environment before allowing them into your environment.
For private packages, make sure you have a pipeline for each of those also. The pipeline should run analysis, testing and other important tasks before registering new code with npm or your private repos (Nexus, for example)

Running integration tests automatically on github

Is there any existing tooling/platform I can use to do the following?
On any github PR or commit, have a custom "check", e.g the same as how travis-ci works.
Have this task talk to a remote machine on azure.
Execute a script on this machine and collect logs/exit code
Fail the check if the code is none zero or timeout is reached.
Handle queuing if two PR's come in, clean up on abort etc.
Have some sort of "status" badge like travis-ci to see the current test state/pass rate.
So far only travis-ci itself seems to work something like this, but whatever I execute will run in their cloud so I don't "own" the machine. Additionally my integration tests require copyrighted data which needs to be kept safe on my own cloud machine, and could take multiple hours to complete.

Yes you can. https://help.github.com/articles/about-webhooks/ describes how to do this. Your machine will need to be accessible to github to do this.

Running IIS server with Coypu and SpecFlow

I have already spending a lot of time googling for some solution but I'm helpless !
I got an MVC application and I'm trying to do "integration testing" for my Views using Coypu and SpecFlow. But I don't know how I should manage IIS server for this. Is there a way to actually run the server (first start of tests) and making the server use a special "test" DB (for example an in-memory RavenDB) emptied after each scenario (and filled during the background).
Is there a better or simpler way to do this?

I'm fairly new to this too, so take the answers with a pinch of salt, but as noone else has answered...
Is there a way to actually run the server (first start of tests) ...
You could use IIS Express, which can be called via the command line. You can spin up your website before any tests run (which I believe you can do with the [BeforeTestRun] attribute in SpecFlow) with a call via System.Diagnostics.Process.
The actual command line would be something like e.g.
iisexpress.exe /path:c:\iisexpress\<your-site-published-to-filepath> /port:<anyport> /clr:v2.0
... and making the server use a special "test" DB (for example an in-memory RavenDB) emptied after each scenario (and filled during the background).
In order to use a special test DB, I guess it depends how your data access is working. If you can swap in an in-memory DB fairly easily then I guess you could do that. Although my understanding is that integration tests should be as close to production env as possible, so if possible use the same DBMS you're using in production.
What I'm doing is just doing a data restore to my test DB from a known backup of the prod DB, each time before the tests run. I can again call this via command-line/Process before my tests run. For my DB it's a fairly small dataset, and I can restore just the tables relevant to my tests, so this overhead isn't too prohibitive for integration tests. (It wouldn't be acceptable for unit tests however, which is where you would probably have mock repositories or in-memory data.)

Since you're already using SpecFlow take a look at SpecRun (http://www.specrun.com/).
It's a test runner which is designed for SpecFlow tests and adds all sorts of capabilities, from small conveniences like better formatting of the Test names in the Test Explorer to support for running the same SpecFlow test against multiple targets and config file transformations.
With SpecRun you define a "Profile" which will be used to run your tests, not dissimilar to the VS .runsettings file. In there you can specify:
<DeploymentTransformation>
<Steps>
<IISExpress webAppFolder="..\..\MyProject.Web" port="5555"/>
</Steps>
</DeploymentTransformation>
SpecRun will then start up an IISExpress instance running that Website before running your tests. In the same place you can also set up custom Deployment Transformations (using the standard App.Config transformations) to override the connection strings in your app's Web.config so that it points to the in-memory DB.
The only problem I've had with SpecRun is that the documentation isn't great, there are lots of video demonstrations but I'd much rather have a few written tutorials. I guess that's what StackOverflow is here for.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string