Can test and production share the same cloud kubernetes environment? - azure

I have created a kubernetes cluster and I successfully deployed my spring boot application + nginx reverse proxy for testing purposes.
Now I'm moving to production, the only difference between test and prod is the connection to the database and the nginx basic auth (of course scaling parameters are also different).
In this case, considering I'm using a cloud provider infrastructure, what are the best practices for kubernetes?
Should I create a new cluster only for prod? Or I could use the same cluster and use labels to identify test and production machines?
For now having 2 clusters seems a waste to me: the provider assures me that I have the hardware capabilities and I can put different request/limit/replication parameters according to the environment. Also, for now, I just have 2 images to deploy per environment (even though for production I will opt for an horizontal scaling of 2).

I would absolutely 100% set up a separate test cluster. (...assuming a setup large enough where Kubernetes makes sense; I might consider an easier deployment system for a simple three-tier app like what you're describing.)
At a financial level this shouldn't make much difference to you. You'll need some amount of hardware to run the test copy of your application, and your organization will be paying for it whether it's in the same cluster or a different cluster. The additional cost will only be the cost of the management plane, which shouldn't be excessive.
At an operational level, there are all kinds of things that can go wrong during a deployment, and in particular there are cases where one Kubernetes resource can "step on" another. Deploying to a physically separate cluster helps minimize the risk of accidents in production; you won't accidentally overwrite the prod deployment's ConfigMap holding its database configuration, for example. If you have some sort of crash reporting or alerting set up, "it came from the test cluster" is a very clear check you can use to not wake up the DevOps team. It also gives you a place to try out possibly risky configuration changes: if you run your update script once in the test cluster and it passes then you can re-run it in prod, but if the first time you run it is in prod and it fails, that's an outage.
Depending on what you're using for a CI system, the other thing you can set up is fully automated deploys to the test environment. If a commit passes its own unit tests, you can have the test environment always running current master and run integration tests there. If and only if those integration tests pass, you can promote to the production environment.

It is true, that it is definitely a better practice to use a different cluster, as in your test cluster you could do something wrong (especially resource wise), and take down you prod environment, but if you can't afford it, and if you feel confident on k8s, you can put your prod environment in a different namespace.
I don't know on azure, but on GKE you can take the number of nodes to zero. If it is possible on azure, may be you can take the number of nodes of test environment to zero, whenever not using it, and get 2 clusters.

Its better to use different clusters for production and dev/testing. Please refer here for best practices

Related

Intermediate step(s) between manual prod and CI/CD for Node/Next on EC2

For about 18 months now I've been working in Node; and for the last 6 months I've been slowly migrating my existing WordPress websites to NextJS.
To date, I've been deploying to production manually. I log into my production server, checkout the latest release from GitHub, build, and do a pm2 restart.
Even though the above workflow seems to be the most commonly documented around the internet, it's always felt a little wrong to me.
Recently, I found myself in a situation where I needed to customise some 3rd party code. So, my main code now has a line in package.json that says
{
...
"dependencies": {
...
"react-share": "file:../react-share/react-share-4.4.1.tgz",
...
},
...
}
which implies that I'm going to checkout my custom react-share, build it somewhere on the production server, change this line to point to wherever I put it, and then rebuild.
Also, I'm using Prisma, which means that every time I deploy, before I do a build, I need to do an npx prisma generate to create the client.
This now all seems really, really wrong.
I don't know how a "simple" CI/CD environment might look, but whatever it looks like, it feels like overkill. It's just me doing development, and my production environment is a single EC2 server sitting behind AWS CloudFront.
It seems to me that I should be doing something more/different than what I'm currently doing, in service to someday moving to a CI/CD model, if/when I have a whole team working on this, or sufficient users that I have multiple load-balanced servers and need production to be continually up.
In particular, it feels like I shouldn't be building on the production server.
Are there any intermediary step(s) I can/should be taking for faster/less-error-prone/less-down-time deployment to a single EC2 instance for Next/Node apps, between manually deploying as I am currently, and some sort of CI/CD setup? Or are my only choices to do what I'm doing now, or go research how to do CI/CD?
You're approaching towards your initial stages of what technically is called DevOps, if not already as it appears from your context. What you're asking is a broad topic, which is an understatement, and explaining each and everything here will almost be like writing an article about it, at the very least.
However, I'll brief you overall on how to approach with this.
I don't know how a "simple" CI/CD environment might look, but whatever it looks like, it feels like overkill.
Simplicity & complexity are relative terms. A system which is complicated for one might be simple for another. CI/CD doesn't define any laws that you need to follow in order to create a perfect deployment procedure, as everyone's deployment requirement is unique (at some point).
If I mention it in bullet points, what you need to figure out before you start with setting up CI&CD, is -
The sequence of steps your deployment procedure needs in order to deploy your latest version. As you have stated already that you've been doing deployment manually, that means you already know your steps. All you need to do is to fine-tune each step so that it shouldn't require manual intervention while being executed automatically by the CI program.
Choose a CI program, like Travis CI, Circle CI, or if you're using GitHub, it has it's own GitHub Actions for the purpose, you can read their documentation for more details. Your CI program will be responsible for executing your deployment steps which you'll mention to it in whichever format it understands (mostly .yml).
The CI program will execute your steps on behalf of you based on the condition which you'll provide, (like when code is pushed on prod branch). It will execute the commands on a machine (like your EC2), specifically, GitHub actions runner will be responsible for running your commands on your machine, the runner should be setup beforehand in the instance you intend to deploy your code on. More details on runners can be found in relevant documentations.
Since the runner will actually execute the commands on your machine, make sure that all required commands and parameters, including the concerned files & directories are accessible to the runner program, from permissions point of view at least. For example, running your npx prisma generate command should require that npx command is available and executable in the system, and the concerned folders in which the command will CRUD files is accessible by the runner program. Similarly for all other commands.
Get your hands on bash scripting as well.
If your steps contain dynamic info, like the one you mentioned that in your package.json an npm script needs to be updated, then a custom bash script created to update the same automatically will help, for instance. There will be however, several other ways depending on the specific nature of the dynamic changes.
The above points are huge (by huge, I mean astronomically huge) oversimplification of the ways through which CI&CD pipelines are setup. But I hope you get the idea of it at least.
In particular, it feels like I shouldn't be building on the production server.
Your feeling is legitimate. You should replicate your production environment (including deployment procedures) into a separate development environment as close as possible, in order to have all your experiments, development and testing done separately from production environment, and after successful evaluation on the development environment, deploy on production one. Steps like building will most likely be done on both environments, as it is something your program needs to run, irrespective of the environment it is running in. Your future team will appreciate this separation of environments.
if/when I have a whole team working on this, or sufficient users that I have multiple load-balanced servers and need production to be continually up.
Again, this small statement in itself is a proper domain of IT department, known as System Design, in which, to put it simply, you or your team will create an architecture for your whole system which will support your business requirements and scaling as your audience increases, which is something a simple Stackoverflow QnA won't suffice to explain.
Therefore,
or go research how to do CI/CD?
is what I'd recommend and you should also feel is the right way ahead, after reading everything above.
Useful references to begin with (not endorsing any resources, you can search for relevant/better resources too)
GitHub Actions self-hosted runners
System Design - Getting started
Bash scripting
Development, Staging, Production

How would one create an isolated jenkins build node (without access to secrets)?

As the Top 10 CI/CD Security Risks SEC-04 states:
Ensure that pipelines running unreviewed code are executed on isolated nodes, not exposed to secrets and sensitive environments.
The above statement seems especially true when the code (or pipeline code itself) is in a pull request which has not yet been seen/approved/merged but from a developer perspective you want to know if it builds successfully in the first place. Running code that nobody has laid eyes upon while having access to build secrets is definitely a security risk.
Wondering if isolation is achievable with Jenkins build nodes as I cannot find any specific options for this.
My assumption is that dynamic provisioned containerized agents are best suited for isolated environments, I'm just not sure how to prevent their access to secrets from the Jenkins controller.

Simplest setup for a staging server and a production server -

What's the simplest way to manage a staging server vs production?
What's the point of having a staging server if you could just push changes to a different branch in production?
What's the best way to merge the staging server with production? Cron job?
Current setup is staging server which we don't use we are just pushing straight to production, but trying to improve the process
What's the simplest way to manage a staging server vs production?
The simplest and cheapest way is to get rid of your staging server. Staging servers don't inherently make deploys safer, but generally developers want at least a dev environment (functionally not necessarily distinct from the idea of staging server) to host their code in a prod-like environment before they push it to prod.
What's the point of having a staging server if you could just push changes to a different branch in production?
If you have 2 branches running in production simultaneously, that's functionally equivalent to a staging server. Most shops prefer to have a staging environment not server so that their data tier, 3rd party integrations, etc. are completely separate between staging and prod.
Simply deploying another copy of your application in prod is deceptively dangerous because if you mess up the data tier or 3rd party integrations you can easily effect prod.
trying to improve the process
Feature flags. if you can enable new features or even fixes for specific users you can then roll it out to your QA team (or the devs, whomever is going to test) and then when you're happy with it, roll it out to the general user base. This isn't inherently safer than anything else, but it has the advantage that it front loads the work of planning for multiple concurrent code paths and makes that planning more explicit.
Unfortunately there's no magic bullet for having testing (dev ,staging , whatever you want to call them) environments increase reliability.
What's the best way to merge the staging server with production? Cron job?
For code, usually the preferred method is to "promote" the artifact you deployed to staging over to prod without rebuilding, guaranteeing the same thing is shipped.
for runtime environment, using containerization makes most of that part of the code artifact and that's the simplest way. If you're running on a container-centric hosting like ECS Fargate or Google's docker oriented service, there's nothing else on the app side to ship. This is what I recommend, it's straight forward and easy to reason about. Adding virtual servers into the mix just adds an OS level to manage and there's little benefit to that. If you can make your app serverless so it's not sitting waiting for connections but instead is invoked when connections come in, the same thing applies, no OS to manage (AWS lambda for example has serverless docker image support)
Data is generally considered the tricky bit to having test environments by those who have experience with them. If your production data is not at all sensitive you can copy it over, but that may or may not actually work depending on what's in the data and how distributed your data ends up being. Generally production data is sensitive enough that you don't want to expose it to dev environments, which makes it tricky to ensure the dev data is appropriate for testing features. One common methodology for overcoming that obstacle is automating end-to-end tests via something like selenium for web broswers, and automated API tests for non-browser-centric endpoints. This allows you to write the test along with the app to prove it's working.

How many agents should I have?

I trying to build a branch-based GitOps declarative infrastructure for Kubernetes. I plan to create clusters on a cloud provider with crossplane, and those clusters will be stored in Gitlab. However, as I start building, I seem to be running into gitlab-agent sprawl.
Every application I will be deploying to each of my environments is stored in a separate git repo, and I'm wondering if I need a separate agent for each repo and environment. For example, I have my three clusters prod, stage, and dev, and my three apps, API, kafka, and DB. I've started with three agents per repo (gitlab-agent-api-prod, gitlab-agent-kafka-stage, ...), Which seems a bit excessive. Do I really need 9 agents?
Additionally, I now have to install as many agents as I have apps onto each of my clusters, which already eats up significant resources. I'd imagine I can get away with one gitlab agent per cluster, I am just not seeing how that is done. Any help would be appreciated!
PS: if anyone has a guide on how to automatically add gitlab agents to new clusters created with crossplane, I'm all ears. Thanks!

Heroku workers in dev

I'm looking into using a worker as well as a web for the first time as I have to scrape a website. I'm just wondering before I commit to this about working in a dev environment. How do jobs in a queue get handled when I'm testing my app before it's pushed to Heroku?
I will probably be using RabbitMQ if that's relevant here.
I guess it depends on what you mean by testing. You can unit test the code that does the scraping in isolation from any queue, and you can provide a mock implementation of the queue operations to handle a goodly portion of your integration tests.
I suppose you might want a real instance of the queue for certain tests, but depending on the nature of your project, you might be satisfied with the sorts of tests described in the first paragraph.
If you simply must test the queue operation and/or you want to run a complete copy of production locally then you'll have to stand up an instance of Rabbitmq. You can stand one up locally or use one of the SAAS providers.
If you have multiple developers working on the project, you might want to make it easy for them by creating something like a vagrant script that sets up a complete environment in a vm. Or better still something like docker. Doing so also gives you a lot more deployment options (making you less dependent on the heroku tooling).
Lastly, numerous CI solutions like Travis CI provide instances of popular services for running tests (including rabbit).

Resources