TL;DR
I'm trying to understand how to setup a Jenkins or Gitlab installation "properly" security-wise.
We'd like jobs to be able to pull from and push to maven repositories and docker registries, but do this in a safe way, so that even malicious Jenkinsfile or .gitlab-ci.yml files can't get direct access to the credentials to either print them on-screen or send them with e.g. curl somewhere.
It seems the straight-forward and documented way to do it for both Jenkins and gitlab is to create "Credentials" in Jenkins and "Variables" in Gitlab CI/CD. These are then made available as environment variables for Jenkinsfile or .gitlab-ci.yml to access and use in their scripts. Which is super-handy!
BUT!
That means that anybody that can create a job in Jenkins/Gitlab or has write access to any repository that has an existing job can get access the raw credentials if they're malicious. Is that really the best one can hope for? That we trust every single person that has login to a Jenkins/Gitlab installation with keys to the kingdom?
Sure we can limit credentials so they're only accessible to certain jobs, but all jobs need access to maven repos and docker registries...
In these post-SolarWinds times, surely we can and must do better than that when securing our build pipeline...
Details
I was hoping for something like the ability for a e.g. a Jenkins file to declare up-front that it wants to use these X docker images and these Y java maven dependencies somewhere before a script runs, so these dependencies are downloaded. So that credentials to pull dependencies are hidden from the scripts. And that after a build, a number of artifacts are declared, so that after the script has concluded, "hidden" credentials are used to pushed the artifacts to e.g. a nexus repository and/or docker registry.
But the Jenkins documentation entry for Using Docker with Pipeline describes how to use a registry with:
docker.withRegistry('https://registry.example.com', 'credentials-id') {
bla bla bla
}
And that looks all safe and good, but if I put this in the body:
sh 'cat $DOCKER_CONFIG/config.json | base64'
then it is game over. I have direct access to the credentials. (The primitive security of string matching for credentials in script output is easily defeated with base64.)
Gitlab doesn't even try to hide that it is easy in their docs
before_script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
Could be replaced with
before_script:
- "echo $CI_REGISTRY_USER:$CI_REGISTRY_PASSWORD | base64"
Likewise, game over.
Is there no general to have credentials that are safely protected from the Jenkinsfile or .gitlab-ci.yml scripts?
These two articles describe the situation perfectly:
Accessing and dumping Jenkins credentials | Codurance "The answers you seek, Jenkins shall leak."
Storing Secrets In Jenkins - Security Blogs - this last article even describes how to print Jenkins' own encrypted /var/lib/jenkins/credentials.xml and then use Jenkins itself to decrypt them. Saves the hacker the trouble.
Related
I have a project filled with various pipeline templates that other projects can use for their work, keeping my CI DRY. These are not in a .gitlab-ci.yml file, but separate ./templates/language.yml files.
I'm already using yaml lint to make sure it is valid yaml, but I want to make sure I'm using valid GitLab CI syntax also.
I'd like to lint my CI templates when I'm merging. I have rejected running gitlab-runner exec shell because I can't figure out how to trigger specific copies. It looks like there might be something in the API with this, but I haven't been able to nail down the secret sauce.
How are you doing this?
We are using two different approach to achieve this.
via API - https://docs.gitlab.com/ee/api/lint.html
with a fake project setup within my templates
with gitlab-ci-local
via API
The first approach is using the linter from gitlab via API:
curl --header "Content-Type: application/json" --header "PRIVATE-TOKEN: <your_access_token>" "https://gitlab.example.com/api/v4/ci/lint" --data '{"content": "{ \"image\": \"ruby:2.6\", \"services\": [\"postgres\"], \"before_script\": [\"bundle install\", \"bundle exec rake db:create\"], \"variables\": {\"DB_NAME\": \"postgres\"}, \"types\": [\"test\", \"deploy\", \"notify\"], \"rspec\": { \"script\": \"rake spec\", \"tags\": [\"ruby\", \"postgres\"], \"only\": [\"branches\"]}}"}'
The problem here, is that you can not utilize the JOB_TOKEN to do this, therefore you need to inject a secret and generate a token to achieve this. there is even a linting library available - https://github.com/BuBuaBu/gitlab-ci-lint
fake project
The second approach mimics the setup, with an own .gitlab-ci.yml which includes the templates and executes it. Like normal merge request pipelines. This way we ensure the scripts do not have any failure and are save to use them. We do this for docker images as well for gradle build templates etc.
eg. for docker images we build the image, include the template, and overwrite the image property of the jobs to the temporary docker image.
gitlab-ci-local
The third approach is not as sufficient and depending on the feature, lacks functionality. There is the tool gitlab-ci-local https://github.com/firecow/gitlab-ci-local which can be used to verify gitlab ci builds and execute them. But it is not an official implementation and not all features are present. In the end you also need again some kind of project setup.
If i can choose i would go with the first approach. In our case it has proven to be useful. The initial effort of faking a project is not that much, for the benefit of a long term save solution.
We are working on the s4sdk pipeline implementation for delivery of SAP CloudFoundry applications (spring-boot micro-services) using the SAP Cloud SDK for Java.
We have multiple developers working on multiple micro-services but all these micro-services are having some common dependencies.
We want to control the versions for all the common dependencies from a central location.
For this we have created a Maven BOM (Bill of Materials) dependency and added it as the parent in pom.xml of all the micro-services.
The aforementioned BOM is housed in Nexus repository and all pom.xmls (of the micro-services) can access the parent using the repository tag like below.
<repository>
<id>my-repo</id>
<name>nexus-repo</name>
<url>http://some/url</url>
</repository> `
The credentials for the above nexus repository are placed in the settings.xml file.
We want to run the above model in the cloud-s4-sdk pipeline. Although it works fine, the problem is that we need to expose the nexus repo access credentials in the settings.xml file.
Per documentation in https://github.com/SAP/cloud-s4-sdk-pipeline/blob/master/configuration.md#mavenexecute, the settings.xml for maven builds needs to be placed relative to the
project root. This is a security concern for us as the project repository is in GitHub and as such projectSettingsFile can be accessed by the developers.
We don't want these credentials to be exposed to the developers. It should be limited to only the admin team.
Is there a way we can achieve this using the cloud-s4-sdk pipeline?
Although nexus facilitates user token for maven settings.xml, but that does not work here as GUI login is still possible using the token values.
I think you could consider the following options:
Allow anonymous read access for artifacts
The developers anyway need a way to build the artifacts locally. How could developers build your service without having access to a dependency. Allowing read access would also enable them to do that.
Commit credentials to git but make git repository private
If you don't want to allow all employees (I guess the only employees have access to your nexus), you can commit the credentials together with the settings.xml but make the repository private to not share these details.
Inject credentials as environment variable
You can inject the credentials as environment variable to your settings xml file. See also: How to pass Maven settings via environmental vars
The setup the environment variable you can surround the full pipeline in your Jenkinsfile with the withCredentials step. For details see: https://jenkins.io/doc/pipeline/steps/credentials-binding/
String pipelineVersion = "master"
node {
deleteDir()
sh "git clone --depth 1 https://github.com/SAP/cloud-s4-sdk-pipeline.git -b ${pipelineVersion} pipelines"
withCredentials([usernamePassword(credentialsId: 'nexus', usernameVariable: 'NEXUS_USERNAME', passwordVariable: 'NEXUS_PASSWORD')]) {
load './pipelines/s4sdk-pipeline.groovy'
}
}
and a settings.xml like:
<username>${env.NEXUS_USERNAME}</username>
<password>${env.NEXUS_PASSWORD}</password>
I have a gitlab project that is mirroring (pull) a github private repo. Because of its origins, the repo has a "config/private.js" file with all the api keys and server config that it needs. Or rather, that file isnt in the repo, its in .gitignore.
How do I populate my gitlab environment with this file? It would be ideal if I could reserve a special file that is not in the repo and does not update with commits, and is used to populate the dist environment with a build command like:
- cat secrets.file > src/config/private.js
But -- I'm having no luck finding that in the documentation. I do se project and group secrets -- but 1. that would be tedious just to add them and 2. I would need to rewrite the code, or else create another just as tedious script to echo each to the file.
this was a tad complicated.
Gitlab does not install the repo it installs the build results, thus you can inject api key files in gitlab's CI CD - but you would have to change it/rebuild for each env. (You couldnt test results and then redeploy known working results to prod.) In my case, I was building once, and committed to only applying relevant keys to stage and prod.
What I do is I keep the secrets as variables on the destination. I inject a key file that refers to the env during CI CD. For example, it might set a key to __MY_API_KEY__. I use a postinstall script in deployment to apply these env keys to the built scripts that are installed (this is just a tr command over a set of env variables and /build files).
This way, I can use a hard coded, gitignored private file locally, and still inject private keys specific to each env separately.
I think I'm fundamentally missing something. I'm new to CI/CD and trying to set up my first pipeline ever with gitlab.
The project is a pre-existing PHP project.
I don't want to clean it up just yet, at the moment I've pushed the whole thing into a docker container and it's running fine talking to google cloud's mysql databases etc as it should locally and also on a remote google cloud testing VM.
The dream is to be able to push to the development branch, and then merge the dev banch into the test branch which then TRIGGERS automated tests (easy part), and also causes the remote test VM (hosted on google cloud), to PULL the newest changes, rebuild the image from the latest docker file (or pull the latest image from gitlab image register)... and then rebuild the container with the newest image.
I'm playing around with gitlab's runner but I'm not understanding what it's actually for, despite looking through almost all the online content for it.
Do I just install it in the google cloud VM, and then when I push to gitlab from my development machine.. the repo will 'signal' the runner (which is running on the VM, to execute a bunch of scripts (which might include git pull on the newest changes?).
Because I already pre-package my app into a container locally (and push the image to the image registry) do I need to use docker as my executor on the runner? or can i just use shell and shell the commands in?
What am I missing?
TLDR and extra:
Questions:
What is runner actually for,
where is it meant to be installed?
Does it care which directory it is run in?
If it doesn't care which directory it's run,
where does it execute it's script commands? At root?
If I am locally building my own images and uploading them to gitlab's registry,
Do I need to set my executor to docker? Shouldn't I just set it to shell, pull the image, and build it? (Assuming the runner is runing on the remote VM).
What is runner actually for?
You have your project along with a .gitlab-ci.yml file. .gitlab-ci.yml defines what stages your CI/CD pipeline has and what to do in each stage. This typically consists of a build,test,deploy stages. Within each stage you can define multiple job. For example in build stage you may have 3 jobs to build on debian, centos and windows (in GitLab glossary build:debian, build:centos, build:windows). A GitLab runner clones the project read the gitlab-ci.yaml file and do what he is instructed to do. So basically GitLab runner is a Golang process that executes some instructed tasks.
where is it meant to be installed?
You can install a runner in your desired environment listed here. https://docs.gitlab.com/runner/install/
or
you can use a shared runner that is already installed on GitLab's infrastructure.
Does it care which directory it is run in?
Yes. Every task executed by runner is relativly to CI_PROJECT_DIR defined in https://gitlab.com/help/ci/variables/README. But you can alter this behaviour.
where does it execute it's script commands? At root?
Do I need to set my executor to docker? Shouldn't I just set it to shell, pull the image, and build it?
A runner can have mutiple executors such as docker, shell, virtualbox etc but docker being the most common one. If you use docker as the executor you can pull any image from docker hub or your configured registry and you can do loads of stff with docker images. In a docker environment normally you run them as the root user.
https://docs.gitlab.com/runner/executors/README.html
See gitlab access logs , runner is constantly polling the server
I'm a bit new to version control and deployment environments and I've come to a halt in my learning about the matter: how do deployment environments work if developers can't work on the same local machine and are forced to always work on a remote server?
How should the flow of the deployment environments be set up according to best practices?
For this example I considered three deployment environments: development, staging and production; and three storage environments: local, repository server and final server.
This is the flow chart I came up with but I have no idea if it's right or how to properly implement it:
PS. I was thinking the staging tests on the server could have restricted access through login or ip checks, in case you were wondering.
I can give you (according to my experience) a good and straightforwarfd practice, this is not the only approach as there is not a unique standard on how to work on all projects:
Use a distributed version control system (like git/github):
Make a private/public repository to handle your project
local Development:
Developers will clone the project from your repo and contribute to it, it is recommended that each one work on a branch, and create a new branch for each new feature
Within your team, there is one responsible for merging the branches that are ready with the master branch
I Strongly suggest working on a Virtual Machine during Development:
To isolate the dev environment from the host machine and deal with dependencies
To have a Virtual Machine identic to the remote production server
Easy to reset, remove, reproduce
...
I suggest using VirtualBox for VM provider and Vagrant for provisioning
I suggest that your project folder be a shared folder between your host machine and your VM, so, you will write your source codes on your host OS using the editor you love, and at the same time this code exists and runs inside your VM, is in't that amazingly awesome ?!
If you are working with python I also strongly recommend using virtual environments (like virtualenv or anaconda) to isolate and manage inner dependencies
Then each developer after writing some source code, he can commit and push his changes to the repository
I suggest using project automation setup tools like (fabric/fabtools for python):
Making a script or something that with one click or some commands, reproduces all the environment and all the dependencies and everything needed by the project to be up and running, so all developers backend, frontend, designers... no matter their knowlege nor their host machine types can get the project running very merely. I also suggest doing the same thing to the remote servers whether manually or with tools like (fabric/fabtools)
The script will mainly install os dependencies, then project dependencies, then cloning the project repo from your Version Control, and to do so, you need to Give the remote servers (testing, staging, and production) access to the Repository: add ssh public keys of each server to the keys in your Version Control system (or use agent forwarding with fabric)
Remote servers:
You will need at least a production server which makes your project accessible to the end-users
it is recommended that you also have a testing and staging servers (I suppose that you know the purpose of each one)
Deployment flow: Local-Repo-Remote server, how it works ?:
Give the remote servers (testing, staging, and production) access to the Repository: add ssh public keys of each server to the keys in your Version Control system (or user agent forwarding with fabric)
The developer writes the code on his machine
Eventually writes tests for his code and runs them locally (and on the testing server)
The developer commits and pushes his code to the branch he is using to the remote Repository
Deployment:
5.1 If you would like to deploy a feature branch to testing or staging:
ssh access to the server and then cd to the project folder (cloned from repo manually or by automation script)
git checkout <the branch used>
git pull origin <the branch used>
5.2 If you would like to deploy to production:
Make a pull request and after the pull request gets validated by the manager and merged with master branch
ssh access to the server and then cd to the project folder (cloned from repo manually or by automation script)
git checkout master # not needed coz it should always be on the master
git pull origin master
I suggest writing a script like with fabric/fabtools or use tools like Jenkins to automate the deployment task.
VoilĂ ! Deployment is done!
This is a bit simplified approach, there are still a bunch of other recommended and best practice tools and tasks.