How to write automated test for Gitlab CI/CD configuration?

How to write automated test for Gitlab CI/CD configuration? - gitlab

I have notified the complexity of my configuration for Gitlab CI/CD grows fairly fast. Normally, in a either a programming language or infrastructure code I could write automated test for the code.
Is it possible to write automated test for the gitlab-ci.yml file itself?
Does it exist any library or testing framework for it?
Ideally I would like to setup the different environments variables, execute the gitlab-ci.yml file, and do assertions on the output.

I am not aware of any tool currently to really test behaviour. But i ended up with two approaches:
1. testing in smaller chunks aka unit testing my building blocks.
I extracted my builds into smaller chunks which i can include, and test seperately. This works fine for script blocks etc. but is not really sufficient for rule blocks - but offers great value, eg i use the templates for some steps as verification for the generated docker image, which will be used by those steps.
2. verification via gitlab-ci-local
https://github.com/firecow/gitlab-ci-local is a tool which allows you to test it locally, and which allows you to provide environment variables. There are some minor issues with branch name resolution in pr pipelines but besides that it works great. I use it to test gitlab ci files in GitHub Actions.
I hope that this helps somehow

Gitlab have a linter for gitlab-ci.yaml
To access the CI Lint tool, navigate to CI/CD > Pipelines or CI/CD > Jobs in your project and click CI lint.
Docs

Related

What is a good Databricks workflow

I'm using Azure Databricks for data processing, with notebooks and pipeline.
I'm not satisfied with my current workflow:
The notebook used in production can't be modified without breaking the production. When I want to develop an update, I duplicate the notebook, change the source code until I'm satisfied, then I replace the production notebook with my new notebook.
My browser is not an IDE! I can't easily go to a function definition. I have lots of notebooks, if I want to modify or even just see the documentation of a function, I need to switch to the notebook where this function is defined.
Is there a way to do efficient and systematic testing ?
Git integration is very simple, but this is not my main concern.

Great question. Definitely dont modify your production code in place.
One recommended pattern is to keep separate folders in your workspace for dev-staging-prod. Do your dev work and then run tests in staging before finally promoting to production.
You can use the Databricks CLI to pull and push a notebook from one folder to another without breaking existing code. Going one step further, you can incorporate this pattern with git to sync with version control. In either case, the CLI gives you programmatic access to the workspace and that should make it easier to update code for production jobs.
Regarding your second point about IDEs - Databricks offers Databricks Connect, which let's you use your IDE while running commands on a cluster. Based on your pain points I think this is a great solution for you, as it will give your more visibility into the functions you have defined and so on. You can also write and run your unit tests this way.
Once you have your scripts ready to go you can always import them into the workspace as a notebook and run it as a job. Also know that you can run .py scripts as a job using the REST API.

I personally prefer to package my code, and copy the *.whl package to DBFS, where I can install the tested package and import it.
Edit: To be more explicit.
The notebook used in production can't be modified without breaking the production. When I want to develop an update, I duplicate the notebook, change the source code until I'm satisfied, then I replace the production notebook with my new notebook.
This can be solved by either having separate environments DEV/TST/PRD. Or having versioned packages that can be modified in isolation. I'll clarify later on.
My browser is not an IDE! I can't easily go to a function definition. I have lots of notebooks, if I want to modify or even just see the documentation of a function, I need to switch to the notebook where this function is defined.
Is there a way to do efficient and systematic testing ?
Yes, using the versioned packages method I mentioned in combination with databricks-connect, you are totally able to use your IDE, implement tests, have proper git integration.
Git integration is very simple, but this is not my main concern.
Built-in git integration is actually very poor when working in bigger teams. You can't develop in the same notebook simultaneously, as there's a flat and linear accumulation of changes that are shared with your colleagues. Besides that, you have to link and unlink repositories that are prone to human error, causing your notebooks to be synchronized in the wrong folders, causing runs to break because notebooks can't be imported. I advise you to also use my packaging solution.
The packaging solution works as follows Reference:
List item
On your desktop, install pyspark
Download some anonymized data to work with
Develop your code with small bits of data, writing unit tests
When ready to test on big data, uninstall pyspark, install databricks-connect
When performance and integration is sufficient, push code to your remote repo
Create a build pipeline that runs automated tests, and builds the versioned package
Create a release pipeline that copies the versioned package to DBFS
In a "runner notebook" accept "process_date" and "data folder/filepath" as arguments, and import modules from your versioned package
Pass the arguments to your module to run your tested code

The way we are doing it -
-Integrate the Dev notebooks with Azure DevOps.
-Create custom Build and Deployment tasks for Notebook, Jobs, package and cluster deployments. This is sort of easy to do with the DatabBricks RestAPI
https://docs.databricks.com/dev-tools/api/latest/index.html
Create Release pipeline for Test, Staging and Production deployments.
-Deploy on Test and test.
-Deploy on Staging and test.
-Deploy on production
Hope this can help.

Variable substitution in build pipeline

There are tons of resources online on how to replace JSON configuration files in a release pipeline like this one. I configured this. It works. However, we have multiple integration tests which reach the database too. These tests are run during build time. I haven't seen any option yet to replace config values in the build pipeline. Does it exist? Or do I really have to use this custom task (see screenshot below)?

There is an out-of-the-box task since recently by Microsoft. It's called File Transform. It's currently in preview but it works really well! Haven't had any issues whatsoever with it and it works the same as you would configure it in the release pipeline. Would recommend this any day!
Below you can see my configuration.

There is no out-of-the-box task only to replace tokens/values in files (also in the release pipline the task is Azure App Service Deploy and not only for replace json configuration).
You need to use an external extension from here or write a PowerShell script for that.

VSTS CI/CD definition as scripts

We are building a Microservices based architecture and we are having 50 odd CI and 50 odd CD pipelines. Is there a way to script the CI / CD Build and Release definitions? We want this to be a repeatable process and do not want to leave it to our DevOps engineer(s) as it is prone to errors. Please note that I am not talking about ARM (which is already being used by us). Is there a way to do the above?

For builds, you can use YAML builds, which are currently in preview.
For releases, there's nothing equivalent yet.
You could always use the REST APIs to extract the build and release definitions as JSON, source control them, and then create a continuous delivery pipeline to update them when the definitions in source control change.

Is there a way to add coverage report to gitlab?

I understand that gitlab has support to Jenkins CI, but what I need is a lot less than that.
I have a Rails application and get the coverage from the tests using simplecov. It generates HTML output in a directory by running a rake task. I would like to see the current coverage through gitlab. Is there a simple way to integrate this report with gitlab?

I fear there is still no easy way to integrate code coverage reports but Gitlab now supports (since Version 8.0 integrated) build jobs for your code. Unfortunately you have to implement your solution by writing a custom .gitlab-ci.yml to run your coverage tests. For viewing the reports, you can specify the generated "artifacts" or publish them on gitlab pages.
For more information, see here: https://about.gitlab.com/gitlab-ci/
Additionally you can parse a text output to display a short code coverage report:
(Enable builds and output test coverage)
Go to "Project Settings" -> Builds
Add to "Test coverage parsing" a regular expression (examples below, simplecov included)

See Publish Code Coverage Report with GitLab Pages

The short answer: Unfortunately there is no easy way to do this.
The longer answer:
GitLab not yet has a Jenkins support.
What you basically need is a service like GitLab CI or Jenkins CI, which starts simplecov and posts the output back to GitLab. Unfortunately GitLab does not offer such a functionality yet.
But I know other organizations which do have a Jenkins service for GitLab which automatically comment git pushes with the Jenkins result.

You now (June 2020, GitLab 13.1) have code coverage history, in addition of Test coverage parsing.
Graph code coverage changes over time for a project
All too often, a project has a code coverage target but development teams might not have much visibility into which direction that target value is trending over time.
There needs to be an easier way to track changes in code coverage over time without that extra hassle.
The Code Coverage graph now provides better visibility into how code coverage is trending over time.
It displays a simple graph of the coverage value(s) calculated in pipelines.
See Documentation and Issue
With GitLab 13.6 (November 2020), you also have (not for free though)
Display code coverage data for selected projects
In 13.4, we released the first iteration of Code Coverage data for Groups that enables you to compare the coverage of multiple projects and download the data in a single file from a single screen. However, to analyze the data, you had to open the file to check it manually, and probably imported it into a spreadsheet for further analysis.
In GitLab 13.6, you can now select specific projects in a group to see their latest coverage values directly in GitLab itself without needing to download a file or waste development time accessing code coverage data. We welcome feedback on the functionality and possible iterations for this feature in our feedback issue.
See Documentation and Issue.

Deployment after CI builds

Im pretty new to CI so bear with me here. I have just setup an instance of Team City in on a local machine, and I can clearly see the benefits.
The one thing we do want understand is how we can managed the deployment aspect of CI. What we really want to achieve are two builds:
1) We check in to our source repository and the CI server notices the change and compiles the code, tests etc.
2) We manually trigger a build that compiles the code, copies the code to a remote server and update its IIS mappings.
Now the first build is pretty much wrapped up with TeamCity. But I assume that the deployment aspect of this is going to involve some scripting (Nant, MsBuild, Rake etc) is this correct?
If this is the case, I can see that transferring files from the build machine to a remote server will be ok, but will we be able to update IIS mappings without being on the same network? For that matter where is THE correct place to deploy a CI server, should is live on the same network as the apps we deploy?
Finally, we have been (rather unorthadoxily) using IronRuby to run rake scripts as our build runner. This is simply because we like Rake, but if we were to look at Nant/Msbuild do they have any baked in tasks that would simplify what we are trying to achieve?
Cheers, Chris.

We use MSBuild exclusively, just a choice. I am sure Nant and the others do things just as well. We only publish to a dev environment (for dev testing) and a stage environment (Where QA actually tests). I would not suggest that you put the production system push on this as the temptation to force builds might be too great for some people.
We use some of the MSBuild Community Tasks

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to write automated test for Gitlab CI/CD configuration? - gitlab

Gitlab have a linter for gitlab-ci.yaml To access the CI Lint tool, navigate to CI/CD > Pipelines or CI/CD > Jobs in your project and click CI lint. Docs

Related

What is a good Databricks workflow

Variable substitution in build pipeline

VSTS CI/CD definition as scripts

Is there a way to add coverage report to gitlab?

Deployment after CI builds

Categories

Resources