What is the point of `cache:key` in .gitlab-ci.yml?

What is the point of `cache:key` in .gitlab-ci.yml? - gitlab

According to the docs:
Since the cache is shared between jobs, if you’re using different paths for different jobs, you should also set a different cache:key otherwise cache content can be overwritten.
This sounds weird to me.
So if I'm "using different paths for different jobs" like this
job_a:
paths:
- binaries/
job_b:
paths:
- node_modules/
How could the cache be overwritten..?
Does it mean node_modules will overwrite binaries ?? because the cache key is the same?
Anyone knows the details of the implementation of cache in gitlab?
Does it works like this??
$job_cache_key = $job_cache_key || 'default';
if ($cache[$job_cache_key]){
return $cache[$job_cache_key];
}
$cache[$job_cache_key] = $job_cache;
return $job_cache;

Cache keys in GitLab mimick Rails caching, although, as app/models/concerns/faster_cache_keys.rb mentions:
# Rails' default "cache_key" method uses all kind of complex logic to figure
# out the cache key. In many cases this complexity and overhead may not be
# needed.
#
# This method does not do any timestamp parsing as this process is quite
# expensive and not needed when generating cache keys. This method also relies
# on the table name instead of the cache namespace name as the latter uses
# complex logic to generate the exact same value (as when using the table
# name) in 99% of the cases.
The pipeline itself starts with initializing its local cache: lib/gitlab/ci/pipeline/seed/build/cache.rb
You can see a cache example in spec/lib/gitlab/ci/pipeline/seed/build/cache_spec.rb
Does it mean node_modules will overwrite binaries ?? because the cache key is the same?
No: Each job will use their own paths set, which override any path set defined in a global cache.
gitlab-org/gitlab-runner issue 2838 asks about cache per job, and give the example:
stages:
- build
- build-image
# the following line is the global cache configuration but also defines an anchor with the name of "cache"
# you can refer to the anchor and reuse this cache configuration in your jobs.
# you can also add and replace properties
# In the job definitions you will find examples.
# for more information regarding reuse in YAML files, see https://blog.daemonl.com/2016/02/yaml.html
cache: &cache
paths:
- api/node_modules/
- global/node_modules/
- frontend/node_modules/
# first job, it does not have an explicit cache definition:
# therefore it uses the global cache definition!
build-app:
stage: build
image: node:8
before_script:
- yarn
- cd frontend
script:
- npm run build
# a job in a later stage, have a look at the cache block!
# it "inherits" from the global cache block and adds the "policy: pull" key / value
build-image-api:
stage: build-image
image: docker
dependencies: []
cache:
<<: *cache
policy: pull
before_script:
# .... and so on
That inheritance mechanism is also documented in the "Inherit global config, but override specific settings per job" section of caching
You can override cache settings without overwriting the global cache by using anchors.
For example, if you want to override the policy for one job:
cache: &global_cache
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
- public/
- vendor/
policy: pull-push
job:
cache:
# inherit all global cache settings
<<: *global_cache
# override the policy
policy: pull
1+ year later (Q2 2021):
See GitLab 13.11 (April 2021)
Use multiple caches in the same job
GitLab CI/CD provides a caching mechanism that saves precious development time when your jobs are running. Previously, it was impossible to configure multiple cache keys in the same job. This limitation may have caused you to use artifacts for caching, or use duplicate jobs with different cache paths. In this release, we provide the ability to configure multiple cache keys in a single job which will help you increase your pipeline performance.
https://about.gitlab.com/images/13_11/cache.png -- Use multiple caches in the same job
See Documentation and Issue.

Related

Gitlab runner does not always update environment

I have a gitlab job that does not seem to update the repository before being run. Sometimes it leaves some files in their old states and run the script... Any idea ?
For instance when I have a
packagePython:
stage: package
script:
- .\scripts\PackagePython.ps1
tags:
- myServer
cache:
paths:
- .\python\cache\
only:
changes:
- python/**/*

I finally managed to understand what was happening :
I realised that the gitlab-runner did not use exactly the same path for each run on my server, and my script assumed that it did... So I ended up pointing on a build made on the wrong path.
I guess if you think that it is not updating the repository (like I did) make sure you are not referencing hardcoded path/package in your scripts that could refer to previous versions !

Split GitLab CI Jobs into multiple files

I've been trying to get more familiar with GitLab's CI functionality and find the idea of splitting up a CI pipeline into multiple separate jobs interesting. This would allow me to maintain one project of "known jobs" and include them in other projects.
So far, I have something like this:
$ ls
jobA.yaml jobB.yaml jobC.yaml jobD.yaml
Those 4 are all identical (for now), and have the following:
job-name:
stage: my-stage # Might be needed to differentiate later on
tags: runner-tag # used to figure out where/how the job should be done: directly on a server, in a container, etc
script:
- echo "beep beep"
In the actual .gitlab-ci.yaml I want to use, I would then (I think) put something like this. In this case, I would use the jobs defined in the project for itself:
include:
project: '$CI_PROJECT_PATH'
file: "*.yaml"
stages:
- my-stage
That gives me back a linter error though. Perhaps I'm misreading the documentation, but I think that should be possible somehow....

This should be a comment, but can't put formatted code in there..
We use a main yml, which just include all the others. It is not wildcards like you have.
Have you tried changing "file" to "local"? with the leading "- "?
include:
- template: Code-Quality.gitlab-ci.yml
- local: '/.gitlab/py.yml'
- local: '/.gitlab/static.yml'
- local: '/.gitlab/lint.yml'
- local: '/.gitlab/docs.yml'
- local: '/.gitlab/publish.yml'

According to the docs, wildcard includes are only possible with local. Furthermore you need to move your jobA.yaml to a directory as otherwise you will include your .gitlab-ci.yml as well with a wildcard on the top level.
So the following works with JobA.yaml in config:
include:
- local: 'config/*.yaml'
stages:
- my-stage

Gitlab - cache error message about no URL

I have the following logic in my GitLab-ci:
stages:
- build
- deploy
job_make_zip:
tags:
- test123
image: node:10.19
stage: build
script:
- npm install
- make
- make source-package
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- node_modules/
artifacts:
when:
paths:
- test.bz2
expire_in: 2 days
When the job runs, I see the following message:
17 Restoring cache
18 Checking cache for master...
19 No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
20 Successfully extracted cache
I'm just new to Gitlab... and so I can't tell if this is an error or not. I basically don't want to have to download the same npm modules every single time this build runs.
I found a similar post here: GitLab CI caching key
But I'm already using the correct gitlab CI variable.
Any suggestions would be appreciated.

In my GitLab-CI setup at home I am getting this warning (in my case I am not considering it to be an error) in all of my build jobs. According to https://gitlab.com/gitlab-org/gitlab/-/issues/201861 and https://gitlab.com/gitlab-org/gitlab-runner/-/issues/16097 there seem to be cases where this is a message to be taken seriously.
This is especially true if you are uploading (and later on downloading / extracting) the cache to a particular URL, which is used by several runners to get and sync the cache. In a general case though - meaning that if the cache is stored on a single GitLab-Runner rather than on a shared source, which is supposed to be used by several GitLab-Runners, I don't think this message has any real meaning. On my GitLab-Runners, which usually are project- or group-specific, this never was a problem and I always had the cache properly extracted in a local manner.

GitLab CI Yocto Build - How to use SSTATE and DL_DIR

How to configure GitLab CI to store the SSTATE_DIR and the DL_DIR between several jobs? Currently, bitbake rebuilds the complete project every time, which is very time consuming. So i would like to use the sstage again. I tried caching, but building time increases effectively, due to the big zip/unzip overhead.
I don't even need a shared sstate between several projects, just a method to store the output between jobs.
I'm using Gitlab 11.2.3 with a shell executor as runner.
Thanks a lot!

In version 11.10, GIT_CLEAN_FLAGS was added, which could be used to achieve what you want to do with the shell executor.
For completeness: when using the docker executor, this can be achieved by using docker volumes, which are persistent across builds.

If you're only using one runner for this, you could potentially use GIT_STRATEGY: none, which would re-use the project workspace for the following job; relevant documentation. However, this wouldn't be extremely accurate in case you have multiple jobs running which requires the runner, as it could dilute the repository, if jobs are started from across different pipelines.
Another way, if you're still using one runner, is you could just copy the directories out and back into the job you require.
Otherwise, you may potentially be out of luck, and have to wait for the sticky runners issue.

You can reuse a shared-state cache between jobs simply as follows:
Specify the path to the sstate-cache directory in the .yml file of your
gitlab-ci pipeline. An example fragment from one of mine:
myrepo.yml
stages:
...
...
variables:
...
TCM_MACHINE: buzby2
...
SSTATE_CACHE: /sstate-cache/$TCM_MACHINE/PLAT3/
PURGE_SSTATE_CACHE: N
...
In my case /sstate-cache/$TCM_MACHINE/PLAT3/ is a directory in the docker container
that runs the build. This path is mounted in the docker container from a
permanent sstate cache directory on the build-server's filesystem, /var/bitbake/sstate-cache/<machine-id>/PLAT3.
PURGE_SSTATE_CACHE is overridable by a private variable
in the pipeline schedule settings so that I can optionally delete the cache for a squeaky clean
build.
Ensure that the setting of SSTATE_CACHE is appended to the bitbake conf/local.conf
file of the build, e.g.
.build_image: &build_image
stage: build
tags:
...
before_script:
...
script:
- echo "SSTATE_DIR ?= \"$SSTATE_CACHE\"" >> conf/local.conf
...
Apply the same pattern to DL_DIR if you use it.
Variables you use in the .yml file can be overriden by gitlab-ci trigger
or schedule variables. See Priority of variables

Concourse CI - Build Artifacts inside source, pass all to next task

I want to set up a build pipeline in Concourse for my web application. The application is built using Node.
The plan is to do something like this:
,-> build style guide -> dockerize
source code -> npm install -> npm test -|
`-> build website -> dockerize
The problem is, after npm install, a new container is created so the node_modules directory is lost. I want to pass node_modules into the later tasks but because it is "inside" the source code, it doesn't like it and gives me
invalid task configuration:
you may not have more than one input or output when one of them has a path of '.'
Here's my jobs set up
jobs:
- name: test
serial: true
disable_manual_trigger: false
plan:
- get: source-code
trigger: true
- task: npm-install
config:
platform: linux
image_resource:
type: docker-image
source: {repository: node, tag: "6" }
inputs:
- name: source-code
path: .
outputs:
- name: node_modules
run:
path: npm
args: [ install ]
- task: npm-test
config:
platform: linux
image_resource:
type: docker-image
source: {repository: node, tag: "6" }
inputs:
- name: source-code
path: .
- name: node_modules
run:
path: npm
args: [ test ]
Update 2016-06-14
Inputs and outputs are just directories. So you put what you want output into an output directory and you can then pass it to another task in the same job. Inputs and Outputs can not overlap, so in order to do it with npm, you'd have to either copy node_modules, or the entire source folder from the input folder to an output folder, then use that in the next task.
This doesn't work between jobs though. Best suggestion I've seen so far is to use a temporary git repository or bucket to push everything up. There has to be a better way of doing this since part of what I'm trying to do is avoid huge amounts of network IO.

There is a resource specifically designed for this use case of npm between jobs. I have been using it for a couple of weeks now:
https://github.com/ymedlop/npm-cache-resource
It basically allow you to cache the first install of npm and just inject it as a folder into the next job of your pipeline. You could quite easily setup your own caching resources from reading the source of that one as well, If you want to cache more than node_modules.
I am actually using this npm-cache-resource in combination with a Nexus proxy to speed up the initial npm install further.
Be aware that some npm packages have native bindings that need to be built with the standardlibs that matches the containers linux versions standard libs so, If you move between different types of containers a lot you may experience some issues with libmusl etc, in that case I recommend either streamlinging to use the same container types through the pipeline or rebuilding the node_modules in question...
There is a similar one for gradle (on which the npm one is based upon)
https://github.com/projectfalcon/gradle-cache-resource

This doesn't work between jobs though.
This is by design. Each step (get, task, put) in a Job is run in an isolated container. Inputs and outputs are only valid inside a single job.
What connects Jobs is Resources. Pushing to git is one way. It'd almost certainly be faster and easier to use a blob store (eg S3) or file store (eg FTP).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What is the point of `cache:key` in .gitlab-ci.yml? - gitlab

Related

Gitlab runner does not always update environment

Split GitLab CI Jobs into multiple files

Gitlab - cache error message about no URL

GitLab CI Yocto Build - How to use SSTATE and DL_DIR

Concourse CI - Build Artifacts inside source, pass all to next task

Categories

Resources