GitLab CI caching key

GitLab CI caching key - gitlab

Say I have the following step in my .gitlab-ci.yml file:
setup_vue:
image: ....
stage: setup
script:
- cd vue/
- npm install --no-audit
cache:
key: node-cache
paths:
- vue/node-modules/
I see;
Checking cache for node-cache-1...
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
Successfully extracted cache
And after the script runs:
Creating cache node-cache-1...
Created cache
WARNING: vue/node-modules/: no matching files
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally.
Job succeeded
When I try to get the cache on the next step like so:
test_vue:
image: ....
stage: test
cache:
key: node-cache
script:
- cd docker-hotreload-vue
- cqc src
- npm test
It doesnt try to retrieve any cache, and just tries to run the script (which fails obviously). According to the GitLab docs this is the correct way to do this. (I'm using a docker runner)
Here's the output I get:
Fetching changes...
fatal: remote origin already exists.
Removing vue/node_modules/
HEAD is now at ....
Checking out ...
Skipping Git submodules setup
$ cd docker-hotreload-vue
$ cqc src
I am using tags to ensure the same runner is executing the jobs.

Try updating your key to the below:
cache:
key: ${CI_COMMIT_REF_SLUG}
This solved my problem. I had 3 stages - build, test, package. Without the key set to ${CI_COMMIT_REF_SLUG}, the cache only worked for test stage. After updating the key, now the package stage can also extract the cache properly.

Related

Gitlab CI/CD cache expires and therefor build fails

I got AWS CDK application in typescript and pretty simple gitlab CI/CD pipeline with 2 stages, which takes care of the deployment:
image: node:latest
stages:
- dependencies
- deploy
dependencies:
stage: dependencies
only:
refs:
- master
changes:
- package-lock.json
script:
- npm install
- rm -rf node_modules/sharp
- SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install --arch=x64 --platform=linux --libc=glibc sharp
cache:
key:
files:
- package-lock.json
paths:
- node_modules
policy: push
deploy:
stage: deploy
only:
- master
script:
- npm run deploy
cache:
key:
files:
- package-lock.json
paths:
- node_modules
policy: pull
npm run deploy is just a wrapper for the cdk command.
But for some reason, sometimes it happens, that the cache of the node_modules (probably) expires - simply deploy stage is not able to fetch for it and therefore the deploy stage fails:
Restoring cache
Checking cache for ***-protected...
WARNING: file does not exist
Failed to extract cache
I checked that the cache name is the same as the one built previously in the last pipeline run with dependencies stage.
I suppose it happens, as often times this CI/CD is not running even for multiple weeks, since I contribute to that repo rarely. I was trying to search for the root causes but failed miserably. I pretty much understand that cache can expire after some times(30 days from what I found by default), but I would expect CI/CD to recover from that by running the dependencies stage despite the fact package-lock.json wasn't updated.
So my question is simply "What am I missing? Is my understanding of caching in Gitlab's CI/CD completely wrong? Do I have to turn on some feature switcher?"
Basically my ultimate goal is to skip the building of the node_modules part as often as possible, but not failing on the non-existent cache even if I don't run the pipeline for multiple months.

A cache is only a performance optimization, but is not guaranteed to always work. Your expectation that the cache might be expired is most likely correct, and thus you'll need to have a fallback in your deploy script.
One thing you could do is that you change your dependencies job to:
Always run
Both push & pull the cache
Shortcircuit the job if the cache was found.
E.g. something like this:
dependencies:
stage: dependencies
only:
refs:
- master
changes:
- package-lock.json
script:
- |
if [[ -d node_modules ]]; then
exit 0
fi
- npm install
- rm -rf node_modules/sharp
- SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install --arch=x64 --platform=linux --libc=glibc sharp
cache:
key:
files:
- package-lock.json
paths:
- node_modules
See also this related question.
If you want to avoid spinning up unnecessary jobs, then you could also consider to merge the dependencies & deploy jobs, and take a similar approach as above in the combined job.

GitLab cache key: files - file does not exist

I have a short pipeline. And it constantly fails with not being able to find the cache:
node:
stage: Install
cache:
- key:
files:
- package.json
- package-lock.json
prefix: node
paths: [node_modules]
- key: npm
paths: [.npm]
rules:
- changes:
- package.json
- package-lock.json
script:
- npm i
mocha:
stage: Test
script:
- npm test
cache:
- key:
files:
- package.json
- package-lock.json
prefix: node
paths: [ node_modules ]
policy: pull
This pipeline run well on Branch 1
And on Branch 2, the node job skipped, as expected, however, job mocha failed with
Checking cache for node-313ff968911abee510931abad7ccd29ed21954b5-17-non_protected...
WARNING: file does not exist
Failed to extract cache
This is strange because it should use cache from the run of Branch 1 pipeline.
I use shared runners with Merge Pipeline if it's important.

even though this is an old question asked but it might save someone else day for using cache in different branches, what I understand your cache is working as expected in feature which is probably non-proctected branch and when you're trying to create a merge-request to merge your changes to a protected branch probably dev/main.
Basically protected and non-protected branches don't share cache in Gitlab CI by default as mentioned in their docs.
By default, protected and non-protected branches do not share the cache. However, you can change this behavior.
https://docs.gitlab.com/ee/ci/caching/
Use the same cache for all branches
Introduced in GitLab 15.0.
If you do not want to use cache key names, you can have all branches (protected and unprotected) use the same cache.
The cache separation with cache key names is a security feature and should only be disabled in an environment where all users with Developer role are highly trusted.
To use the same cache for all branches:
On the top bar, select Main menu > Projects and find your project.
On the left sidebar, select Settings > CI/CD.
Expand General pipelines.
Clear the Use separate caches for protected branches checkbox.
Select Save changes.

Why does GitLab Ci not find my cached folder?

I have a list of CI jobs running in my GitLab and the Caching does not work as expected:
This is how my docu-generation job ends:
[09:19:33] Documentation generated in ./documentation/ in 4.397 seconds using gitbook theme
Creating cache angular...
00:02
WARNING: frontend/node_modules: no matching files
frontend/documentation: found 136 matching files
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally.
Created cache
Job succeeded
I then start a deployment Job (to GitLab Pages) but it fails because it doesn't find the documentation-folder:
$ cp -r frontend/documentation .public/frontend
cp: cannot stat 'frontend/documentation': No such file or directory
this is the cache config of the generation:
generate_docu_frontend:
image: node:12.19.0
stage: build
cache:
key: angular
paths:
- frontend/node_modules
- frontend/documentation
needs: ["download_angular"]
and this is for deployment:
deploy_documentation:
stage: deploy
cache:
- key: angular
paths:
- frontend/node_modules
- frontend/documentation
policy: pull
- key: laravel
paths:
- backend/vendor
- backend/public/docs
policy: pull
does anyone know why my documentation folder is missing?

The message in your job output No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally. just means that your runners are not using Amazon S3 to store your cache, or something similar like Minio.
Without S3/Minio, the cache only lives on the runner that first ran the job and cached the resources. This means that the next time the job runs and it happens to be picked up by a different runner, it won't have the cache. In that case, you'd run into an error like this.
There's a couple ways around this:
Configure your runners to use S3/Minio (Minio has an open source, free-to-use license if you're interested in hosting it yourself).
Only use one runner (not a great solution since generally more runners means faster pipelines and this would slow things down considerably, though it would solve the cache problem).
Use tags. Tags are used to ensure that a job runs on a specific runner(s). Let's say for example that 1 out of your 10 runners have access to your production servers, but all have access to your lower environment servers. Your lower-env jobs can run on any runner, but your Production Deployment job has to run on the one runner with prod access. You can do this by putting a Tag on the runner called let's say prod-access and putting the same tag on the prod deploy job. This will ensure that job will run on the runner with prod access. The same thing can be used here to ensure the cache is available.
Use artifacts instead of cache. I'll explain this option below as it's really what you should be using for this use case.
Let's briefly explain the difference between Cache and Artifacts:
Cache is generally best used for dependency installation like npm or composer (for PHP projects). When you have a job that runs npm ci or composer install, you don't want it to run every since time your pipeline runs when you don't necessary change the dependencies as it wastes time. Use the cache keyword to cache the dependencies so that subsequent pipelines don't have to install the dependencies again.
Artifacts are best used when you need to share files or directories between jobs in the same pipeline. For example, after installing npm dependencies, you might need to use the node_modules directory in another job in the pipeline. Artifacts are also uploaded to the GitLab server by the runner at the end of the job, opposed to being stored locally on the runner that ran the job. All previous artifacts will be downloaded for all subsequent jobs, unless controlled with either dependencies or needs.
Artifacts are the better choice for your use case.
Let's update your .gitlab-ci.yml file to use artifacts instead of cache:
stages:
- build
- deploy
generate_docu_frontend:
image: node:12.19.0
stage: build
script:
- ./generate_docs.sh # this is just a representation of whatever steps you run to generate the docs
artifacts:
paths:
- frontend/node_modules
- frontend/documentation
expire_in: 6 hours # your GitLab instance will have a default, you can override it like this
when: on_success # don't attempt to upload the docs if generating them failed
deploy_documentation:
stage: deploy
script:
- ls # just an example showing that frontend/node_modules and frontend/documentation are present
- deploy.sh # whatever else you need to run this job

GitLab pipeline throwing an error "Error: Could not find or load main class"

While trying to run the GitLab pipeline, I am getting an error
"Error: Could not find or load main class Testing\GitLab-Runner\builds\EgKZ847y\0\sandeshmms\LearningSelenium..m2.repository"
Also, it is giving this message:
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
Below is the console message:
Running with gitlab-runner 14.2.0 (58ba2b95)
on my-runner1 EgKZ847y
Preparing the "shell" executor 00:00
Using Shell executor...
Preparing environment
Running on HOMEPC...
Getting source from Git repository 00:10
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in D:/Java Testing/GitLab-Runner/builds/EgKZ847y/0/sandeshmms/LearningSelenium/.git/
Checking out 41ee697d as develop...
git-lfs/2.12.1 (GitHub; windows 386; go 1.14.10; git 85b28e06)
Skipping Git submodules setup
Restoring cache 00:02
Version: 14.2.0
Git revision: 58ba2b95
Git branch: 14-2-stable
GO version: go1.13.8
Built: 2021-08-22T19:47:56+0000
OS/Arch: windows/386
Checking cache for default-14...
Runtime platform arch=386 os=windows pid=5420 revision=58ba2b95 version=14.2.0
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
Successfully extracted cache
Executing "step_script" stage of the job script 00:03
$ echo "Testing Job Triggered"
Testing Job Triggered
$ echo $CI_PROJECT_DIR
D:\Java Testing\GitLab-Runner\builds\EgKZ847y\0\sandeshmms\LearningSelenium
$ mvn $MAVEN_OPTS clean test
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Error: Could not find or load main class Testing\GitLab-Runner\builds\EgKZ847y\0\sandeshmms\LearningSelenium..m2.repository
Uploading artifacts for failed job 00:02
Version: 14.2.0
Git revision: 58ba2b95
Git branch: 14-2-stable
GO version: go1.13.8
Built: 2021-08-22T19:47:56+0000
OS/Arch: windows/386
Uploading artifacts...
Runtime platform arch=386 os=windows pid=4312 revision=58ba2b95 version=14.2.0
WARNING: target/surefire-reports/*: no matching files
ERROR: No files to upload
Cleaning up file based variables 00:01
ERROR: Job failed: exit status 1
Below is the complete yaml file:
stages:
- test
variables:
# This will suppress any download for dependencies and plugins or upload messages which would clutter the console log.
# `showDateTime` will show the passed time in milliseconds. You need to specify `--batch-mode` to make this work.
MAVEN_OPTS: "-Dmaven.repo.local=$CI_PROJECT_DIR/.m2/repository"
# Cache downloaded dependencies and plugins between builds.
# To keep cache across branches add 'key: "$CI_JOB_NAME"'
cache:
paths:
- .m2/repository
test job:
stage: test
tags:
- testing
script:
- echo "Testing Job Triggered"
- echo $CI_PROJECT_DIR
- 'mvn $MAVEN_OPTS clean test'
- echo "Testing Job Finished"
artifacts:
when: always
paths:
- target/surefire-reports/*
But if I remove the variables section and cache section from the yaml file and in the script section if I give just mvn clean test, then the build runs fine.
Also, it is downloading the maven repository to 'C:\Windows\System32\config\systemprofile\.m2\repository'. Any reason why it is downloading to this directory ?
Can anyone please help on this ?

The message No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted. just means that your GitLab instance isn't configured to use a service like AWS S3 or Min.io to store your cached items. Without it, the cache can only be stored locally where your Gitlab-Runners are running. This also means that the cache stored on one runner cannot be shared with another runner, which is most likely how you can run into the error you have. You also don't have a key, so the runner doesn't know when to download which cached items.
Here's an example of a job building NPM dependencies that use the cache and a key for the specific ref name (a branch, commit, or tag):
...
Run NPM Install:
stage: build
cache:
key: $CI_COMMIT_REF_NAME
paths:
- node_modules
script:
- npm ci
artifacts:
paths:
- node_modules
...
In this job, for the first pipeline for the branch, commit, or tag under CI_COMMIT_REF_NAME, it will run npm ci and upload it as an artifact for jobs later in the pipeline to use. However, if a pipeline for the same branch, commit, or tag is run again, instead of running npm ci it will download the cached node_modules directory, and upload that as an artifact.
See Caching in GitLab CI/CD for more information on caching, and Distributed Caching for information on using S3 or Minio to distribute your cache across all runners.

Stop gitlab runner to not remove a directory

I have a directory which is generated during a build and it should not be deleted in the next builds. I tried to keep the directory using cache in .gitlab-ci.yml:
cache:
key: "$CI_BUILD_REF_NAME"
untracked: true
paths:
- target_directory/
build-runner1:
stage: build
script:
- ./build-platform.sh target_directory
In the first build a cache.zip is generated but for the next builds the target_directory is deleted and the cache.zip is extracted which takes a very long time. Here is a log of the the second build:
Running with gitlab-ci-multi-runner 1.11.
on Runner1
Using Shell executor...
Running on Runner1...
Fetching changes...
Removing target_directory/
HEAD is now at xxxxx Update .gitlab-ci.yml
From xxxx
Checking out xxx as master...
Skipping Git submodules setup
Checking cache for master...
Successfully extracted cache
Is there a way that gitlab runner not remove the directory in the first place?

What you need is to use a job artifacts:
Artifacts is a list of files and directories which are attached to a
job after it completes successfully.
.gitlab-ci.yml file:
your job:
before_script:
- do something
script:
- do another thing
- do something to generate your zip file (example: myFiles.zip)
artifacts:
paths:
- myFiles.zip
After a job finishes, if you visit the job's specific page, you can see that there is a button for downloading the artifacts archive.
Note
If you need to pass artifacts between different jobs, you need to use dependencies.
Gitlab has a good documentation about that if you really have this need http://docs.gitlab.com/ce/ci/yaml/README.html#dependencies

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string