Why does GitLab Ci not find my cached folder? - gitlab

I have a list of CI jobs running in my GitLab and the Caching does not work as expected:
This is how my docu-generation job ends:
[09:19:33] Documentation generated in ./documentation/ in 4.397 seconds using gitbook theme
Creating cache angular...
00:02
WARNING: frontend/node_modules: no matching files
frontend/documentation: found 136 matching files
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally.
Created cache
Job succeeded
I then start a deployment Job (to GitLab Pages) but it fails because it doesn't find the documentation-folder:
$ cp -r frontend/documentation .public/frontend
cp: cannot stat 'frontend/documentation': No such file or directory
this is the cache config of the generation:
generate_docu_frontend:
image: node:12.19.0
stage: build
cache:
key: angular
paths:
- frontend/node_modules
- frontend/documentation
needs: ["download_angular"]
and this is for deployment:
deploy_documentation:
stage: deploy
cache:
- key: angular
paths:
- frontend/node_modules
- frontend/documentation
policy: pull
- key: laravel
paths:
- backend/vendor
- backend/public/docs
policy: pull
does anyone know why my documentation folder is missing?

The message in your job output No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally. just means that your runners are not using Amazon S3 to store your cache, or something similar like Minio.
Without S3/Minio, the cache only lives on the runner that first ran the job and cached the resources. This means that the next time the job runs and it happens to be picked up by a different runner, it won't have the cache. In that case, you'd run into an error like this.
There's a couple ways around this:
Configure your runners to use S3/Minio (Minio has an open source, free-to-use license if you're interested in hosting it yourself).
Only use one runner (not a great solution since generally more runners means faster pipelines and this would slow things down considerably, though it would solve the cache problem).
Use tags. Tags are used to ensure that a job runs on a specific runner(s). Let's say for example that 1 out of your 10 runners have access to your production servers, but all have access to your lower environment servers. Your lower-env jobs can run on any runner, but your Production Deployment job has to run on the one runner with prod access. You can do this by putting a Tag on the runner called let's say prod-access and putting the same tag on the prod deploy job. This will ensure that job will run on the runner with prod access. The same thing can be used here to ensure the cache is available.
Use artifacts instead of cache. I'll explain this option below as it's really what you should be using for this use case.
Let's briefly explain the difference between Cache and Artifacts:
Cache is generally best used for dependency installation like npm or composer (for PHP projects). When you have a job that runs npm ci or composer install, you don't want it to run every since time your pipeline runs when you don't necessary change the dependencies as it wastes time. Use the cache keyword to cache the dependencies so that subsequent pipelines don't have to install the dependencies again.
Artifacts are best used when you need to share files or directories between jobs in the same pipeline. For example, after installing npm dependencies, you might need to use the node_modules directory in another job in the pipeline. Artifacts are also uploaded to the GitLab server by the runner at the end of the job, opposed to being stored locally on the runner that ran the job. All previous artifacts will be downloaded for all subsequent jobs, unless controlled with either dependencies or needs.
Artifacts are the better choice for your use case.
Let's update your .gitlab-ci.yml file to use artifacts instead of cache:
stages:
- build
- deploy
generate_docu_frontend:
image: node:12.19.0
stage: build
script:
- ./generate_docs.sh # this is just a representation of whatever steps you run to generate the docs
artifacts:
paths:
- frontend/node_modules
- frontend/documentation
expire_in: 6 hours # your GitLab instance will have a default, you can override it like this
when: on_success # don't attempt to upload the docs if generating them failed
deploy_documentation:
stage: deploy
script:
- ls # just an example showing that frontend/node_modules and frontend/documentation are present
- deploy.sh # whatever else you need to run this job

Related

Use cache with multiple image in gitlab CICD

I am making a gitlab CI/CD pipeline that uses two different image.
One of them necessitate the installations of some package using npm. In order to avoid multiple-time installation I've added some cache.
Let's see this example :
stages:
- build
- quality
cache:
paths:
- node_modules/
build-one:
image: node:latest
stage: build
script:
- npm install <some package>
build-two:
image: foo_image:latest
stage: build
script:
- some cmd
quality:
image: node:latest
stage: quality
script:
- <some cmd using the previously installed package>
The fact of having two different docker images forces me to specify it inside the job definition. So from my tests the cache isn't actually used and the command executed by the quality job will fail because the package isn't installed.
Is there a solution to this problem ?
Many thanks !
Kev'.
There can be two cases
Same runner is being used to run all the jobs. In this case the way to specified cache should work fine.
Different runners are being used to run different jobs. So suppose build job runs with runner 1 and quality jobs is running with runner 2 so the cache will only be present in runner 1.
In order to make use of caching in case 2 you will have to use distributed caching.
Then runner 1 will run the build job it will push the cache to s3 and runner 2 will pull with cache during the quality job and then can use that.

How to access artifacts in next stage in GitLab CI/CD

I am trying to build GitLab CI/CD for the first time. I have two stages build and deploy The job in the build stage produce artifacts. And then the job in deploy stage wants to upload those artifacts to AWS S3. Both the jobs are using same runner but different docker image.
default:
tags:
- dev-runner
stages:
- build
- deploy
build-job:
image: node:14
stage: build
script:
- npm install
- npm run build:prod
artifacts:
paths:
- deploy/build.zip
deploy-job:
image: docker.xx/xx/gitlab-templates/awscli
stage: deploy
script:
- aws s3 cp deploy/build.zip s3://mys3bucket
The build-job is successfully creating the artifacts. GitLab documentation says artifacts will be automatically downloaded and available in the next stage, however it does not specify where & how these artifacts will be available to consume in the next stage.
Question
In the deploy-job will the artifacts available at the same location? like deploy/build.zip
The artifacts should be available to the second job in the same location, where the first job saved them using the 'artifacts' directive.
I think this question already has an answer on the gitlab forum:
https://forum.gitlab.com/t/access-artifact-in-next-task-to-deploy/9295
Maybe you need to make sure the jobs run in the correct order using the dependencies directive, which is also mentioned in the forum discussion accesible via the link above.

GitLab cache key: files - file does not exist

I have a short pipeline. And it constantly fails with not being able to find the cache:
node:
stage: Install
cache:
- key:
files:
- package.json
- package-lock.json
prefix: node
paths: [node_modules]
- key: npm
paths: [.npm]
rules:
- changes:
- package.json
- package-lock.json
script:
- npm i
mocha:
stage: Test
script:
- npm test
cache:
- key:
files:
- package.json
- package-lock.json
prefix: node
paths: [ node_modules ]
policy: pull
This pipeline run well on Branch 1
And on Branch 2, the node job skipped, as expected, however, job mocha failed with
Checking cache for node-313ff968911abee510931abad7ccd29ed21954b5-17-non_protected...
WARNING: file does not exist
Failed to extract cache
This is strange because it should use cache from the run of Branch 1 pipeline.
I use shared runners with Merge Pipeline if it's important.
even though this is an old question asked but it might save someone else day for using cache in different branches, what I understand your cache is working as expected in feature which is probably non-proctected branch and when you're trying to create a merge-request to merge your changes to a protected branch probably dev/main.
Basically protected and non-protected branches don't share cache in Gitlab CI by default as mentioned in their docs.
By default, protected and non-protected branches do not share the cache. However, you can change this behavior.
https://docs.gitlab.com/ee/ci/caching/
Use the same cache for all branches
Introduced in GitLab 15.0.
If you do not want to use cache key names, you can have all branches (protected and unprotected) use the same cache.
The cache separation with cache key names is a security feature and should only be disabled in an environment where all users with Developer role are highly trusted.
To use the same cache for all branches:
On the top bar, select Main menu > Projects and find your project.
On the left sidebar, select Settings > CI/CD.
Expand General pipelines.
Clear the Use separate caches for protected branches checkbox.
Select Save changes.

GitLab CI job takes 20+ minutes

With gitlab-ci I am using a simple .yml file. I have defined various stages to run synchronously. I have set a cache for node_modules. But the problem is that the cache of node_modules is actually slowing down the process. This cache is required to make the node_modules the same across each stage. (Each stage automatically clears /node_modules for some reason)
When building locally this whole process takes less then 2 minutes. But on the CI machine this process takes between 20 and 25 minutes. Learning how Gitlab CI works internally, I've learned that it's zipping the node_module files (about 36K small files) and that process is extremely slow.
tl;dr: What is the proper way to handle node_module caching with Gitlab CI without uploading node_modules to artifacts? I would like to avoid uploading artifacts that are over 400MB large.
See configuration below:
cache:
untracked: true
key: "%CI_COMMIT_REF_NAME%"
paths:
- node_modules
stages:
- install
- eslint-check
- eslint
- prettier
- test
- dist
# install dependancies
install:
stage: install
script:
- yarn install
environment:
name: development
# run eslint-check
eslint-check:
stage: eslint-check
script:
- yarn eslint-check
environment:
name: development
# Other scripts below
It would appear that there will be a solution for this in the future as the issue has been discussed here for almost two years. A milestone has been set so this can be resolved eventually.
https://gitlab.com/gitlab-org/gitlab-runner/issues/1797

Stop gitlab runner to not remove a directory

I have a directory which is generated during a build and it should not be deleted in the next builds. I tried to keep the directory using cache in .gitlab-ci.yml:
cache:
key: "$CI_BUILD_REF_NAME"
untracked: true
paths:
- target_directory/
build-runner1:
stage: build
script:
- ./build-platform.sh target_directory
In the first build a cache.zip is generated but for the next builds the target_directory is deleted and the cache.zip is extracted which takes a very long time. Here is a log of the the second build:
Running with gitlab-ci-multi-runner 1.11.
on Runner1
Using Shell executor...
Running on Runner1...
Fetching changes...
Removing target_directory/
HEAD is now at xxxxx Update .gitlab-ci.yml
From xxxx
Checking out xxx as master...
Skipping Git submodules setup
Checking cache for master...
Successfully extracted cache
Is there a way that gitlab runner not remove the directory in the first place?
What you need is to use a job artifacts:
Artifacts is a list of files and directories which are attached to a
job after it completes successfully.
.gitlab-ci.yml file:
your job:
before_script:
- do something
script:
- do another thing
- do something to generate your zip file (example: myFiles.zip)
artifacts:
paths:
- myFiles.zip
After a job finishes, if you visit the job's specific page, you can see that there is a button for downloading the artifacts archive.
Note
If you need to pass artifacts between different jobs, you need to use dependencies.
Gitlab has a good documentation about that if you really have this need http://docs.gitlab.com/ce/ci/yaml/README.html#dependencies

Resources