Gitlab CI/CD cache expires and therefor build fails - node.js

I got AWS CDK application in typescript and pretty simple gitlab CI/CD pipeline with 2 stages, which takes care of the deployment:
image: node:latest
stages:
- dependencies
- deploy
dependencies:
stage: dependencies
only:
refs:
- master
changes:
- package-lock.json
script:
- npm install
- rm -rf node_modules/sharp
- SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install --arch=x64 --platform=linux --libc=glibc sharp
cache:
key:
files:
- package-lock.json
paths:
- node_modules
policy: push
deploy:
stage: deploy
only:
- master
script:
- npm run deploy
cache:
key:
files:
- package-lock.json
paths:
- node_modules
policy: pull
npm run deploy is just a wrapper for the cdk command.
But for some reason, sometimes it happens, that the cache of the node_modules (probably) expires - simply deploy stage is not able to fetch for it and therefore the deploy stage fails:
Restoring cache
Checking cache for ***-protected...
WARNING: file does not exist
Failed to extract cache
I checked that the cache name is the same as the one built previously in the last pipeline run with dependencies stage.
I suppose it happens, as often times this CI/CD is not running even for multiple weeks, since I contribute to that repo rarely. I was trying to search for the root causes but failed miserably. I pretty much understand that cache can expire after some times(30 days from what I found by default), but I would expect CI/CD to recover from that by running the dependencies stage despite the fact package-lock.json wasn't updated.
So my question is simply "What am I missing? Is my understanding of caching in Gitlab's CI/CD completely wrong? Do I have to turn on some feature switcher?"
Basically my ultimate goal is to skip the building of the node_modules part as often as possible, but not failing on the non-existent cache even if I don't run the pipeline for multiple months.

A cache is only a performance optimization, but is not guaranteed to always work. Your expectation that the cache might be expired is most likely correct, and thus you'll need to have a fallback in your deploy script.
One thing you could do is that you change your dependencies job to:
Always run
Both push & pull the cache
Shortcircuit the job if the cache was found.
E.g. something like this:
dependencies:
stage: dependencies
only:
refs:
- master
changes:
- package-lock.json
script:
- |
if [[ -d node_modules ]]; then
exit 0
fi
- npm install
- rm -rf node_modules/sharp
- SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install --arch=x64 --platform=linux --libc=glibc sharp
cache:
key:
files:
- package-lock.json
paths:
- node_modules
See also this related question.
If you want to avoid spinning up unnecessary jobs, then you could also consider to merge the dependencies & deploy jobs, and take a similar approach as above in the combined job.

Related

GitLab cache key: files - file does not exist

I have a short pipeline. And it constantly fails with not being able to find the cache:
node:
stage: Install
cache:
- key:
files:
- package.json
- package-lock.json
prefix: node
paths: [node_modules]
- key: npm
paths: [.npm]
rules:
- changes:
- package.json
- package-lock.json
script:
- npm i
mocha:
stage: Test
script:
- npm test
cache:
- key:
files:
- package.json
- package-lock.json
prefix: node
paths: [ node_modules ]
policy: pull
This pipeline run well on Branch 1
And on Branch 2, the node job skipped, as expected, however, job mocha failed with
Checking cache for node-313ff968911abee510931abad7ccd29ed21954b5-17-non_protected...
WARNING: file does not exist
Failed to extract cache
This is strange because it should use cache from the run of Branch 1 pipeline.
I use shared runners with Merge Pipeline if it's important.
even though this is an old question asked but it might save someone else day for using cache in different branches, what I understand your cache is working as expected in feature which is probably non-proctected branch and when you're trying to create a merge-request to merge your changes to a protected branch probably dev/main.
Basically protected and non-protected branches don't share cache in Gitlab CI by default as mentioned in their docs.
By default, protected and non-protected branches do not share the cache. However, you can change this behavior.
https://docs.gitlab.com/ee/ci/caching/
Use the same cache for all branches
Introduced in GitLab 15.0.
If you do not want to use cache key names, you can have all branches (protected and unprotected) use the same cache.
The cache separation with cache key names is a security feature and should only be disabled in an environment where all users with Developer role are highly trusted.
To use the same cache for all branches:
On the top bar, select Main menu > Projects and find your project.
On the left sidebar, select Settings > CI/CD.
Expand General pipelines.
Clear the Use separate caches for protected branches checkbox.
Select Save changes.

Gitlab-CI avoid unnecessary rebuilds of react portion of project

I have a stage in my CI pipeline (gitlab-ci) as follows:
build_node:
stage: Build Prerequisites
only:
- staging
- production
- ci
image: node:15.5.0
artifacts:
paths:
- http
cache:
key: "node_modules"
paths:
- ui/node_modules
script:
- cd ui
- yarn install --network-timeout 600000
- CI=false yarn build
- mv build ../http
The UI however, is not the only part of the project. There are other files with their own build processes. So whenever we commit changes for only those other files, this stage gets rerun every time, even if nothing in the ui folder changed.
Is there a way to have gitlab cache or otherwise not rebuild this every time if there were no changes? Any changes that should trigger a rebuild would all be under the ui folder. Just have it use the older build if possible?
It is possible to do in latest Gitlab version using the rules:changes keyword.
rules:
- changes:
- ui/*
Link: https://docs.gitlab.com/ee/ci/jobs/job_control.html#variables-in-ruleschanges
This will only check for changes inside the ui folder and trigger this stage.
Check this link for more info: https://docs.gitlab.com/ee/ci/yaml/#ruleschanges

Caching npm dependencies in circleci

I'm setting up the CI for an existing Express server project that lives in my repo's backend/core folder. Starting with just basic setup and linting. I was able to get npm install and linting to work but I wanted to cache the dependencies so that it wouldn't take 4 minutes to load for each push.
I used the caching scheme they describe here but it still seemed to run the full install each time. Or if it was using cached dependencies, it installed grpc each time which took a while. Any ideas what I can do?
My config.yml for reference:
# Use the latest 2.1 version of CircleCI pipeline process engine. See: https://circleci.com/docs/2.0/configuration-reference
# default executors
executors:
core-executor:
docker:
- image: 'cimg/base:stable'
commands:
init_env:
description: initialize environment
steps:
- checkout
- node/install
- restore_cache:
keys:
# when lock file changes, use increasingly general patterns to restore cache
- node-v1-{{ .Branch }}-{{ checksum "backend/core/package-lock.json" }}
- node-v1-{{ .Branch }}-
- node-v1-
- run: npm --prefix ./backend/core install
- save_cache:
paths:
- ~/backend/core/usr/local/lib/node_modules # location depends on npm version
key: node-v1-{{ .Branch }}-{{ checksum "backend/core/package-lock.json" }}
jobs:
install-node:
executor: core-executor
steps:
- checkout
- node/install
- run: node --version
- run: pwd
- run: ls -A
- run: npm --prefix ./backend/core install
lint:
executor: core-executor
steps:
- init_env
- run: pwd
- run: ls -A
- run: ls backend
- run: ls backend/core -A
- run: npm --prefix ./backend/core run lint
orbs:
node: circleci/node#4.1.0
version: 2.1
workflows:
test_my_app:
jobs:
#- install-node
- lint
#requires:
#- install-node
I think the best thing to do is to use npm ci which is faster. Best explanation of this is here: https://stackoverflow.com/a/53325242/4410223. Even though it will reinstall every time, it is consistent so better than caching. Although when using this, I am unsure what the point of continuing to use cache in your pipeline is, but caching still seems to be recommended with npm ci.
However, the best way to do this is to just use the node orb you already have in your config. A single step of - node/install-packages will do all that work for you. You will be able to replace it with your restore_cache, npm install and save_cache steps. You can even see all the steps it does here: https://circleci.com/developer/orbs/orb/circleci/node#commands-install-packages. Just open the command source and look at the steps on line 71.

GitLab CI caching key

Say I have the following step in my .gitlab-ci.yml file:
setup_vue:
image: ....
stage: setup
script:
- cd vue/
- npm install --no-audit
cache:
key: node-cache
paths:
- vue/node-modules/
I see;
Checking cache for node-cache-1...
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
Successfully extracted cache
And after the script runs:
Creating cache node-cache-1...
Created cache
WARNING: vue/node-modules/: no matching files
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally.
Job succeeded
When I try to get the cache on the next step like so:
test_vue:
image: ....
stage: test
cache:
key: node-cache
script:
- cd docker-hotreload-vue
- cqc src
- npm test
It doesnt try to retrieve any cache, and just tries to run the script (which fails obviously). According to the GitLab docs this is the correct way to do this. (I'm using a docker runner)
Here's the output I get:
Fetching changes...
fatal: remote origin already exists.
Removing vue/node_modules/
HEAD is now at ....
Checking out ...
Skipping Git submodules setup
$ cd docker-hotreload-vue
$ cqc src
I am using tags to ensure the same runner is executing the jobs.
Try updating your key to the below:
cache:
key: ${CI_COMMIT_REF_SLUG}
This solved my problem. I had 3 stages - build, test, package. Without the key set to ${CI_COMMIT_REF_SLUG}, the cache only worked for test stage. After updating the key, now the package stage can also extract the cache properly.

GitLab CI job takes 20+ minutes

With gitlab-ci I am using a simple .yml file. I have defined various stages to run synchronously. I have set a cache for node_modules. But the problem is that the cache of node_modules is actually slowing down the process. This cache is required to make the node_modules the same across each stage. (Each stage automatically clears /node_modules for some reason)
When building locally this whole process takes less then 2 minutes. But on the CI machine this process takes between 20 and 25 minutes. Learning how Gitlab CI works internally, I've learned that it's zipping the node_module files (about 36K small files) and that process is extremely slow.
tl;dr: What is the proper way to handle node_module caching with Gitlab CI without uploading node_modules to artifacts? I would like to avoid uploading artifacts that are over 400MB large.
See configuration below:
cache:
untracked: true
key: "%CI_COMMIT_REF_NAME%"
paths:
- node_modules
stages:
- install
- eslint-check
- eslint
- prettier
- test
- dist
# install dependancies
install:
stage: install
script:
- yarn install
environment:
name: development
# run eslint-check
eslint-check:
stage: eslint-check
script:
- yarn eslint-check
environment:
name: development
# Other scripts below
It would appear that there will be a solution for this in the future as the issue has been discussed here for almost two years. A milestone has been set so this can be resolved eventually.
https://gitlab.com/gitlab-org/gitlab-runner/issues/1797

Resources