GitLab pipeline execute a job when a cache not available - gitlab

We use GitLab pipeline and try to implement the cache mechanism to avoid the node packages getting downloaded multiple times in a pipeline through different jobs. Our scenario is below
The pipeline should check for the availability of cache in the local cache directory
if found , then it should download the files and move further with Sonarqube scan
If not found, then the pipeline should execute a specific job which do npm install and once its finished it should move further with sonarqube scan.
As far as I verified the documents , I didn't see any option to run a specific job only when there is no cache available in the local cache location. Any suggestions?

Related

Depend on named artifact in GitLab pipeline

The first job in my GitLab pipeline builds multiple large artifacts of different types:
Applications that should be packaged as Docker images
NuGet packages
Database migrations
The job publishes multiple artifacts, at the moment one for each type. The size of all artifacts are in the realm om hundreds of megabytes.
In the next stage, multiple jobs are run in parallel to process and publish the artifacts from the build step. A significant amount of time is spent to download all artifacts from the build job, even though each job only needs one artifact. How can I configure GitLab CI so that a job depends on a specific artifact from a job?

(How) can I use job on a different pipeline?

Gitlab CI/CD
I want to run a job for the Merge Request that needs jar artefacts built during a job run in another pipelines (for commits).
I don't want to build them again because if takes too much time then the job will fail.
Can I do that?
I checked the
needs:
- project:
But that's only with Premium licences.

How to run gitlab-runner job locally that requires artifacts

We have a pipeline with many jobs and the last job failed. I'm trying to debug the issue, but the job requires artifacts from previous jobs.
How can I run this job locally with gitlab-runner so it has access to these artifacts?
That's not possible (yet).
See the limitations of the exec compared to regular CI here (artifacts -> not available).
Consider upvoting the issue to get this fixed.

How to Scale out Gitlab EE

Currently I am running the whole gitlab EE as a single container. I need to scale out the service so that it can support more users and more operations/pull/push/Merge Requests etc simultanously.
I need to run a redis cluster of its own
I need to run a PG cluster separate
I need to integrate elasticsearch for search
But how can I scale out the remaning core gitlab services. Do they support a scale out architecture.
gitlab workhorse
unicorn ( gitlab rails )
sidekiq ( gitlab rails )
gitaly
gitlab shell
Do they support a scale out architecture.
Not exactly, considering the GitLab Omnibus image is one package with bundled dependencies.
But I never experienced so much traffic that it needed to be split up and scaled out.
There is though a proposal for splitting up the Omnibus image: gitlab-org/omnibus-gitlab issue 1800.
It points out to gitlab-org/build/CNG which does just what you are looking for:
Each directory contains the Dockerfile for a specific component of the infrastructure needed to run GitLab.
rails - The Rails code needed for both API and web.
unicorn - The Unicorn container that exposes Rails.
sidekiq - The Sidekiq container that runs async Rails jobs
shell - Running GitLab Shell and OpenSSH to provide git over ssh, and authorized keys support from the database
gitaly - The Gitaly container that provides a distributed git repos
The other option, using Kubernetes, is the charts/gitlab:
The gitlab chart is the best way to operate GitLab on Kubernetes. This chart contains all the required components to get started, and can scale to large deployments.
Some of the key benefits of this chart and corresponding containers are:
Improved scalability and reliability
No requirement for root privileges
Utilization of object storage instead of NFS for storage
The default deployment includes:
Core GitLab components: Unicorn, Shell, Workhorse, Registry, Sidekiq, and Gitaly
Optional dependencies: Postgres, Redis, Minio
An auto-scaling, unprivileged GitLab Runner using the Kubernetes executor
Automatically provisioned SSL via Let's Encrypt.
Update Sept. 2020:
GitLab 13.4 offers one feature which can help scaling out GitLab on-premise:
Gitaly Cluster majority-wins reference transactions (beta)
Gitaly Cluster allows Git repositories to be replicated on multiple warm Gitaly nodes. This improves fault tolerance by removing single points of failure.
Reference transactions, introduced in GitLab 13.3, causes changes to be broadcast to all the Gitaly nodes in the cluster, but only the Gitaly nodes that vote in agreement with the primary node persist the changes to disk.
If all the replica nodes dissented, only one copy of the change would be persisted to disk, creating a single point of failure until asynchronous replication completed.
Majority-wins voting improves fault tolerance by requiring a majority of nodes to agree before persisting changes to disk. When the feature flag is enabled, writes must succeed on multiple nodes. Dissenting nodes are automatically brought in sync by asynchronous replication from the nodes that formed the quorum.
See Documentation and Issue.

Automated Spark testing environment

My problem: I am developing a Spark extension and I would like to perform tests and performance at scale before making the changes public. Currently such tests are a bit too manual: I compile & package my libraries, copy the jar files to a cluster where I have a private Spark deployment, restart Spark, then fire tests and benchmarks by hand. After each test I manually inspect logs and console output.
Could someone with more experience offer hints on how to make this more automatic? I am particularly interested in:
Ability to integrate with Github & Jenkins. Ideally I would only have to push a commit to the GitHub repo, then Jenkins would automatically pull and build, add the new libraries to a Spark environment, start Spark & trigger the tests and benchmarks, and finally collect & make output files available.
How to run and manage the Spark cluster. I see a number of options:
a) continue with having a single Spark installation: The test framework would update my jar files, restart Spark so the new libraries are picked up and then run the tests/benchmarks. The advantage would be that I only have to set up Spark (and maybe HDFS for sharing data & application binaries, YARN as the resource manager, etc) once.
b) run Spark in containers: My cluster would run a container management system (like Kubernetes). The test framework would create/update the Spark container image, fire up & configure a number of containers to start Spark, submit the test/benchmarks and collect results. The big advantage of this is that multiple developers can run tests in parallel and that I can test various versions of Spark & Hadoop.
Create a Docker container that has your entire solution contained including tests, push it to GitHub and have a DroneCi or Travis CI build it and listen for updates. It works great for me. 😀
There are many Spark docker images on GitHub or Docker hub I use this one:
https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook

Resources