How to make a cluster of GitLab instances - gitlab

Is it possible to create a cluster of multiple GitLab instances (multiple machines)? My instance is over utilized and I would like to add other machines, but at the same for the user should be transparent to access his project, he doesn't care which instance it will be hosted on.
What could be the best solution to help the users?
I'm on GitLab Community Edition 10.6.4
Thanks for your help,
Leonardo

I reckon you are talking about scaling GitLab server, not GitLab runners.
GitLab Omnibus is a fairly complex system with multiple components, some are stateless and some are stateful.
If you currently have everything on the same server, the easiest option is to scale up (move to bigger machine).
If you can't, you can extract stateful components to host them separately: PostgreSQL, Redis, files to NFS.
Funnily you can make performance worse here.
Next step you can scale out the stateless side.
But it is in no way an easy task.
I'd suggest to start with setting up proper monitoring to see where are your limitations (CPU, RAM, IO) and bottle-necks (in which components).
See docs, including some examples of scaling:
https://docs.gitlab.com/ee/administration/high_availability/
https://about.gitlab.com/solutions/high-availability/
https://docs.gitlab.com/charts/
https://docs.gitlab.com/ee/development/architecture.html
https://docs.gitlab.com/ee/administration/high_availability/gitlab.html

Related

How to manage patching on multiple AWS accounts with different schedules

I'm looking for the best way to manage patching Linux systems across AWS accounts with the following things to consider:
Separate schedules to roll patches through Dev, QA, Staging and Prod sequentially
Production patches to be released on approval, not automatic
No newer patches can be deployed to Production than what was already deployed to lower environments (as new patches come out periodically throughout the month)
We have started by caching all patches in all environments on the first Sunday of every month. The goal there was to then install patches from cache. This helps prevent un-vetted patches being installed in prod.
Most, not all, instances are managed by OpsWorks, but there are numerous OpsWorks stacks. We have some other instances managed by Chef Server. Still others are not managed, but are just simple EC2 instances created from the EC2 console. This means, using recipes means we have to kick off approved patches on a stack-by-stack basis or instance-by-instance basis. Not optimal.
More recently, we have looked at the new features of SSM using a central AWS account to manage instances. However, this causes problems with some applications because the AssumeRole for SSM adds credentials to the .aws/config file that interferes with other tasks we need to run.
We have considered other tools, such as Ansible, but we would like to explore staying within the toolset we currently have which is largely OpsWorks and Chef Server. I'm looking for ideas that are more on a higher level, an architecture of how one would approach this scenario.
Thanks for any thoughts or ideas.
This sounds like one of the exact scenarios RunCommand was designed for.
You can create multiple groups of servers with different schedules based on tags. More importantly, you don't need to rely on secret/keys being deployed anywhere.

Why nobody does not make it in the docker? (All-in-one container/"black box")

I need a lot of various web applications and microservices.
Also, I need to do easy backup/restore and move it between servers/cloud providers.
I started to study Docker for this. And I'm embarrassed when I see advice like this: "create first container for your application, create second container for your database and link these together".
But why I need to do separate container for database? If I understand correctly, the main message is the docker the: "allow to run and move applications with all these dependencies in isolated environment". That is, as I understand, it is appropriate to place in the container application and all its dependencies (especially if it's a small application with no require to have external database).
How I see the best-way for use Docker in my case:
Take a baseimage (eg phusion/baseimage)
Build my own image based on this (with nginx, database and
application code).
Expose port for interaction with my application.
Create data-volume based on this image on the target server (for store application data, database, uploads etc) or restore data-volume from prevous backup.
Run this container and have fun.
Pros:
Easy to backup/restore/move application around all. (Move data-volume only and simply start it on the new server/environment).
Application is the "black box", with no headache external dependencies.
If I need to store data in external databases or use data form this - nothing prevents me for doing it (but usually it is never necessary). And I prefer to use the API of other blackboxes instead direct access to their databases.
Much isolation and security than in the case of a single database for all containers.
Cons:
Greater consumption of RAM and disk space.
A little bit hard to scale. (If I need several instances of app for response on thousand requests per second - I can move database in separate container and link several app instances on it. But it need in very rare cases)
Why I not found recommendations for use of this approach? What's wrong with it? What's the pitfalls I have not seen?
First of all you need to understand a Docker container is not a virtual machine, just a wrapper around the kernel features chroot, cgroups and namespaces, using layered filesystems, with its own packaging format. A virtual machine usually a heavyweight, stateful artifact with extensive configuration options regarding to the resources available on the host machine and you can setup complex environments within a VM.
A container is a lightweight, throwable runtime environment with a recommendation to make it as stateless as possible. All changes are stored with in the container that is just a running instance of the image and you'll loose all diffs in case of container deletion. Of course you can map volumes for more static data, but this is available for the multi-container architecture too.
If you pack everything into one container you loose the capability to scale the components independently from each other and build a tight coupling.
With this tight coupling you can't implement fail-over, redundancy and scalability features into your app config. The most modern nosql databases are built to scale out easily and also the data redundancy could be a possibility when you run more than one backing database instance.
On the other side defining this single-responsible containers is easy with docker-compose, where you can declare them in a simple yml file.

Setup Puppet at first place

I am trying to understand the best practice of setting up Puppet in the first place, let's say I have 1000 existing servers needs to be managed Puppet.
Do I manually install Puppet agent on each or there is a better way.
Sorry if this question is too generic just want to have some idea.
1000 servers could be a lot for a single master instance. of course it will depend on the master specs, and other factors related to the puppet runs.
There are few questions you need to answer first to determine how are you going to go about it such as
Puppet Enterprise or Open Source? What is the current configuration night mare you are trying to solve?
What is the current configuration data related to the challenge or
problem you have?
What are the current business roles (e.g. web server, load
balancer,database, ..etc) related to the problem you have? What
makes a role in terms of configurations?
I would suggest that you start first small to learn more about the puppet DSL, and its ECO system (master, agent, puppetdb, console/dashboard). I also recommend you start with the free 10 nodes puppet Enterprise as it will let you focus more on the problem at hand not how to configure the puppet masters, and agents, how to scale them, ..etc.
One more thing install puppet agent every where if you can in NOOP/disabled mode to get at least facts and run it in a masterless fashion using puppet apply when you need to. i find NOOP mode more useful as it tells you what needs to be changed, also you can enforce changes using --no-noop
hope that will get you started.
To answer your question: Yes, Puppet agent would need to be installed on every node. If you are managing 1000 nodes, I would assume you have your own OS image. In this case, its best to add it to the OS image, and use this image on 1000 nodes.

Keeping Multiple Servers in a Cluster In-Sync?

I'm currently managing a cluster of PHP-FPM servers, all of which tend to get out of sync with each other. The application that I'm using on top of the app servers (Magento) allows for admins to modify various files on the system, but now that the site is in a clustered set up modifying a file only modifies it on a single instance (on one of the app servers) of the various machines in the cluster.
Is there an open-source application for Linux that may allow me to keep all of these servers in sync? I have no problem with creating a small VM instance that can listen for changes from machines to sync. In theory, the perfect application would have small clients that run on each machine to be synced, which would talk to the master server which would then decide how/what to sync from each machine.
I have already examined the possibilities of running a centralized file server, but unfortunately my app servers are spread out between EC2 and physical machines, which makes this unfeasible. As there are multiple app servers (some of which are dynamically created depending on the load of the site), simply setting up a rsync cron job is not efficient as the cron job would have to be modified on each machine to send files to every other machine in the cluster, and that would just be a whole bunch of unnecessary data transfers/ssh connections.
I'm dealing with setting up a similar solution. I'm half way there. I would recommend you use lsyncd, which basically monitors the disk for changes and then immediately (or whatever interval you want) automatically syncs files to a list of servers using rsync.
The only issue I'm having is keeping the server lists up to date, since I can spin up additional servers at any time, I would need to have each machine in the cluster notified whenever a machine is added or removed from the cluster.
I think lsyncd is a great solution that you should look into. The issue I'm having may turn out to be a problem for you as well, and that remains to be solved.
Instead of keeping tens or hundreds of servers cross-synchronized it would be much more efficient, reliable, and most of all simple maintaining just one "admin node" and replicating changes from that to all your "worker nodes".
For instance at our company we use a Development server -> Staging server -> Live backends workflow where all the changes are transferred across servers using a custom php+rsync front end. That allows the developers to push updates to a Staging server in the live environment, test out changes, and roll them to Live backends incrementally.
A similar approach could very well work in your case as well. Obviously it's not a plug-and-play solution, but I see it as the easiest way to go - both in terms of maintainability and scalability.

Recommended approach & tools to provision a VM instance(s) from node.js?

I am trying to implement a 'lab in the cloud' to allow people to have a sandbox to experiment and learn in; i.e. for devops (chef/puppet), installing or configuring software etc.
I have a node.js server implementation to manage this and looking for sane and reasonable ways to attack this problem.
The options are bewilderingly diverse: puppet or chef directly, or vagrant seems appropriate. But Openstack, cloudfoundry, Amazon EC2 also provide their own feature sets.
Ideally a micro-cloud solution (multiple VM's per instance) would be ideal as there isn't going to be any large computational load.
Suggestions most appreciated.
Cheers
After some investigation, it seems that LXC on EC2 might be the way forward:
It gives
lightweight, instances on a single EC2 instance
supports hibernate/restore
fast to standup
able to automate using chef/cucumber
EC2 virtualization using LXC
Chef-lxc
Testing infrastructure code in LXC using Cucumber

Resources