I'm just looking for some advice on puppet. Mainly what everyone uses for their puppet master's hardware and whether or not there is a limit on the maximum number of nodes that a puppet master can serve.
I'm looking at setting up a puppet cluster to manage a few thousand servers and was curious how much hardware I would need to throw at this project.
I do not manage much servers ( less than 10 ) for the moment, so I cannot help you much,
but if scaling is a issue, you can have a masterless deployment, using git and running puppet with cron ( http://current.workingdirectory.net/posts/2011/puppet-without-masters/ ).
You just need to deploy your config, using cron ( and the vcs of your choice ), and to apply it, using cron too ( or fabric, or mcollective ). Then the only thing that need to scale is the vcs server, and this is much easier to do. You can even simply use rsync or nfs, and I think serving a few thousand server would not be a problem.
However, the issue would be stored configs.
Related
Is it possible to create a cluster of multiple GitLab instances (multiple machines)? My instance is over utilized and I would like to add other machines, but at the same for the user should be transparent to access his project, he doesn't care which instance it will be hosted on.
What could be the best solution to help the users?
I'm on GitLab Community Edition 10.6.4
Thanks for your help,
Leonardo
I reckon you are talking about scaling GitLab server, not GitLab runners.
GitLab Omnibus is a fairly complex system with multiple components, some are stateless and some are stateful.
If you currently have everything on the same server, the easiest option is to scale up (move to bigger machine).
If you can't, you can extract stateful components to host them separately: PostgreSQL, Redis, files to NFS.
Funnily you can make performance worse here.
Next step you can scale out the stateless side.
But it is in no way an easy task.
I'd suggest to start with setting up proper monitoring to see where are your limitations (CPU, RAM, IO) and bottle-necks (in which components).
See docs, including some examples of scaling:
https://docs.gitlab.com/ee/administration/high_availability/
https://about.gitlab.com/solutions/high-availability/
https://docs.gitlab.com/charts/
https://docs.gitlab.com/ee/development/architecture.html
https://docs.gitlab.com/ee/administration/high_availability/gitlab.html
There is the need that one puppet agent contacts some different puppet masters.
Reason: there are different groups that create different and independent sets of manifests.
Possible groups and their tasks
Application Vendor: configuration of application
Security: hardening
Operations: routing tables, monitoring tools
Each of these groups should run it's own puppet master - the data (manifests and appropriate data) should be strictly separated. If it is possible, one group should even not see / have access to the manifests of the others (we are using MAC on the puppet agent OSes).
Thoughts and ideas that all failed:
using (only) hira is not flexible as needed - there is the need to have different manifests.
r10k: supports more than one environment, but in each environment can only access one set of manifests.
multi but same puppet server using e.g. DNS round robin: this is the other way round. We need different puppet masters.
Some ways that might be possible but...
running multiple instances of puppet agents. That 'feels' strange. Advantage: the access rights can be limited in the way as needed (e.g. the application puppet agent can run under the application user).
patching puppet that it can handle more than one puppet master. Disadvantage: might be some work.
using other mechanisms to split responsibility. Example: use different git-repositories. Create one puppet master. The puppet master pulls all the different repositories and serves the manifests.
My questions:
Is there a straight forward way implementing this requirement with puppet?
If not, is there some best practice how to do this?
While I think what you are trying to do here is better tackled by incorporating all of your modules and data onto a single master, and that utilizing environments will be effectively the exact same situation (different masters will provide a different set of modules/data) this can be achieved by implementing a standard multi-master infrastructure (one CA master for cert signing, multiple compile masters with certs signed by the same CA master, configured to forward cert traffic elsewhere) and configure each master to have whatever you need. You then end up having to specify which master you want to check in to on each run (a cronjob or some other approach), and have the potential for one checkin to change settings set by another (kinda eliminating the hardening/security concept).
I would urge you to think deeper on how to collaborate your varied aspects (git repos for each division's hiera data and modules that have access control) so that a central master can serve your needs (and access to that master would be the only way to get data/modules from everywhere).
This type of setup will be complex to implement, but the end result will be more reliable and maintainable. Puppet inc. may even be able to do consultation to help you get it right.
There are likely other approaches too, just fyi.
I've often found it convenient to multi-home a puppet agent for development purposes, because with a localĀ puppet server you can instantly test manifest changes - there's no requirement to commit, push and r10k deploy environment like there is if you're just using directory environments and a single (remote) puppet server.
I've found the best way to do that is to just vary the path configuration (otherwise you run into problems with e.g. the CA certs failing to verify against the other server) - a form of your "running multiple instances of puppet agents" suggestion. (I still run them all privileged, so they can all use apt package {} etc.)
For Puppet 3, I'd do this by varying the libdir with --libdir (because the ssldir was under the libdir), but now (Puppet 4+) it looks more sensible to vary the --confdir. So, for example:
$ sudo puppet agent -t # Runs against main puppet server
$ sudo puppet agent -t \
--server=puppet.dev.example.com \
--confdir=/etc/puppetlabs/puppet-dev # Runs against dev puppet server
I am trying to understand the best practice of setting up Puppet in the first place, let's say I have 1000 existing servers needs to be managed Puppet.
Do I manually install Puppet agent on each or there is a better way.
Sorry if this question is too generic just want to have some idea.
1000 servers could be a lot for a single master instance. of course it will depend on the master specs, and other factors related to the puppet runs.
There are few questions you need to answer first to determine how are you going to go about it such as
Puppet Enterprise or Open Source? What is the current configuration night mare you are trying to solve?
What is the current configuration data related to the challenge or
problem you have?
What are the current business roles (e.g. web server, load
balancer,database, ..etc) related to the problem you have? What
makes a role in terms of configurations?
I would suggest that you start first small to learn more about the puppet DSL, and its ECO system (master, agent, puppetdb, console/dashboard). I also recommend you start with the free 10 nodes puppet Enterprise as it will let you focus more on the problem at hand not how to configure the puppet masters, and agents, how to scale them, ..etc.
One more thing install puppet agent every where if you can in NOOP/disabled mode to get at least facts and run it in a masterless fashion using puppet apply when you need to. i find NOOP mode more useful as it tells you what needs to be changed, also you can enforce changes using --no-noop
hope that will get you started.
To answer your question: Yes, Puppet agent would need to be installed on every node. If you are managing 1000 nodes, I would assume you have your own OS image. In this case, its best to add it to the OS image, and use this image on 1000 nodes.
I currently run a Red Hat Linux server with Plesk to host a hundred or so domains. For multiple reasons I'd like to transition away from Plesk and to Docker containers with each virtual host as one or more containers. I'm unclear from what I've read so far what would be the best approach to this.
A typical site includes the doc root file area and one or two MySQL databases. We run PHP on all the sites. Some sites may have constraints on the version of PHP they can run. Some of the sites use SSL. I don't believe there are any constraints on the MySQL versions, but it's of course possible that future MySQL versions could deprecate some feature that is needed. I don't believe there's any dependency on the Apache version, but I do rely on some specific Apache modules being installed. There may be a site or two that have dependencies outside of their doc root and not part of the basic virtual host setup, but I don't believe any require a specific version of Linux.
I would like the containers to have maximum portability so that I can have flexibility in moving sites to whatever server or cloud service I choose. Part of my goal is to retire the current server and move sites to servers which best fit them.
I would also like to try upgrading the PHP version after the containers are created.
So would a single container include the entire doc root file system, including the data directories where users can upload/ftp files? Would it include the MySQL database, or would that be separate? I assume I would include the current version of PHP so that I could upgrade each one when I was ready. Would it include Apache when specific Apache modules are required? Is there a reason to include Apache and/or MySQL in all containers?
One last piece. I'm looking into using CoreOS which utilizes Docker as an integral part.
Any and all inputs are appreciated.
The whole idea of Docker is running processes/components isolated, to keep them easily upgradable. I have tinkered with this in the past and have come up with the following.
Create four containers per instance (customer):
Apache or nginx
php-fpm
MySQL
Busybox (as a data container)
Link all of them together and set volumes to all data that should persist in the data container. MySQL data and /var/www plus site config files for example.
This way you can always switch out one of the components while keeping the others. It's questionable though if Docker is a solution to a full virtual server though, as Docker containers do not have a full init system and you'll have to resort to bending things quite a bit to resemble a full virtual machine. Think more of it as "application containers", hence the idea with the separation of concerns.
Update:
Newer Docker versions come with the docker-compose tool which greatly eases this task.
I am trying to solve the same issues with cPanel instead of Plesk.
We can try and accomplish this using the plugins for cpanel or plesk however we have to worry about few things.
and we have to create some premade template images for containers that our clients can use.. it cannot be just any container from dockerhub,user Dockerfiles,etc Because cPanel/Plesk will look for specific log files available on specifc locations for bw calculations, disk quota,etc.
Biggest advantage with this solution is that we can provide CloudLinux kind of isolation and easy resource allocation/ fair sharing. However it is not as easy.
To answer your question:
Every container will be nearly a complete system so you will need to have less clients per host, because each container might be like 1G and by default have to run its own webserver/php and hence more ram foot print.
Its painful to run a Mysql inside each container and it is better to use mysql on the host or 1 dedicated container and share it. this way the Plesk's tools will help.
You may also have to use the standard apache and then reverse proxy it to each container after ssl termination so Plesk's standard tools are used but then I think containers will have to run its own webserver itself or we may have to do some trickery with php-fpm to allow host's apache to talk to each container's php-fpm processes . This is more painful than allowing each container to just run its own Nginx but possible.
It doesnt prevent users from installing their own Mysql server within their container if they need.
This kind of stuff is easy for someone from cPanel or Plesk to do.. but for others it will need a lot of Dedicated development time + testing to make sure all this works.
I was going to invest some time in creating this kind of plugin for cPanel but still undediced. I may try this if I can rope in some investors.
You can see amount of interest , CPanel shows on this issue : http://features.cpanel.net/responses/dockerio-support
I will leave you to decide
Also as an alternative solution:
so Instead of playing to the Cpanel's tune I created this . https://github.com/paimpozhil/WhatPanel
Here every site runs in its own container ( and its own VM if needed.).
Migration is simple as exporting/importing a container with tools like : on github.com /paimpozhil/docker-volume-backup & acaranta/docker-backuper
I didnt complete the migrator/ php upgrade tools ,etc here but will do when i have free time.
I'm experimenting with Puppet scripts for deployment.
I find the hardest part about the process of writing those scripts is iteratively testing them.
I don't want to puppet apply on my local development machine, that liable to screw stuff up. I have a clean-slate remote box where I want to apply. I also don't see how a puppetmaster can help me; I might be using a puppetmaster at a later point for production deployments, but for now, I just want to get my code working.
So I put together a quick shell script that would rsync the different directories from my local puppet module path to /tmp on the remote machine, and then run puppet apply. This is terribly inconvenient. It's slow, especially if we're talking about a syntax error.
I think what I want really is something like a puppetd <-> puppetmaster connection, where puppetd on the remote machine receives an already compiled manifest. Just an adhoc-one over a SSH connection, without having to actual setup an Puppetmaster, dealing with certificates etc. puppet apply user#host.
There seems to be nothing of the sort, but how do other people deal with this? I experience of working on a Puppet script is incredibly frustrating to me, as is.
I'd recommend using Vagrant. If you're not testing the puppet master setup you can use the built in provisioner integration.
Once you have everything setup you can run vagrant provision or just run puppet apply on the vagrant vm.
Here's a related article you may find helpful as well.
I would also take a look at puppet rpsec tests, using rspec-puppet and puppetlabs-spec-helper. The rspec-puppet-init will break puppet doc and geppetto and maybe some other things due to the symlinks, and there are some issues with hiera, but the tests are easy to setup otherwise and work well, and can also be tied into jenkins/hudson.
I usually have two levels of testing for my Puppet scripts.
Unit tests for quick feedback: Written using rspec-puppet, these compile a Puppet catalog for the class/define/etc being tested, and make assertions about it. Run locally each time I make a minor change, and on the build server each time I check in. The tests run quickly (<10 seconds), and pick up syntax and dependency issues.
Functional tests to make sure it really works: Written using Cucumber with the Aruba library. When I'm finished implementing a feature and the unit tests for it pass, these tests provision a VM (using Vagrant) with the appropriate Puppet manifest(s), log in, and make assertions about the VM's state. The tests themselves look something like:
Given I am SSHed into Vagrant box "webserver"
When I type "php --version"
Then the output should include "PHP 5.4.11"
Vagrant is the most useful environment for rapid infrastructure development that I've found. It most closely (99%) will mirror your production setup, and you can account for those tiny differences in puppet so everything works as expected. It takes about 30 minutes to get going with it and will pay you back many times over in saved time messing around with file copy scripts :)
If it's helpful to visualize, on my desktop I have 3 terminals side by side:
Terminal 1) Editing puppet manifests, classes, ruby code, etc
Terminal 2) Running 'vagrant provision' which simply does a puppet apply along with any facts you want to pass, etc.
Terminal 3) 'vagrant ssh' into the box so I can poke around as puppet is doing its work
Hope this helps!
Why don't you want to run a puppetmaster? It's created for exactly this situation.
If you absolutely cannot run a puppetmaster, then you would have to wrap your puppet calls in another script that first downloads the file (with curl or wget) and apply them after a successful download. Given that the puppetmaster is a fairly simple application to run, I don't see how not using it would be any better.
I stumbled across rump while looking at another question. If you're using git, it might be useful. There's a slide deck available.
From the README.md: "Rump helps you run Puppet locally against a Git checkout."
You may be interested in citac, a toolkit for automated testing of Puppet scripts. It is available on Github: https://github.com/citac/citac
Citac systematically executes your Puppet manifest in various configurations, imitating transient system faults, different resource execution orders, and more. The generated test reports inform you about issues with non-idempotent resources, convergence-related issues, etc.
The tool uses Docker containers for execution, hence your system remains untouched while testing. State changes are tracked during execution of the Puppet script, and detailed test reports are generated.
To get an idea of which bugs the tool is able to detect, a large-scale evaluation with more than 150 public Puppet scripts has been performed. The results are available here: http://citac.github.io/eval/
Please feel free to provide feedback, pull requests, etc. Happy testing!