puppet fileserver distribute binaries - puppet

Well, question is not new but I still unable to find any nice solution.
I distributing binaries 100-300mb files via puppet fileserver, but it works really bad in case of performance I'm sure because of md5 checks. Now I have more than 100 servers and my puppet master works really hard to manage all that md5 computation checks. In puppet 3.x checksum for file{} does not work. I'm unable to update to puppet 4.x and I have no chance to change flow. files should came from puppet fileserver.
So I can't believe that there is no custom file type with fixed checksum option, but I can't find it :(
Or maybe there is any other way to download files from puppet fileserver ?
Please any advice will help!
rsync or pack as a native package impossible option to me.

It is indeed reasonable to suppose that using the default checksum algorithm (MD5) when managing large files will have a substantial performance impact. The File resource has a checksum attribute that is supposed to be usable to specify an alternative checksumming algorithm among those supported by Puppet (some of which are not actually checksums per se), but it was buggy in many versions of Puppet 3. At this time, it does not appear that the fix implemented in Puppet 4 has been backported to the Puppet 3 series.
If you need only to distribute files, and don't care about afterward updating them or maintaining their consistency via Puppet, then you could consider turning off checksumming altogether. That might look something like this:
file { '/path/to/bigfile.bin':
ensure => 'file',
source => 'puppet:///modules/mymodule/bigfile.bin',
owner => 'root',
group => 'root',
mode => '0644',
checksum => 'none',
replace => false
}
If you do want to manage existing files, however, then Puppet needs a way to determine whether a file already present on the node is up to date. That's the one of the two main purposes of checksumming. If you insist on distributing the file via the Puppet file server, and you are stuck on Puppet 3, then I'm afraid you are out of luck as far as lightening the load. Puppet's file server is tightly integrated with the File resource type, and not intended to serve general purposes. To the best of my knowledge, there is no third-party resource type that leverages it. In any case, the file server itself is a major contributor to the problem of File's checksum parameter not working -- buggy versions do not perform any type of checksumming other than MD5.
As an alternative, you might consider packaging your large file in your system's native packaging format, dropping it in your internal package repository, and managing the package (via a Package resource) instead of managing the file directly. That does get away from distributing it via the file server, but that's pretty much the point.

Related

How to call a puppet provider method from puppet manifest?

I'm using the ibm_installation_manager module from the puppet forge and it is a bit basic because IBM wrote Installation Manager in a time where idempotency was done much.
ref: https://forge.puppet.com/puppetlabs/ibm_installation_manager
As such it does not cater nicely for upgrades - so the module will not detect if an upgrade is needed, stop existing processes, do the upgrade and then start the processes again. It will just detect if an upgrade is needed and try to install the desired version and if that constitutes an upgrade that's great, but it will probably fail due to running instances.
So I need to implement some "stop processes" pre-upgrade functionality.
I need to mention at this point I'm new to ruby and fairly new to puppet.
The provider that the module uses (imcl.rb) has an exists method.
The ideal way for me to detect if an upgrade is going to happen (and stop the instances if it is) would be for my puppet manifest to be able to somehow call the exists method. Is this possible?
Or how would you approach this problem?
Something like imcl.exists(ibm_pkg["my_imcl_pkg_resource"])
The ideal way for me to detect if an upgrade is going to happen (and stop the instances if it is) would be for my puppet manifest to be able to somehow call the exists method. Is this possible?
No, it is not possible, at least not in any useful way. Your manifests describe how to build a catalog of resources describing the target state of the machine. In a master / agent setup, this happens on the master. The catalog is then used as input to a separate step, in which it is transferred to the target machine and applied there. It is in this second step that providers are engaged.
To the extent that you want the contents of your catalogs to be influenced by the current state of the target machine, the Puppet mechanism for that is to convey the needed state details to the catalog builder in the form of facts. It is relatively straightforward to add your own facts. Indeed, there are at least two distinct, non-exclusive mechanisms, going under the names "external facts" and "custom facts".

Possible to have node local configuration file

is it possible to use a node local config file (hiera?) that is used by the puppet master to compile the update list during a puppet run?
My usecase is that puppet will make changes to users .bashrc file and to the users home directory, but I would like to be able to control which users using a file on the actual node itself, not in the site.pp manifest.
is it possible to use a node local config file (hiera?) that is used
by the puppet master to compile the update list during a puppet run?
Sure, there are various ways to do this.
My usecase is that puppet will make changes to users .bashrc file and
to the users home directory, but I would like to be able to control
which users using a file on the actual node itself, not in the site.pp
manifest.
All information the master has about the current state of the target node comes in the form of node facts, provided to it by the node in its catalog request. A local file under local control, whose contents should be used to influence the contents of the node's own catalog, would fall into that category. Puppet supports structured facts (facts whose values have arbitrarily-nested list and/or hash structure), which should be sufficient for communicating the needed data to the master.
There are two different ways to add your own facts to those that Puppet will collect by default:
Write a Ruby plugin for Facter, and let Puppet distribute it automatically to nodes, or
Write an external fact program or script in the language of your choice,
and distribute it to nodes as an ordinary file resource
Either variety could read your data file and emit a corresponding fact (or facts) in appropriate form. The Facter documentation contains details about how to write facts of both kinds; "custom facts" (Facter plugins written in Ruby) integrate a bit more cleanly, but "external facts" work almost as well and are easier for people who are unfamiliar with Ruby.
In principle, you could also write a full-blown custom type and accompanying provider, and let the provider, which runs on the target node, take care of reading the appropriate local files. This would be a lot more work, and it would require structuring the solution a bit differently than you described. I do not recommend it for your problem, but I mention it for completeness.

Reinstalling package if specific file is missing

I need to reinstall a package with Puppet if a specific file is missing. How can I achive this? In never versions of Puppet is a the onlyif parameter available but we still use Puppet 3.1.
Do not mistake Puppet for some kind of baroque script engine. Reinstalling a package is an action, whereas Puppet classes, resources, and DSL in general focus on describing state. Puppet's general paradigm is that you tell it what state you want, and it figures out for itself what actions, if any, to take to achieve that state. Even Exec resources are best conceptualized and used as representations of state to be managed.
Puppet Package resources do not recognize a state of "installed but broken", or any similar thing, and therefore they have no sense of a need to reinstall (as opposed to updating) a package, nor any mechanism for doing so.
If your concern is with only one specific file that you expect the package to provide, then you should consider putting that file under direct management (via a File resource) instead of relying on a package reinstallation to recover it if it should go missing.
However, you should consider what flaw in your system's configuration or security policy affords any plausible likelihood that random system files will unexpectedly go missing. You should especially do this if you are using the file in question as a canary to detect broader damage.
Nevertheless, if you remain firm about doing what you ask, then an Exec resource can help. The details of what you would need are unclear, but you can take this as a pattern:
exec { 'Ensure package mypackage good':
command => '/usr/bin/yum -y reinstall mypackage',
creates => '/path/to/some_file',
require => Package['mypackage']
}
The Exec type also has unless and onlyif parameters (including in Puppet 3.1), but the creates parameter serves the specific case of using the presence or absence of a file to determine whether the command needs to be run, which is exactly what you want.
Note also the require parameter. This presumes that package 'mypackage' is under Puppet management (not shown), and guarantees that the Exec will not be synced before the package. That way, if the package is altogether absent, you can be sure that Puppet will install it (supposing that's what you have specified) before testing the presence of any file it is expected to provide.

Is puppet efficient in synchronizing large files?

How efficient is puppet with handling large files? To give you a concrete example:
Let's assume we're dealing with configuration data (stored in files) in the order of gigabytes. Puppet needs to ensure that the files are up-to-date with every agent run.
Question: Is puppet performing some file digest type of operation beforehand, or just dummy-copying every config file during agent runs?
When using file { 'name': source => <URL> }, the file content is not sent through the network unless there is a checksum mismatch between master and agent. The default checksum type is md5.
Beware of the content property for file. Its value is part of the catalog. Don't assign it with contents of large files via the file() or template() functions.
So yes, you can technically manage files of arbitrary size through Puppet. In practice, I try to avoid it, because all of Puppet's files should be part of a git repo or similar. Don't push your tarballs inside there. Puppet can deploy them by other means (packages, HTTP, ...).
Im not entirely certain now puppets file server works in the latest update but in previous versions Puppet read the file into memory and thats why it was not recommended using the file server to transfer files larger than 1gb. I suggest you go through these answers and see if it makes sense https://serverfault.com/a/398133

Using Vagrant, why is puppet provisioning better than a custom packaged box?

I'm creating a virtual machine to mimic our production web server so that I can share it with new developers to get them up to speed as quickly as possible. I've been through the Vagrant docs however I do not understand the advantage of using a generic base box and provisioning everything with Puppet versus packaging a custom box with everything already installed and configured. All I can think of is;
Advantages of using Puppet vs custom packaged box
Easy to keep everyone up to date - Ability to put manifests under
version control and share the repo so that other developers can
simply pull new updates and re-run puppet i.e. 'vagrant provision'.
Environment is documented in the manifests.
Ability to use puppet modules defined in production environment to
ensure identical environments.
Disadvantages of using Puppet vs custom packaged box
Takes longer to write the manifests than to simply install and
configure a custom packaged box.
Building the virtual machine the first time would take longer using
puppet than simply downloading a custom packaged box.
I feel like I must be missing some important details, can you think of any more?
Advantages:
As dependencies may change over time, building a new box from scratch will involve either manually removing packages, or throwing the box away and repeating the installation process by hand all over again. You could obviously automate the installation with a bash or some other type of script, but you'd be making calls to the native OS package manager, meaning it will only run on the operating system of your choice. In other words, you're boxed in ;)
As far as I know, Puppet (like Chef) contains a generic and operating system agnostic way to install packages, meaning manifests can be run on different operating systems without modification.
Additionally, those same scripts can be used to provision the production machine, meaning that the development machine and production will be practically identical.
Disadvantages:
Having to learn another DSL, when you may not be planning on ever switching your OS or production environment. You'll have to decide if the advantages are worth the time you'll spend setting it up. Personally, I think that having an abstract and repeatable package management/configuration strategy will save me lots of time in the future, but YMMV.
One great advantages not explicitly mentioned above is the fact that you'd be documenting your setup (properly), and your documentation will be the actual setup - not a (one-time) description of how things were/may have been intended to be.

Resources