Using Vagrant, why is puppet provisioning better than a custom packaged box? - puppet

I'm creating a virtual machine to mimic our production web server so that I can share it with new developers to get them up to speed as quickly as possible. I've been through the Vagrant docs however I do not understand the advantage of using a generic base box and provisioning everything with Puppet versus packaging a custom box with everything already installed and configured. All I can think of is;
Advantages of using Puppet vs custom packaged box
Easy to keep everyone up to date - Ability to put manifests under
version control and share the repo so that other developers can
simply pull new updates and re-run puppet i.e. 'vagrant provision'.
Environment is documented in the manifests.
Ability to use puppet modules defined in production environment to
ensure identical environments.
Disadvantages of using Puppet vs custom packaged box
Takes longer to write the manifests than to simply install and
configure a custom packaged box.
Building the virtual machine the first time would take longer using
puppet than simply downloading a custom packaged box.
I feel like I must be missing some important details, can you think of any more?

Advantages:
As dependencies may change over time, building a new box from scratch will involve either manually removing packages, or throwing the box away and repeating the installation process by hand all over again. You could obviously automate the installation with a bash or some other type of script, but you'd be making calls to the native OS package manager, meaning it will only run on the operating system of your choice. In other words, you're boxed in ;)
As far as I know, Puppet (like Chef) contains a generic and operating system agnostic way to install packages, meaning manifests can be run on different operating systems without modification.
Additionally, those same scripts can be used to provision the production machine, meaning that the development machine and production will be practically identical.
Disadvantages:
Having to learn another DSL, when you may not be planning on ever switching your OS or production environment. You'll have to decide if the advantages are worth the time you'll spend setting it up. Personally, I think that having an abstract and repeatable package management/configuration strategy will save me lots of time in the future, but YMMV.

One great advantages not explicitly mentioned above is the fact that you'd be documenting your setup (properly), and your documentation will be the actual setup - not a (one-time) description of how things were/may have been intended to be.

Related

Separate environments for learning or trying out vs production (sandboxes?)

Can you suggest me a way of separating learning/trying out vs production in the same computer? I am in such a place that I know a lot of JS and production ready skills whilst sometimes require probing or trying out simpler stuff or basics. I presume that a lot of engineers are also in a similar place.
This is the situation I am facing with right now.
I wanted to install redis and configure it while trying out something interested.
In a separate project I needed another clean redis configuration and installation.
In front-end side I tried and installed a few npm packages globally.
At some point I installed python 3.4 now require 3.6
At some point I installed nginx and configured it, now need another configuration and wipe the previous one out,
If I start a big project right now I feel like my computer will eventually let me down due to several attempts I previously done
et cetera, these all create friction on both my learning and exploration
Now, it crosses mind to use separate virtual box installations for trying out things, but this answer is trivial, please suggest something else.
P.S.: I am using Linux Mint.
You can install and use Docker, which is also trivial,
however, if your environment is Linux you can use LXC
There isn't really a single good answer to this sort of question of course; but some things that are generally a good idea are:
use git repos to keep the source "backed up" (obviously your local pc should not be the git server); commit your changes all the time, if you can't hold your breath for as long as the timespan between 2 commits, then you're doing it wrong (or you may have asthma, see a doctor).
Always build your project with there being not just multiple, but a variable amount of "deployments" in mind. That means not hardcoding absolute paths and database names/ports/hostnames and things like that. If your project needs database/api credentials then that should be in a configfile of sorts (or in the env); that configfile should be stored outside the codebase and shouldn't be checked into your git repos (though there can ofcourse be a config template in there).
Always have at least 2 deployments of any project actually deployed. Next to the (obvious) "live"/"production" deployment, which your clients/users use, you want a "dev"-version for yourself where you can freely shit the bed, and for bigger projects you may well want multiple. Each deployment would have its own database, and it's own copy of the code/assets.
It can be useful to deploy everything inside podman or docker containers, that makes it easier to have a near-identical system in both development and production (incase those are different servers), but that may be too much overhead for you.
Have a method (maybe a script) that makes it very easy to deploy updates from your gitrepo or dev-deployment, to the production deployment. Based on your description, i'm guessing if a client tells you she wants some minor cosmetic changes done, you do them straight on the live version; very convenient and fast, but a horrible thing in practice. once you switch from that workflow to having a seperate dev-deploy, you'll feel slowed down by that (which you are), but if you optimize that workflow over time you'll get to the point where you could still deploy cosmetic changes in a minute orso, while having fully separated deployments, it is worth the time investment.
Have a personal devtools git repo or something similar. You're likely using an IDE such as VS code ? Back up your vs code user config in that repo, update it reasonably frequently. Use a texteditor, photoshop/editor, etc etc, same deal. You hear that ticking sound ? that's the bomb that's been placed on your motherboard. It might go off tonight, it might not go off for years, but you never know, always expect it could be today or tomorrow, so have stuff backed up externally and/or on offline media.
There's a lot more but those are some of the basics that spring to mind.
I though Docker was only for containerizing your app with all the installation files and configurations before pushing to the production
Docker is useful whenever you need to configure the runtime environment in an isolated manner. Production, local development, other environments - all need the same runtime. All benefit from the runtime definition and isolation that docker provides. Arguably docker is even more useful in workstation-centric development, than it is in production.
I wanted to install redis and configure it while trying out something interested.
Instead of installing redis on your os directly, run the preexisting docker image for redis.
In a separate project I needed another clean redis configuration and installation.
Instantiate the docker image again and now you have 2 isolated redis servers running locally.
In front-end side I tried and installed a few npm packages globally.
Run your npm code within a nodejs docker container
At some point I installed python 3.4 now require 3.6
Different versions of python is a great use case for docker containers, which will tagged with specific python versions.
At some point I installed nginx and configured it, now need another configuration and wipe the previous one out,
Nginx also has a very useful official container.
If I start a big project right now I feel like my computer will eventually let me down due to several attempts I previously done
Yeah, it gets messy quick. That's why docker is such a great solution. Give every project dedicated services and use docker-compose to simplify the networking and building components. Fight the temptation to use a docker container for more than one service - instead stitch them together with docker networks.
Read https://docs.docker.com/get-started/overview/ to get started with docker.

How does RunKit make their virtual servers?

There are many websites providing cloud coding sush as Cloud9, repl.it. They must use server virtualisation technologies. For example, Clould9's workspaces are powered by Docker Ubuntu containers. Every workspace is a fully self-contained VM (see details).
I would like to know if there are other technologies to make sandboxed environment. For example, RunKit seems to have a light solution:
It runs a completely standard copy of Node.js on a virtual server
created just for you. Every one of npm's 300,000+ packages are
pre-installed, so try it out
Does anyone know how RunKit acheives this?
You can see more in "Tonic is now RunKit - A Part of Stripe! " (see discussion)
we attacked the problem of time traveling debugging not at the application level, but directly on the OS by using the bleeding edge virtualization tools of CRIU on top of Docker.
The details are in "Time Traveling in Node.js Notebooks"
we were able to take a different approach thanks to an ambitious open source project called CRIU (which stands for checkpoint and restore in user space).
The name says it all. CRIU aims to give you the same checkpointing capability for a process tree that virtual machines give you for an entire computer.
This is no small task: CRIU incorporates a lot of lessons learned from earlier attempts at similar functionality, and years of discussion and work with the Linux kernel team. The most common use case of CRIU is to allow migrating containers from one computer to another
The next step was to get CRIU working well with Docker
Part of that setup is being opened-source, as mentioned in this HackerNews feed.
It uses linux containers, currently powered by Docker.

What is the safest way to deliver an Application to novice Linux users?

My customers are novice Linux users, and so am i.
When I gave them my App packaged with ansible, they saw ansible problems, when i gave them manual steps, they also screwed that up, now i have 3 last options, either a perl/bash script or a snappy/deb/rpm package or Linux containers, can anyone share their experience on the safest way to see less problems when installing my app (Written in C)?
This depends on the nature of your application. Debs, rpms etc. are all fine but depend on which distro you're using.
If it's C application, it might make sense to make it a static binary. That way, you'll have to download a single file and just click on it to make it run. It will be big but it should work fine regardless of what else is there. Otherwise, you'll have to worry about dependencies etc.
As it was commented before it depends what you did to deploy the product.
In general, if you have dependencies (previous packages that you assume were already installed) or your installation is complex - use rpm or deb.
However if you target multi-platform bare in mind you will have at least two releases (one rpm and one deb...)
If configuration or installation is easier you can just give them an install script.
If your application requires a specific environment with specific configuration/packages I'd consider containers although I never done that personally before.

How to be able to "move" all necessary libraries that a script requires when moving to a new machine

We work on scientific computing and regularly submit calculations to different computing clusters. For that we connect using linux shell and submitting jobs through SGE, Slurm, etc (it depends on the cluster). Our codes are composed of python and bash scripts and several binaries. Some of them depend on external libraries such as matplotlib. When we start to use a new cluster, it is a nightmare since we need to tell the admins all the libraries we need, and sometimes they can not install all of them, or they only have old versions that can not be upgraded. So we wonder what could we do here. I was wondering if we could somehow "pack" all libraries we need along with our codes. Do you think it is possible? Otherwise, how could we move to new clusters without the need for admins to install anything?
The key is to compile all the code you need by yourself, using the compiler/library/MPI toolchains installed by the admins of the clusters, so that
your software is compiled properly for the cluster hardware, and
you do not depend on the admin to install the software.
The following are very useful in this case:
Ansible, to upload/manage configuration files, rc files, set permissions, compile your binaries, etc. and deploy a new environment easily on new clusters
Easybuild to install your version of Python with all the needed dependencies, and install other scientific software thanks to the community supported build procedures
CDE to build a package with all dependencies for your binaries on your laptop and use it as-is on the clusters.
More specifically for Python, you can use
virtual envs to setup a consistent set of Python modules across all clusters, independently from the modules already installed; or
Anaconda or Canopy to use a Python scientific distribution
to have a consistent Python install across all clusters.
Don't get me wrong, but I think what you have to do so: stop behaving like amateurs.
Meaning: the integrity of your "system configuration" is one of the core assets of your "business". And you just told us that you are basically unable of easily re-producing your system configuration.
So, the real answer here can't be a recommendation to use this or that technology. The real answer is: you, and the other teams involved in running your operations need to come together and define a serious strategy how to fix this.
Maybe you then decide that the way to go is that your development team provides Docker buildfiles, so that your operations team can easily create images on new machines. Or you decide that you need to use something like ansible to enable centralized control over your complete environment.
That's what venv is for, it allows you to create a portable customized environment easily, with exactly what you need and nothing more.
I completely agree with https://stackoverflow.com/users/1531124/ghostcat
but here is the really bad answer that will cause you a lot of problems in near future!!!:
if you need some dynamic library and you are not planning to upgrade them in future, you can try copying all needed libs to a folder in your app and use an script to launch the app:
#!/bin/sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/lib/folder
./myAPP
but keep in mind that this is bad practice.
Create a chroot image, like here - click. Install everything you need and then you can just chroot into it on any machine.
I work on scientific clusters as well, and you are going to find that wherever you go.
I would only rely on the admins on installing the most basic stuff. That is:
- Software necessary to build your software or run the most basic stuff: compilers and most basic utilities (python, perl, binutils, autotools, cmake, etc.).
Software libraries that make use of I/O devices: MPI, file I/O libraries...
A queue system (they already have it most of the time).
Environment modules. This is not a must, but it really helps you get the job done, specially if you mess with different library versions or implementations (that's my case, for example).
From that point on, you can build and install on your own directories all the software you use most of the time.
This does not mean that you cannot ask an admin to install some libraries. If you feel that many people is going to benefit from that, then you should request its installation. In addition, you may need some specific version or some special features which are not used most of the time, but you really need them. A very good example is with BLAS libraries (basic lineal algebra subroutines):
You have lots of BLAS implementations available: the original BLAS, Intel MKL, OpenBLAS, ATLAS, cuBLAS
If that is not enough, the open source versions usually offer multiple configuration options: serial version, parallel version with PThreads, parallel version with OpenMP, parallel version with MPI...
In my particular case, most of the software that I felt was necessary for many users in the cluster ended up being installed by the admins without any problem (either me or other users requested it), but you also have to keep in mind that in a cluster there can be many users and a single person/team is not able to attend the specific requirements you need, specially if you are able to do so.
I think you want to containerize your application in some way. Two main options (because docker/rkt and similar things are way too heavyweight for your task if I understand it correctly) in my opinion are runc and snappy.
Runc relies on OCI runtime specification, you need to create an environment (that is very similar to chroot environment in that you need to copy everything you software uses in one directory) and then you'll be able to run your application with runc tool. Runc itself is just one binary, at the moment it requires root privileges to run (hello, cluster admins), but there are patches at least partly solving that, so if you build your own runc and there are no blocking things wrt root privilege requirements you may be able to run your application with no administration overhead at all.
Snappy is similar in that you need to prepare a snap package for your application, this time using snapcraft as an assistant tool. Snappy is probably a bit easier in creating an application image and IMO is certainly better for long-term support because it clearly separates your application from the data (kinda W^X, application image is a read-only squashfs file and application can only write to a limited set of directories). But at the moment it will require your cluster admins to install snapd and to perform some operations like snap installation that require root privileges. Still, it should be better than your current situation, because that's just one non-intrusive package to install.
If these tools don't fit for some reason, there is always an option to make something of your own. That won't be easy and there are many subtle details that can bite you when doing that, but it can be done, compile all of your dependencies and applications into some path, create wrapper scripts to set up PATH and LD_LIBRARY_PATH environment for your components and then bring that directory into the new cluster, run wrapper scripts instead of target binaries and that's it. It's similar to what XAMPP does, they have quite a number of integrated things packaged into one directory that works across many distributions.
update
Let's also add AppImage into the mix, theoretically it can be a savior for your case, as it specifically does not require root privileges. It's kinda inbetween Snappy and rolling your own, as you need to prepare your application directory yourself (snappy can manage some of dependencies with snapcraft when you just specify "I need this Ubuntu package"), add appropriate metadata and then it can be packaged into single executable.

Remote software update on Linux machines

We develop Linux-based networking application which will run on multiple servers. We need to develop some solution for remote application update.
All I can think of now is using rpm/deb packages but we prefer not to lock this to some distro-specific solution. Besides copying files via SSH by some Bash script what would you recommend?
Thanks.
Distros does vary so much in setup and dependencies, I would actually recommend you create distro specific packages and integrate with its update tool - in the end it normally saves you a ton of trouble.
With the ease of virtualization, it's rather easy to spin up a vmware/virtualbox image foor the various distros to create/test packaging for each of them
How about puppet?
Check out Blueprit and Blueprint I/O. Blueprint is a tool that detects all of the packages, files modifications and source installs on a server. It packages them up in a reusable format called a blueprint that can be applied to another server. Blueprint I/O is a tools for pushing to and pulling from another server. Both are open-source. Hope this helps.
https://github.com/devstructure/blueprint (Blueprint # Github)
https://github.com/devstructure/blueprint-io (Blueprint I/O # Github)
I'm eight years late, but check Ansible.
Ansible is a radically simple IT automation platform that makes your
applications and systems easier to deploy. Avoid writing scripts or
custom code to deploy and update your applications— automate in a
language that approaches plain English, using SSH, with no agents to
install on remote systems.
Also, you can check this guide.

Resources