Datastax Enterprise - 5.0 Best Practices - Installation

Datastax Enterprise - 5.0 Best Practices - Installation - cassandra

I am currently evaluating the Datastax Enterprise 5 installation for my production system. There are many methods available for installation. When we choose runinstallter unattended method by DSE using option file it provide two modes
1. Service Based - Need root permission and binaries are installed in /usr/share/dse and /etc/dse.
2. No Service Based - Not need root and binaries can be installed on custom location equivalent to tar based installation without service based.
I have following questions -
Is there any best practice available which method is best suited for production installation ( in short any problem in running no service based runinstallter installation)
Is there a way we can modify runinstaller in service based installation to point to another dse home then /usr/share/dse and /etc/dse , something like /Cassandra which is owned by casandra user.
Any other best practice on the method of installation with is currently live in production without any issues.
Regards

Any of the methods specified here are fine for production installations
Not that I know of, you might want to look at using the Tarball installation if you need this level of configuration
There are a whole lot of things you need to think about when planning a cluster for DSE 5. I would start by looking at this list here.

I'm an OpsCenter developer who works on the Lifecycle Manager feature, so I'm more than a bit biased... but I think that OpsCenter LifeCycle Manager is an excellent way to install and manage DSE if you don't already have something like Chef or Ansible that you use enterprise-wide. It's a graphical webapp with a RESTful API in case you need to do any scripting of it. It deploys DSE using deb/rpm packages over SSH and can configure pretty much every DSE option there is.
As to your other questions:
Services vs no-services installations: You probably want a services-based installation. It behaves more like a "normal" linux service that can be managed with the 'service' command. A no-services install is primarily useful if you don't have root access because of very tight security policies in your org, and if you choose to go that route you'll need to decide how you want to manage DSE startup and shutdown (for clean reboots, for example).
The DSE installer can probably handle non-standard paths, but I'm not familiar enough with the details. LCM can handle some non-standard paths but not all of them (DSE will always be installed to the standard locations, for example). If you want to very tightly control every aspect of the install, tarball is your best choice. That's a lot of complexity, though, do you REALLY need to control every path?
The OpsCenter Best Practice service is probably the best list of recommended things to do in Prod, and is very easy to turn for LCM-managed clusters. But even if you don't use LCM, I recommend you set up OpsCenter so you can use the Best Practice Service.
You can find the OpsCenter install stesp at: https://docs.datastax.com/en/latest-opsc/opsc/LCM/opscLCMOverview.html.

Related

How to be able to "move" all necessary libraries that a script requires when moving to a new machine

We work on scientific computing and regularly submit calculations to different computing clusters. For that we connect using linux shell and submitting jobs through SGE, Slurm, etc (it depends on the cluster). Our codes are composed of python and bash scripts and several binaries. Some of them depend on external libraries such as matplotlib. When we start to use a new cluster, it is a nightmare since we need to tell the admins all the libraries we need, and sometimes they can not install all of them, or they only have old versions that can not be upgraded. So we wonder what could we do here. I was wondering if we could somehow "pack" all libraries we need along with our codes. Do you think it is possible? Otherwise, how could we move to new clusters without the need for admins to install anything?

The key is to compile all the code you need by yourself, using the compiler/library/MPI toolchains installed by the admins of the clusters, so that
your software is compiled properly for the cluster hardware, and
you do not depend on the admin to install the software.
The following are very useful in this case:
Ansible, to upload/manage configuration files, rc files, set permissions, compile your binaries, etc. and deploy a new environment easily on new clusters
Easybuild to install your version of Python with all the needed dependencies, and install other scientific software thanks to the community supported build procedures
CDE to build a package with all dependencies for your binaries on your laptop and use it as-is on the clusters.
More specifically for Python, you can use
virtual envs to setup a consistent set of Python modules across all clusters, independently from the modules already installed; or
Anaconda or Canopy to use a Python scientific distribution
to have a consistent Python install across all clusters.

Don't get me wrong, but I think what you have to do so: stop behaving like amateurs.
Meaning: the integrity of your "system configuration" is one of the core assets of your "business". And you just told us that you are basically unable of easily re-producing your system configuration.
So, the real answer here can't be a recommendation to use this or that technology. The real answer is: you, and the other teams involved in running your operations need to come together and define a serious strategy how to fix this.
Maybe you then decide that the way to go is that your development team provides Docker buildfiles, so that your operations team can easily create images on new machines. Or you decide that you need to use something like ansible to enable centralized control over your complete environment.

That's what venv is for, it allows you to create a portable customized environment easily, with exactly what you need and nothing more.

I completely agree with https://stackoverflow.com/users/1531124/ghostcat
but here is the really bad answer that will cause you a lot of problems in near future!!!:
if you need some dynamic library and you are not planning to upgrade them in future, you can try copying all needed libs to a folder in your app and use an script to launch the app:
#!/bin/sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/your/lib/folder
./myAPP
but keep in mind that this is bad practice.

Create a chroot image, like here - click. Install everything you need and then you can just chroot into it on any machine.

I work on scientific clusters as well, and you are going to find that wherever you go.
I would only rely on the admins on installing the most basic stuff. That is:
- Software necessary to build your software or run the most basic stuff: compilers and most basic utilities (python, perl, binutils, autotools, cmake, etc.).
Software libraries that make use of I/O devices: MPI, file I/O libraries...
A queue system (they already have it most of the time).
Environment modules. This is not a must, but it really helps you get the job done, specially if you mess with different library versions or implementations (that's my case, for example).
From that point on, you can build and install on your own directories all the software you use most of the time.
This does not mean that you cannot ask an admin to install some libraries. If you feel that many people is going to benefit from that, then you should request its installation. In addition, you may need some specific version or some special features which are not used most of the time, but you really need them. A very good example is with BLAS libraries (basic lineal algebra subroutines):
You have lots of BLAS implementations available: the original BLAS, Intel MKL, OpenBLAS, ATLAS, cuBLAS
If that is not enough, the open source versions usually offer multiple configuration options: serial version, parallel version with PThreads, parallel version with OpenMP, parallel version with MPI...
In my particular case, most of the software that I felt was necessary for many users in the cluster ended up being installed by the admins without any problem (either me or other users requested it), but you also have to keep in mind that in a cluster there can be many users and a single person/team is not able to attend the specific requirements you need, specially if you are able to do so.

I think you want to containerize your application in some way. Two main options (because docker/rkt and similar things are way too heavyweight for your task if I understand it correctly) in my opinion are runc and snappy.
Runc relies on OCI runtime specification, you need to create an environment (that is very similar to chroot environment in that you need to copy everything you software uses in one directory) and then you'll be able to run your application with runc tool. Runc itself is just one binary, at the moment it requires root privileges to run (hello, cluster admins), but there are patches at least partly solving that, so if you build your own runc and there are no blocking things wrt root privilege requirements you may be able to run your application with no administration overhead at all.
Snappy is similar in that you need to prepare a snap package for your application, this time using snapcraft as an assistant tool. Snappy is probably a bit easier in creating an application image and IMO is certainly better for long-term support because it clearly separates your application from the data (kinda W^X, application image is a read-only squashfs file and application can only write to a limited set of directories). But at the moment it will require your cluster admins to install snapd and to perform some operations like snap installation that require root privileges. Still, it should be better than your current situation, because that's just one non-intrusive package to install.
If these tools don't fit for some reason, there is always an option to make something of your own. That won't be easy and there are many subtle details that can bite you when doing that, but it can be done, compile all of your dependencies and applications into some path, create wrapper scripts to set up PATH and LD_LIBRARY_PATH environment for your components and then bring that directory into the new cluster, run wrapper scripts instead of target binaries and that's it. It's similar to what XAMPP does, they have quite a number of integrated things packaged into one directory that works across many distributions.
update
Let's also add AppImage into the mix, theoretically it can be a savior for your case, as it specifically does not require root privileges. It's kinda inbetween Snappy and rolling your own, as you need to prepare your application directory yourself (snappy can manage some of dependencies with snapcraft when you just specify "I need this Ubuntu package"), add appropriate metadata and then it can be packaged into single executable.

Using Vagrant, why is puppet provisioning better than a custom packaged box?

I'm creating a virtual machine to mimic our production web server so that I can share it with new developers to get them up to speed as quickly as possible. I've been through the Vagrant docs however I do not understand the advantage of using a generic base box and provisioning everything with Puppet versus packaging a custom box with everything already installed and configured. All I can think of is;
Advantages of using Puppet vs custom packaged box
Easy to keep everyone up to date - Ability to put manifests under
version control and share the repo so that other developers can
simply pull new updates and re-run puppet i.e. 'vagrant provision'.
Environment is documented in the manifests.
Ability to use puppet modules defined in production environment to
ensure identical environments.
Disadvantages of using Puppet vs custom packaged box
Takes longer to write the manifests than to simply install and
configure a custom packaged box.
Building the virtual machine the first time would take longer using
puppet than simply downloading a custom packaged box.
I feel like I must be missing some important details, can you think of any more?

Advantages:
As dependencies may change over time, building a new box from scratch will involve either manually removing packages, or throwing the box away and repeating the installation process by hand all over again. You could obviously automate the installation with a bash or some other type of script, but you'd be making calls to the native OS package manager, meaning it will only run on the operating system of your choice. In other words, you're boxed in ;)
As far as I know, Puppet (like Chef) contains a generic and operating system agnostic way to install packages, meaning manifests can be run on different operating systems without modification.
Additionally, those same scripts can be used to provision the production machine, meaning that the development machine and production will be practically identical.
Disadvantages:
Having to learn another DSL, when you may not be planning on ever switching your OS or production environment. You'll have to decide if the advantages are worth the time you'll spend setting it up. Personally, I think that having an abstract and repeatable package management/configuration strategy will save me lots of time in the future, but YMMV.

One great advantages not explicitly mentioned above is the fact that you'd be documenting your setup (properly), and your documentation will be the actual setup - not a (one-time) description of how things were/may have been intended to be.

NodeJS Managed Hostings vs VPS [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
There are a bunch of managed cloud based hosting services for nodejs out there which seem relatively new and some still in Beta.
Yet another path to host a nodejs app is setting up a stack on a VPS like Linode.
I'm wondering what's the basic difference here between these two kinds of deployment.
Which factors should one consider in choosing one over another?
Which one is more suitable for production considering how young these services are.
To be clear I'm not asking on choosing a provider but to decide whether to host on a managed nodejs specific hosting or on an old fashioned self setup VPS.

Using one of the services is for the most part hands off - you write your code and let them worry about managing the box, keep your process up, creating the publishing channel, patching the OS, etc...
In contrast having your own VM gives you more control but with more up front and ongoing time investment.
Another consideration is some hosters and cloud providers offer proprietary or distinct variations on technologies. They have reasons for them and they offer value but it does mean that if you want to switch cloud providers, it might mean you have to rewrite code, deployment scripts etc... On the other hand using VMs with standard OS as the baseline is pretty generic. If you automate/script/document the configuration of your VMs and your code stays generic, then your options stay open. If you do take a dependency on a proprietary cloud technology then it would be good to abstract it away behind an interface so it's a decoupled component and not sprinkled throughout your code.
I've done both. I did the VM path recently mostly because I wanted the learning experience. I had to:
get the VM from the cloud provider
I had to update and patch the OS
I had to install and configure git as a publishing channel
I had to write some scripts and use things like forever to keep it running
I had to configure the reverse http-proxy to get it to run multiple sites.
I had to configure DNS with the cloud provider, open ports for git etc...
The list goes on. In the end, it cost me more up front time not coding but I learned about a lot more things. If those are important to you, then give it a shot. If you want to focus on writing your code, then a node hosting provider may be for you.
At the end of it, I had also had more options - I wanted to add a second site. I added an entry to my reverse proxy, append my script to start up another app with forever, voila, another site. More control. After that, I wanted to try out MongoDB - simple - installed it.
Cost wise they're about the same but if you start hosting multiple sites with many other packages like databases etc..., then the VM can start getting cheaper.
Nodejitsu open sourced their tools which also makes it easier if you do your own.
If you do it yourself, here's some links that may help you:
Keeping the server up:
https://github.com/nodejitsu/forever/
http://blog.nodejitsu.com/keep-a-nodejs-server-up-with-forever
https://github.com/bryanmacfarlane/svchost
Upstart and Monit
generic auto start and restart through monitoring
http://howtonode.org/deploying-node-upstart-monit
Cluster Node
Runs one process per core
http://nodejs.org/docs/latest/api/cluster.html
Reverse Proxy
https://github.com/nodejitsu/node-http-proxy
https://github.com/nodejitsu/node-http-proxy/issues/232
http://blog.nodejitsu.com/http-proxy-middlewares
https://github.com/nodejitsu/node-http-proxy/issues/168#issuecomment-3289492
http://blog.argteam.com/coding/hardening-node-js-for-production-part-2-using-nginx-to-avoid-node-js-load/
Script the install
https://github.com/bryanmacfarlane/svcinstall
Exit Shell Script Based on Process Exit Code
Publish Site
Using git to publish to a website

IMHO the biggest drawback of setting up your own stack is that you need to manage things like making Node.js run forever, start it as a daemon, bring it behind a reverse-proxy such as Nginx, and so on ... the great thing about Node.js - making firing up a web server a one-liner - is one of its biggest drawbacks when it comes to production-ready systems.
Plus, you've got all the issues of managing and updating and securing your server yourself.
This is so much easier with the hosters: Usually it's a git push and that's it. Scaling? Easy. Replication? Easy. ...? Easy. All within a few clicks.
The drawback with the hosters is that you can not adjust the environment. Okay, you can probably choose which version of Node.js and / or npm to run, but that's it. You have no control over what 3rd party software is installed. You've got no control over the OS. You've got no control over where the servers are located. And so on ...
Of course, some hosters allow you access to some of these things, but there is rarely a hoster that supports all.
So, basically the question regarding Node.js is the same as with each other technology: It's a pro vs con of individualism, pricing, scalabilty, reliability, knowledge, ...
I personally chose to go with a hoster: The time and effort I save easily outperform the disadvantages. Mind you: For me, personally.
This question needs to be answered individually.

Using Docker is another way to simplify the setup on single Linux VPS. With Docker both development and production setups are faster, more robust, and more secure.
The setup is faster and more robust because you will be deploying ready Node.js image at once, without running any installation scripts. And it would be more secure because internal dependencies, such as database, can be hidden from outside world completely and accessible only from Docker internal network. On top of it, Docker significantly simplifies the upgrade process for underlying OS and Node.js runtime.
There are two ways to setup Node.js Docker environment. The first one – follow the instruction published here how to dockerize your application and deploy it with Docker, alongside with databases when needed. The guide gives the instructions for the development setup, the production setup will be similar.
Another way would be deploying official Node.js docker image and mounting application code as a volume or a folder to Node.js image. That would allow to update Node.js image going forward without re-building and re-deploying the application. Such approach solves long-standing problem with security patching of Docker images.
To help out with the setup of Docker on single machine - you can use Abberit Admin Panel. It will set up Node.js environment for you with a click of a button, including databases if you need them. The tool is free, and you can turn it off after you have completed initial setup. On the other hand, if later you decide to reduce maintenance tax of the production - you can migrate into managed service without any changes in the app.
Disclaimer: I am one of the founders of Abberit.

Use "apt" or compile from scratch for a web service?

For the first time, I am writing a web service that will call upon external programs to process requests in batch. The front-end will accept file uploads and then place them in a queue. The workers on the backend will take that file, run it through ffmpeg and the rest of my pipeline, and send an email when the process is complete.
I have my backend process working on my computer (Ubuntu 10.04). The question is: should I try to re-create that pipeline using binaries that I've compiled from scratch? Or is it okay to use apt when configuring in The Real World?
Not all hosting services uses Ubuntu, and not all give me root access. (I haven't chosen a host yet.) However, they will let me upload binaries to execute, and many give me shell access with gcc.
Usually this would be a no-brainier and I'd compile it all from scratch. But doing so - not to mention trying to figure out how to create a platform-independent .tar.gz binary - will be quite a task which ultimately doesn't really help me ship my product.
Do you have any thoughts on the best way to set up my stack so that I'm not tied to a specific hosting provider? Should I try creating my own .deb, which contains Ubuntu's version of ffmpeg (and other tools) with the configurations I need?
Short of a setup where I manage my own servers/VMs (which may very well be what I have to do), how might I accomplish this?

The question is: should I try to re-create that pipeline using binaries that I've compiled from scratch? Or is it okay to use apt when configuring in The Real World?
It is in reverse: it is not okay to deploy unpackaged in The Real World IMHO
and not all give me root access
How would you be deploying a .deb without root access. Chroot jails?
But doing so - not to mention trying to figure out how to create a platform-independent .tar.gz binary - will be quite a task which ultimately doesn't really help me ship my product.
+1 You answer you own question. Don't meddle unless you have to.
Do you have any thoughts on the best way to set up my stack so that I'm not tied to a specific hosting provider?
Only depend on wellpackaged standard libs (such as ffmpeg). Otherwise include them in your own deployment. This problem isn't too hard too solve for 10s of thousand Linux applications over decades now, so it would probably be feasible for you too.
Out of the box:
Look at rightscale and other cloud providers/agents that have specialized images/tool chains especially for video encoding.
A 'regular' VPS provider (with Xen or Virtuozzo) will not normally be happy with these kinds of workload, but EC2, Rackspace and their lot will be absolutely fine with that.
In general, I wouldn't believe that a cloud infrastructure provider that doesn't grant root access will allow for computationally intensive workloads. $0.02

Maintaining software as a user (on a cluster)

Every cluster of computers I've encountered suffers from the same problem: its software is outdated. Naturally, one has the ability as a user to install everything from source in the home directory. I was wondering if there are any tools that would allow one to install and update software within home directory the same way package managers do in Linux distributions, i.e. with minimal pain and effort.
I have found toast, which is good, but not always reliable and up-to-date. Are there alternatives?
My particular needs at the moment are a recent version of GCC, boost, python, cmake.

I recommended using a sensible distribution for your cluster nodes. Then keeping the nodes up-to-date can be as simple as running the package manager, which you can even do via a distributed shell on all nodes at once. And for what it is worth, my choice would be Debian or Ubuntu.

You could try nix (http://nixos.org/). I haven't used it, so I don't know if it's more up-to-date than toast.

Either use a package manager that installs/updates on all cluster nodes transparently or create a directory that is shared (i.e. network file system) from all nodes

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string