Can I install different softwares into one Aws EC2 virtual machine? - node.js

I want to ask a question as I'm new to AWS.
At one instance of Ubuntu EC2 I installed InfluxDB and it is running so I want to know if I can install Node.js on that same instance? Would my node.js installation affect InfluxDB?
Basically, I want to run a background nodejs script that will be live forever, to insert data to InfluxDB from a server.
Would I need to launch a separate virtual machine to run that script OR it will be on that same virtual machine?

Generally speaking, you can install and run any software in a single EC2 instance. The only limit is the underlying resource, meaning whether the instance has sufficient memory, CPU, disk I/O or network bandwidth to run all of them.
Practically, any decisions you make will have trade-offs and it's always good for you to be aware of them.
In your case, I can give you some pros and cons of the 2 approaches
Same-instance installation
Pros: easy to configure as your script is in the same instance with your InfluxDB. Also , if your NodeJS script has a small resource footprint, then this approach is possibly cheaper as well.
Cons: if you are running a cluster of multiple InfluxDB instances, having multiple copies of a NodeJS script in all of your InfluxDB instances will make it hard to maintain, deploy, update and monitor those instances.
This approach is only recommended if you are running single-node InfluxDB.
Dedicated installation
Pros: easy to scale up. easy to manage, deploy and update. better availability.
You can have a dedicated cluster of InfluxDB and another much smaller cluster of your NodeJS scripts.
This separation provides you a more reliable cluster for InfluxDB, as the frequency that you update your NodeJS script is usually higher than you update your InfluxDB software. Having a dedicated NodeJS cluster gives you peace of mind that even if your script has a critical bug, your InfluxDB cluster is still running fine.
Cons: harder to configure. You also need to deal with the distributed nature of your system, as your script is now hosted in different instances of your InfluxDB. Also, this approach is more expensive as well.
You should consider this approach if you are running InfluxDB cluster.

Related

Why nobody does not make it in the docker? (All-in-one container/"black box")

I need a lot of various web applications and microservices.
Also, I need to do easy backup/restore and move it between servers/cloud providers.
I started to study Docker for this. And I'm embarrassed when I see advice like this: "create first container for your application, create second container for your database and link these together".
But why I need to do separate container for database? If I understand correctly, the main message is the docker the: "allow to run and move applications with all these dependencies in isolated environment". That is, as I understand, it is appropriate to place in the container application and all its dependencies (especially if it's a small application with no require to have external database).
How I see the best-way for use Docker in my case:
Take a baseimage (eg phusion/baseimage)
Build my own image based on this (with nginx, database and
application code).
Expose port for interaction with my application.
Create data-volume based on this image on the target server (for store application data, database, uploads etc) or restore data-volume from prevous backup.
Run this container and have fun.
Pros:
Easy to backup/restore/move application around all. (Move data-volume only and simply start it on the new server/environment).
Application is the "black box", with no headache external dependencies.
If I need to store data in external databases or use data form this - nothing prevents me for doing it (but usually it is never necessary). And I prefer to use the API of other blackboxes instead direct access to their databases.
Much isolation and security than in the case of a single database for all containers.
Cons:
Greater consumption of RAM and disk space.
A little bit hard to scale. (If I need several instances of app for response on thousand requests per second - I can move database in separate container and link several app instances on it. But it need in very rare cases)
Why I not found recommendations for use of this approach? What's wrong with it? What's the pitfalls I have not seen?
First of all you need to understand a Docker container is not a virtual machine, just a wrapper around the kernel features chroot, cgroups and namespaces, using layered filesystems, with its own packaging format. A virtual machine usually a heavyweight, stateful artifact with extensive configuration options regarding to the resources available on the host machine and you can setup complex environments within a VM.
A container is a lightweight, throwable runtime environment with a recommendation to make it as stateless as possible. All changes are stored with in the container that is just a running instance of the image and you'll loose all diffs in case of container deletion. Of course you can map volumes for more static data, but this is available for the multi-container architecture too.
If you pack everything into one container you loose the capability to scale the components independently from each other and build a tight coupling.
With this tight coupling you can't implement fail-over, redundancy and scalability features into your app config. The most modern nosql databases are built to scale out easily and also the data redundancy could be a possibility when you run more than one backing database instance.
On the other side defining this single-responsible containers is easy with docker-compose, where you can declare them in a simple yml file.

How to consist the containers in Docker?

Now I am developing the new content so building the server.
On my server, the base system is the Cent OS(7), I installed the Docker, pulled the cent os, and establish the "WEB SERVER container" Django with uwsgi and nginx.
However I want to up the service, (Database with postgres), what is the best way to do it?
Install postgres on my existing container (with web server)
Build up the new container only for database.
and I want to know each advantage and weak point of those.
It's idiomatic to use two separate containers. Also, this is simpler - if you have two or more processes in a container, you need a parent process to monitor them (typically people use a process manager such as supervisord). With only one process, you won't need to do this.
By monitoring, I mainly mean that you need to make sure that all processes are correctly shutdown if the container receives a SIGSTOP signal. If you don't do this properly, you will end up with zombie processes. You won't need to worry about this if you only have a signal process or use a process manager.
Further, as Greg points out, having separate containers allows you to orchestrate and schedule the containers separately, so you can do update/change/scale/restart each container without affecting the other one.
If you want to keep the data in the database after a restart, the database shouldn't be in a container but on the host. I will assume you want the db in a container as well.
Setting up a second container is a lot more work. You should find a way that the containers know about each other's address. The address changes each time you start the container, so you need to make some scripts on the host. The host must find out the ip-adresses and inform the containers.
The containers might want to update the /etc/hosts file with the address of the other container. When you want to emulate different servers and perform resilience tests this is a nice solution. You will need quite some bash knowledge before you get this running well.
In about all other situations choose for one container. Installing everything in one container is easier for setting up and for developing afterwards. Setting up Docker is just the environment where you want to do your real work. Tooling should help you with your real work, not take all your time and effort.

docker and product versions

I am working for a product company and we do make lot of releases of the product. In the current approach to test multiple releases, we create separate VM and install all infrastructure softwares(db, app server etc) on top of it. Later we deploy the application WARs on the respective VM. Recently, I came across docker and it seems to be much helpful. Hence I started exploring it with the examples listed on the site. But, I am not able to find a way as how docker can be applied to build environment suitable to various releases?
Each product version will have db schema changes.
Each application WARs will have enhancements/defects etc.
Consider below example.
Every month, our company is releasing a new version of software and hence in order to support/fix defects we create VMs per release. Given the fact that if the application's overall size is 2 gb and OS takes close to 5 gb (apart from space it will also take up system resources for extra overhead). The VMs are required to restore any release and test any support issues reported against it. But looking at the additional infrastructure requirements, it seems that its very costly affair.
Can docker have everything required to run an application inside a container/image?
Can docker pack an application which consists of multiple WARs/DB schemas and when started allocate appropriate port?
Will there be any space/memory/speed differences compared to VM and docker assuming above scenario?
Do you think docker is still appropriate solution or should we continue using VMs? Can someone share pointers on how I can achieve above requirements with docker?
tl;dr: Yes, docker can run most applications inside a container.
Docker runs a single process inside each container. When using VMs or real servers, this one process is usually the init system which starts all system services. With docker it is usually your app.
This difference will get you faster startup times for your app (not starting the whole operating system). The trade off is that, if you depend on system services (such as cron, sshd…) you will need to start them yourself. There are some base images that provide a more "VM-like" environment… check phusion's baseimage for instance. To start more than a single process, you can also use a process manager such as supervisord.
Going forward, the recommended (although not required) approach is to start one process in each container (one per application server, one per database server, and so on) and not use containers as VMs.
Docker has no problems allocating ports either. It even has an explicit command on the Dockerfile: EXPOSE. Exposed ports can also be published on the docker host with the --publish argument of run so you don't even need to know the IP assigned to the container.
Regarding used space, you will probably see important savings. Docker images are created by stacking filesystem layers… this means that the common layers are only stored once on the server. In your setup, you will likely only have one copy of the base operating system layer (with VMs, you have a copy on each VM).
On memory you will probably see less significant savings (mostly caused by not starting all the operating system services). Speed is still a subject of research… A few things clear so far is that for faster IO you will need to use docker volumes and that for network heavy use cases you should use host networking. Check the IBM research "An Updated Performance Comparison of Virtual Machines and Linux Containers" for details. Or a summary like InfoQ's.

What is the benefit of Docker container for a memcached instance?

One of the Docker examples is for a container with Memcached configured. I'm wondering why one would want this versus a VM configured with Memcached? I'm guessing that it would make no sense to have more than one memcached docker container running under the same host, and that the only real advantage is speed advantage of "spinning up" the memcached stack in a docker container vs Memcached via a VM. Is this correct?
Also, how does one set the memory to be used by memcached in the docker container? How would this work if there were two or more docker containers with Memcached under one host? (I'm assuming again that two or more would not make sense).
I'm wondering why one would want this versus a VM configured with Memcached?
Security: If someone breaks memcached and trojans the filesystem, it doesn't matter -- the file system gets thrown away when you start a new memchached.
Isolation: You can hard-limit each container to prevent it from using too much RAM.
Standardization: Currently, each app/database/cache/load balancer must record what to install, what to configure and what to run. There is no standard (and no lack of tools such as puppet, chef, etc.). But these tools are very complex, not really OS independent (despite their claims), and carry the same complexity from development to deployment.
With docker, everything is just a container started with run BLAH. If your app has 5 layers, you just have 5 containers to run, with a tiny bit of orchestration on top. Developers never need to "look into the container" unless they are developing at that layer.
Resources: You can spin up 1000's of docker containers on an ordinary PC, but you would have trouble spinning up 100's of VMs. The limit is both CPU and RAM. Docker containers are just processes in an "enhanced" chroot. On a VM, there are dozens of background processes (cron, logrotation, syslog, etc), but there are no extra processes for docker.
I'm guessing that it would make no sense to have more than one memcached docker container running under the same host
It depends. There are cases where you want to split up your RAM into parcels instead of globally. (i.e. imagine if you want to devote 20% of your cache to caching users, and 40% of your cache to caching files, etc.)
Also, most sharding schemes are hard to expand, so people often start with many 'virtual' shards, then expand on to physical boxes when needed. So you might start with your app knowing about 20 memcached instances (chosen based on object ID). At first, all 20 run on one physical server. But later you split them onto 2 servers (10/10), then later onto 5 servers (4/4/4/4) and finally onto 20 physical servers (1 memcached each). Thus, you can scale your app 20x just by moving VMs around and not changing your app.
the only real advantage is speed advantage of "spinning up" the memcached stack in a docker container vs Memcached via a VM. Is this correct?
No, that's just a slight side benefit. see above.
Also, how does one set the memory to be used by memcached in the docker container?
In the docker run command, just use -m.
How would this work if there were two or more docker containers with Memcached under one host? (I'm assuming again that two or more would not make sense).
Same way. If you didn't set a memory limit, it would be exactly like running 2 memcached processes on the host. (If one fills up the memory, both will get out of memory errors.)
There seems to be two questions here...
1 - The benefit is as you describe. You can sandbox the memcached instance (and configuration) in to separate containers so you could run multiple on a given host. In addition, moving the memcached instance to another host is pretty trivial and just requires an update to application configuration in the worst case.
2 - docker run -m <inbytes> <memcached-image> would limit the amount of memory a memcached container could consume. You can run as many of these as you want under a single host.
I might be missing something here, but Memcaching only says something about memory usage, right? Docker containers are very efficient in disk space usage as well. You don't need an OS on every VM, but you can share resources. Insightful expanation with pictures on the docker.io website.

Recommended approach & tools to provision a VM instance(s) from node.js?

I am trying to implement a 'lab in the cloud' to allow people to have a sandbox to experiment and learn in; i.e. for devops (chef/puppet), installing or configuring software etc.
I have a node.js server implementation to manage this and looking for sane and reasonable ways to attack this problem.
The options are bewilderingly diverse: puppet or chef directly, or vagrant seems appropriate. But Openstack, cloudfoundry, Amazon EC2 also provide their own feature sets.
Ideally a micro-cloud solution (multiple VM's per instance) would be ideal as there isn't going to be any large computational load.
Suggestions most appreciated.
Cheers
After some investigation, it seems that LXC on EC2 might be the way forward:
It gives
lightweight, instances on a single EC2 instance
supports hibernate/restore
fast to standup
able to automate using chef/cucumber
EC2 virtualization using LXC
Chef-lxc
Testing infrastructure code in LXC using Cucumber

Resources