I am a newcomer to Hadoop. For my College project we are given 4 VMs. I need to configure a multi-mode Hadoop cluster on this ( 1 master 3 slaves) and run my webapp on it. I would be using HBase in my project. Usually CentOS is used for installation and deployment of HDP, whereas I was given ubuntu. I cannot use Apache ambari plugin for installation as it is not supported in Ubuntu. I need to manually deploy them, Hence I tried looking out for tutorials.
I looked out for a tutorial to install HDP multinode clusters on ubuntu and found this [http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/]
But its too outdated (2010)
I have the official documentation here, but I am not able to follow it properly.
[http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1-latest/bk_installing_manually_book/content/rpm-chap2-3.html] and I tried following them.
Could someone suggest me somelinks which are latest, a tutorial with decent amount of screenshots for installation of multinode clusters over Ubuntu 14.04 ( 12.04 is also fine).
Thanks a lot!!

The Michael Noll tutorial is too old, I think. I found this site:
I have a mini cluster (with 5 slaves and a master) in my University Lab. Ubuntu 12.04 and Hadoop 2.5.0 is there. Furthermore, I have a VM cluster in my laptop (2 slaves and a master) of Hadoop 1.2.1 on Ubuntu 12.04 too.
But I couldn't install Hadoop (any version) in Ubuntu 14.04. I don't remember the cause, but I think it was some problem with Java version (I don't check that).
I hope that help you!

I can across the same issue to install HDP 2.2 on Ubuntu 14.04, and found a solution.
I documented everything here: http://www.swiss-scalability.com/2014/12/install-hdp-22-on-ubuntu-1404-trusty.html
In a nutshell, the magic happens here:
sed -e "s/14.04/12.04/g" -i /etc/*-release
And the you can install or restart ambari-agent, it will be able to communicate with ambari-server.


Can CCM create command use a locally installed version?

I'm trying to create a Cassandra Cluster locally on a single Windows 64 bit machine and followed these instructions.
I already have Cassandra 3.7 locally installed and was assuming there'd be a way to make use of the same installation through ccm. But it looks like, ccm always tries to download and install the Cassandra version. Looking into the ccm create [options] didn't provide me a pointer.
Does this needs to be followed instead for an already installed one?
You can create a cluster with ccm by using the --install-dir= parameter as described in the README.

Datastax Enterprise Installation on Virtual Box CentOS

Can anyone please guide me step by step installation one by one for Datastax Enterprise Installation on Virtual Box CentOS .
I checked Datastax Documentation , but getting little bit confused in few steps and due to which I am not satisfied. Also checked other resources but not able to understand completely.
So Help me to know installation process one by one with all basis steps.
Thanks in advance .
You may have an easier time using OpsCenter's Lifecycle Manager to deploy DSE. (Disclaimer, I am a Lifecycle Manager dev so am biased.)
First you need to install OpsCenter in a separate VM or Centos box. If you're able to get through the Java install and yum repository setup parts of DSE setup, this won't be difficult: https://docs.datastax.com/en/opscenter/6.0/opsc/install/opscInstallRHEL_t.html
Then run an install job from LCM: https://docs.datastax.com/en/opscenter/6.0/opsc/LCM/opscLCMinstallJob.html Example the pre-requisite section of that page carefully. It will show you the things you need to do in LCM to get ready to run the job, it's all point-and-click, though.
The only pre-requisites on your target DSE machine are "python" (usually installed by default) and for the minute "which", though we'll be removing that dependency in an upcoming version.
Note at the end of this process, you'll need to provide cqlsh an IP address, username, and password to connect to the cluster... even when making a "local" connection from your DSE vm. For example: "cqlsh -u cassandra -p the-password-you-chose-during-lcm-install"

Connecting SparkR to the spark cluster

I have a spark cluster running on 10 machines (1 - 10) with the master at machine 1. All of these run on CentOS 6.4.
I am trying to connect a jupyterhub installation (which is running inside a ubuntu docker because of issues with installing on CentOS), using sparkR, to the cluster and get the spark context.
The code I am using is
sc <- sparkR.init(master="spark://<master-ip>:7077")
The output I get is
attaching package: ‘SparkR’
The following object is masked from ‘package:stats’:
The following objects are masked from ‘package:base’:
intersect, sample, table
Launching java with spark-submit command spark-submit sparkr-shell/tmp/Rtmpzo6esw/backend_port29e74b83c7b3 Error in sparkR.init(master = "spark://"): JVM is not ready after 10 seconds
Error in sparkRSQL.init(sc): object 'sc' not found
I am using Spark 1.4.1. The spark cluster is also running CDH 5.
The jupyterhub installation can connect to the cluster via pyspark and I have python notebooks which use pyspark.
Can someone tell me what I am doing wrong?
I have a similar problem and have searching all around but no solutions. Can you please tell me what do you mean by "jupyterhub installation (which is running inside a ubuntu docker because of issues with installing on CentOS), "?
We have 4 clusters too on CentOS 6.4. One of my other problem is that how do use an IDE like IPython or RStudio to interact with these 4 servers? Do I use my laptop to connect to these servers remotely (if yes, then how?) and if no then what can be the other solution.
Now to answer your question, I can give it a try. I think the you have to use --yarn-cluster option as stated here I hope this helps you solving the problem.

Can I install and run a RHEL based lxc on CenOS?

I have been playing with LXC for the past few days and was wondering it is indeed possible and how do I make it work
I'm using docker and I highly recommend it although it's not a production ready version yet.
Here you can find the manual for installing on RHEL which should work for CentOS as well - http://docs.docker.io/en/latest/installation/rhel/.
Good luck.

Is EC2 Ubuntu 12.04 different, if compiling Haskell locally?

So I want to compile a Haskell program locally, and then upload it to my EC2 Ubuntu 12.04 (free trial) instance.
My question is, will it work on EC2 if I compile my haskell program on an official Ubuntu 12.04 distribution (say in virtualbox)?
Or do I need exactly the same version of Ubuntu as Amazon is running? Do it have to have the exact same set of updates etc..?
P.S. If yes - where do I get the Amazon's version of Ubuntu?
I do this on a regular basis, it should work just fine. Just make sure you're using the same architecture (32- or 64-bit).
You can get a list of the different Ubuntu AMIs at:
If you are using the official Ubuntu AMIs from https://cloud-images.ubuntu.com/releases/, you have the exact same binaries as the official Ubuntu distribution (as long as the architecture is the same: 32-bit or 64-bit). The only difference should be which packages are installed by default (so you might need to install a few extra packages). And as long as both are kept updated, both will also have the exact same set of updates.
Even if you are using AMIs created by someone else, it should still be the same; I believe most Ubuntu AMIs would be created by installing the official Ubuntu distribution.
