Install Apache Spot - linux

I need to install Apache Spot on Ubuntu.
http://spot.incubator.apache.org/doc/#installation
I have already did the setup of a single node cluster of Hadoop following this guide:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
And i already have installed Hive, Kafka and Spark following the guides written in Apache documentation.
The main problem is that I'm not able to configure the file '/spot-setup/spot.conf' properly because when i run the command:
./hdfs_setup.sh
the Terminal displays:
sudo: unknown user: hdfs
sudo: unable to initialize policy plugin
./hdfs_setup.sh:line 48:hdfs:command not found
./hdfs_setup.sh:line 52:hdfs:command not found
./hdfs_setup.sh: line 62:impala-shell:command not found
my /etc/hosts file is:
127.0.0.1 localhost
127.0.1.1 osboxes
127.0.0.2 node03
127.0.0.3 node04
127.0.0.4 node16
Which values should I write in the rows of spot.conf?
Thank you very much.

Script output implies that hadoop is not properly configured in that node. Instead of installing and configuring the dependencies individually you can try cloudera quickstart vm which packages all the dependencies required for Apache Spot.

Related

Setting up Cassandra on Cloud9 IDE

I've followed these instructions to install Cassandra: http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installDeb_t.html
When I do $ cqlsh terminal replies me with
Connection error: Could not connect to localhost:9160
I read that the issue might be with configuration file cassandra.yaml
However, I turned out I can't access it. My etc/cassandra folder is empty: enter image description here
How to access cassandra.yaml?
Where is cassandra is stored in my project?
Is there a way to check if Cassandra is actually set up in project?
The image you have attached is showing the ~/.cassandra directory off of your home dir. That's not the same as/etc/cassandra. You should be able to confirm this with the following command:
$ ls -al /etc/cassandra/cassandra.yaml
-rw-r--r-- 1 cassandra cassandra 43985 Mar 11 12:46 /etc/cassandra/cassandra.yaml
To verify if Cassandra is even running, this should work for you if you have successfully completed the packaged install:
$ sudo service cassandra status
Otherwise, simply running this should work, too:
$ ps -ef | grep cassandra
When you set up Cassandra, you'll want to set the listen_address and rpc_address to the machine's hostname or IP. They're set to localhost by default, so if it's running cqlsh should connect to that automatically.
My guess is that Cassandra is not starting for you. Check the system.log file, which (for the packaged install) is stored in /var/logs/cassandra:
$ cat /var/log/cassandra/system.log
Check out that file, and you might find some clues as to what is happening here.
Also, did you really install Cassandra 2.0? That version has been deprecated, so for a new install you shouldn't go any lower than Cassandra 2.1.

Could not connect to cassandra with cqlsh

I want to connect to cassandra but got this error:
$ bin/cqlsh
Connection error: ('Unable to connect to any servers', {'192.168.1.200': error(10061, "Tried connecting to [('192.168.1.200', 9042)]. Last error: No connection could be made because the target machine actively refused it")})
Pretty simple.
The machine is actively refusing it because your system does not have cassandra running on it. Follow the following steps to completely get rid of this trouble :
Install Cassandra from DataStax (Datastax-DDC; Cassandra version 3).
Go to ~\installation\path\DataStax-DDC\apache-cassandra\bin.
Open up cmd there. (Use Alt+F+P to open it if you are on windows 8 or later).
type cassandra -f this will generate a lot of stuff on the window and you must get the last line as INFO 11:32:31 Created default superuser role 'cassandra'
Now open another cmd window in the same folder.
Type cqlsh
This should give you a prompt, without any error.
I also discovered that this error doesn't pop up if I use cassadra v2.x found here Archived version of Cassandra. I don't know why :( (If you find out please comment).
So, if the above steps do not work, you can always go back to Cassandra v2.x.
Cheers.
Check if you have started Cassandra server, then provide the host and port as the arguments.
$ bin/cqlsh 127.0.0.1 4092
I run into the same problem. This worked for me.
Go to any directory for example E:\ (doesn't have to be the same disc as the cassandra installation)
Create the following directories
E:\cassandra\storage\commitlogs
E:\cassandra\storage\data
E:\cassandra\storage\savedcaches
Then go to your cassandra installations conf path. In my case.
D:\DataStax-DDC\apache-cassandra\conf
Open cassandra.yaml. Edit the lines containing: data_file_directories, commitlog_directory, saved_caches_directory to look like the code below (change paths accordingly to where you created the folders)
data_file_directories:
- E:\cassandra\storage\data
commitlog_directory: E:\cassandra\storage\commitlog
saved_caches_directory: E:\cassandra\storage\savedcaches
Then open the cmd (I did it as administrator, but didn't check if it is necessary) to your cassandra installations bin path. In my case.
D:\DataStax-DDC\apache-cassandra\bin
run cassandra -f
Lots of stuff will be logged to your screen.
You should now be able to run cqlsh and all other stuff without problems.
Edit: The operating system was windows10 64bit
Edit2: If it stops working after a while check if the service is till running using nodetool status. If it isn't follow this instruction.
I also faced the same problem on a Win32 windows 7 machine.
Check if you have JAVA installed correctly and JAVA_HOME variable set.
Once you have checked the java installation and set JAVA_HOME, uninstall Cassandra and install it again.
Hopefully this would solve the problem. Mine was solved after applying the above two steps.
You need to mention host, user, password for cassandra cqlsh connection. Default cassandra cqlsh user is cassandra and password is cassandra.
$ bin/cqlsh <host> -u cassandra -p cassandra
I also had same problem. I applied many methods given on google and youtube but none of them worked in my case. Finally, I applied the following 3 steps and it worked in my case:-
Create a folder without any space in C or D whichever is your system drive. eg:- C:\cassandra
Install Cassandra in this folder instead of installing in"Program Files".
After installation, it will be like this- C:\cassandra\apache-cassandra-3.11.6
Copy python 2.7 installed in bin folder i.e.,C:\cassandra\apache-cassandra-3.11.6\bin
Now your program is ready for work.
There is no special method to connect cqlsh it simple as below:-
$ bin/cqlsh 127.0.0.1(host IP) 9042 or $ bin/cqlsh 127.0.0.1(host IP) 9160 (if older version of Cassandra)
Don't forget to check port connectivity if you are connecting cqlsh to remote host. Also you can use username/password if you enabled by default it is disabled.

DataStax OpsCenter - can't connect with agents

I've installed DataStax OpsCenter (Apache Cassandra) and in OpsCenter, there is an error: "0 of 1 agents connected". When I click "fix", enter credentials and try to install nodes, i get error:
Unable to SSH to some of the hosts
Unable to SSH to 127.0.0.1:
global name 'get_output' is not defined
Does anyone have any ideas how to fix it?
I fixed the problem with instruction from stackoverflow
The reason is OpsCenter could not find correct cassandra config file (cassandra.yaml).
In my case I installed cassandra to "D:\DataStax" instead of default location "C:\Program Files\DataStax Community". Add conf_location to opscenter\conf\clusters\local.conf solved my problem.
This is my final setting:
conf_location = [DataStax Install Dir]\apache-cassandra\conf\cassandra.yaml

Hadoop 2.2.0 multi-node cluster setup on ec2 - 4 ubuntu 12.04 t2.micro identical instances

I have followed this tutorial to setup Hadoop 2.2.0 multi-node cluster on Amazon EC2. I have had a number of issues with ssh and scp which i was either able to resolve or workaround with help of articles on Stackoverflow but unfortunately, i could not resolve the latest problem.
I am attaching the core configuration files core-site.xml, hdfs-site.xml etc. Also attaching a log file which is a dump output when i run the start-dfs.sh command. It is the final step for starting the cluster and it is giving a mix of errors and i don't have a clue what to do with them.
So i have 4 nodes exactly the same AMI is used. Ubuntu 12.04 64 bit t2.micro 8GB instances.
Namenode
SecondaryNode (SNN)
Slave1
Slave2
The configuration is almost the same as suggested in the tutorial mentioned above.
I have been able to connect with WinSCP and ssh from one instance to the other. Have copied all the configuration files, masters, slaves and .pem files for security purposes and the instances seem to be accessible from one another.
If someone could please look at the log, config files, .bashrc file and let me know what am i doing wrong.
Same security group HadoopEC2SecurityGroup is used for all the instances. All TCP traffic is allowed and ssh port is open. Screenshot in the zipped folder attached. I am able to ssh from Namenode to secondary namenode (SSN). Same goes for slaves as well which means that ssh is working but when i start the hdfs every thing goes down. The error log is not throwing any useful exceptions either. All the files and screenshots can be found as zipped folder here.
Excerpt from error output on console looks like
Starting namenodes on [OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
ec2-54-72-106-167.eu-west-1.compute.amazonaws.com]
You: ssh: Could not resolve hostname you: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
loaded: ssh: Could not resolve hostname loaded: Name or service not known
VM: ssh: Could not resolve hostname vm: Name or service not known
library: ssh: Could not resolve hostname library: Name or service not known
Server: ssh: Could not resolve hostname server: Name or service not known
warning:: ssh: Could not resolve hostname warning:: Name or service not known
which: ssh: Could not resolve hostname which: Name or service not known
guard.: ssh: Could not resolve hostname guard.: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
might: ssh: Could not resolve hostname might: Name or service not known
.....
Add the following entries to .bashrc where HADOOP_HOME is your hadoop folder:
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export
HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
Hadoop 2.2.0 : "name or service not known" Warning
hadoop 2.2.0 64-bit installing but cannot start

Starting Hadoop without ssh'ing to localhost

I've a very tricky situation in my hand. I'm installing Hadoop on few nodes which run Ubuntu 12.04 and our IT guys have created a user "hadoop" for me to use on all the nodes. The issue with this user is that it does not allow ssh on localhost because of some security constraints. So, I'm not able to start Hadoop daemons at all.
I can connect to itself using "ssh hadoop#hadoops_address" but not using loopback address. I also cannot make any changes to the /etc/hosts. Is there a way I can tell Hadoop to ssh to itself using "ssh hadoop#hadoops_address" instead of "ssh hadoop#localhost"?
Hadoop reads the hostname from "masters" and "slaves" file which is present inside conf dir,
edit the file and change the value from localhost to hadoops_address.
This should fix your problem.

Resources