Enable remote JMX monitoring for Spark running on DataStax Enterprise - apache-spark

What steps are required to perform on an standard AMI DSE installation on AWS EC2 to enable remote JMX monitoring?
I did following steps but I am not able to connect using VisualVM:
Create /etc/dse/spark/conf/metrics.properties
Copy&paste contents of https://github.com/apache/spark/blob/branch-1.4/conf/metrics.properties.template
Uncommented these lines:
org.apache.spark.metrics.sink.JmxSink
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
Restarted DSE cluster via OpsCenter

By default, remote jmx monitoring is turned off in DSE 4.8.0 and up. Assuming that the ami is running a recent version of DSE. Then spark will be running on a DSE node. The docs are here but here's a summary:
edit cassandra-env.sh and set the following:
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=true"
JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"
LOCAL_JMX=no`
copy jmxremote.password.template from your jre/jdk install location to /etc/cassandra and rename it to jmxremote.password
chown cassandra:cassandra /etc/cassandra/jmxremote.password
chmod 400 /etc/cassandra/jmxremote.password`
Edit jmxremote.password and user and password.
monitorRole QED
controlRole R&D
cassandra cass_password`
Sorry about this one, but you have to add the user with r/w permissions to:
/usr/lib/jvm/java-8-oracle/jre/lib/management/jmxremote.access
I did this install on Ubuntu with Oracle Java 8 installed. Please change the path to match your java installation.
monitorRole readonly
cassandra readwrite
controlRole readwrite \
create javax.management.monitor.,javax.management.timer. \
unregister
Restart cassandra and attach your jmx tools using the cassandra username and password.

Related

Install Apache Spot

I need to install Apache Spot on Ubuntu.
http://spot.incubator.apache.org/doc/#installation
I have already did the setup of a single node cluster of Hadoop following this guide:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
And i already have installed Hive, Kafka and Spark following the guides written in Apache documentation.
The main problem is that I'm not able to configure the file '/spot-setup/spot.conf' properly because when i run the command:
./hdfs_setup.sh
the Terminal displays:
sudo: unknown user: hdfs
sudo: unable to initialize policy plugin
./hdfs_setup.sh:line 48:hdfs:command not found
./hdfs_setup.sh:line 52:hdfs:command not found
./hdfs_setup.sh: line 62:impala-shell:command not found
my /etc/hosts file is:
127.0.0.1 localhost
127.0.1.1 osboxes
127.0.0.2 node03
127.0.0.3 node04
127.0.0.4 node16
Which values should I write in the rows of spot.conf?
Thank you very much.
Script output implies that hadoop is not properly configured in that node. Instead of installing and configuring the dependencies individually you can try cloudera quickstart vm which packages all the dependencies required for Apache Spot.

Java HotSpot(TM) Server VM warning in Cassandra

I am getting following error while running the cassandra.
$ sudo service cassandra start
$ cassandra
Java HotSpot(TM) Server VM warning: Cannot open file /var/log/cassandra/gc.log due to Permission denied.
I guess you have installed the Cassandra using repositories. Cassandra needs a directory to store data and in your case, it cannot create that directories because of permission problems. You have three-way:
Become the root user using the command sudo su and run the command cassandra as the root user. You can issue the command sudo systemctl enable cassandra.service to run Cassandra automatically at startup.
change the following setting in cassandra.yaml file to where the user has permission, like your home directory.
data_file_directories
commitlog_directory
saved_caches_directory
add the line export CASSANDRA_HOME=path/to/cassandra in user .bashrc file and after that run source .bashrc to compile it. This makes Cassandra know the Cassandra install directory and creates the nesseccery folder within that.

Setting up Cassandra on Cloud9 IDE

I've followed these instructions to install Cassandra: http://docs.datastax.com/en/cassandra/2.0/cassandra/install/installDeb_t.html
When I do $ cqlsh terminal replies me with
Connection error: Could not connect to localhost:9160
I read that the issue might be with configuration file cassandra.yaml
However, I turned out I can't access it. My etc/cassandra folder is empty: enter image description here
How to access cassandra.yaml?
Where is cassandra is stored in my project?
Is there a way to check if Cassandra is actually set up in project?
The image you have attached is showing the ~/.cassandra directory off of your home dir. That's not the same as/etc/cassandra. You should be able to confirm this with the following command:
$ ls -al /etc/cassandra/cassandra.yaml
-rw-r--r-- 1 cassandra cassandra 43985 Mar 11 12:46 /etc/cassandra/cassandra.yaml
To verify if Cassandra is even running, this should work for you if you have successfully completed the packaged install:
$ sudo service cassandra status
Otherwise, simply running this should work, too:
$ ps -ef | grep cassandra
When you set up Cassandra, you'll want to set the listen_address and rpc_address to the machine's hostname or IP. They're set to localhost by default, so if it's running cqlsh should connect to that automatically.
My guess is that Cassandra is not starting for you. Check the system.log file, which (for the packaged install) is stored in /var/logs/cassandra:
$ cat /var/log/cassandra/system.log
Check out that file, and you might find some clues as to what is happening here.
Also, did you really install Cassandra 2.0? That version has been deprecated, so for a new install you shouldn't go any lower than Cassandra 2.1.

Could not connect to cassandra with cqlsh

I want to connect to cassandra but got this error:
$ bin/cqlsh
Connection error: ('Unable to connect to any servers', {'192.168.1.200': error(10061, "Tried connecting to [('192.168.1.200', 9042)]. Last error: No connection could be made because the target machine actively refused it")})
Pretty simple.
The machine is actively refusing it because your system does not have cassandra running on it. Follow the following steps to completely get rid of this trouble :
Install Cassandra from DataStax (Datastax-DDC; Cassandra version 3).
Go to ~\installation\path\DataStax-DDC\apache-cassandra\bin.
Open up cmd there. (Use Alt+F+P to open it if you are on windows 8 or later).
type cassandra -f this will generate a lot of stuff on the window and you must get the last line as INFO 11:32:31 Created default superuser role 'cassandra'
Now open another cmd window in the same folder.
Type cqlsh
This should give you a prompt, without any error.
I also discovered that this error doesn't pop up if I use cassadra v2.x found here Archived version of Cassandra. I don't know why :( (If you find out please comment).
So, if the above steps do not work, you can always go back to Cassandra v2.x.
Cheers.
Check if you have started Cassandra server, then provide the host and port as the arguments.
$ bin/cqlsh 127.0.0.1 4092
I run into the same problem. This worked for me.
Go to any directory for example E:\ (doesn't have to be the same disc as the cassandra installation)
Create the following directories
E:\cassandra\storage\commitlogs
E:\cassandra\storage\data
E:\cassandra\storage\savedcaches
Then go to your cassandra installations conf path. In my case.
D:\DataStax-DDC\apache-cassandra\conf
Open cassandra.yaml. Edit the lines containing: data_file_directories, commitlog_directory, saved_caches_directory to look like the code below (change paths accordingly to where you created the folders)
data_file_directories:
- E:\cassandra\storage\data
commitlog_directory: E:\cassandra\storage\commitlog
saved_caches_directory: E:\cassandra\storage\savedcaches
Then open the cmd (I did it as administrator, but didn't check if it is necessary) to your cassandra installations bin path. In my case.
D:\DataStax-DDC\apache-cassandra\bin
run cassandra -f
Lots of stuff will be logged to your screen.
You should now be able to run cqlsh and all other stuff without problems.
Edit: The operating system was windows10 64bit
Edit2: If it stops working after a while check if the service is till running using nodetool status. If it isn't follow this instruction.
I also faced the same problem on a Win32 windows 7 machine.
Check if you have JAVA installed correctly and JAVA_HOME variable set.
Once you have checked the java installation and set JAVA_HOME, uninstall Cassandra and install it again.
Hopefully this would solve the problem. Mine was solved after applying the above two steps.
You need to mention host, user, password for cassandra cqlsh connection. Default cassandra cqlsh user is cassandra and password is cassandra.
$ bin/cqlsh <host> -u cassandra -p cassandra
I also had same problem. I applied many methods given on google and youtube but none of them worked in my case. Finally, I applied the following 3 steps and it worked in my case:-
Create a folder without any space in C or D whichever is your system drive. eg:- C:\cassandra
Install Cassandra in this folder instead of installing in"Program Files".
After installation, it will be like this- C:\cassandra\apache-cassandra-3.11.6
Copy python 2.7 installed in bin folder i.e.,C:\cassandra\apache-cassandra-3.11.6\bin
Now your program is ready for work.
There is no special method to connect cqlsh it simple as below:-
$ bin/cqlsh 127.0.0.1(host IP) 9042 or $ bin/cqlsh 127.0.0.1(host IP) 9160 (if older version of Cassandra)
Don't forget to check port connectivity if you are connecting cqlsh to remote host. Also you can use username/password if you enabled by default it is disabled.

DataStax OpsCenter - can't connect with agents

I've installed DataStax OpsCenter (Apache Cassandra) and in OpsCenter, there is an error: "0 of 1 agents connected". When I click "fix", enter credentials and try to install nodes, i get error:
Unable to SSH to some of the hosts
Unable to SSH to 127.0.0.1:
global name 'get_output' is not defined
Does anyone have any ideas how to fix it?
I fixed the problem with instruction from stackoverflow
The reason is OpsCenter could not find correct cassandra config file (cassandra.yaml).
In my case I installed cassandra to "D:\DataStax" instead of default location "C:\Program Files\DataStax Community". Add conf_location to opscenter\conf\clusters\local.conf solved my problem.
This is my final setting:
conf_location = [DataStax Install Dir]\apache-cassandra\conf\cassandra.yaml

Resources