Creating Linux Service for Cassandra DDAC - linux

Creating Linux service for DataStax Distribution of Apache Cassandra (DDAC)
Hi,
Installed DataStax Distribution of Apache Cassandra (DDAC), the Cassandra community version by DataStax.
Used this link:
https://docs.datastax.com/en/ddac/doc/datastax_enterprise/install/installDDAC.html
At the end of the instructions, it says to start Cassandra using interactive command, not as a service:
$ bin/cassandra
Also, there is NO option to create a service for Cassandra using:
$ service cassandra start
I get:
Failed to start cassandra.service: Unit not found.
Does DDAC support starting as a service?
Regards,

You are right, DDAC has those instructions to launch the process from command line, if you want to set it as a service, my guess is that Datastax will provide it as part of their enterprise product.
You can still create the systemd service unit, there are multiple examples in github like this one

Related

Apache Cassandra monitoring

What is the best way to monitor if cassandra nodes are up? Due to security reasons JMX and nodetool is out of question. I have cluster metrics monitoring via Rest Api, but I understand that even if a node goes Rest Api will only report on a whole cluster.
Well, I have integrated a system where I can monitor all the metrics regarding to my cluster of all nodes. This seems like complicated but pretty simple to integrate. You will need the following components to build up a monitoring system for cassandra:
jolokia jar
telegraf
influxdb
grafana
I'm writing a short procedure, how it works.
Step 1: copy jolokia jvm jar to install_dir/apache-cassandra-version/lib/ , jolokia jvm agent can be downloaded from anywhere in google.
Step 2: add the following line to install_dir/apache-cassandra-version/conf/cassandra-env.sh
JVM_OPTS="$JVM_OPTS -javaagent:<here_goes_the_path_of_your_jolokia_jar>"
Step 3: install telegraf on each node and configure the metrics you want to monitor. and start telegraf service.
Step 4: install grafana and configure your ip, port, protocol. grafana will give you a dashboard to look after your nodes and start grafana service. Your metrics will be able get visibility here.
Step 5: install influxdb on another server from where you want to store your metrics data which will come through telegraf agent.
Step 6: browse the ip you have mentioned, where you have launched your grafana through browser and add data source ip (influxdb ip), then customize your dashboard.
image source: https://blog.pythian.com/monitoring-cassandra-grafana-influx-db/
This is not for monitoring but only for node state.
Cassandra CQL driver provides info if a particular node is UP or DOWN with Host.StateListener Interface. This info is used by driver to mark a node UP or Down. Thus it could be used if node is down or up if JMX is not accessible.
Java Doc API : https://docs.datastax.com/en/drivers/java/3.3/
I came up with a script which listens for DN nodes in the cluster and reports it to our monitoring setup which is integrated with pagerduty.
The script runs on one of our nodes and executes nodetool status every minute and reports for all down nodes.
Here is the script https://gist.github.com/johri21/87d4d549d05c3e2162af7929058a00d1
[1]:

What version of Apache spark is used in my IBM Analytics for Apache Spark for IBM Cloud service?

I saw an email indicating the sunset of support for 1.6 apache spark within IBM Cloud. I am pretty sure my version is 2.x, but I wanted to confirm. I couldn't find anywhere in the UI that indicated the version, and the bx cli command that I thought would show it didn't.
[chrisr#oc5287453221 ~]$ bx service show "Apache Spark-bc"
Invoking 'cf service Apache Spark-bc'...
Service instance: Apache Spark-bc
Service: spark
Bound apps:
Tags:
Plan: ibm.SparkService.PayGoPersonal
Description: IBM Analytics for Apache Spark for IBM Cloud.
Documentation url: https://www.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html
Dashboard: https://spark-dashboard.ng.bluemix.net/dashboard
Last Operation
Status: create succeeded
Message:
Started: 2018-01-22T16:08:46Z
Updated: 2018-01-22T16:08:46Z
How do I determine the version of spark that I am using? Also, I tried going to the "Dashboard" URL from above, and I got an "Internal Server Error" message after logging in.
The information found on How to check the Spark version doesn't seem to help, because it seems to be related to locally installed spark instances. I need to find out the information from the IBM Cloud (ie. Bluemix) using either the UI or the bluemix CLI. Other possibilities would be running some command from a Jupyter Notebook in iPython running in Data Science Experience (part of IBM Cloud).
The answer was given by ptitzler above, just adding an answer as requested by the email I was sent.
The Spark service itself is not version specific. To find out whether
or not you need to migrate you need to inspect the apps/tools that
utilize the service. For example if you've created notebooks in DSX
you associated them with a kernel that was bound to a specific Spark
version and you'd need to open each notebook to find out which Spark
version they are utilizing. – ptitzler Jan 31 at 16:32

Enable Spark on Same Node As Cassandra

I am trying to test out Spark so I can summarize some data I have in Cassandra. I've been through all the DataStax tutorials and they are very vague as to how you actually enable spark. The only indication I can find is that it comes enabled automatically when you select "Analytics" node during install. However, I have an existing Cassandra node and I don't want to have to use a different machine for testing as I am just evaluating everything on my laptop.
Is it possible to just enable Spark on the same node and deal with any performance implications? If so how can I enable it so that it can be tested?
I see the folders there for Spark (although I'm not positive all the files are present) but when I check to see if it's set to Spark master, it says that no spark nodes are enabled.
dsetool sparkmaster
I am using Linux Ubuntu Mint.
I'm just looking for a quick and dirty way to get my data averaged and so forth and Spark seems like the way to go since it's a massive amount of data, but I want to avoid having to pay to host multiple machines (at least for now while testing).
Yes, Spark is also able to interact with a cluster even if it is not on all the nodes.
Package install
Edit the /etc/default/dse file, and then edit the appropriate line
to this file, depending on the type of node you want:
...
Spark nodes:
SPARK_ENABLED=1
HADOOP_ENABLED=0
SOLR_ENABLED=0
Then restart the DSE service
http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseServ.html
Tar Install
Stop DSE on the node and the restart it using the following command
From the install directory:
...
Spark only node: $ bin/dse cassandra -k - Starts Spark trackers on a cluster of Analytics nodes.
http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseStandalone.html
Enable spark by changing SPARK_ENABLED=1
using the command: sudo nano /usr/share/dse/resources/dse/conf/dse.default

titan rexster with external cassandra instance

I have a cassandra cluster (2.1.0) running fine.
After installing titan 5.1, and editing the titan-cassandra.properties to point to cluster hostname list rather than localhost, i run following -
titan.sh -c conf/titan-cassandra.properties start
It is able to recognize running cassandra instance, starts elastic search, but times out while connecting to rexster.
If i run it with local cassandra, everything runs fine using following ->br>
titan.sh start
do i need to make any change in rexster properties to point to running cassandra cluster..
Thanks in advance
Titan Server started by titan.sh represents a quick way to get started with Titan/Rexster/ES. It is designed to simplify running all those things with one startup script. Once you start breaking things apart (e.g. a separate cassandra cluster), you might not want to use titan.sh anymore because, it still forks a cassandra process when it starts up. Presumably, you don't need that anymore, given that you have a separate cassandra cluster.
Given the more advanced nature of your environment, I would simply download Rexster and configure it to connect to your cluster.

Restart tasktracker and job tracker of hadoop CDH4 using Cloudera services

I have made few entries in mapred-site.xml, to pick these changes i need to restart TT and JT running at my cluster nodes.
Is there any i can restart them using Cloud Era manager web services from command line.
So I can automate those steps any time changed made configuration files for hadoop it will restart TT and JT..
Since version 4.0, Cloudera Manager exposes its functionality through an HTTP API which allows you to do the operations through "curl" from the shell. The API is available in both the Free Edition and the Enterprise Edition.
Their repository hosts a set of client-side utilities for communicating with the Cloudera Manager API. You can find more on the documentation page.

Resources