How to load data from cassandra into NIFI - cassandra

I want to use the querycassandra processor, but i'm facing some troubles. I'm trying to do some very basic actions which is selecting all rows from a table and then send them to another ExcecuteStream processor(java program).
Here is the table :
Here is the processor configuration. I've also added the port 9042 to the HDP sandbox both from the port forwarding table and the start_sandbox script:
username/password: cassandra
a global schema showing the error:
all host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException:[/127.0.0.1] Cannot connect)

Related

Unable to get metrics from PrometheusServlet on Databricks Spark 3.1.1

Trying to get prometheus metrics with grafana dashboard working for Databricks clusters on AWS but cannot seem to get connections on the ports as requried. I've tried a few different setups, but will focus on PrometheusServlet in this question as it seems like it should be the quickest path to glory.
PrometheusServlet - I put this in my metrics.properties file using an init script on each worker:
sudo bash -c "cat <<EOF >> /databricks/spark/conf/metrics.properties
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
# Enable jvm source for instance master, worker, driver and executor
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
EOF"
I also have "spark.ui.prometheus.enabled true" and "spark.executor.processTreeMetrics.enabled true" in the spark config options for the Databricks job
I get a connection refused when trying to hit the worker URL at anything but port 8080. On port 8080 I get a wierd binary response "P%" when I try to connect via curl, and get a bad SSL cert error when I try to connect via the browser. I've opened up the necessary ports on the security group associated with the Spark workers. Trying to add a worker in Grafana just results in a 'Bad Gateway' error.
Has anyone gotten the PrometheusServlet working on Databricks clusters? Is there another way I should be doing this? This is the blog I was following for reference, as the PrometheusServlet documentation is pretty hard to find: https://dzlab.github.io/bigdata/2020/07/03/spark3-monitoring-1/
I'm running Databricks 8.3 runtime, Spark 3.1.1.

Cannot connect to Cassandra on localhost

First time using cassandra, I have attempted to configue the yaml file according to other related posts but had no luck so far. Any idea how do so on the localhost?
The specified host(s) could not be reached.
All host(s) tried for query failed (tried: localhost/0:0:0:0:0:0:0:1:9042 (com.datastax.driver.core.TransportException: [localhost/0:0:0:0:0:0:0:1:9042] Cannot connect), localhost/127.0.0.1:9042 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1:9042] Cannot connect))
[localhost/0:0:0:0:0:0:0:1:9042] Cannot connect
[localhost/127.0.0.1:9042] Cannot connect
Inverse the process(s) you already executed. To run cassandra on localhost, you don't need to change anything so far. You don't need to change cassandra.yaml. It'll run by default on localhost. Read the documents carefully.
Learn more about cassandra.yaml:https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html
Maybe you can stop the Cassandra server and start it again.
bin/cassandra -f -R
Further this link helps understand
https://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html
cassandra.yaml config parameters.
You can also make sure that the ports are set correctly native_transport(9042),native_transport_port_ssl(9142),Storage_port(7000),roc_port(9160) and JMX port(7199).
Lastly,
seeds - "127.0.0.1"
assuming you are working on single node cassandra setup.
Might be your DataStax Cassandra Community Server windows service is stopped. So, start it and reconnect. i hope you get success.
While you are trying to start this service and if it would stopped again then you have to delete the logs folder (where DataStax installed) and restart this service.

Apache Spark Error creating pool to EC2 Cassandra

My configuration are:
1 Spark machine on EC2: c3.2xlarge.
Communicating with 4 nodes of Cassandra on EC2.
I am getting the following error:
16/08/03 22:41:10 ERROR Session: Error creating pool to /XX.XX.XXX.XX:9042
com.datastax.driver.core.TransportException: [/XX.XX.XXX.XX:9042] Cannot connect
The XX are the public IP of the EC2 cassandra.
However inside my spark configuration: Im telling spark to use the seeder internal IP node, which then the spark-connector driver receive the information from Cassandra the public IP.
How my IT set the clusters up, I am assuming the following:
ERROR Session: Error creating pool to /127.0.0.1:9042
However I don't want to have my clusters connect via public IP and open up the firewall. I would like to have it stay to the internal IP of the cluster.
Is there a way to do this Spark code level wise or cassandra.yml configuration wise?

How do I connect to local cassandra db

I have a cassandra db running locally. I can see it working in Ops Center. However, when I open dev center and try to connect I get a cryptic "unable to connect" error.
How can I get the exact name / connectionstring that I need to use to connect to this local cassandra db via dev center?
The hostname/IP to connect to is specified in the listen_address property of your cassandra.yaml.If you are connecting to Cassandra from your localhost only (a sandbox machine), then you can set the listen_address in your cassandra.yaml accordingly:
listen_address: localhost
When you start Cassandra, you should see lines similar to this either in STDOUT or in your system.log (timestamps removed for brevity):
Starting listening for CQL clients on localhost/127.0.0.1:9042...
Binding thrift service to localhost/127.0.0.1:9160
Listening for thrift clients...
These lines indicate which address you should be using to connect to your cluster. The first way to test your connection, is with clqsh. Note that cqlsh will connect to "localhost" by default. If you are connecting to a host/IP other than localhost, then you will need to specify it on the command line.
$ cqlsh
Connected to Test Cluster at localhost:9042.
[cqlsh 5.0.1 | Cassandra 2.1.0-rc5-SNAPSHOT | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh>
If this works, then you should also be able to connect (and test) from DataStax Dev Center (also on your local machine) by defining a connection to localhost, like this:
At this point, you should be able to connect via your application code (Java CQL3 driver shown):
cluster = Cluster.builder().addContactPoint("localhost").build();
Metadata metadata = cluster.Metadata;
Console.WriteLine("Connected to cluster: " + metadata.ClusterName.ToString());
Session session = cluster.connect();

Unable to connect to Cassandra in Presto

I have setup Cassandra, and I've created a keyspace('mykeyspace') and a table in it. I started Cassandra as a service, added the cassandra.properties file like this, in the presto installation files:
connector.name=cassandra
cassandra.contact-points=localhost
cassandra.native-protocol-port=9142
cassandra.thrift-port=9160
After this I have issued this command in Presto but I'm not sure if it is connecting to the Cassandra data:
./presto --server localhost:8080 --catalog cassandra --schema mykeyspace
Now, when I give the command 'show tables', I get this Exception-message:
All host(s) tried for query failed (tried: localhost/127.0.0.1 (com.datastax.driver.core.TransportException: [localhost/127.0.0.1] Cannot connect))
I have used cqlsh, to view a created table in 'mykeyspace' in cassandra, and hence sure that cassandra is running.
I would really appreciate any help to clear this error.
If you have a default cassandra installation, the dafault native protocol port is 9042. If that is the case, you can remove cassandra.native-protocol-port and cassandra.thrift-port properties.
If you want to keep this ports, you can change cassandra.yaml configuration file, property native_transport_port
I hope it's helps.

Resources