run describe schema using datastax cassandra driver 3.0 - cassandra

when I am trying to run "describe keyspace keyspacename" command using datastax driver 3.0, it gives me an error
Exception in thread "main" com.datastax.driver.core.exceptions.SyntaxError: line 1:0 no viable alternative at input 'DESCRIBE' ([DESCRIBE]...)
How to run "describe keyspace keyspacename" command?

DESCRIBE is a cqlsh extension.
You could query system table like this
SELECT * from system.schema_keyspaces
WHERE keyspace_name = 'keyspacename';

Datastax's Java Driver has a class KeyspaceMetadata which exposes the method exportAsString. So I used that method to get entire keyspace schema as string using this method.

Related

How to use Airflow SparkSQLOperator?

I'd like to run a query using Spark SQL in Airflow, it looks like the SparkSQLOperator is perfect for this (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/spark_sql_operator.py)
However, I can't figure out how the connection must be configured.
In DB Visualizer, I can connect to the Hive database using:
driver : jdbc
database url : jdbc:hive2://myserver.com:10000/default
database userid : me
database password : mypassword
Applying these settings to the spark_sql_default connection gives me:
enter[2017-12-12 11:35:33,774] {models.py:1462} ERROR - Cannot execute
on hive2://myserver.com:10000/default. Error code is: 1. Output: ,
Stderr:
Any ideas?
Spark isn't a database, so you configure your Spark connection just like you would if you were submitting regular Spark jobs, rather than the JDBC-like parameters you're used to.
If you look at the signature of the operator:
def __init__(self,
sql,
conf=None,
conn_id='spark_sql_default',
executor_cores=None,
executor_memory=None,
keytab=None,
master='yarn',
name='default-name',
num_executors=None,
yarn_queue='default',
*args,
**kwargs)
You need to create a connection (see models.Connection) that specifies the Spark master, which would be one of 'yarn', 'local[x]', or 'spark://hostname:port'. That should really be it, since they default the rest for you and it is likely handled by the underlying Spark config.
Your SQL or HQL code/script can be passed into the first parameter.

Spark Cassandra Connector Issue

I am trying to integrate Cassandra with Spark and facing the below issue.
Issue:
com.datastax.spark.connector.util.ConfigCheck$ConnectorConfigurationException: Invalid Config Variables
Only known spark.cassandra.* variables are allowed when using the Spark Cassandra Connector.
spark.cassandra.keyspace is not a valid Spark Cassandra Connector variable.
Possible matches:
spark.cassandra.sql.keyspace
spark.cassandra.output.batch.grouping.key
at com.datastax.spark.connector.util.ConfigCheck$.checkConfig(ConfigCheck.scala:50)
at com.datastax.spark.connector.cql.CassandraConnectorConf$.apply(CassandraConnectorConf.scala:253)
at org.apache.spark.sql.cassandra.CassandraSourceRelation$.apply(CassandraSourceRelation.scala:263)
at org.apache.spark.sql.cassandra.CassandraCatalog.org$apache$spark$sql$cassandra$CassandraCatalog$$buildRelation(CasandraCatalog.scala:41)
at org.apache.spark.sql.cassandra.CassandraCatalog$$anon$1.load(CassandraCatalog.scala:26)
at org.apache.spark.sql.cassandra.CassandraCatalog$$anon$1.load(CassandraCatalog.scala:23)
Please find the below versions of spark Cassandra and connector I am using.
Spark : 1.6.0
Cassandra : 2.1.17
Connector Used : spark-cassandra-connector_2.10-1.6.0-M1.jar
Below is the code snippet I am using to connect Cassandra from spark.
val conf: org.apache.spark.SparkConf = new SparkConf(true) \
.setAppName("Spark Cassandra") \
.set"spark.cassandra.connection.host", "abc.efg.lkh") \
.set("spark.cassandra.auth.username", "xyz") \
.set("spark.cassandra.auth.password", "1234") \
.set("spark.cassandra.keyspace","abcded")
val sc = new SparkContext("local[*]", "Spark Cassandra",conf)
val csc = new CassandraSQLContext(sc)
csc.setKeyspace("abcded")
val my_df = csc.sql("select * from table")
Here when I try to create DF, I am getting above posted error. I tried without passing schema in conf but it is trying to access in default schema where mentioned user doesn't have access.
Already a JIRA was opened and closed.
https://datastax-oss.atlassian.net/browse/SPARKC-102
yet I am getting this issue. Please let me know whether I need to use lastest connector to resolve this issue.
Thanks in advance.
The important information is in the error message you posted [formatted for readability]:
Invalid Config Variables
Only known spark.cassandra.* variables are allowed when using the Spark Cassandra Connector.
spark.cassandra.keyspace is not a valid Spark Cassandra Connector variable.
Possible matches: spark.cassandra.sql.keyspace
spark.cassandra.keyspace is not an available property for the connector. A full list of the available properties can be found here: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md
You may have some luck using the suggested spark.cassandra.sql.keyspace; otherwise you may just need to explicitly specify the keyspace for every Cassandra interaction you perform using the connector.

spark-sql Table or view not found error

I'm trying to run a basic java program using spark-sql & JDBC. I'm running into the following error. Not sure what's wrong here. Most of the material I have read does not talk on what needs to be done to fix this problem.
It will also be great if someone can point me to some good material to read on Spark-sql (Spark-2.1.1). I'm planning to use spark to implement ETL's, connecting to MySQL and other datasources.
Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: myschema.mytable; line 1 pos 21;
String MYSQL_CONNECTION_URL = "jdbc:mysql://localhost:3306/myschema";
String MYSQL_USERNAME = "root";
String MYSQL_PWD = "root";
Properties connectionProperties = new Properties();
connectionProperties.put("user", MYSQL_USERNAME);
connectionProperties.put("password", MYSQL_PWD);
Dataset<Row> jdbcDF2 = spark.read()
.jdbc(MYSQL_CONNECTION_URL, "myschema.mytable", connectionProperties);
spark.sql("SELECT COUNT(*) FROM myschema.mytable").show();
It's because Spark is not registering any tables from any schemas from connection by default in Spark SQL Context. You must register it by yourself:
jdbcDF2.createOrReplaceTempView("mytable");
spark.sql("select count(*) from mytable");
Your jdbcDF2 has a source in myschema.mytable from MySQL and will load data from this table on some action.
Remember that MySQL table is not the same as Spark table or view. You are telling Spark to read data from MySQL, but you must register this DataFrame or Dataset as table or view in current Spark SQL Context or Spark Session

export cassandra schema (all keyspaces and tables)

I am going through the code of apache cassandra 2.2 branch, and unable to find where we can backup entire schema (including all keyspaces and tables).
If you only need to backup scheme than you can use DESCRIBE KEYSPACE in cqlsh
cqlsh $(hostname) -e "DESCRIBE KEYSPACE <keyspace>;" > backup.cql
To restore scheme you can simply do
cat backup.cql | cqlsh $(hostname)
EDIT
To describe keyspace programmatically via thrift client. Compile client and then you can use describe_keyspaces/describe_keyspace
Cassandra.Client client = ...
for (KDef keyspaceDefinition : client.describe_keyspaces()) {
// process keyspace data
}

How to delete graph in Titan with Cassandra storage backend?

I use Titan 0.4.0 All, running Rexster in shared VM mode on Ubuntu 12.04.
How could I properly delete a graph in Titan which is using the Cassandra storage backend?
I have tried the TitanCleanup.clear(graph), but it does not delete everything. The indices are still there. My real issue is that I have an index which I don't want (it crashes every query), however as I understand Titan's documentation it is impossible to remove an index once it is created.
You can clear all the edges/vertices with:
g.V.remove()
but as you have found that won't clear the types/indices previously created. The most cleanly option would be to just delete the Cassandra data directory.
If you are executing the delete via a unit test you might try to do this as part of your test setup:
this.config = new BaseConfiguration(){{
addProperty("storage.backend", "berkeleyje")
addProperty("storage.directory", "/tmp/titan-schema-test")
}}
GraphDatabaseConfiguration graphconfig = new GraphDatabaseConfiguration(config)
graphconfig.getBackend().clearStorage()
g = (StandardTitanGraph) TitanFactory.open(config)
Be sure to call g.shutdown() in your test teardown method.
Just to update this answer.
With Titan 1.0.0 this can be done programmatically in Java with:
TitanGraph graph = TitanFactory.open(config);
graph.close();
TitanCleanup.clear(graph);
For the continuation of Titan called JanusGraph, the command is JanusGraphFactory.clear(graph) but is soon to be JanusGraphCleanup.clear(graph).
As was mentioned in one of the comments to the earlier answer DROPping a keyspace titan using cqlsh should do it:
cqlsh> DROP KEYSPACE titan;
The name of the keyspace Titan uses is set up using storage.cassandra.keyspace configuration option. You can change it to whatever name you want and is acceptable by Cassandra.
storage.cassandra.keyspace=hello_titan
When Cassandra is getting up, it prints out the keyspace's name as follows:
INFO 19:50:32 Create new Keyspace: KSMetaData{name=hello_titan,
strategyClass=SimpleStrategy, strategyOptions={replication_factor=1},
cfMetaData={}, durableWrites=true,
userTypes=org.apache.cassandra.config.UTMetaData#767d6a9f}
In 0.9.0-M1, the name appears in Titan's log in DEBUG (set log4j.rootLogger=DEBUG, stdout in conf/log4j-server.properties):
[DEBUG] AstyanaxStoreManager - Found keyspace titan
or the following when it doesn't:
[DEBUG] AstyanaxStoreManager - Creating keyspace titan...
[DEBUG] AstyanaxStoreManager - Created keyspace titan

Resources