Example about how to use map cql type with DataStax java driver - cassandra

I am trying to use the datastax java driver to update and query a column family that has a map field. Does anyone an example about how to use cql collections with the Datastax Java Driver?
Thanks

I will add some examples of using the CQL collections with both simple and prepared statements to the current Java driver doc.
You can use CQL collections with prepared statements. There's an example in the Java driver doc in the quick start section.
http://docs.datastax.com/en/developer/java-driver/2.0/java-driver/quick_start/qsSimpleClientBoundStatements_t.html
Step 4 binds a Java HashSet object to a CQL set column in the songs table.

Normally I'd ask what you've tried, but I know that this isn't in the DataStax documentation for the Java Driver. I'll go through what worked for me.
A couple of things to note:
The DataStax Cassandra Java class directed me to put my variables directly into the CQL text string (instead of binding the map). I'm guessing that binding collections wasn't working at the time of production (for the class videos).
Collections can't be queried using DevCenter, so you'll need to check their values via the command line with cqlsh if you want to see what they are outside your app.
To update an existing row (in the "users" table which has a Map<varchar,varchar> phone_numbers), try something like this:
String cqlQuery = "UPDATE users SET phone_numbers = phone_numbers + ";
cqlQuery += "{'" + phoneType + "':'" + phoneNumber+ "'} ";
cqlQuery += "WHERE username = ?";
PreparedStatement preparedStatement = getSession().prepare(cqlQuery);
BoundStatement boundStatement = preparedStatement.bind(user);
getSession().execute(boundStatement);
The better way to do this (assuming a Map<String,String> phoneNumbers), is to bind the collection to the prepared statement, like this:
String cqlQuery = "UPDATE users SET phone_numbers = ? ";
cqlQuery += "WHERE username = ?";
PreparedStatement preparedStatement = getSession().prepare(cqlQuery);
BoundStatement boundStatement = preparedStatement.bind(phoneNumbers,user);
getSession().execute(boundStatement);
Likewise, to read it back out:
String cqlQuery2 = "SELECT phone_numbers FROM users WHERE username = ?";
PreparedStatement preparedStatement2 = getSession().prepare(cqlQuery2);
BoundStatement boundStatement2 = preparedStatement2.bind(user);
ResultSet result2 = getSession().execute(boundStatement2);
Map<String,String> newMap = result2.one().getMap("phone_numbers", String.class, String.class);
They just covered this today in the (free) CAS101J class on DataStax Academy.

Here is how I did it; This has mapping for a tuple column in Cassandra as well as the map column in Cassandra. I was using Scala with the DataStax Cassandra Java Driver
Needed imports
import java.lang
import java.text.SimpleDateFormat
import com.datastax.driver.core._
import com.datastax.driver.core.querybuilder.QueryBuilder
import scala.collection.Map
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import com.datastax.spark.connector.cql.CassandraConnector
import org.apache.spark.rdd.RDD
import scala.collection.JavaConversions._
Code snippet
val simpleDateFormat: SimpleDateFormat = new SimpleDateFormat("dd-MM-yyyy H:mm:ss")
val start_date: java.util.Date = simpleDateFormat.parse(val("StartTime").replaceAll(":(\\s+)", ":"))
val b_tuple= session.getCluster().getMetadata().newTupleType(DataType.cint(), DataType.cint(), DataType.text())
val time_tuple = session.getCluster().getMetadata().newTupleType(DataType.timestamp(), DataType.timestamp())
val time_tuple_value = time_tuple.newValue(start_date, end_date)
val b_tuple_value = bin_cell_tuple.newValue(b._1: lang.Integer, b._2: lang.Integer, val("xxx"))
val statement_2: Statement = QueryBuilder.insertInto("keyspace", "table_name")
.value("b_key", bin_cell_tuple_value)
.value("time_key", time_tuple_value)
.value("some_map", mapAsJavaMap(my_scala_map))
session.executeAsync(statement_2)

Related

CosmosDB Cassandra API - select in query throws exception

SELECT partition_int, clustering_int, value_string
FROM test_ks1.test WHERE partition_int = ? AND clustering_int IN ?
Prepared select query with in clause throws the following exception:
java.lang.ClassCastException: class com.datastax.oss.driver.internal.core.type.PrimitiveType cannot be cast to class com.datastax.oss.driver.api.core.type.ListType
(com.datastax.oss.driver.internal.core.type.PrimitiveType and com.datastax.oss.driver.api.core.type.ListType are in unnamed module of loader 'app')
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.inspectType(CachingCodecRegistry.java:343)
at com.datastax.oss.driver.internal.core.type.codec.registry.CachingCodecRegistry.codecFor(CachingCodecRegistry.java:256)
at com.datastax.oss.driver.internal.core.data.ValuesHelper.encodePreparedValues(ValuesHelper.java:112)
at com.datastax.oss.driver.internal.core.cql.DefaultPreparedStatement.bind(DefaultPreparedStatement.java:159)
Using datastax oss driver - version 4.5.1 and cosmosDB.
The query works with cassadra as docker and works in cqlsh with CosmosDB.
Queries used:
CREATE TABLE IF NOT EXISTS test_ks1.test (partition_int int, value_string text, clustering_int int, PRIMARY KEY ((partition_int),clustering_int))
Prepare the statement: INSERT INTO test_ks1.test (partition_int,clustering_int,value_string) values (?,?,?)
Insert values: 1,1,”a” | 1,2,”b”
Prepare the statement: SELECT partition_int, clustering_int, value_string FROM test_ks1.test WHERE partition_int = ? AND clustering_int IN ?
Execute with parameters 1,List.of(1,2)
The expected parameter is an integer and not list of integers
Sample code of the select prepared statement:
final CqlSessionBuilder sessionBuilder = CqlSession.builder()
.withConfigLoader(loadConfig(sessionConfig));
CqlSession session = sessionBuilder.build();
PreparedStatement statement = session.prepare(
"SELECT partition_int, clustering_int,"
+ "value_string FROM test_ks1.test WHERE partition_int = ? "
+ "AND clustering_int IN ?");
com.datastax.oss.driver.api.core.cql.ResultSet rs =
session.execute(statement.bind(1,List.of(1, 2)));
Is there a workaround to use prepared select queries with in clause?
Thanks.
My suggestion would be to use an work around like this:
//IDS you want to use;
List<Integer> list = Arrays.asList(1, 2);
// Multiply the number of question marks
String markers = StringUtils.repeat("?,", list.size()-1);
//Final query
final String query = "SELECT * FROM test_ks1.test where clustering_int in ("+markers+" ?)";
PreparedStatement prepared = session.prepare(query);
BoundStatement bound = prepared.bind(list.toArray()).setIdempotent(true);
List<Row> rows = session.execute(bound).all();
I have tried on my end and it works with me.
Now on your case you also have another parameter before the IN that you need to include in the parameter list but only after build the markers placeholder.

Azure Kusto Java SDK get all columns of query results

I'm using the Azure Kusto Java SDK v2.0.1 with Scala over Java8.
I'm executing some query:
val query = " ... "
val tenantId = " ... "
val queryResponse = client.execute(tenantId, query)
val queryResponseResults = queryResponse.getPrimaryResults
I want to convert the given data structure to JSON eventually, so I want to get all columns, but I can't seem to find some kind of getColumns.
While debugging I see the object (KustoResultSetTable) has fields columnsAsArray (which is exactly what I want) and columns - but they are private and I didn't find any getters.
A getter will be added in the next version

How to get Cassandra cql string given a Apache Spark Dataframe in 2.2.0?

I am trying to get a cql string given a Dataframe. I came across this function
Where I can do something like this
TableDef.fromDataFrame(df, "test", "hello", ProtocolVersion.NEWEST_SUPPORTED).cql()
It looks to me that the library uses first column as Partition Key and does not care about Clustering Key so how do I specify to use particular set of columns of a Dataframe as a PartitionKey and ParticularSet of columns as a Clustering Key ?
Looks like I can create a new TableDef however I have to do the entire mapping by myself and in some cases the necessary functions like ColumnType are not accessible in Java. for Example I tried to create a new ColumnDef like below
new ColumnDef("col5", new PartitionKeyColumn(), ColumnType is not accessible in Java)
Objective: To get a CQL create Statement from a Spark DataFrame.
Input My dataframe can have any number of columns with their respective Spark Types. so say I have a Spark Dataframe with 100 columns where my col8, col9 of my dataframe corresponds to cassandra partitionKey columns and my column10 corresponds to cassandra clustering Key column
col1| col2| ...|col100
Now I want to use spark-cassandra-connector library to give me a CQL create table statement given the info above.
Desired Output
create table if not exists test.hello (
col1 bigint, (whatever column1 type is from my dataframe I just picked bigint randomly)
col2 varchar,
col3 double,
...
...
col100 bigint,
primary key(col8,col9)
) WITH CLUSTERING ORDER BY (col10 DESC);
Because required components (PartitionKeyColumn & instances of ColumnType) are objects in Scala, you need to use following syntax to access their intances:
// imports
import com.datastax.spark.connector.cql.ColumnDef;
import com.datastax.spark.connector.cql.PartitionKeyColumn$;
import com.datastax.spark.connector.types.TextType$;
// actual code
ColumnDef a = new ColumnDef("col5",
PartitionKeyColumn$.MODULE$, TextType$.MODULE$);
See code for ColumnRole & PrimitiveTypes to find full list of names of objects/classes.
Update after additional requirements: Code is lengthy, but should work...
SparkSession spark = SparkSession.builder()
.appName("Java Spark SQL example").getOrCreate();
Set<String> partitionKeys = new TreeSet<String>() {{
add("col1");
add("col2");
}};
Map<String, Integer> clustereingKeys = new TreeMap<String, Integer>() {{
put("col8", 0);
put("col9", 1);
}};
Dataset<Row> df = spark.read().json("my-test-file.json");
TableDef td = TableDef.fromDataFrame(df, "test", "hello",
ProtocolVersion.NEWEST_SUPPORTED);
List<ColumnDef> partKeyList = new ArrayList<ColumnDef>();
List<ColumnDef> clusterColumnList = new ArrayList<ColumnDef>();
List<ColumnDef> regColulmnList = new ArrayList<ColumnDef>();
scala.collection.Iterator<ColumnDef> iter = td.allColumns().iterator();
while (iter.hasNext()) {
ColumnDef col = iter.next();
String colName = col.columnName();
if (partitionKeys.contains(colName)) {
partKeyList.add(new ColumnDef(colName,
PartitionKeyColumn$.MODULE$, col.columnType()));
} else if (clustereingKeys.containsKey(colName)) {
int idx = clustereingKeys.get(colName);
clusterColumnList.add(new ColumnDef(colName,
new ClusteringColumn(idx), col.columnType()));
} else {
regColulmnList.add(new ColumnDef(colName,
RegularColumn$.MODULE$, col.columnType()));
}
}
TableDef newTd = new TableDef(td.keyspaceName(), td.tableName(),
(scala.collection.Seq<ColumnDef>) partKeyList,
(scala.collection.Seq<ColumnDef>) clusterColumnList,
(scala.collection.Seq<ColumnDef>) regColulmnList,
td.indexes(), td.isView());
String cql = newTd.cql();
System.out.println(cql);

How do I retrieve table names in Cassandra using Java?

If is use this code in a CQL shell , I get all the names of table(s) in that keyspace.
DESCRIBE TABLES;
I want to retrieve the same data using ResulSet . Below is my code in Java.
String query = "DESCRIBE TABLES;";
ResultSet rs = session.execute(query);
for(Row row : rs) {
System.out.println(row);
}
While session and cluster has been initialized earlier as:
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Session session = cluster.connect("keyspace_name");
Or I like to know Java code to retrieve table names in a keyspace.
The schema for the system tables change between versions quite a bit. It is best to rely on drivers Metadata that will have version specific parsing built in. From the Java Driver use
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Collection<TableMetadata> tables = cluster.getMetadata()
.getKeyspace("keyspace_name")
.getTables(); // TableMetadata has name in getName(), along with lots of other info
// to convert to list of the names
List<String> tableNames = tables.stream()
.map(tm -> tm.getName())
.collect(Collectors.toList());
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Metadata metadata = cluster.getMetadata();
Iterator<TableMetadata> tm = metadata.getKeyspace("Your Keyspace").getTables().iterator();
while(tm.hasNext()){
TableMetadata t = tm.next();
System.out.println(t.getName());
}
The above code will give you table names in the passed keyspace irrespective of the cassandra version used.

How to use java.time.LocalDate in Cassandra query from Spark?

We have a table in Cassandra with column start_time of type date.
When we execute following code:
val resultRDD = inputRDD.joinWithCassandraTable(KEY_SPACE,TABLE)
.where("start_time = ?", java.time.LocalDate.now)
We get following error:
com.datastax.spark.connector.types.TypeConversionException: Cannot convert object 2016-10-13 of type class java.time.LocalDate to com.datastax.driver.core.LocalDate.
at com.datastax.spark.connector.types.TypeConverter$$anonfun$convert$1.apply(TypeConverter.scala:45)
at com.datastax.spark.connector.types.TypeConverter$$anonfun$convert$1.apply(TypeConverter.scala:43)
at com.datastax.spark.connector.types.TypeConverter$LocalDateConverter$$anonfun$convertPF$14.applyOrElse(TypeConverter.scala:449)
at com.datastax.spark.connector.types.TypeConverter$class.convert(TypeConverter.scala:43)
at com.datastax.spark.connector.types.TypeConverter$LocalDateConverter$.com$datastax$spark$connector$types$NullableTypeConverter$$super$convert(TypeConverter.scala:439)
at com.datastax.spark.connector.types.NullableTypeConverter$class.convert(TypeConverter.scala:56)
at com.datastax.spark.connector.types.TypeConverter$LocalDateConverter$.convert(TypeConverter.scala:439)
at com.datastax.spark.connector.types.TypeConverter$OptionToNullConverter$$anonfun$convertPF$29.applyOrElse(TypeConverter.scala:788)
at com.datastax.spark.connector.types.TypeConverter$class.convert(TypeConverter.scala:43)
at com.datastax.spark.connector.types.TypeConverter$OptionToNullConverter.com$datastax$spark$connector$types$NullableTypeConverter$$super$convert(TypeConverter.scala:771)
at com.datastax.spark.connector.types.NullableTypeConverter$class.convert(TypeConverter.scala:56)
at com.datastax.spark.connector.types.TypeConverter$OptionToNullConverter.convert(TypeConverter.scala:771)
at com.datastax.spark.connector.writer.BoundStatementBuilder$$anonfun$8.apply(BoundStatementBuilder.scala:93)
I've tried to register custom converters according to documentation:
object JavaLocalDateToCassandraLocalDateConverter extends TypeConverter[com.datastax.driver.core.LocalDate] {
def targetTypeTag = typeTag[com.datastax.driver.core.LocalDate]
def convertPF = {
case ld: java.time.LocalDate => com.datastax.driver.core.LocalDate.fromYearMonthDay(ld.getYear, ld.getMonthValue, ld.getDayOfMonth)
case _ => com.datastax.driver.core.LocalDate.fromYearMonthDay(1971, 1, 1)
}
}
object CassandraLocalDateToJavaLocalDateConverter extends TypeConverter[java.time.LocalDate] {
def targetTypeTag = typeTag[java.time.LocalDate]
def convertPF = { case ld: com.datastax.driver.core.LocalDate => java.time.LocalDate.of(ld.getYear(), ld.getMonth(), ld.getDay())
case _ => java.time.LocalDate.now
}
}
TypeConverter.registerConverter(JavaLocalDateToCassandraLocalDateConverter)
TypeConverter.registerConverter(CassandraLocalDateToJavaLocalDateConverter)
But it didn't help.
How can I use JDK8 Date/Time classes in Cassandra queries executed from Spark?
I think the simplest thing to do in a where clause like this is to just call
sc
.cassandraTable("test","test")
.where("start_time = ?", java.time.LocalDate.now.toString)
.collect`
And just pass in the string since that will be a well defined conversion.
There seems to be an issue in the TypeConverters where your converter is not taking precedence over the built in converter. I'll take a quick look.
--Edit--
It seems like the registered converters are not being properly transferred to the Executors. In Local mode the code works as expected which makes me think this is a serialization issue. I would open a ticket on the Spark Cassandra Connector for this issue.
Cassandra date format is yyyy-MM-dd HH:mm:ss.SSS
so you can use the below code, if you are using Java 8 to convert Cassandra date to LocalDate, then you can do your logic.
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS")
val dateTime = LocalDateTime.parse(cassandraDateTime, formatter);
Or you can convert LocalDate to Cassandra date format and check it.

Resources