Cassandra column metadata shows "DateType" instead of "timestamp" after driver upgrade - cassandra

I have some Java code that performs introspection on the schema of Cassandra tables. After upgrading the Cassandra driver dependency, this code is no longer working as expected. With the old driver version, the type for a timestamp column was returned from ColumnMetadata#getType() as DataType.Name#TIMESTAMP. With the new driver, the same call returns DataType.Name#CUSTOM and CustomType#getCustomTypeClassName returning org.apache.cassandra.db.marshal.DateType.
The old driver version is com.datastax.cassandra:cassandra-driver-core:2.1.9:
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>2.1.9</version>
</dependency>
The new driver version is com.datastax.cassandra:dse-driver:1.1.2:
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>dse-driver</artifactId>
<version>1.1.2</version>
</dependency>
The cluster version is DataStax Enterprise 2.1.11.969:
cqlsh> SELECT release_version FROM system.local;
release_version
-----------------
2.1.11.969
To illustrate the problem, I created a simple console application that prints column metadata for a specified table. (See below.) When built with the old driver, the output looks like this:
# old driver
mvn -Pcassandra-driver clean package
java -jar target/cassandra-print-column-metadata-cassandra-driver.jar <address> <user> <password> <keyspace> <table>
...
ts timestamp
...
When built with the new driver, the output looks like this:
# new driver
mvn -Pdse-driver clean package
java -jar target/cassandra-print-column-metadata-dse-driver.jar <address> <user> <password> <keyspace> <table>
...
ts 'org.apache.cassandra.db.marshal.DateType'
...
So far, I have only encountered this problem with timestamp columns. I have not seen it for any other data types, though my schema does not exhaustively use all of the supported data types.
DESCRIBE TABLE shows that the column is timestamp. system.schema_columns shows that the validator is org.apache.cassandra.db.marshal.DateType.
[cqlsh 3.1.7 | Cassandra 2.1.11.969 | CQL spec 3.0.0 | Thrift protocol 19.39.0]
cqlsh:my_keyspace> DESCRIBE TABLE my_table;
CREATE TABLE my_table (
prim_addr text,
ch text,
received_on timestamp,
...
PRIMARY KEY (prim_addr, ch, received_on)
) WITH
bloom_filter_fp_chance=0.100000 AND
caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND
comment='emm_ks' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
compaction={'sstable_size_in_mb': '160', 'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
cqlsh:system> SELECT * FROM system.schema_columns WHERE keyspace_name = 'my_keyspace' AND columnfamily_name = 'my_table' AND column_name IN ('prim_addr', 'ch', 'received_on');
keyspace_name | columnfamily_name | column_name | component_index | index_name | index_options | index_type | type | validator
---------------+-------------------+-------------+-----------------+------------+---------------+------------+----------------+------------------------------------------
my_keyspace | my_table | ch | 0 | null | null | null | clustering_key | org.apache.cassandra.db.marshal.UTF8Type
my_keyspace | my_table | prim_addr | null | null | null | null | partition_key | org.apache.cassandra.db.marshal.UTF8Type
my_keyspace | my_table | received_on | 1 | null | null | null | clustering_key | org.apache.cassandra.db.marshal.DateType
Is this a bug in the driver, an intentional change in behavior, or some kind of misconfiguration on my part?
pom.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cnauroth</groupId>
<artifactId>cassandra-print-column-metadata</artifactId>
<version>0.0.1-SNAPSHOT</version>
<description>Console application that prints Cassandra table column metadata</description>
<name>cassandra-print-column-metadata</name>
<packaging>jar</packaging>
<properties>
<maven.compiler.source>1.7</maven.compiler.source>
<maven.compiler.target>1.7</maven.compiler.target>
<slf4j.version>1.7.25</slf4j.version>
</properties>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<archive>
<manifest>
<addDefaultImplementationEntries>true</addDefaultImplementationEntries>
<mainClass>cnauroth.Main</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<finalName>${project.artifactId}</finalName>
<appendAssemblyId>false</appendAssemblyId>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<profiles>
<profile>
<id>dse-driver</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>dse-driver</artifactId>
<version>1.1.2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<finalName>${project.artifactId}-dse-driver</finalName>
</configuration>
</plugin>
</plugins>
</build>
</profile>
<profile>
<id>cassandra-driver</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>2.1.9</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<finalName>${project.artifactId}-cassandra-driver</finalName>
</configuration>
</plugin>
</plugins>
</build>
</profile>
</profiles>
<dependencies>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>${slf4j.version}</version>
</dependency>
</dependencies>
</project>
Main.java
package cnauroth;
import java.util.List;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.ColumnMetadata;
import com.datastax.driver.core.Session;
class Main {
public static void main(String[] args) throws Exception {
// Skipping validation for brevity
String address = args[0];
String user = args[1];
String password = args[2];
String keyspace = args[3];
String table = args[4];
try (Cluster cluster = new Cluster.Builder()
.addContactPoints(address)
.withCredentials(user, password)
.build()) {
List<ColumnMetadata> columns =
cluster.getMetadata().getKeyspace(keyspace).getTable(table).getColumns();
for (ColumnMetadata column : columns) {
System.out.println(column);
}
}
}
}

It looks like the internal Cassandra type used for Timestamp changed from org.apache.cassandra.db.marshal.DateType and org.apache.cassandra.db.marshal.TimestampType between Cassandra 1.2 and 2.0 (CASSANDRA-5723). If you created the table with Cassandra 1.2 (or a DSE compatible version) DateType would be used (even if you upgraded your cluster later).
It appears that the 2.1 version of the java driver was able to account for this (source) but starting with 3.0 it does not (source). Instead, it parses it as a Custom type.
Fortunately, the driver is still able to serialize and deserialize this column as the cql timestamp type is communicated over the protocol in responses, but it's a bug that the driver parses this as the wrong type. I went ahead and created JAVA-1561 to track this.
If you were to migrate your cluster to C* 3.0+ or DSE 5.0+ I suspect the problem goes away as the schema tables reference the cql name instead of the representative Java class name (unless it is indeed a custom type).

Related

Configure the org.jooq.conf.Settings.backslashEscaping property

In the book "Beginning jOOQ" by Tayo Koleoso it is written:
Caution For your own peace of mind, go ahead and configure the
org.jooq.conf.Settings.backslashEscaping property on
your Settings object. MySQL and some versions of PostgreSQL
support non-standard escape characters that can cause you a lot of
grief when you least expect it. This property lets jOOQ properly handle
this “feature” from MySQL.
The problem is that to the best of my ability I can't find in the book how toconfigure this.
Pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.0.2</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.example</groupId>
<artifactId>jooq</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>jooq</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>17</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jooq</artifactId>
</dependency>
<dependency>
<groupId>com.mysql</groupId>
<artifactId>mysql-connector-j</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
application.properties
spring.datasource.url=jdbc:mysql://${MYSQL_HOST:localhost}:3306/testdb
spring.datasource.username=testdb
spring.datasource.password=testdb
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
Controller
#Controller
#RequestMapping("/start")
public class StartController {
#GetMapping("/")
public void start() {
try (Connection connection = DriverManager.
getConnection("jdbc:mysql://localhost/test?user=testdb&password=testdb")) {
DSLContext context = DSL.using(connection, SQLDialect.MYSQL);
ResultQuery resultQuery = context.
resultQuery("SELECT * FROM edens_car.complete_car_listing");
List<CompleteVehicleRecord> allVehicles =
resultQuery.fetchInto(CompleteVehicleRecord.class);
} catch (SQLException sqlex) {
assert true;
}
}
}
I just organised the Spring Controller for my convenience, don't pay attention to it.
Adapting your example code
You can pass custom Settings to one of the DSL.using() overloads, specifically:
DSLContext context = DSL.using(connection, SQLDialect.MYSQL, settings);
Noting that these methods are just convenience for creating your own DefaultConfiguration
A Spring Boot DefaultConfigurationCustomizer
Alternatively, if you're using Spring Boot, then this article shows how to use a DefaultConfigurationCustomizer, e.g.:
#Bean
public DefaultConfigurationCustomizer configurationCustomiser() {
return (DefaultConfiguration c) -> c.settings()
.withBackslashEscaping(...);
}

Getting error : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf

I am working on Kafka Spark Streaming. The IDLE doesn't show any errors and the program builds successfully as well but I am getting this error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at KafkaSparkStream1$.main(KafkaSparkStream1.scala:13)
at KafkaSparkStream1.main(KafkaSparkStream1.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
I am using maven. I have also set up my environment variables correctly as every component is working individually My spark version is 3.0.0-preview2, Scala version is 2.12
I have exported a spark-streaming-Kafka jar file.
Here is my pom file:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.org.cpg.casestudy</groupId>
<artifactId>Kafka_casestudy</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<spark.version>3.0.0</spark.version>
<scala.version>2.12</scala.version>
</properties>
<build>
<plugins>
<!-- Maven Compiler Plugin-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<!-- Apache Kafka Clients-->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.5.0</version>
</dependency>
<!-- Apache Kafka Streams-->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>2.5.0</version>
</dependency>
<!-- Apache Log4J2 binding for SLF4J -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.11.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0-preview2</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.12</artifactId>
<version>3.0.0-preview2</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.12</artifactId>
<version>3.0.0-preview2</version>
</dependency>
</dependencies>
Here is my code (word count of message send by producer):
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.spark._
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.codehaus.jackson.map.deser.std.StringDeserializer
object KafkaSparkStream {
def main(args: Array[String]): Unit = {
val brokers = "localhost:9092";
val groupid = "GRP1";
val topics = "KafkaTesting";
val SparkConf = new SparkConf().setMaster("local[*]").setAppName("KafkaSparkStreaming");
val ssc = new StreamingContext(SparkConf,Seconds(10))
val sc = ssc.sparkContext
sc.setLogLevel("off")
val topicSet = topics.split(",").toSet
val kafkaPramas = Map[String , Object](
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokers,
ConsumerConfig.GROUP_ID_CONFIG -> groupid,
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer]
)
val messages = KafkaUtils.createDirectStream[String,String](
ssc, LocationStrategies.PreferConsistent, ConsumerStrategies.Subscribe[String,String](topicSet,kafkaPramas)
)
val line=messages.map(_.value)
val words = line.flatMap(_.split(" "))
val wordCount = words.map(x=> (x,1)).reduceByKey(_+_)
wordCount.print()
ssc.start()
ssc.awaitTermination()
}
}
Try cleaning your mvn local repository or else run below command to override you dependency JARs from online
mvn clean install -U
Your spark dependencies, specially spark-core_2.12-3.0.0-preview2.jar is not added to your class path while executing the Spark JAR.
you can do it via
spark-submit --jars <path>/spark-core_2.12-3.0.0-preview2.jar

How to skip a specific scenario having unique tag from TestNG runner

I have feature file with lots of scenarios, which run on multiple countries. To run on different countries I have created different TestNG runner classes.
Here my question is how can I skip a scenario to run from a specific runner file. I am running scenario's using a feature level tag.
For Example :
Feature file is having #regression tag and I am using this tag in
all runner classes to across the countries. Since data issue for some
countries I want skip some scenario for some countries.(I am using TestNG
runner). I have seen that in JUnit runner you can use not to skip but the
same is not working in TestNG runner.
I tried below :
#CucumberOptions(
plugin = "com.cucumber.listener.ExtentCucumberFormatter:",
monochrome = true,
features = "src/features/Cart",
tags = { "#regression and not #invalid"}
#regression
Feature: Validate login functionality for all countries
#valid
Scenario Outline: login with valid user access
Given site launched
And user enters "<username>"
And user enters "<password>"
When user clicks Sign In button
Then display user home page
Examples:
| username | password |
| xyz | xyz123 |
| abc | abc123 |
#invalid
Scenario Outline: login with invalid user access
Given site launched
And user enters "<username>"
And user enters "<password>"
When user clicks Sign In button
Then display user home page
Examples:
| username | password |
| xyz | xyz123 |
| abc | abc123 |
Below is my runner class file :
package runner;
import java.io.File;
import java.util.HashMap;
import java.util.Map;
import org.junit.runner.RunWith;
import org.testng.annotations.AfterClass;
import org.testng.annotations.BeforeClass;
import com.cucumber.listener.Reporter;
import cucumber.api.CucumberOptions;
import cucumber.api.junit.Cucumber;
import cucumber.api.testng.AbstractTestNGCucumberTests;
import utils.ConfigManagement;
import utils.ExcelSheetManager;
import utils.ExtentReportUtills;
#CucumberOptions(plugin = "com.cucumber.listener.ExtentCucumberFormatter:",
monochrome = true, features = "src/features/Cart", tags = { "#regression and not #invalid"},
format = { "html:cucumber-html-reports1",
"json:cucumber-html-reports/cucumber.json" }, dryRun = false, glue = "steps")
public class EU_IR_EN extends AbstractTestNGCucumberTests {
public static Map<String, String> configDetails = new HashMap<>();
#BeforeClass
public static void setup() throws Exception {
Map<String, String> SheetData = new HashMap<>();
String key = "Cart";
SheetData.put("SHEETNAME", key);
configDetails = ConfigManagement.GetConfigDetailsForRCL(key);
SheetData.putAll(configDetails);
System.out.println("map at class level of runner1" + SheetData);
ExcelSheetManager.setData(SheetData);
System.out.println("first statement");
}
#AfterClass
public static void prepareReport() throws Exception {
ExtentReportUtills.UpdateExtentReport();
}
}
Below is my POM.xml
<?xml version="1.0" encoding="UTF-8"?>
http://maven.apache.org/xsd/maven-4.0.0.xsd">
4.0.0
<groupId>cucumberTest</groupId>
<artifactId>FSCartUIAutomation</artifactId>
<version>1</version>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.5</version>
</dependency>
<dependency>
<groupId>com.aventstack</groupId>
<artifactId>extentreports</artifactId>
<version>3.0.6</version>
</dependency>
<dependency>
<groupId>com.vimalselvam</groupId>
<artifactId>cucumber-extentsreport</artifactId>
<version>3.0.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-java</artifactId>
<version>4.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/io.cucumber/cucumber-testng -->
<dependency>
<groupId>io.cucumber</groupId>
<artifactId>cucumber-testng</artifactId>
<version>4.2.0</version>
</dependency>
<dependency>
<groupId>io.github.bonigarcia</groupId>
<artifactId>webdrivermanager</artifactId>
<version>3.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>io.github.bonigarcia</groupId>
<artifactId>selenium-jupiter</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.14</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.14</version>
</dependency>
<!-- For excel file handling -->
<dependency>
<groupId>net.sourceforge.jexcelapi</groupId>
<artifactId>jxl</artifactId>
<version>2.6.12</version>
</dependency>
<!-- https://mvnrepository.com/artifact/joda-time/joda-time -->
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.3</version>
</dependency>
</dependencies>
<pluginRepositories>
<pluginRepository>
<snapshots>
<enabled>false</enabled>
</snapshots>
<id>hindsighttesting.release</id>
<name>Hindsight Software Release Repository</name>
<url>http://repo.hindsightsoftware.com/public-maven</url>
</pluginRepository>
</pluginRepositories>
<build>
<plugins>
<plugin>
<groupId>com.hindsighttesting.behave</groupId>
<artifactId>behave-maven-plugin</artifactId>
<version>1.0.4</version>
<executions>
<execution>
<id>install6</id>
<phase>package</phase>
<goals>
<goal>features</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-failsafe-plugin</artifactId>
<version>2.12</version>
<executions>
<execution>
<id>integration-test</id>
<goals>
<goal>integration-test</goal>
<goal>verify</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.19.1</version>
<configuration>
<suiteXmlFiles>
<suiteXmlFile>GFSCart.xml</suiteXmlFile>
</suiteXmlFiles>
<printSummary>true</printSummary>
<forkCount>4</forkCount>
</configuration>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<!-- <source>${jdk.level}</source> <target>${jdk.level}</target> -->
</configuration>
</plugin>
</plugins>
</build>
You can use tags to run / skip specific scenarios.
From the docs:
You can tell Cucumber to ignore scenarios with a particular tag:
Using JUnit runner class:
#CucumberOptions(tags = "not #smoke")
class RunCucumberTest {}

Failed to Read data from Couchbase using Spark

I have been trying to read data from couchbase , but failing to read due to authentication issue.
import com.couchbase.client.java.document.JsonDocument
import org.apache.spark.sql.SparkSession
import com.couchbase.spark._
object SparkRead {
def main(args: Array[String]): Unit = {
// The SparkSession is the main entry point into spark
val spark = SparkSession
.builder()
.appName("KeyValueExample")
.master("local[*]") // use the JVM as the master, great for testing
.config("spark.couchbase.nodes", "***********") // connect to couchbase on hostname
.config("spark.couchbase.bucket.beer-sample","") // open the travel-sample bucket with empty password
.config("spark.couchbase.username", "couchdb")
.config("spark.couchbase.password", "******")
.config("spark.couchbase.connectTimeout","30000")
.config("spark.couchbase.kvTimeout","10000")
.config("spark.couchbase.socketConnect","10000")
.getOrCreate()
spark.sparkContext
.couchbaseGet[com.couchbase.client.java.document.JsonDocument](Seq("airline_10123")) // Load documents from couchbase
.collect() // collect all data from the spark workers
.foreach(println) // print each document content
}
}
Below is the Build File
name := "KafkaSparkCouchReadWrite"
organization := "my.clairvoyant"
version := "1.0.0-SNAPSHOT"
scalaVersion := "2.11.11"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-streaming" % "2.1.0",
"org.apache.spark" %% "spark-sql" % "2.1.0",
"org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0",
"com.couchbase.client" %% "spark-connector" % "2.1.0",
"org.glassfish.hk2" % "hk2-utils" % "2.2.0-b27",
"org.glassfish.hk2" % "hk2-locator" % "2.2.0-b27",
"javax.validation" % "validation-api" % "1.1.0.Final",
"org.apache.kafka" %% "kafka" % "0.11.0.0",
"com.googlecode.json-simple" % "json-simple" % "1.1").map(_.excludeAll(ExclusionRule("org.glassfish.hk2"),ExclusionRule("javax.validation")))
ERROR LOG
17/12/12 15:18:35 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.33.220, 52402, None)
17/12/12 15:18:35 INFO SharedState: Warehouse path is 'file:/Users/sampat/Desktop/GitClairvoyant/cpdl3-poc/KafkaSparkCouchReadWrite/spark-warehouse/'.
17/12/12 15:18:35 INFO CouchbaseCore: CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslKeystorePassword=false, sslKeystore=null, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=8, computationPoolSize=8, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=1, viewServiceEndpoints=12, queryServiceEndpoints=12, searchServiceEndpoints=12, ioPool=NioEventLoopGroup, kvIoPool=null, viewIoPool=null, searchIoPool=null, queryIoPool=null, coreScheduler=CoreScheduler, memcachedHashingStrategy=DefaultMemcachedHashingStrategy, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-java-client/2.4.2 (git: 2.4.2, core: 1.4.2), dcpEnabled=false, retryStrategy=BestEffort, maxRequestLifetime=75000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS, powers of 2; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=10, upper=100000}, keepAliveInterval=30000, autoreleaseAfter=2000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=1000, dcpConnectionBufferSize=20971520, dcpConnectionBufferAckThreshold=0.2, dcpConnectionName=dcp/core-io, callbacksOnIoPool=false, disconnectTimeout=25000, requestBufferWaitStrategy=com.couchbase.client.core.env.DefaultCoreEnvironment$2#7b7b3edb, queryTimeout=75000, viewTimeout=75000, kvTimeout=2500, connectTimeout=5000, dnsSrvEnabled=false}
17/12/12 15:18:37 WARN Endpoint: [null][KeyValueEndpoint]: Authentication Failure.
17/12/12 15:18:37 INFO Endpoint: [null][KeyValueEndpoint]: Got notified from Channel as inactive, attempting reconnect.
17/12/12 15:18:37 WARN ResponseStatusConverter: Unknown ResponseStatus with Protocol HTTP: 401
17/12/12 15:18:37 WARN ResponseStatusConverter: Unknown ResponseStatus with Protocol HTTP: 401
Exception in thread "main" com.couchbase.client.java.error.InvalidPasswordException: Passwords for bucket "beer-sample" do not match.
at com.couchbase.client.java.CouchbaseAsyncCluster$OpenBucketErrorHandler.call(CouchbaseAsyncCluster.java:601)
at com.couchbase.client.java.CouchbaseAsyncCluster$OpenBucketErrorHandler.call(CouchbaseAsyncCluster.java:584)
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140)
at rx.internal.operators.OnSubscribeMap$MapSubscriber.onError(OnSubscribeMap.java:88)
Sample Code :
import com.couchbase.client.java.document.JsonDocument
import com.couchbase.spark._
import org.apache.spark.sql.SparkSession
object SparkReadCouchBase {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("KeyValueExample")
.master("local[*]") // use the JVM as the master, great for testing
.config("spark.couchbase.nodes", "127.0.0.1") // connect to couchbase on hostname
.config("spark.couchbase.bucket.travel-sample","") // open the travel-sample bucket with empty password
.config("com.couchbase.username", "*******")
.config("com.couchbase.password", "*******")
.config("com.couchbase.kvTimeout","10000")
.config("com.couchbase.connectTimeout","30000")
.config("com.couchbase.socketConnect","10000")
.getOrCreate()
println("=====================================================================================")
spark.sparkContext
.couchbaseGet[JsonDocument](Seq("airline_10123")) // Load documents from couchbase
.collect() // collect all data from the spark workers
.foreach(println) // print each document content
println("=====================================================================================")
}
}
POM.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.*******.*****</groupId>
<artifactId>KafkaSparkCouch</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<java.version>1.8</java.version>
<spark.version>2.2.0</spark.version>
<scala.version>2.11.8</scala.version>
<scala.parent.version>2.11</scala.parent.version>
<kafka.client.version>0.11.0.0</kafka.client.version>
<fat.jar.name>SparkCouch</fat.jar.name>
<scala.binary.version>2.11</scala.binary.version>
<main.class>com.*******.demo.spark.couchbase.SparkReadCouchBaseTest</main.class>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>com.couchbase.client</groupId>
<artifactId>spark-connector_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>${kafka.client.version}</version>
</dependency>
<dependency>
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple</artifactId>
<version>1.1.1</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
<!--Create fat-jar file-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.6</version>
<configuration>
<scalaCompatVersion>${scala.binary.version}</scalaCompatVersion>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
<!-- other settings-->
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>mavencentral</id>
<url>http://repo1.maven.org/maven2/</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
<repository>
<id>scala</id>
<name>Scala Tools</name>
<url>http://scala-tools.org/repo-releases/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala</id>
<name>Scala Tools</name>
<url>http://scala-tools.org/repo-releases/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</pluginRepository>
</pluginRepositories>
<name>KafkaSparkCouch</name>
You will need to set the following couchbase configurations as system properties:
System.setProperty("com.couchbase.connectTimeout", "30000");
System.setProperty("com.couchbase.kvTimeout", "10000");
System.setProperty("com.couchbase.socketConnect", "10000");

Unable to save to HBase using Pheonix from spark

I am trying sample code to save data to HBase from spark DataFrame.
I am not sure where i went wrong but the code is not working for me.
Below is the code, that i tried. I am able to get the RDD for existing table, but could not save it. I tried couple of ways, which i have mentioned.
Code:
import scala.reflect.runtime.universe
import org.apache.hadoop.fs.Path
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SaveMode
case class Person(id: String, name: String)
object PheonixTest extends App {
val conf = new SparkConf;
conf.setMaster("local");
conf.setAppName("test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc);
val hbaseConf = HBaseConfiguration.create()
hbaseConf.set(TableInputFormat.INPUT_TABLE, "table1")
hbaseConf.addResource(new Path("/Users/srini/softwares/hbase-1.1.2/conf/hbase-site.xml"));
import org.apache.phoenix.spark._;
val phDf = sqlContext.phoenixTableAsDataFrame("table1", Array("id", "name"), conf = hbaseConf)
println("===========>>>>>>>>>>>>>>>>>> " + phDf.show());
val rdd = sc.parallelize(Seq("sr,Srini","sr2,Srini2"))
import sqlContext.implicits._;
val df = rdd.map { x => {val array = x.split(","); Person(array(0), array(1))} }.toDF;
//df.write.format("org.apache.phoenix.spark").mode("overwrite") .option("table", "table1").option("zkUrl", "localhost:2181").save()
//df.rdd.saveToP
df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "table1", "zkUrl" -> "localhost:2181"))
sc.stop()
}
Pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.srini.plug</groupId>
<artifactId>data-ingestion</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-xml</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>com.splunk</groupId>
<artifactId>splunk</artifactId>
<version>1.5.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.scalaj</groupId>
<artifactId>scalaj-collection_2.10</artifactId>
<version>1.5</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>12.0</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-spark</artifactId>
<version>4.6.0-HBase-1.1</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.4.1</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>ext-release-local</id>
<url>http://splunk.artifactoryonline.com/splunk/ext-releases-local</url>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.5</source>
<target>1.5</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.5.3</version>
<executions>
<execution>
<id>create-archive</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<descriptorRefs>
<descriptorRef>
jar-with-dependencies
</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.srini.ingest.SplunkSearch</mainClass>
</manifest>
</archive>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Error:
16/01/02 18:26:29 INFO ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x152031ff8da001c, negotiated timeout = 90000
16/01/02 18:27:18 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=48344 ms ago, cancelled=false, msg=
16/01/02 18:27:38 INFO RpcRetryingCaller: Call exception, tries=11, retries=35, started=68454 ms ago, cancelled=false, msg=
16/01/02 18:27:58 INFO RpcRetryingCaller: Call exception, tries=12, retries=35, started=88633 ms ago, cancelled=false, msg=
16/01/02 18:28:19 INFO RpcRetryingCaller: Call exception, tries=13, retries=35, started=108817 ms ago, cancelled=false, msg=
Two issues i notice
Zk url. If you are sure the zookeeper is running locally, update your hosts file with a entry like below and pass the hostname to HBaseConfiguration.
ipaddress hostname
Phoenix by defaults upper cases your table name and columns. So , change the above code as
val phDf = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID", "NAME"), conf = hbaseConf)

Resources