I am trying to consume from pulsar topic. My client code can be seen below.
But, I am facing NoClassDefFoundError. Please let me know how to resolve this.
pulsar client version: 2.10, pulsar server version: 2.10,
installation type: standalone(not docker), OS: Ubuntu 20.04.4 LTS
My pulsar client code:
private static void consumeFromPulsarAsync() throws Exception {
logger.info("consumeFromPulsarAsync()");
PulsarClient client = PulsarClient.
builder()
.serviceUrl("pulsar://localhost:6650")
.build();
logger.info("consumeFromPulsarAsync() client");
Consumer<String> consumer = client.newConsumer(Schema.STRING)
.topic("persistent://public/default/ack-2")
.consumerName("pulsar-consumer-id-" + Math.random())
.subscriptionName("pulsar-subscription-id-" + Math.random())
.subscriptionType(SubscriptionType.Shared).subscribe();
logger.info("consumeFromPulsarAsync() consumer");
consumer.receiveAsync().thenCompose((msg) -> {
logger.info("consumeFromPulsarAsync() consumed msg :: {}", msg.getValue());
try {
consumer.acknowledge(msg);
} catch (PulsarClientException e) {
throw new RuntimeException(e);
}
return null;
});
}
My pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.company</groupId>
<artifactId>my.work.manager</artifactId>
<version>release-1.0</version>
<description>my Worker Manager</description>
<properties>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<flink.version>1.14.3</flink.version>
<pulsar.version>2.10.0</pulsar.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.12</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.12</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-pulsar</artifactId>
<version>1.15.0</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.9</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.22</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20210307</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.32</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.7.30</version>
</dependency>
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>1.30</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
<dependency>
<groupId>org.apache.pulsar</groupId>
<artifactId>pulsar-client</artifactId>
<version>${pulsar.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<compilerArgs>
<arg>-Xlint:all,-options,-path</arg>
</compilerArgs>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.5</version>
<configuration>
<archive>
<manifest>
<mainClass>com.company.my.manager.flink.myWorkManager</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.3.0</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<resources>
<resource>
<directory>src/main/resources</directory>
<targetPath>${project.build.outputDirectory}</targetPath>
<includes>
<include>**/*.yml</include>
</includes>
</resource>
</resources>
</build>
</project>
Error it is showing:
2022-05-10 17:52:52,549 WARN
org.apache.pulsar.client.impl.MultiTopicsConsumerImpl [] -
Encountered error in partition auto update timer task for multi-topic
consumer. Another task will be scheduled.
java.lang.NoClassDefFoundError:
org/apache/pulsar/shade/com/google/common/primitives/Ints at
org.apache.pulsar.shade.com.google.common.collect.Lists.computeArrayListCapacity(Lists.java:152)
~[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at
org.apache.pulsar.shade.com.google.common.collect.Lists.newArrayListWithExpectedSize(Lists.java:192)
~[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at
org.apache.pulsar.client.impl.MultiTopicsConsumerImpl$TopicsPartitionChangedListener.onTopicsExtended(MultiTopicsConsumerImpl.java:1238)
~[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at
org.apache.pulsar.client.impl.MultiTopicsConsumerImpl$1.run(MultiTopicsConsumerImpl.java:1350)
[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at
org.apache.pulsar.shade.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at
org.apache.pulsar.shade.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at
org.apache.pulsar.shade.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at
org.apache.pulsar.shade.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at
org.apache.pulsar.shade.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at
org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[e8fe35f4-c20c-4c3e-9b81-a3ff1a0ebaec_my.work.manager-release-1.0-jar-with-dependencies.jar:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
What type of cluster? Docker, Standalone, K8, ...
Is the Pulsar client and server both 2.10?
What JDK for server? for client?
Any errors on the server?
What server OS?
What client OS?
It says WARN, did it run or crash with the missing class? Sometimes on a MAC you will get a WARN on missing SSL and it will work with warnings.
It may be related to this https://github.com/apache/pulsar/issues/9585
Steps to debug:
Create a new topic.
Try again.
Try to produce and consume with command line client.
Create a single partitioned topic and test with that.
I ran your example with this test
bin/pulsar-client produce --key "test1" "persistent://public/default/ack-2" -m "Test this thing 4" -n 25
--
public void consumeFromPulsarAsync() throws Exception {
PulsarClient client = PulsarClient.
builder()
.serviceUrl("pulsar://localhost:6650")
.build();
Consumer<String> consumer = client.newConsumer(Schema.STRING)
.topic("persistent://public/default/ack-2")
.consumerName("pulsar-consumer-id-" + Math.random())
.subscriptionName("pulsar-subscription-id-" + Math.random())
.subscriptionType(SubscriptionType.Shared).subscribe();
consumer.receiveAsync().thenCompose((msg) -> {
log.info("consumeFromPulsarAsync() consumed msg :: {}", msg.getValue());
try {
consumer.acknowledge(msg);
} catch (PulsarClientException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
return null;
});
}
--
2022-05-11 10:00:47.321 INFO 28575 --- [r-client-io-6-1] o.a.pulsar.client.impl.ConnectionPool : [[id: 0x2005ff4b, L:/192.168.1.63:49386 - R:pulsar1.fios-router.home/192.168.1.230:6650]] Connected to server
2022-05-11 10:00:47.321 INFO 28575 --- [r-client-io-6-1] org.apache.pulsar.client.impl.ClientCnx : [id: 0x2005ff4b, L:/192.168.1.63:49386 - R:pulsar1.fios-router.home/192.168.1.230:6650] Connected through proxy to target broker at 127.0.0.1:6650
2022-05-11 10:00:47.323 INFO 28575 --- [r-client-io-7-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-2][pulsar-subscription-id-0.6345962322446594] Subscribing to topic on cnx [id: 0xf5af925d, L:/192.168.1.63:49385 - R:pulsar1.fios-router.home/192.168.1.230:6650], consumerId 2
2022-05-11 10:00:47.324 INFO 28575 --- [r-client-io-6-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-2][pulsar-subscription-id-0.6808890118894377] Subscribing to topic on cnx [id: 0x2005ff4b, L:/192.168.1.63:49386 - R:pulsar1.fios-router.home/192.168.1.230:6650], consumerId 2
2022-05-11 10:00:47.324 INFO 28575 --- [r-client-io-7-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-1][pulsar-subscription-id-0.6345962322446594] Subscribing to topic on cnx [id: 0xf5af925d, L:/192.168.1.63:49385 - R:pulsar1.fios-router.home/192.168.1.230:6650], consumerId 1
2022-05-11 10:00:47.324 INFO 28575 --- [r-client-io-6-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-1][pulsar-subscription-id-0.6808890118894377] Subscribing to topic on cnx [id: 0x2005ff4b, L:/192.168.1.63:49386 - R:pulsar1.fios-router.home/192.168.1.230:6650], consumerId 1
2022-05-11 10:00:47.324 INFO 28575 --- [r-client-io-7-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-0][pulsar-subscription-id-0.6345962322446594] Subscribing to topic on cnx [id: 0xf5af925d, L:/192.168.1.63:49385 - R:pulsar1.fios-router.home/192.168.1.230:6650], consumerId 0
2022-05-11 10:00:47.324 INFO 28575 --- [r-client-io-6-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-0][pulsar-subscription-id-0.6808890118894377] Subscribing to topic on cnx [id: 0x2005ff4b, L:/192.168.1.63:49386 - R:pulsar1.fios-router.home/192.168.1.230:6650], consumerId 0
2022-05-11 10:00:47.357 INFO 28575 --- [r-client-io-6-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-0][pulsar-subscription-id-0.6808890118894377] Subscribed to topic on pulsar1.fios-router.home/192.168.1.230:6650 -- consumer: 0
2022-05-11 10:00:47.370 INFO 28575 --- [r-client-io-6-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-1][pulsar-subscription-id-0.6808890118894377] Subscribed to topic on pulsar1.fios-router.home/192.168.1.230:6650 -- consumer: 1
2022-05-11 10:00:47.370 INFO 28575 --- [r-client-io-7-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-1][pulsar-subscription-id-0.6345962322446594] Subscribed to topic on pulsar1.fios-router.home/192.168.1.230:6650 -- consumer: 1
2022-05-11 10:00:47.371 INFO 28575 --- [r-client-io-6-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-2][pulsar-subscription-id-0.6808890118894377] Subscribed to topic on pulsar1.fios-router.home/192.168.1.230:6650 -- consumer: 2
2022-05-11 10:00:47.372 INFO 28575 --- [r-client-io-6-1] o.a.p.c.impl.MultiTopicsConsumerImpl : [persistent://public/default/ack-2] [pulsar-subscription-id-0.6808890118894377] Success subscribe new topic persistent://public/default/ack-2 in topics consumer, partitions: 3, allTopicPartitionsNumber: 3
2022-05-11 10:00:47.394 INFO 28575 --- [r-client-io-7-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-0][pulsar-subscription-id-0.6345962322446594] Subscribed to topic on pulsar1.fios-router.home/192.168.1.230:6650 -- consumer: 0
2022-05-11 10:00:47.394 INFO 28575 --- [r-client-io-7-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/ack-2-partition-2][pulsar-subscription-id-0.6345962322446594] Subscribed to topic on pulsar1.fios-router.home/192.168.1.230:6650 -- consumer: 2
2022-05-11 10:00:47.395 INFO 28575 --- [r-client-io-7-1] o.a.p.c.impl.MultiTopicsConsumerImpl : [persistent://public/default/ack-2] [pulsar-subscription-id-0.6345962322446594] Success subscribe new topic persistent://public/default/ack-2 in topics consumer, partitions: 3, allTopicPartitionsNumber: 3
2022-05-11 10:00:52.529 INFO 28575 --- [t-internal-11-1] a.f.a.consumer.AirQualityConsumerApp : consumeFromPulsarAsync() consumed msg :: Test this thing 4
2022-05-11 10:00:52.529 INFO 28575 --- [nt-internal-9-1] a.f.a.consumer.AirQualityConsumerApp : consumeFromPulsarAsync() consumed msg :: Test this thing 4
^C2022-05-11 10:00:54.823 INFO 28575 --- [r-client-io-1-1] o.a.pulsar.client.impl.ConsumerImpl : [persistent://public/default/airquality] [airqualitysj1] Closed consumer
2022-05-11 10:00:54.824 INFO 28575 --- [ionShutdownHook] o.a.pulsar.client.impl.PulsarClientImpl : Client closing. URL: pulsar://pulsar1:6650
2022-05-11 10:00:54.826 INFO 28575 --- [r-client-io-1-1] org.apache.pulsar.client.impl.ClientCnx : [id: 0x4d247f3d, L:/192.168.1.63:49381 ! R:pulsar1.fios-router.home/192.168.1.230:6650] Disconnected
2022-05-11 10:00:54.829 INFO 28575 --- [r-client-io-1-1] org.apache.pulsar.client.impl.ClientCnx : [id: 0x5fa81a8b, L:/192.168.1.63:49382 ! R:pulsar1.fios-router.home/192.168.1.230:6650] Disconnected
Related
I am working on Kafka Spark Streaming. The IDLE doesn't show any errors and the program builds successfully as well but I am getting this error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at KafkaSparkStream1$.main(KafkaSparkStream1.scala:13)
at KafkaSparkStream1.main(KafkaSparkStream1.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
I am using maven. I have also set up my environment variables correctly as every component is working individually My spark version is 3.0.0-preview2, Scala version is 2.12
I have exported a spark-streaming-Kafka jar file.
Here is my pom file:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.org.cpg.casestudy</groupId>
<artifactId>Kafka_casestudy</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<spark.version>3.0.0</spark.version>
<scala.version>2.12</scala.version>
</properties>
<build>
<plugins>
<!-- Maven Compiler Plugin-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<!-- Apache Kafka Clients-->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.5.0</version>
</dependency>
<!-- Apache Kafka Streams-->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>2.5.0</version>
</dependency>
<!-- Apache Log4J2 binding for SLF4J -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.11.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.0-preview2</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.12</artifactId>
<version>3.0.0-preview2</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.12</artifactId>
<version>3.0.0-preview2</version>
</dependency>
</dependencies>
Here is my code (word count of message send by producer):
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.spark._
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.codehaus.jackson.map.deser.std.StringDeserializer
object KafkaSparkStream {
def main(args: Array[String]): Unit = {
val brokers = "localhost:9092";
val groupid = "GRP1";
val topics = "KafkaTesting";
val SparkConf = new SparkConf().setMaster("local[*]").setAppName("KafkaSparkStreaming");
val ssc = new StreamingContext(SparkConf,Seconds(10))
val sc = ssc.sparkContext
sc.setLogLevel("off")
val topicSet = topics.split(",").toSet
val kafkaPramas = Map[String , Object](
ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG -> brokers,
ConsumerConfig.GROUP_ID_CONFIG -> groupid,
ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer],
ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG -> classOf[StringDeserializer]
)
val messages = KafkaUtils.createDirectStream[String,String](
ssc, LocationStrategies.PreferConsistent, ConsumerStrategies.Subscribe[String,String](topicSet,kafkaPramas)
)
val line=messages.map(_.value)
val words = line.flatMap(_.split(" "))
val wordCount = words.map(x=> (x,1)).reduceByKey(_+_)
wordCount.print()
ssc.start()
ssc.awaitTermination()
}
}
Try cleaning your mvn local repository or else run below command to override you dependency JARs from online
mvn clean install -U
Your spark dependencies, specially spark-core_2.12-3.0.0-preview2.jar is not added to your class path while executing the Spark JAR.
you can do it via
spark-submit --jars <path>/spark-core_2.12-3.0.0-preview2.jar
I have been trying to read data from couchbase , but failing to read due to authentication issue.
import com.couchbase.client.java.document.JsonDocument
import org.apache.spark.sql.SparkSession
import com.couchbase.spark._
object SparkRead {
def main(args: Array[String]): Unit = {
// The SparkSession is the main entry point into spark
val spark = SparkSession
.builder()
.appName("KeyValueExample")
.master("local[*]") // use the JVM as the master, great for testing
.config("spark.couchbase.nodes", "***********") // connect to couchbase on hostname
.config("spark.couchbase.bucket.beer-sample","") // open the travel-sample bucket with empty password
.config("spark.couchbase.username", "couchdb")
.config("spark.couchbase.password", "******")
.config("spark.couchbase.connectTimeout","30000")
.config("spark.couchbase.kvTimeout","10000")
.config("spark.couchbase.socketConnect","10000")
.getOrCreate()
spark.sparkContext
.couchbaseGet[com.couchbase.client.java.document.JsonDocument](Seq("airline_10123")) // Load documents from couchbase
.collect() // collect all data from the spark workers
.foreach(println) // print each document content
}
}
Below is the Build File
name := "KafkaSparkCouchReadWrite"
organization := "my.clairvoyant"
version := "1.0.0-SNAPSHOT"
scalaVersion := "2.11.11"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-streaming" % "2.1.0",
"org.apache.spark" %% "spark-sql" % "2.1.0",
"org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0",
"com.couchbase.client" %% "spark-connector" % "2.1.0",
"org.glassfish.hk2" % "hk2-utils" % "2.2.0-b27",
"org.glassfish.hk2" % "hk2-locator" % "2.2.0-b27",
"javax.validation" % "validation-api" % "1.1.0.Final",
"org.apache.kafka" %% "kafka" % "0.11.0.0",
"com.googlecode.json-simple" % "json-simple" % "1.1").map(_.excludeAll(ExclusionRule("org.glassfish.hk2"),ExclusionRule("javax.validation")))
ERROR LOG
17/12/12 15:18:35 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.33.220, 52402, None)
17/12/12 15:18:35 INFO SharedState: Warehouse path is 'file:/Users/sampat/Desktop/GitClairvoyant/cpdl3-poc/KafkaSparkCouchReadWrite/spark-warehouse/'.
17/12/12 15:18:35 INFO CouchbaseCore: CouchbaseEnvironment: {sslEnabled=false, sslKeystoreFile='null', sslKeystorePassword=false, sslKeystore=null, bootstrapHttpEnabled=true, bootstrapCarrierEnabled=true, bootstrapHttpDirectPort=8091, bootstrapHttpSslPort=18091, bootstrapCarrierDirectPort=11210, bootstrapCarrierSslPort=11207, ioPoolSize=8, computationPoolSize=8, responseBufferSize=16384, requestBufferSize=16384, kvServiceEndpoints=1, viewServiceEndpoints=12, queryServiceEndpoints=12, searchServiceEndpoints=12, ioPool=NioEventLoopGroup, kvIoPool=null, viewIoPool=null, searchIoPool=null, queryIoPool=null, coreScheduler=CoreScheduler, memcachedHashingStrategy=DefaultMemcachedHashingStrategy, eventBus=DefaultEventBus, packageNameAndVersion=couchbase-java-client/2.4.2 (git: 2.4.2, core: 1.4.2), dcpEnabled=false, retryStrategy=BestEffort, maxRequestLifetime=75000, retryDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=100, upper=100000}, reconnectDelay=ExponentialDelay{growBy 1.0 MILLISECONDS, powers of 2; lower=32, upper=4096}, observeIntervalDelay=ExponentialDelay{growBy 1.0 MICROSECONDS, powers of 2; lower=10, upper=100000}, keepAliveInterval=30000, autoreleaseAfter=2000, bufferPoolingEnabled=true, tcpNodelayEnabled=true, mutationTokensEnabled=false, socketConnectTimeout=1000, dcpConnectionBufferSize=20971520, dcpConnectionBufferAckThreshold=0.2, dcpConnectionName=dcp/core-io, callbacksOnIoPool=false, disconnectTimeout=25000, requestBufferWaitStrategy=com.couchbase.client.core.env.DefaultCoreEnvironment$2#7b7b3edb, queryTimeout=75000, viewTimeout=75000, kvTimeout=2500, connectTimeout=5000, dnsSrvEnabled=false}
17/12/12 15:18:37 WARN Endpoint: [null][KeyValueEndpoint]: Authentication Failure.
17/12/12 15:18:37 INFO Endpoint: [null][KeyValueEndpoint]: Got notified from Channel as inactive, attempting reconnect.
17/12/12 15:18:37 WARN ResponseStatusConverter: Unknown ResponseStatus with Protocol HTTP: 401
17/12/12 15:18:37 WARN ResponseStatusConverter: Unknown ResponseStatus with Protocol HTTP: 401
Exception in thread "main" com.couchbase.client.java.error.InvalidPasswordException: Passwords for bucket "beer-sample" do not match.
at com.couchbase.client.java.CouchbaseAsyncCluster$OpenBucketErrorHandler.call(CouchbaseAsyncCluster.java:601)
at com.couchbase.client.java.CouchbaseAsyncCluster$OpenBucketErrorHandler.call(CouchbaseAsyncCluster.java:584)
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140)
at rx.internal.operators.OnSubscribeMap$MapSubscriber.onError(OnSubscribeMap.java:88)
Sample Code :
import com.couchbase.client.java.document.JsonDocument
import com.couchbase.spark._
import org.apache.spark.sql.SparkSession
object SparkReadCouchBase {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName("KeyValueExample")
.master("local[*]") // use the JVM as the master, great for testing
.config("spark.couchbase.nodes", "127.0.0.1") // connect to couchbase on hostname
.config("spark.couchbase.bucket.travel-sample","") // open the travel-sample bucket with empty password
.config("com.couchbase.username", "*******")
.config("com.couchbase.password", "*******")
.config("com.couchbase.kvTimeout","10000")
.config("com.couchbase.connectTimeout","30000")
.config("com.couchbase.socketConnect","10000")
.getOrCreate()
println("=====================================================================================")
spark.sparkContext
.couchbaseGet[JsonDocument](Seq("airline_10123")) // Load documents from couchbase
.collect() // collect all data from the spark workers
.foreach(println) // print each document content
println("=====================================================================================")
}
}
POM.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.*******.*****</groupId>
<artifactId>KafkaSparkCouch</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<java.version>1.8</java.version>
<spark.version>2.2.0</spark.version>
<scala.version>2.11.8</scala.version>
<scala.parent.version>2.11</scala.parent.version>
<kafka.client.version>0.11.0.0</kafka.client.version>
<fat.jar.name>SparkCouch</fat.jar.name>
<scala.binary.version>2.11</scala.binary.version>
<main.class>com.*******.demo.spark.couchbase.SparkReadCouchBaseTest</main.class>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>com.couchbase.client</groupId>
<artifactId>spark-connector_${scala.parent.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>${kafka.client.version}</version>
</dependency>
<dependency>
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple</artifactId>
<version>1.1.1</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
<!--Create fat-jar file-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<version>2.15.2</version>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.1.6</version>
<configuration>
<scalaCompatVersion>${scala.binary.version}</scalaCompatVersion>
<scalaVersion>${scala.version}</scalaVersion>
</configuration>
<!-- other settings-->
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>mavencentral</id>
<url>http://repo1.maven.org/maven2/</url>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
<repository>
<id>scala</id>
<name>Scala Tools</name>
<url>http://scala-tools.org/repo-releases/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala</id>
<name>Scala Tools</name>
<url>http://scala-tools.org/repo-releases/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</pluginRepository>
</pluginRepositories>
<name>KafkaSparkCouch</name>
You will need to set the following couchbase configurations as system properties:
System.setProperty("com.couchbase.connectTimeout", "30000");
System.setProperty("com.couchbase.kvTimeout", "10000");
System.setProperty("com.couchbase.socketConnect", "10000");
I want to run a pipeline with Spark runner and data is stored on a remote machine. The following command has been used to submit the job:
./spark-submit --class org.apache.beam.examples.WindowedWordCount --master spark://192.168.1.214:6066 --deploy-mode cluster --supervise --executor-memory 2G --total-executor-cores 4 hdfs://192.168.1.214:9000/input/word-count-ck-0.1.jar --runner=SparkRunner
It is creating a temporary file '.temp-beam-2017-07-184_19-10-19-0' in the output directory.However, it is throwing an IllegalArgumentException while finalizing the write operation(See log below):
17/07/04 00:40:29 INFO Executor: Adding file:/usr/local/spark/spark-1.6.3-bin-hadoop2.6/work/app-20170704004020-0000/1/./word-count-ck-0.1.jar to class loader
17/07/04 00:40:29 INFO TorrentBroadcast: Started reading broadcast variable 0
17/07/04 00:40:29 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 7.4 KB, free 1247.2 MB)
17/07/04 00:40:29 INFO TorrentBroadcast: Reading broadcast variable 0 took 102 ms
17/07/04 00:40:30 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 15.6 KB, free 1247.2 MB)
17/07/04 00:40:30 INFO CacheManager: Partition rdd_0_1 not found, computing it
17/07/04 00:40:31 INFO MemoryStore: Block rdd_0_1 stored as values in memory (estimated size 292.9 KB, free 1246.9 MB)
17/07/04 00:40:33 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 9074 bytes result sent to driver
17/07/04 00:40:34 INFO CoarseGrainedExecutorBackend: Got assigned task 2
17/07/04 00:40:34 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)
17/07/04 00:40:34 INFO MapOutputTrackerWorker: Updating epoch to 1 and clearing cache
17/07/04 00:40:34 INFO TorrentBroadcast: Started reading broadcast variable 1
17/07/04 00:40:34 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 8.4 KB, free 1246.9 MB)
17/07/04 00:40:34 INFO TorrentBroadcast: Reading broadcast variable 1 took 76 ms
17/07/04 00:40:34 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 18.4 KB, free 1246.9 MB)
17/07/04 00:40:34 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 1, fetching them
17/07/04 00:40:34 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker#192.168.1.214:35429)
17/07/04 00:40:34 INFO MapOutputTrackerWorker: Got the output locations
17/07/04 00:40:34 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
17/07/04 00:40:34 INFO ShuffleBlockFetcherIterator: Started 1 remote fetches in 11 ms
17/07/04 00:40:34 INFO WriteFiles: Opening writer 59cdfe11-3fff-4188-b9ca-17fce87d3ee2 for write operation TextWriteOperation{tempDirectory=hdfs://192.168.1.214:9000/beamWorks/ckoutput/.temp-beam-2017-07-184_19-10-19-0/, windowedWrites=true}, window [2017-07-03T19:30:00.000Z..2017-07-03T19:40:00.000Z) pane PaneInfo.NO_FIRING
17/07/04 00:40:34 INFO WriteFiles: Opening writer f69da197-b163-4eee-8456-12cd7435ba8d for write operation TextWriteOperation{tempDirectory=hdfs://192.168.1.214:9000/beamWorks/ckoutput/.temp-beam-2017-07-184_19-10-19-0/, windowedWrites=true}, window [2017-07-03T19:50:00.000Z..2017-07-03T20:00:00.000Z) pane PaneInfo.NO_FIRING
17/07/04 00:40:34 INFO WriteFiles: Opening writer 99089ec2-d54f-492c-8dc6-7dbdb0d6ab8d for write operation TextWriteOperation{tempDirectory=hdfs://192.168.1.214:9000/beamWorks/ckoutput/.temp-beam-2017-07-184_19-10-19-0/, windowedWrites=true}, window [2017-07-03T19:10:00.000Z..2017-07-03T19:20:00.000Z) pane PaneInfo.NO_FIRING
17/07/04 00:40:34 INFO WriteFiles: Opening writer a5f369b3-355b-4db7-b894-c8b35c20b274 for write operation TextWriteOperation{tempDirectory=hdfs://192.168.1.214:9000/beamWorks/ckoutput/.temp-beam-2017-07-184_19-10-19-0/, windowedWrites=true}, window [2017-07-03T19:20:00.000Z..2017-07-03T19:30:00.000Z) pane PaneInfo.NO_FIRING
17/07/04 00:40:34 INFO WriteFiles: Opening writer bd717b0e-af3c-461c-a554-3b472369bc84 for write operation TextWriteOperation{tempDirectory=hdfs://192.168.1.214:9000/beamWorks/ckoutput/.temp-beam-2017-07-184_19-10-19-0/, windowedWrites=true}, window [2017-07-03T19:40:00.000Z..2017-07-03T19:50:00.000Z) pane PaneInfo.NO_FIRING
17/07/04 00:40:34 INFO WriteFiles: Opening writer f0135931-d310-4e31-8b58-caa752e75e6b for write operation TextWriteOperation{tempDirectory=hdfs://192.168.1.214:9000/beamWorks/ckoutput/.temp-beam-2017-07-184_19-10-19-0/, windowedWrites=true}, window [2017-07-03T20:00:00.000Z..2017-07-03T20:10:00.000Z) pane PaneInfo.NO_FIRING
17/07/04 00:40:34 INFO WriteFiles: Opening writer e6a9e90d-ef5d-46d2-8fc2-8b374727e63e for write operation TextWriteOperation{tempDirectory=hdfs://192.168.1.214:9000/beamWorks/ckoutput/.temp-beam-2017-07-184_19-10-19-0/, windowedWrites=true}, window [2017-07-03T20:10:00.000Z..2017-07-03T20:20:00.000Z) pane PaneInfo.NO_FIRING
17/07/04 00:40:35 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 6819 bytes result sent to driver
17/07/04 00:40:35 INFO CoarseGrainedExecutorBackend: Got assigned task 5
17/07/04 00:40:35 INFO Executor: Running task 1.0 in stage 2.0 (TID 5)
17/07/04 00:40:35 INFO MapOutputTrackerWorker: Updating epoch to 2 and clearing cache
17/07/04 00:40:35 INFO TorrentBroadcast: Started reading broadcast variable 2
17/07/04 00:40:35 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 9.8 KB, free 1246.9 MB)
17/07/04 00:40:35 INFO TorrentBroadcast: Reading broadcast variable 2 took 10 ms
17/07/04 00:40:35 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 22.2 KB, free 1246.9 MB)
17/07/04 00:40:35 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 0, fetching them
17/07/04 00:40:35 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker#192.168.1.214:35429)
17/07/04 00:40:35 INFO MapOutputTrackerWorker: Got the output locations
17/07/04 00:40:35 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
17/07/04 00:40:35 INFO ShuffleBlockFetcherIterator: Started 1 remote fetches in 2 ms
17/07/04 00:40:36 INFO WriteFiles: Finalizing write operation TextWriteOperation{tempDirectory=hdfs://192.168.1.214:9000/beamWorks/ckoutput/.temp-beam-2017-07-184_19-10-19-0/, windowedWrites=true}.
17/07/04 00:40:36 ERROR Executor: Exception in task 1.0 in stage 2.0 (TID 5)
org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds have the same scheme, but received hdfs, ck2-19.
at org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:36)
at org.apache.beam.sdk.io.WriteFiles$1$auxiliary$LSdUfeyo.invokeProcessElement(Unknown Source)
at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:197)
at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:158)
at org.apache.beam.runners.spark.translation.DoFnRunnerWithMetrics.processElement(DoFnRunnerWithMetrics.java:64)
at org.apache.beam.runners.spark.translation.SparkProcessContext$ProcCtxtIterator.computeNext(SparkProcessContext.java:165)
at org.apache.beam.runners.spark.repackaged.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:145)
at org.apache.beam.runners.spark.repackaged.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:140)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$32.apply(RDD.scala:912)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Expect srcResourceIds and destResourceIds have the same scheme, but received hdfs, ck2-19.
at org.apache.beam.sdk.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
at org.apache.beam.sdk.io.FileSystems.validateSrcDestLists(FileSystems.java:398)
at org.apache.beam.sdk.io.FileSystems.copy(FileSystems.java:240)
at org.apache.beam.sdk.io.FileBasedSink$WriteOperation.copyToOutputFiles(FileBasedSink.java:641)
at org.apache.beam.sdk.io.FileBasedSink$WriteOperation.finalize(FileBasedSink.java:529)
at org.apache.beam.sdk.io.WriteFiles$1.processElement(WriteFiles.java:539)
The following are the plugins and dependencies i used in my project:
<packaging>jar</packaging>
<properties>
<beam.version>2.0.0</beam.version>
<surefire-plugin.version>2.20</surefire-plugin.version>
</properties>
<repositories>
<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-spark</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-hadoop-file-system</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.3</version>
<scope>runtime</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-flink_2.10</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.10</artifactId>
<version>2.8.8</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-core</artifactId>
<version>${beam.version}</version>
<!-- <exclusions>
<exclusion>
<artifactId>beam-sdks-java-core</artifactId>
</exclusion>
</exclusions> -->
</dependency>
<!-- Adds a dependency on the Beam Google Cloud Platform IO module. -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
<version>${beam.version}</version>
</dependency>
<!-- Dependencies below this line are specific dependencies needed by the examples code. -->
<dependency>
<groupId>com.google.api-client</groupId>
<artifactId>google-api-client</artifactId>
<version>1.22.0</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-bigquery</artifactId>
<version>v2-rev295-1.22.0</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.http-client</groupId>
<artifactId>google-http-client</artifactId>
<version>1.22.0</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-pubsub</artifactId>
<version>v1-rev10-1.22.0</version>
<exclusions>
<!-- Exclude an old version of guava that is being pulled
in by a transitive dependency of google-api-client -->
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava-jdk5</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.4</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>20.0</version>
</dependency>
<!-- Add slf4j API frontend binding with JUL backend -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.14</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-jdk14</artifactId>
<version>1.7.14</version>
<!-- When loaded at runtime this will wire up slf4j to the JUL backend -->
<scope>runtime</scope>
</dependency>
<!-- Hamcrest and JUnit are required dependencies of PAssert,
which is used in the main code of DebuggingWordCount example. -->
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-all</artifactId>
<version>1.3</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-hadoop-common</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-hadoop-file-system</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-hadoop-input-format</artifactId>
<version>${beam.version}</version>
</dependency>
<!-- The DirectRunner is needed for unit tests. -->
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-direct-java</artifactId>
<version>${beam.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.0.0-alpha2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>${surefire-plugin.version}</version>
<configuration>
<parallel>all</parallel>
<threadCount>4</threadCount>
<redirectTestOutputToFile>true</redirectTestOutputToFile>
</configuration>
<dependencies>
<dependency>
<groupId>org.apache.maven.surefire</groupId>
<artifactId>surefire-junit47</artifactId>
<version>${surefire-plugin.version}</version>
</dependency>
</dependencies>
</plugin>
<!-- Ensure that the Maven jar plugin runs before the Maven
shade plugin by listing the plugin higher within the file. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
</plugin>
<!--
Configures `mvn package` to produce a bundled jar ("fat jar") for runners
that require this for job submission to a cluster.
-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/LICENSE</exclude>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.4.0</version>
<configuration>
<cleanupDaemonThreads>false</cleanupDaemonThreads>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
The following is the source code of the WindowedWordCount class:
package org.apache.beam.examples;
import java.io.IOException;
import java.util.concurrent.ThreadLocalRandom;
import org.apache.beam.examples.common.ExampleBigQueryTableOptions;
import org.apache.beam.examples.common.ExampleOptions;
import org.apache.beam.examples.common.WriteOneFilePerWindow;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.Default;
import org.apache.beam.sdk.options.DefaultValueFactory;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.ParDo;
import org.apache.beam.sdk.transforms.windowing.FixedWindows;
import org.apache.beam.sdk.transforms.windowing.Window;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.PCollection;
import org.joda.time.Duration;
import org.joda.time.Instant;
public class WindowedWordCount {
static final int WINDOW_SIZE = 10; // Default window duration in minutes
static class AddTimestampFn extends DoFn<String, String> {
private static final Duration RAND_RANGE = Duration.standardHours(1);
private final Instant minTimestamp;
private final Instant maxTimestamp;
AddTimestampFn(Instant minTimestamp, Instant maxTimestamp) {
this.minTimestamp = minTimestamp;
this.maxTimestamp = maxTimestamp;
}
#ProcessElement
public void processElement(ProcessContext c) {
Instant randomTimestamp =
new Instant(
ThreadLocalRandom.current()
.nextLong(minTimestamp.getMillis(), maxTimestamp.getMillis()));
/**
* Concept #2: Set the data element with that timestamp.
*/
c.outputWithTimestamp(c.element(), new Instant(randomTimestamp));
}
}
/** A {#link DefaultValueFactory} that returns the current system time. */
public static class DefaultToCurrentSystemTime implements DefaultValueFactory<Long> {
// #Override
public Long create(PipelineOptions options) {
return System.currentTimeMillis();
}
}
/** A {#link DefaultValueFactory} that returns the minimum timestamp plus one hour. */
public static class DefaultToMinTimestampPlusOneHour implements DefaultValueFactory<Long> {
// #Override
public Long create(PipelineOptions options) {
return options.as(Options.class).getMinTimestampMillis()
+ Duration.standardHours(1).getMillis();
}
}
/**
* Options for {#link WindowedWordCount}.
*
* <p>Inherits standard example configuration options, which allow specification of the
* runner, as well as the {#link WordCount.WordCountOptions} support for
* specification of the input and output files.
*/
public interface Options extends WordCount.WordCountOptions,
ExampleOptions, ExampleBigQueryTableOptions {
#Description("Fixed window duration, in minutes")
#Default.Integer(WINDOW_SIZE)
Integer getWindowSize();
void setWindowSize(Integer value);
#Description("Minimum randomly assigned timestamp, in milliseconds-since-epoch")
#Default.InstanceFactory(DefaultToCurrentSystemTime.class)
Long getMinTimestampMillis();
void setMinTimestampMillis(Long value);
#Description("Maximum randomly assigned timestamp, in milliseconds-since-epoch")
#Default.InstanceFactory(DefaultToMinTimestampPlusOneHour.class)
Long getMaxTimestampMillis();
void setMaxTimestampMillis(Long value);
#Description("Fixed number of shards to produce per window, or null for runner-chosen sharding")
Integer getNumShards();
void setNumShards(Integer numShards);
}
public static void main(String[] args) throws IOException {
String[] args1 =new String[]{ "--hdfsConfiguration=[{\"fs.defaultFS\" : \"hdfs://192.168.1.214:9000\"}]","--runner=SparkRunner"};
Options options = PipelineOptionsFactory.fromArgs(args1).withValidation().as(Options.class);
final String output = options.getOutput();
final Instant minTimestamp = new Instant(options.getMinTimestampMillis());
final Instant maxTimestamp = new Instant(options.getMaxTimestampMillis());
Pipeline pipeline = Pipeline.create(options);
/**
* Concept #1: the Beam SDK lets us run the same pipeline with either a bounded or
* unbounded input source.
*/
PCollection<String> input = pipeline
/** Read from the GCS file. */
.apply(TextIO.read().from(options.getInputFile()))
// Concept #2: Add an element timestamp, using an artificial time just to show windowing.
// See AddTimestampFn for more detail on this.
.apply(ParDo.of(new AddTimestampFn(minTimestamp, maxTimestamp)));
/**
* Concept #3: Window into fixed windows. The fixed window size for this example defaults to 1
* minute (you can change this with a command-line option). See the documentation for more
* information on how fixed windows work, and for information on the other types of windowing
* available (e.g., sliding windows).
*/
PCollection<String> windowedWords =
input.apply(
Window.<String>into(
FixedWindows.of(Duration.standardMinutes(options.getWindowSize()))));
/**
* Concept #4: Re-use our existing CountWords transform that does not have knowledge of
* windows over a PCollection containing windowed values.
*/
PCollection<KV<String, Long>> wordCounts = windowedWords.apply(new WordCount.CountWords());
/**
* Concept #5: Format the results and write to a sharded file partitioned by window, using a
* simple ParDo operation. Because there may be failures followed by retries, the
* writes must be idempotent, but the details of writing to files is elided here.
*/
wordCounts
.apply(MapElements.via(new WordCount.FormatAsTextFn()))
.apply(new WriteOneFilePerWindow(output, options.getNumShards()));
PipelineResult result = pipeline.run();
try {
result.waitUntilFinish();
} catch (Exception exc) {
result.cancel();
}
}
}
I am trying sample code to save data to HBase from spark DataFrame.
I am not sure where i went wrong but the code is not working for me.
Below is the code, that i tried. I am able to get the RDD for existing table, but could not save it. I tried couple of ways, which i have mentioned.
Code:
import scala.reflect.runtime.universe
import org.apache.hadoop.fs.Path
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SaveMode
case class Person(id: String, name: String)
object PheonixTest extends App {
val conf = new SparkConf;
conf.setMaster("local");
conf.setAppName("test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc);
val hbaseConf = HBaseConfiguration.create()
hbaseConf.set(TableInputFormat.INPUT_TABLE, "table1")
hbaseConf.addResource(new Path("/Users/srini/softwares/hbase-1.1.2/conf/hbase-site.xml"));
import org.apache.phoenix.spark._;
val phDf = sqlContext.phoenixTableAsDataFrame("table1", Array("id", "name"), conf = hbaseConf)
println("===========>>>>>>>>>>>>>>>>>> " + phDf.show());
val rdd = sc.parallelize(Seq("sr,Srini","sr2,Srini2"))
import sqlContext.implicits._;
val df = rdd.map { x => {val array = x.split(","); Person(array(0), array(1))} }.toDF;
//df.write.format("org.apache.phoenix.spark").mode("overwrite") .option("table", "table1").option("zkUrl", "localhost:2181").save()
//df.rdd.saveToP
df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "table1", "zkUrl" -> "localhost:2181"))
sc.stop()
}
Pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.srini.plug</groupId>
<artifactId>data-ingestion</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-xml</artifactId>
<version>2.4.4</version>
</dependency>
<dependency>
<groupId>com.splunk</groupId>
<artifactId>splunk</artifactId>
<version>1.5.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.scalaj</groupId>
<artifactId>scalaj-collection_2.10</artifactId>
<version>1.5</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>12.0</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-spark</artifactId>
<version>4.6.0-HBase-1.1</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.4.1</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>ext-release-local</id>
<url>http://splunk.artifactoryonline.com/splunk/ext-releases-local</url>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<groupId>org.scala-tools</groupId>
<artifactId>maven-scala-plugin</artifactId>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.5</source>
<target>1.5</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.5.3</version>
<executions>
<execution>
<id>create-archive</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<descriptorRefs>
<descriptorRef>
jar-with-dependencies
</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>com.srini.ingest.SplunkSearch</mainClass>
</manifest>
</archive>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Error:
16/01/02 18:26:29 INFO ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x152031ff8da001c, negotiated timeout = 90000
16/01/02 18:27:18 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=48344 ms ago, cancelled=false, msg=
16/01/02 18:27:38 INFO RpcRetryingCaller: Call exception, tries=11, retries=35, started=68454 ms ago, cancelled=false, msg=
16/01/02 18:27:58 INFO RpcRetryingCaller: Call exception, tries=12, retries=35, started=88633 ms ago, cancelled=false, msg=
16/01/02 18:28:19 INFO RpcRetryingCaller: Call exception, tries=13, retries=35, started=108817 ms ago, cancelled=false, msg=
Two issues i notice
Zk url. If you are sure the zookeeper is running locally, update your hosts file with a entry like below and pass the hostname to HBaseConfiguration.
ipaddress hostname
Phoenix by defaults upper cases your table name and columns. So , change the above code as
val phDf = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID", "NAME"), conf = hbaseConf)
So I'm having trouble with log4j and hbm2ddl.
When I put an SMTPAppender in my log4j.xml I get this ClaasNotFoundException.
Any Hints on how to solve this?
These are my config files and the stacktrace:
stackctrace:
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building Unnamed - Mail-logging-and-hbm2ddl:Mail-logging-and-hbm2ddl:jar:1.0
[INFO] task-segment: [package]
[INFO] ------------------------------------------------------------------------
[INFO] [resources:resources {execution: default-resources}]
[WARNING] Using platform encoding (windows-1252 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 3 resources
[INFO] Copying 2 resources
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Nothing to compile - all classes are up to date
[INFO] Preparing hibernate3:hbm2ddl
[WARNING] Removing: hbm2ddl from forked lifecycle, to prevent recursive invocation.
[INFO] [resources:resources {execution: default-resources}]
[WARNING] Using platform encoding (windows-1252 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 3 resources
[INFO] Copying 2 resources
[INFO] [hibernate3:hbm2ddl {execution: default}]
[INFO] Configuration XML file loaded: file:/D:/DEV/PROJECTS/Mail%20logging%20and%20hbm2ddl/src/main/resources/hibernate.cfg.xml
[FATAL ERROR] org.codehaus.mojo.hibernate3.exporter.Hbm2DDLExporterMojo#execute() caused a linkage error (java.lang.NoClassDefFoundError) and may be out-of-date. Check the realms:
[FATAL ERROR] Plugin realm = app0.child-container[org.codehaus.mojo:hibernate3-maven-plugin:2.2]
urls[0] = file:/d:/Settings/U190552/.m2/repository/org/codehaus/mojo/hibernate3-maven-plugin/2.2/hibernate3-maven-plugin-2.2.jar
urls[1] = file:/d:/Settings/U190552/.m2/repository/log4j/log4j/1.2.14/log4j-1.2.14.jar
urls[2] = file:/d:/Settings/U190552/.m2/repository/org/hibernate/hibernate-tools/3.2.3.GA/hibernate-tools-3.2.3.GA.jar
urls[3] = file:/d:/Settings/U190552/.m2/repository/org/beanshell/bsh/2.0b4/bsh-2.0b4.jar
urls[4] = file:/d:/Settings/U190552/.m2/repository/freemarker/freemarker/2.3.8/freemarker-2.3.8.jar
urls[5] = file:/d:/Settings/U190552/.m2/repository/org/hibernate/jtidy/r8-20060801/jtidy-r8-20060801.jar
urls[6] = file:/d:/Settings/U190552/.m2/repository/org/hibernate/hibernate-core/3.3.1.GA/hibernate-core-3.3.1.GA.jar
urls[7] = file:/d:/Settings/U190552/.m2/repository/antlr/antlr/2.7.6/antlr-2.7.6.jar
urls[8] = file:/d:/Settings/U190552/.m2/repository/commons-collections/commons-collections/3.1/commons-collections-3.1.jar
urls[9] = file:/d:/Settings/U190552/.m2/repository/dom4j/dom4j/1.6.1/dom4j-1.6.1.jar
urls[10] = file:/d:/Settings/U190552/.m2/repository/xml-apis/xml-apis/1.0.b2/xml-apis-1.0.b2.jar
urls[11] = file:/d:/Settings/U190552/.m2/repository/org/slf4j/slf4j-api/1.5.6/slf4j-api-1.5.6.jar
urls[12] = file:/d:/Settings/U190552/.m2/repository/org/codehaus/mojo/hibernate3/maven-hibernate3-api/2.2/maven-hibernate3-api-2.2.jar
urls[13] = file:/d:/Settings/U190552/.m2/repository/org/codehaus/plexus/plexus-utils/1.1/plexus-utils-1.1.jar
urls[14] = file:/d:/Settings/U190552/.m2/repository/org/apache/geronimo/specs/geronimo-jta_1.0.1B_spec/1.1.1/geronimo-jta_1.0.1B_spec-1.1.1.jar
urls[15] = file:/d:/Settings/U190552/.m2/repository/org/slf4j/slf4j-log4j12/1.5.6/slf4j-log4j12-1.5.6.jar
urls[16] = file:/d:/Settings/U190552/.m2/repository/org/slf4j/jcl-over-slf4j/1.5.6/jcl-over-slf4j-1.5.6.jar
urls[17] = file:/d:/Settings/U190552/.m2/repository/org/codehaus/mojo/hibernate3/maven-hibernate3-jdk14/2.2/maven-hibernate3-jdk14-2.2.jar
urls[18] = file:/d:/Settings/U190552/.m2/repository/org/codehaus/mojo/hibernate3/maven-hibernate3-jdk15/2.2/maven-hibernate3-jdk15-2.2.jar
urls[19] = file:/d:/Settings/U190552/.m2/repository/org/hibernate/hibernate-entitymanager/3.4.0.GA/hibernate-entitymanager-3.4.0.GA.jar
urls[20] = file:/d:/Settings/U190552/.m2/repository/org/hibernate/ejb3-persistence/1.0.2.GA/ejb3-persistence-1.0.2.GA.jar
urls[21] = file:/d:/Settings/U190552/.m2/repository/org/hibernate/hibernate-commons-annotations/3.1.0.GA/hibernate-commons-annotations-3.1.0.GA.jar
urls[22] = file:/d:/Settings/U190552/.m2/repository/org/hibernate/hibernate-annotations/3.4.0.GA/hibernate-annotations-3.4.0.GA.jar
urls[23] = file:/d:/Settings/U190552/.m2/repository/javax/transaction/jta/1.1/jta-1.1.jar
urls[24] = file:/d:/Settings/U190552/.m2/repository/javassist/javassist/3.4.GA/javassist-3.4.GA.jar
urls[25] = file:/d:/Settings/U190552/.m2/repository/jboss/jboss-common/4.0.2/jboss-common-4.0.2.jar
urls[26] = file:/d:/Settings/U190552/.m2/repository/slide/webdavlib/2.0/webdavlib-2.0.jar
urls[27] = file:/d:/Settings/U190552/.m2/repository/xerces/xercesImpl/2.6.2/xercesImpl-2.6.2.jar
[FATAL ERROR] Container realm = plexus.core
urls[0] = file:/D:/DEV/TOOLS/apache-maven-2.2.1/lib/maven-2.2.1-uber.jar
[INFO] ------------------------------------------------------------------------
[ERROR] FATAL ERROR
[INFO] ------------------------------------------------------------------------
[INFO] javax/mail/internet/AddressException
javax.mail.internet.AddressException
[INFO] ------------------------------------------------------------------------
[INFO] Trace
java.lang.NoClassDefFoundError: javax/mail/internet/AddressException
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389)
at java.lang.Class.getConstructor0(Class.java:2699)
at java.lang.Class.newInstance0(Class.java:326)
at java.lang.Class.newInstance(Class.java:308)
at org.apache.log4j.xml.DOMConfigurator.parseAppender(DOMConfigurator.java:174)
at org.apache.log4j.xml.DOMConfigurator.findAppenderByName(DOMConfigurator.java:150)
at org.apache.log4j.xml.DOMConfigurator.findAppenderByReference(DOMConfigurator.java:163)
at org.apache.log4j.xml.DOMConfigurator.parseChildrenOfLoggerElement(DOMConfigurator.java:425)
at org.apache.log4j.xml.DOMConfigurator.parseRoot(DOMConfigurator.java:394)
at org.apache.log4j.xml.DOMConfigurator.parse(DOMConfigurator.java:829)
at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:712)
at org.apache.log4j.xml.DOMConfigurator.doConfigure(DOMConfigurator.java:618)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:470)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:122)
at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:73)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:209)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:221)
at org.hibernate.cfg.Configuration.<clinit>(Configuration.java:151)
at org.codehaus.mojo.hibernate3.configuration.AnnotationComponentConfiguration.createConfiguration(AnnotationComponentConfiguration.java:93)
at org.codehaus.mojo.hibernate3.configuration.AbstractComponentConfiguration.getConfiguration(AbstractComponentConfiguration.java:51)
at org.codehaus.mojo.hibernate3.exporter.Hbm2DDLExporterMojo.doExecute(Hbm2DDLExporterMojo.java:87)
at org.codehaus.mojo.hibernate3.HibernateExporterMojo.execute(HibernateExporterMojo.java:152)
at org.apache.maven.plugin.DefaultPluginManager.executeMojo(DefaultPluginManager.java:490)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoals(DefaultLifecycleExecutor.java:694)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalWithLifecycle(DefaultLifecycleExecutor.java:556)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoal(DefaultLifecycleExecutor.java:535)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeGoalAndHandleFailures(DefaultLifecycleExecutor.java:387)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.executeTaskSegments(DefaultLifecycleExecutor.java:348)
at org.apache.maven.lifecycle.DefaultLifecycleExecutor.execute(DefaultLifecycleExecutor.java:180)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:328)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:138)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:362)
at org.apache.maven.cli.compat.CompatibleMain.main(CompatibleMain.java:60)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.codehaus.classworlds.Launcher.launchEnhanced(Launcher.java:315)
at org.codehaus.classworlds.Launcher.launch(Launcher.java:255)
at org.codehaus.classworlds.Launcher.mainWithExitCode(Launcher.java:430)
at org.codehaus.classworlds.Launcher.main(Launcher.java:375)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
Caused by: java.lang.ClassNotFoundException: javax.mail.internet.AddressException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at org.codehaus.classworlds.RealmClassLoader.loadClassDirect(RealmClassLoader.java:195)
at org.codehaus.classworlds.DefaultClassRealm.loadClass(DefaultClassRealm.java:255)
at org.codehaus.classworlds.DefaultClassRealm.loadClass(DefaultClassRealm.java:274)
at org.codehaus.classworlds.RealmClassLoader.loadClass(RealmClassLoader.java:214)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
... 47 more
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 2 seconds
[INFO] Finished at: Fri Dec 31 11:42:20 CET 2010
[INFO] Final Memory: 10M/24M
[INFO] ------------------------------------------------------------------------
log4j.xml
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">
<appender name="email" class="org.apache.log4j.net.SMTPAppender">
<param name="Threshold" value="error" />
<param name="BufferSize" value="10" />
<param name="SMTPHost" value="smtp.host" />
<param name="From" value="site#domain.com" />
<param name="To" value="CDB#mail" />
<param name="Subject" value="[Site] Error - TST" />
<param name="LocationInfo" value="false" />
<layout class="org.apache.log4j.HTMLLayout">
<param name="LocationInfo" value="false" />
</layout>
</appender>
<root>
<priority value="DEBUG" />
<appender-ref ref="email" />
</root>
</log4j:configuration>
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>Mail-logging-and-hbm2ddl</groupId>
<artifactId>Mail-logging-and-hbm2ddl</artifactId>
<version>1.0</version>
<dependencies>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.16</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate</artifactId>
<version>3.2.6.ga</version>
<exclusions>
<!-- We need a higher version of ehcache -->
<exclusion>
<groupId>net.sf.ehcache</groupId>
<artifactId>ehcache</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
<version>1.4</version>
</dependency>
<dependency>
<groupId>commons-dbcp</groupId>
<artifactId>commons-dbcp</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-beans</artifactId>
<version>2.5.4</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-aop</artifactId>
<version>2.5.4</version>
</dependency>
</dependencies>
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>false</filtering>
</resource>
<resource>
<directory>src/main/resources-${targetprofile}</directory>
<filtering>false</filtering>
</resource>
</resources>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.0.2</version>
<configuration>
<source>${javaVersion}</source>
<target>${javaVersion}</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>hibernate3-maven-plugin</artifactId>
<version>2.2</version>
<executions>
<execution>
<phase>process-classes</phase>
<goals>
<goal>hbm2ddl</goal>
</goals>
</execution>
</executions>
<configuration>
<componentProperties>
<propertyfile>
src/main/resources-${targetprofile}/configuration.properties
</propertyfile>
<export>false</export>
<drop>true</drop>
<outputfilename>
${project.artifactId}-${project.version}-schema.sql
</outputfilename>
</componentProperties>
</configuration>
</plugin>
</plugins>
</build>
<properties>
<javaVersion>1.6</javaVersion>
</properties>
</project>
This other question gave me the answer answer
So the anwser is to add the javax.mail dependency to the plugin as follows :
<build>
....
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>hibernate3-maven-plugin</artifactId>
<version>2.0</version>
<executions>
<execution>
<phase>process-classes</phase>
<goals>
<goal>hbm2ddl</goal>
</goals>
</execution>
</executions>
<configuration>
<componentProperties>
<propertyfile>
src/main/resources-${targetprofile}/configuration.properties
</propertyfile>
<export>false</export>
<drop>true</drop>
<outputfilename>
${project.artifactId}-${project.version}-schema.sql
</outputfilename>
</componentProperties>
</configuration>
<dependencies>
<dependency>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
<version>1.4.3</version>
</dependency>
</dependencies>
</plugin>
From the error message as well as the debug log, it appears that javax.mail dependency is not part of the dependencies that are present when hbm2ddl is run. Since the contents above are poorly formatted and possibly incomplete, it is difficult to say why. One possibility is javax.mail dependency is not included. Or if yes, included with incorrect (say runtime) scope.
You could try running the goal removing the SMTPAppender from log4j.xml to see if it works. This will help narrow down the problem.