Datastax java driver 4.5 has a lot of dependencies.
Is tinkerpop required to use Datastax java driver to connect to a cassandra database ?
Tinkerpop dependency is required only when you're working with DataStax Graph. As documentation states, you can exclude it:
The driver has a non-optional dependency on that library, but if your application does not use graph at all, it is possible to exclude it to minimize the number of runtime dependencies (see the Integration>Driver dependencies section for more details).
and linked documentation shows driver declaration as:
<dependency>
<groupId>com.datastax.oss</groupId>
<artifactId>java-driver-core</artifactId>
<version>${driver.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.tinkerpop</groupId>
<artifactId>gremlin-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.tinkerpop</groupId>
<artifactId>tinkergraph-gremlin</artifactId>
</exclusion>
</exclusions>
</dependency>
Related
I am new to hazelcast and want to use the replicated map to share data between two microservices. Using spring-boot-starter cache with following two modules. Is it open source/free ?
`enter code here`<!-- Core hazelcast module -->
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast</artifactId>
</dependency>
<!-- hazelcast-spring -->
<dependency>
<groupId>com.hazelcast</groupId>
<artifactId>hazelcast-spring</artifactId>
</dependency>
Please help
Short answer: Yes, it is.
Long answer: It's Apache as the LICENSE file on GitHub clearly states.
We have a cdap application to connection to phoenix table from spark using phoenix driver. I have the phoenix version 4.7 in our environment. As per the standard spark2 phoenix connectivity, it requires only the phoenix-spark2 as a dependency and all other dependencies will be picked up from the classpath and hbase-site.xml properties.
Now what are the dependencies required by cdap spark phoenix application and how can i use hbase-site.xml with the cadp application to make the successful connection.
This is an answer for Spark-version and not CDAP, if someone lands here could use this maybe.
I currently use Phoenix version 4.7 and spark version 2.3 in production,
I have the following dependencies related to Phoenix in my pom.xml
<phoenix-version>4.7</phoenix-version>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-spark2</artifactId>
<version>4.7.0.2.6.5.3007-3</version>
<exclusions>
<exclusion>
<groupId>sqlline</groupId>
<artifactId>sqlline</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-client</artifactId>
<version>4.14.1-HBase-1.1</version>
</dependency>
Also, say for example I want to retrieve a table from Phoenix into Spark Dataframe I would use the following Spark code:
val sqlContext = spark.sqlContext
val table = sqlContext.load("org.apache.phoenix.spark",
Map("table" -> s"NAMESPACE.TABLE_NAME",
"zkUrl" -> zookeeperUrl))
Let me know if this doesn't work out
I have read that Google Cloud Dataflow pipelines, which are based on Apache Beam SDK, can be run with Spark or Flink.
I have some dataflow pipelines currently running on GCP using default Cloud Dataflow runner and I want to run it using Spark runner but I don't know how to.
Is there any documentation or guide about how to do this? Any pointers will help.
Thanks.
I'll assume you're using Java but the equivalent process applies with Python.
You need to migrate your pipeline to use the Apache Beam SDK, replacing your Google Dataflow SDK dependency with:
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-core</artifactId>
<version>2.4.0</version>
</dependency>
Then add the dependency for the runner you wish to use:
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-runners-spark</artifactId>
<version>2.4.0</version>
</dependency>
And add the --runner=spark to specify that this runner should be used when submitting the pipeline.
See https://beam.apache.org/documentation/runners/capability-matrix/ for the full list of runners and comparison of their capabilities.
Thanks to multiple tutorials and documentation scattered all over the web, I was finally able to have a coherent idea about how to use spark runner with any Beam SDK based pipeline.
I have documented entire process here for future reference: http://opreview.blogspot.com/2018/07/running-apache-beam-pipeline-using.html.
I am using redis client in spark job and getting an exception
java.lang.NoSuchMethodError: io.netty.bootstrap.Bootstrap.config()Lio/netty/bootstrap/BootstrapConfig;
at org.redisson.client.RedisClient$1$1.operationComplete(RedisClient.java:234)
Its due to netty version mismatch
Spark used netty version netty-buffer=netty-buffer-4.0.23 but redis needs 4.1 , Is it possible to override netty jar in spark-submit command for both driver and executor .
It depends on how you are assembling your project.
Basically we create a fat jar containing all dependencies inside with maven-shade-plugin or maven-assembly-plugin. So to avoid this issue you can specify relocation in the shade plugin configuration. It looks something like this:
<relocations>
<relocation>
<pattern>io.netty</pattern>
<shadedPattern>your.prefix.io.netty</shadedPattern>
</relocation>
</relocations>
Can I read messages from Kafka without Spark Streaming? I mean only with Spark Core library for the batch processing purpose.
If yes can you please show some examples how to do it. I am using HDP 2.4, Kafka 0.9 and Spark 1.6.
There is a class called KafkaUtils in spark streaming kafka api.
https://github.com/apache/spark/blob/master/external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
From this class you can use a method createRDD, which is basically expecting offsets and it is useful only for non-streaming applications.
Dependency jar:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.0</version>
</dependency>
Also, check Kafka Connect, for example you want to read Kafka topic data and populate the data in HDFS, its very simple using Kafka Connect.
http://docs.confluent.io/3.0.0/connect/
http://www.confluent.io/product/connectors/