build failed for datastax spark-cassandra connector - apache-spark

I was trying to build the spark-cassandra connector and followed this link :
http://www.planetcassandra.org/blog/kindling-an-introduction-to-spark-with-cassandra/
Which further in the link asks to download the connector from git and build using sbt. But, when i try to run the command ./sbt/sbt assembly. It throws the following exception :
Launching sbt from sbt/sbt-launch-0.13.8.jar
[info] Loading project definition from /home/naresh/Desktop/spark-cassandra-connector/project
Using releases: https://oss.sonatype.org/service/local/staging/deploy/maven2 for releases
Using snapshots: https://oss.sonatype.org/content/repositories/snapshots for snapshots
Scala: 2.10.5 [To build against Scala 2.11 use '-Dscala-2.11=true']
Scala Binary: 2.10
Java: target=1.7 user=1.7.0_79
[info] Set current project to root (in build file:/home/naresh/Desktop/spark-cassandra-connector/)
[warn] Credentials file /home/hduser/.ivy2/.credentials does not exist
[warn] Credentials file /home/hduser/.ivy2/.credentials does not exist
[warn] Credentials file /home/hduser/.ivy2/.credentials does not exist
[warn] Credentials file /home/hduser/.ivy2/.credentials does not exist
[warn] Credentials file /home/hduser/.ivy2/.credentials does not exist
[info] Compiling 140 Scala sources and 1 Java source to /home/naresh/Desktop/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/classes...
[error] /home/naresh/Desktop/spark-cassandra-connector/spark-cassandra-connector/src/main/scala/org/apache/spark/sql/cassandra/CassandraCatalog.scala:48: not found: value processTableIdentifier
[error] val id = processTableIdentifier(tableIdentifier).reverse.lift
[error] ^
[error] /home/naresh/Desktop/spark-cassandra-connector/spark-cassandra-connector/src/main/scala/org/apache/spark/sql/cassandra/CassandraCatalog.scala:134: value toSeq is not a member of org.apache.spark.sql.catalyst.TableIdentifier
[error] cachedDataSourceTables.refresh(tableIdent.toSeq)
[error] ^
[error] /home/naresh/Desktop/spark-cassandra-connector/spark-cassandra-connector/src/main/scala/org/apache/spark/sql/cassandra/CassandraSQLContext.scala:94: not found: value BroadcastNestedLoopJoin
[error] BroadcastNestedLoopJoin
[error] ^
[error] three errors found
[info] Compiling 11 Scala sources to /home/naresh/Desktop/spark-cassandra-connector/spark-cassandra-connector-embedded/target/scala-2.10/classes...
[warn] /home/naresh/Desktop/spark-cassandra-connector/spark-cassandra-connector-embedded/src/main/scala/com/datastax/spark/connector/embedded/SparkTemplate.scala:69: value actorSystem in class SparkEnv is deprecated: Actor system is no longer supported as of 1.4.0
[warn] def actorSystem: ActorSystem = SparkEnv.get.actorSystem
[warn] ^
[warn] one warning found
[error] (spark-cassandra-connector/compile:compileIncremental) Compilation failed
[error] Total time: 27 s, completed 4 Nov, 2015 12:34:33 PM

This works for me,
run mvn -DskipTests clean package
you can find build spark command in README.md file from your spark Dir.
Before run that command You’ll need to configure Maven to use more
memory than usual by setting MAVEN_OPTS
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

Related

Failure to operate liquibase checksum - server time - with Jhipster project

I want to operate basic liquibase operations with maven in a Jhipster project. I'm using nvm 10.16.1 and Jhipster 6.2.0. I get a message error.
I checked the if the precognized line for server time error was in my app-dev and yes it is.
url: jdbc:mysql://localhost:3306/project?useUnicode=true&characterEncoding=utf8&useSSL=false&useLegacyDatetimeCode=false&serverTimezone=UTC
Here is the message error:
INFO] Starting Liquibase at lun., 12 août 2019 10:54:40 CEST (version 3.6.3 built at 2019-01-29 11:34:48)
[INFO] Settings
----------------------------
[INFO] driver: com.mysql.cj.jdbc.Driver
[INFO] url: jdbc:mysql://localhost:3306/project
[INFO] username: root
[INFO] password: *****
[INFO] use empty password: false
[INFO] properties file: null
[INFO] properties file will override? false
[INFO] prompt on non-local database? true
[INFO] clear checksums? false
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 4.394 s
[INFO] Finished at: 2019-08-12T10:54:41+02:00
[INFO] Final Memory: 31M/353M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.liquibase:liquibase-maven- plugin:3.6.3:clearCheckSums (default-cli) on project project: Error setting up or running Liquibase: liquibase.exception.DatabaseException: java.sql.SQLException: The server time zone value 'Paris, Madrid (heure d��t�)' is unrecognized or represents more than one time zone. You must configure either the server or JDBC driver (via the serverTimezone configuration property) to use a more specifc time zone value if you want to utilize time zone support. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
Process finished with exit code 1
TRY 2 : put the url in pom xml
<groupId>org.liquibase</groupId>
<artifactId>liquibase-maven-plugin</artifactId>
<version>${liquibase.version}</version>
<configuration>
<changeLogFile>${project.basedir}/src/main/resources/config/liquibase/master.xml</changeLogFile>
<diffChangeLogFile>${project.basedir}/src/main/resources/config/liquibase/changelog/${maven.build.timestamp}_changelog.xml</diffChangeLogFile>
<driver>com.mysql.cj.jdbc.Driver</driver>
<url>jdbc:mysql://localhost:3306/project?useUnicode=true&characterEncoding=utf8&useSSL=false&useLegacyDatetimeCode=false&serverTimezone=UTC</url>
<defaultSchemaName>project</defaultSchemaName>
<username>root</username>
<password>*******</password>
<referenceUrl>hibernate:spring:fr.project.domain?dialect=org.hibernate.dialect.MySQL5InnoDBDialect&hibernate.physical_naming_strategy=org.springframework.boot.orm.jpa.hibernate.SpringPhysicalNamingStrategy&hibernate.implicit_naming_strategy=org.springframework.boot.orm.jpa.hibernate.SpringImplicitNamingStrategy</referenceUrl>
<verbose>true</verbose>
<logging>debug</logging>
<contexts>!test</contexts>
</configuration>
The new error is:
[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[FATAL] Non-parseable POM
C:\Users\clari\Documents\project\pom.xml: entity reference name can not
contain character =' (position: START_TAG seen
...mysql://localhost:3306/project?useUnicode=true&characterEncoding=...
#561:101) # line 561, column 101
#
[ERROR] The build could not read 1 project -> [Help 1]
[ERROR]
[ERROR] The project
(C:\Users\clari\Documents\project\pom.xml) has 1 error
[ERROR] Non-parseable POM
C:\Users\clari\Documents\project\pom.xml: entity reference name can not contain character =' (position: START_TAG seen ...mysql://localhost:3306/project?useUnicode=true&characterEncoding=... #561:101) # line 561, column 101 -> [Help 2]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException
[ERROR] [Help 2]
http://cwiki.apache.org/confluence/display/MAVEN/ModelParseException
Process finished with exit code 1
Could you please help me?
Thanks :)
You trace shows that you use liquibase-maven-plugin to execute clearCheckSums goal, the JDBC URL in application-dev.yml is read only by your java application not by maven liquibase plugin.
You must configure it also in pom.xml in the url property of liquibase-maven-plugin.
Warning, the & characters in the <url> should be encodes as XML entity &.
<url>jdbc:mysql://localhost:3306/project?useUnicode=true&characterEncoding=utf8&useSSL=false&useLegacyDatetimeCode=false&serverTimezone=UTC</url>

spark jobserver failing to build with Spark 2.0

I am trying to run spark-jobserver with spark-2.0
I cloned branch spark-2.0-preview from github repository. I follow the deployment guide but when I try to deploy server using bin/server_deploy.sh. I got compilation error:
Error:
[error] /spark-jobserver/job-server-extras/src/main/java/spark/jobserver/JHiveTestLoaderJob.java:4: cannot find symbol
[error] symbol: class DataFrame
[error] location: package org.apache.spark.sql
[error] import org.apache.spark.sql.DataFrame;
[error] /spark-jobserver/job-server-extras/src/main/java/spark/jobserver/JHiveTestJob.java:13: java.lang.Object cannot be converted to org.apache.spark.sql.Row[]
[error] return sc.sql(data.getString("sql")).collect();
[error] /spark-jobserver/job-server-extras/src/main/java/spark/jobserver/JHiveTestLoaderJob.java:25: cannot find symbol
[error] symbol: class DataFrame
[error] location: class spark.jobserver.JHiveTestLoaderJob
[error] final DataFrame addrRdd = sc.sql("SELECT * FROM default.test_addresses");
[error] /spark-jobserver/job-server-extras/src/main/java/spark/jobserver/JSqlTestJob.java:13: array required, but java.lang.Object found
[error] Row row = sc.sql("select 1+1").take(1)[0];
[info] /spark-jobserver/job-server-extras/src/main/java/spark/jobserver/JHiveTestJob.java: Some input files use or override a deprecated API.
[info] /spark-jobserver/job-server-extras/src/main/java/spark/jobserver/JHiveTestJob.java: Recompile with -Xlint:deprecation for details.
[error] (job-server-extras/compile:compileIncremental) javac returned nonzero exit code
Did I forget to add some dependencies?
I had a similar issue. I found that it is bug because of changes in Spark API from 1.x to 2.x. you can found open issue on github https://github.com/spark-jobserver/spark-jobserver/issues/760
I introduced some quick fix which for me solved the issue and I can deploy jobserver. I submitted pull request for that. https://github.com/spark-jobserver/spark-jobserver/pull/762

Scala syntax errors when building Spark

I cloned a fresh copy of the branch-2.0 branch of Spark from Github onto a Centos 7 system. When executing the suggested command to build from source,
./dev/make-distribution.sh --name custom-spark --tgz -Psparkr -Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn
I get the following errors:
[INFO] Total time: 5.368 s (Wall Clock)
[INFO] Finished at: 2016-08-18T11:56:49-05:00
[error[INFO] Final Memory: 71M/1963M
] [INFO] ------------------------------------------------------------------------
/home/rprechelt/ada/spark/common/tags/src/main/scala/org/apache/spark/annotation/Since.scala:30: error writing class Since: /home/rprechelt/ada/spark/common/tags/target/scala-2.11/classes/org/apache/spark/annotation/Since.class: /home/rprechelt/ada/spark/common/tags/target/scala-2.11/classes/org is not a directory
[error] private[spark] class Since(version: String) extends StaticAnnotation
[error] ^
[error] /home/rprechelt/ada/spark/common/tags/src/main/scala/org/apache/spark/annotation/package.scala:25: error writing package object annotation: /home/rprechelt/ada/spark/common/tags/target/scala-2.11/classes/org/apache/spark/annotation/package.class: /home/rprechelt/ada/spark/common/tags/target/scala-2.11/classes/org is not a directory
[error] package object annotation
[error] ^
[error] two errors found
[error] Compile failed at Aug 18, 2016 11:56:49 AM [0.358s]
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-compile-first) on project spark-sketch_2.10: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed. CompileFailed -> [Help 1]
I'm at a loss for what could be occurring here - I've tried building it with different versions of Scala, but they all report the same error.
Does anyone have any suggestions about how I might go about fixing this?
I got the same problem, this is because zinc server is started by some other user. Trying killing zinc and then starting the build again.
Its basically an access error where zinc server started by another user is trying to write to a different directory.
sudo ps -ef | grep zinc

Spark + Amazon S3 "s3a://" urls

AFAIK, the newest, best S3 implementation for Hadoop + Spark is invoked by using the "s3a://" url protocol. This works great on pre-configured Amazon EMR.
However, when running on a local dev system using the pre-built spark-2.0.0-bin-hadoop2.7.tgz, I get
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 99 more
Next I tried to launch my Spark job specifying the hadoop-aws addon:
$SPARK_HOME/bin/spark-submit --master local \
--packages org.apache.hadoop:hadoop-aws:2.7.3 \
my_spark_program.py
I get
::::::::::::::::::::::::::::::::::::::::::::::
:: FAILED DOWNLOADS ::
:: ^ see resolution messages for details ^ ::
::::::::::::::::::::::::::::::::::::::::::::::
:: com.google.code.findbugs#jsr305;3.0.0!jsr305.jar
:: org.apache.avro#avro;1.7.4!avro.jar
:: org.xerial.snappy#snappy-java;1.0.4.1!snappy-java.jar(bundle)
::::::::::::::::::::::::::::::::::::::::::::::
I made a dummy build.sbt project in a temp directory with those three dependencies to see if a basic sbt build could successfully download those and I got:
[error] (*:update) sbt.ResolveException: unresolved dependency: org.apache.avro#avro;1.7.4: several problems occurred while resolving dependency: org.apache.avro#avro;1.7.4 {compile=[default(compile)]}:
[error] org.apache.avro#avro;1.7.4!avro.pom(pom.original) origin location must be absolute: file:/Users/username/.m2/repository/org/apache/avro/avro/1.7.4/avro-1.7.4.pom
[error] org.apache.avro#avro;1.7.4!avro.pom(pom.original) origin location must be absolute: file:/Users/username/.m2/repository/org/apache/avro/avro/1.7.4/avro-1.7.4.pom
[error]
[error] unresolved dependency: com.google.code.findbugs#jsr305;3.0.0: several problems occurred while resolving dependency: com.google.code.findbugs#jsr305;3.0.0 {compile=[default(compile)]}:
[error] com.google.code.findbugs#jsr305;3.0.0!jsr305.pom(pom.original) origin location must be absolute: file:/Users/username/.m2/repository/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.pom
[error] com.google.code.findbugs#jsr305;3.0.0!jsr305.pom(pom.original) origin location must be absolute: file:/Users/username/.m2/repository/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.pom
[error]
[error] unresolved dependency: org.xerial.snappy#snappy-java;1.0.4.1: several problems occurred while resolving dependency: org.xerial.snappy#snappy-java;1.0.4.1 {compile=[default(compile)]}:
[error] org.xerial.snappy#snappy-java;1.0.4.1!snappy-java.pom(pom.original) origin location must be absolute: file:/Users/username/.m2/repository/org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.pom
[error] org.xerial.snappy#snappy-java;1.0.4.1!snappy-java.pom(pom.original) origin location must be absolute: file:/Users/username/.m2/repository/org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.pom
[error] Total time: 2 s, completed Sep 2, 2016 6:47:17 PM
Any ideas on how I can get this working?
It looks like you need additional jars in your submit flag. The Maven repository has a number of AWS packages for Java which you can use to fix your current error: https://mvnrepository.com/search?q=aws
I continuously receive headaches with the S3A filesystem error; but the aws-java-sdk:1.7.4 jar works for Spark 2.0.
Further dialogue on the matter can be found here; albeit there is indeed an actual package in the Maven AWS EC2 repository.
https://sparkour.urizone.net/recipes/using-s3/
Try this:
spark-submit --packages com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3 my_spark_program.py
If you are using Apache Spark (that is: I'm ignoring the build Amazon ship in EMR), you need to add a dependency on org.apache.hadoop:hadoop-aws for exactly the same version of Hadoop that the rest of spark uses. This adds the S3a FS and the transitive dependencies. The version of the AWS SDK must be the same as that used to build the hadoop-aws library, as it's a bit of a moving target.
See: Apache Spark and Object Stores

Error: Invalid or corrupt jarfile occured while trying to build recommendation engine of PredictionIO in Linux machine

Error occured while trying to build the recommendation engine using PredictionIO. Please anyone know how to solve this issue.
root#testing:~/PredictionIO/engines# pio build --verbose
[INFO] [Console$] Using command '/root/PredictionIO/sbt/sbt' at the current working directory to build.
[INFO] [Console$] If the path above is incorrect, this process will fail.
[INFO] [Console$] Uber JAR disabled. Making sure lib/pio-assembly-0.9.4.jar is absent.
[INFO] [Console$] Going to run: /root/PredictionIO/sbt/sbt package assemblyPackageDependency
[ERROR] [Console$] Error: Invalid or corrupt jarfile /root/PredictionIO/sbt/sbt-launch-0.13.7.jar
[ERROR] [Console$] Return code of previous step is 1. Aborting.
For me help to download this file
https://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.7/sbt-launch.jar
Rename downloaded file to sbt-launch-0.13.7.jar and replace previous file in PredictionIO/sbt/

Resources