Spark job deployment failure to cloudera - apache-spark

I am using guice architecture upon developing my spark strreaming program. It can run in my eclipse without any error. However, after compiling and deployed with spark-submit command, it returns an error:
java.lang.NoClassDefFoundError: com/google/common/base/Preconditions
After googling through, I noticed that this error only appears if we are using guice 3.0. But I am using guice 4.0. My spark version is 1.5.2, and my cloudera version is 5.3.2. Is there any work around on this error?

Unfortunately for you, Spark v1.5.2 depends on com.google.inject:guice:3.0.
So I suspect that what is happening is that that your project is pulling both:
Guice 4.0 (as a direct dependency stated in your dependencies file like pom.xml or build.sbt); and
Guice 3.0 (a transitive dependency pulled by Spark v1.5.2)
Basically your classpath ends up being a mess, and depending on the way classes are loaded by the classloader at runtime you will (or will not) experience such kind of errors.
You will have to use the already provided version of Guice (pulled by Spark) or start juggling with classloaders.
UPDATE:
Indeed the org.apache.spark:spark-core_2.10:1.5.2 pulls com.google.inject:guice:3.0 :
+-org.apache.spark:spark-core_2.10:1.5.2 [S]
+ ...
...
+-org.apache.hadoop:hadoop-client:2.2.0
| +-org.apache.hadoop:hadoop-mapreduce-client-app:2.2.0
| | +-com.google.protobuf:protobuf-java:2.5.0
| | +-org.apache.hadoop:hadoop-mapreduce-client-common:2.2.0
| | | +-com.google.protobuf:protobuf-java:2.5.0
| | | +-org.apache.hadoop:hadoop-mapreduce-client-core:2.2.0
| | | | +-com.google.protobuf:protobuf-java:2.5.0
| | | | +-org.apache.hadoop:hadoop-yarn-common:2.2.0 (VIA PARENT org.apache.hadoop:hadoop-yarn:2.2.0 and then VIA ITS PARENT org.apache.hadoop:hadoop-project:2.2.0)
| | | | | +-com.google.inject:guice:3.0
...
The spark-core pom.xml is here.
The hadoop-yarn-common pom.xml is here.
The hadoop-yarn pom.xml is here.
The hadoop-project pom.xml is here.

Related

Kitties tutorial part 1 build is failing with unresolved import for sc_client_api RemoteBackend, anyone faced this issue?

I just followed the tutorial, I am at the first
cargo build --release
This is the error I am getting, any idea why this would happen?
error[E0432]: unresolved import `sc_client_api::RemoteBackend`
--> node/src/service.rs:4:39
|
4 | use sc_client_api::{ExecutorProvider, RemoteBackend};
| ^^^^^^^^^^^^^
| |
| no `RemoteBackend` in the root
| help: a similar name exists in the module: `StateBackend`
I solved this by removing RemoteBackend from the import at node/src/service.rs.
Just change the line (for me it was Line 4) at node/src/service.rs from
use sc_client_api::{ExecutorProvider, RemoteBackend};
to
use sc_client_api::ExecutorProvider;
While working with the tutorial cargo build --release resulted in the following error message:
error[E0432]: unresolved import `sc_client_api::RemoteBackend`
--> node/src/service.rs:4:39
|
4 | use sc_client_api::{ExecutorProvider, RemoteBackend};
| ^^^^^^^^^^^^^
| |
| no `RemoteBackend` in the root
| help: a similar name exists in the module: `StateBackend`
The help message provided some clue as to what was happening. Checking the node/src/service.rs file, I found that RemoteBackend was never referenced anywhere else in the file. So I simply removed it from the imports, and the build process was completed successfully.

Is it possible that aws-sdk/dynamoose causes an SQLite syntax error on DynamoDB local?

Context:
Problem found while upgrading from Nodejs 6 to 12 and with that the project's dependencies.
Using dynamoose 2.3
Containerized application using docker-compose: backend and dynamodb instance only
Docker file for dynamodb:
FROM openjdk:latest
\# Bundle dynamodb
COPY . .
EXPOSE 8000
CMD [ "java", "-jar", "DynamoDBLocal.jar" ]
Problem: when lifting up the containers, after the backend initializes the dynamodb instance throws the errors below, causing any subsequent query or call to stall and return on timeout on the backend's side.
Error:
dynamodb_1 | Sep 03, 2020 8:14:36 AM com.almworks.sqlite4java.Internal log
dynamodb_1 | WARNING: [sqlite] SQLiteDBAccess$10#b6f156c: job exception
dynamodb_1 | com.almworks.sqlite4java.SQLiteException: [1] DB[1] prepare() DROP INDEX Foobar*HVI; [near "*": syntax error]
dynamodb_1 | at com.almworks.sqlite4java.SQLiteConnection.throwResult(SQLiteConnection.java:1436)
dynamodb_1 | at com.almworks.sqlite4java.SQLiteConnection.prepare(SQLiteConnection.java:580)
dynamodb_1 | at com.almworks.sqlite4java.SQLiteConnection.prepare(SQLiteConnection.java:635)
dynamodb_1 | at com.almworks.sqlite4java.SQLiteConnection.prepare(SQLiteConnection.java:622)
dynamodb_1 | at com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.AmazonDynamoDBOfflineSQLiteJob.getPreparedStatement(AmazonDynamoDBOfflineSQLiteJob.java:138)
dynamodb_1 | at com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.SQLiteDBAccess$10.dropGSISQLiteIndex(SQLiteDBAccess.java:1221)
dynamodb_1 | at com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.SQLiteDBAccess$10.dropIndices(SQLiteDBAccess.java:1169)
dynamodb_1 | at com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.SQLiteDBAccess$10.doWork(SQLiteDBAccess.java:1155)
dynamodb_1 | at com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.SQLiteDBAccess$10.doWork(SQLiteDBAccess.java:1152)
dynamodb_1 | at com.amazonaws.services.dynamodbv2.local.shared.access.sqlite.AmazonDynamoDBOfflineSQLiteJob.job(AmazonDynamoDBOfflineSQLiteJob.java:97)
dynamodb_1 | at com.almworks.sqlite4java.SQLiteJob.execute(SQLiteJob.java:372)
dynamodb_1 | at com.almworks.sqlite4java.SQLiteQueue.executeJob(SQLiteQueue.java:534)
dynamodb_1 | at com.almworks.sqlite4java.SQLiteQueue.queueFunction(SQLiteQueue.java:667)
dynamodb_1 | at com.almworks.sqlite4java.SQLiteQueue.runQueue(SQLiteQueue.java:623)
dynamodb_1 | at com.almworks.sqlite4java.SQLiteQueue.access$000(SQLiteQueue.java:77)
dynamodb_1 | at com.almworks.sqlite4java.SQLiteQueue$1.run(SQLiteQueue.java:205)
dynamodb_1 | at java.base/java.lang.Thread.run(Thread.java:832)
I suspect this is happening when creating the tables through Dynamoose's model() which under the hood calls aws' DynamoDB createTable method.
I'm currently just analysing the upgrade to nodejs 12 and dynamoose 2.3. In local I would prefer to have it run to test other parts of the project, so I don't mind updating indexes and recreating tables, but wish to know where this syntax error is coming from to fix it and carry on.
Question: Is it possible that the aws-sdk or dynamoose cause a DynamoDB local instance to attempt to drop an index with an SQLite syntax error?
The problem is that Dynamoose 0.8.7 used to support Schemas with attributes that were both hashkey and had an index marked as global. When jumping to the latest version, some breaking changes made it add characters like '*' to their queries and thus making SQLite complain.
Example:
const FoobarSchema = new dynamoose.Schema({
foo: {
type: String,
hashkey: true,
index: {
global:true
}
}
});
const Foobar = dynamoose.model('Foobar', FoobarSchema, Options); // Blow up
I'm new to the project and DynamoDb. I don't understand what was the reasoning behind having those two conditions together and whether it should be supported or not. I'll dig deeper into this when I have the chance and update this.
Deleting the "shared-local-instance.db" file in my "docker/dynamodb" folder resolved the issue for me (the file is generated again on docker-compose up and everything runs fine)

What is the solution for the error, “JBlas is not a member of package or apache”?

I tried to solve it from both of these ( this and this) threads, and it worked for me on my own virtual machine but didn’t work in cloud dataproc. I did the same process for both of them. But there is still error in the cloud which is same as the error previously in a virtual machine. What should be done on the cloud to solve it?
Did you do the full "git clone" steps in those linked threads? And did you need to actually modify jblas? If not, you should just pull them from maven central using --packages org.jblas:jblas:1.2.4 without the git clone or mvn install; the following worked fine for me on a new Dataproc cluster:
$ spark-shell --packages org.jblas:jblas:1.2.4
Ivy Default Cache set to: /home/dhuo/.ivy2/cache
The jars for the packages stored in: /home/dhuo/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.jblas#jblas added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found org.jblas#jblas;1.2.4 in central
downloading https://repo1.maven.org/maven2/org/jblas/jblas/1.2.4/jblas-1.2.4.jar ...
[SUCCESSFUL ] org.jblas#jblas;1.2.4!jblas.jar (605ms)
:: resolution report :: resolve 713ms :: artifacts dl 608ms
:: modules in use:
org.jblas#jblas;1.2.4 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 1 | 1 | 0 || 1 | 1 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
1 artifacts copied, 0 already retrieved (10360kB/29ms)
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used
Spark context Web UI available at http://10.240.2.221:4040
Spark context available as 'sc' (master = yarn, app id = application_1501548510890_0005).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.2.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.jblas.DoubleMatrix
import org.jblas.DoubleMatrix
scala> :quit
Additionally, if you need to submit jobs that require "packages" via Dataproc's job submission API, then since --packages is actually syntactic sugar in the various Spark launcher scripts rather than being a property of a Spark job, you need to use the equivalent spark.jars.packages instead in such a case, as explained in this StackOverflow answer.

Tanuki upgrade: JVM configuration version

I am currently using the old tanuki version 3.2.3, and moving to the newest one 3.5.25.
I followed the upgrading documentation: modify my script, change the jar and binary wrapper… etc.
Debugging during the JVM launch, I could see that every additional param defined in my wrapper.conf appears as follows:
DEBUG | wrapper | 2014/07/03 13:41:08 | Command[0] : java
DEBUG | wrapper | 2014/07/03 13:41:08 | Command[1] : -Djava.system.class.loader=myClass
DEBUG | wrapper | 2014/07/03 13:41:08 | Command[2] : -Dcom.sun.management.jmxremote=true
But there are some extra params, and I don´t know where they are set up:
DEBUG | wrapper | 2014/07/03 13:41:08 | Command[30] : -Dwrapper.version=3.2.3
DEBUG | wrapper | 2014/07/03 13:41:08 | Command[31] : -Dwrapper.native_library=wrapper
DEBUG | wrapper | 2014/07/03 13:41:08 | Command[32] : -Dwrapper.service=TRUE
DEBUG | wrapper | 2014/07/03 13:41:08 | Command[33] : -Dwrapper.cpu.timeout=10
Specially annoying is the version one. It is still the old one. Does anybody know where I could change this configuration params?
Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org
Thanks!
Best
Tanuki library was imported by a dependency. Doesn´t matter the properties in my project, they will be overwritten.
The solution: using maven discard these dependencies or overwrite them.

Sun One Web Server 6.1 and Tomahawk

Do anyone know which version of tomahawk is suitable to use with Sun One Webserver 6.1?
Thanks in advance,
Alejo
Well, here is a requirements table for JSF:
JSF | 1.0 | 1.1 | 1.2 (JEE5) | 2.0
---------------------------------------------
Java | 1.3 | 1.3 | 5 | *
JSP | 1.2 | 1.2 | 2.1 | *
Servlet | 2.3 | 2.3 | 2.5 | *
JavaBeans | 1.0.1 | 1.0.1 | 1.0.1 | *
JSTL | 1.0 | 1.0 | 1.2 | *
*JSF 2.0 Public Review Draft requires JEE5
The Sun ONE Web Server doc says this:
Sun ONE Web Server 6.1 supports the
Java Servlet 2.3 specification,
including web application and WAR file
(Web ARchive file) support, and the
JavaServer Pages (JSP) 1.2
specification.
So, I'd use the compatibility matrix to check for likely candidates.

Resources