When deploying my application I receive the following error: Website with given name azuredemo already exists.
I am attempting to deploy a Java Spring Boot web application to Azure. I created a Spring Boot application that works on my local machine against port 80.
I then ran mvn azure-webapp:config and configured the Application and then ran config a second time to configure the Runtime.
I get the following output from maven:
AppName : azuredemo
ResourceGroup : azuredemo-rg
Region : eastus
PricingTier : Basic_B1
OS : Linux
Java : Java 11
Web server stack: Tomcat 9.0
Deploy to slot : false
Then I ran the command mvn azure-webapp:deploy to deploy the application to Azure.
[INFO] Scanning for projects...
[INFO]
[INFO] -----------------< demo:azuredemo >------------------
[INFO] Building azuredemo 0.0.1-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- azure-webapp-maven-plugin:1.12.0:deploy (default-cli) # azuredemo ---
[WARNING] The POM for com.microsoft.azure.applicationinsights.v2015_05_01:azure-mgmt-insights:jar:1.0.0-beta is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.microsoft.applicationinsights.core.dependencies.xstream.core.util.Fields (file:/C:/xxxxx/.m2/repository/com/microsoft/azure/applicationinsights-core/2.6.1/applicationinsights-core-2.6.1.jar) to field java.util.TreeMap.comparator
WARNING: Please consider reporting this to the maintainers of com.microsoft.applicationinsights.core.dependencies.xstream.core.util.Fields
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[INFO] Auth Type : AZURE_CLI, Auth Files : [C:\xxxxx\.azure\azureProfile.json, C:\xxxxx\.azure\accessTokens.json]
[INFO] Subscription : Pay-As-You-Go(yyyyyyyy)
[INFO] Target Web App doesn't exist. Creating a new one...
[INFO] Creating App Service Plan 'ServicePlandaaaa-aaa'...
[INFO] Successfully created App Service Plan.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 30.596 s
[INFO] Finished at: 2021-02-03T15:02:13-05:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal com.microsoft.azure:azure-webapp-maven-plugin:1.12.0:deploy (default-cli) on project azuredemo: Website with given name azuredemo already exists.: OnError while emitting onNext value: retrofit2.Response.class -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
So to troubleshoot the issue, I delete all of the Application Services. I deleted all of the Resource Groups. And attempted the deployment again to the same error.
I understand I am probably missing something, but going through the portal and dashboards, I cannot find what the conflict is.
My Environment
Maven 3.6.3
Java 11
Spring Boot 2.3.8
az cli 2.18.0
Pay-As-You-Go subscription
My Maven POM
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.3.8.RELEASE</version>
<relativePath/>
<!-- lookup parent from repository -->
</parent>
<groupId>demo</groupId>
<artifactId>azuredemo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>azuredemo</name>
<description>Demo project for Spring Boot</description>
<properties>
<java.version>11</java.version>
<azure.version>3.1.0</azure.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>com.azure.spring</groupId>
<artifactId>azure-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>com.azure.spring</groupId>
<artifactId>azure-spring-boot-starter-active-directory</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-devtools</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-configuration-processor</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.azure.spring</groupId>
<artifactId>azure-spring-boot-bom</artifactId>
<version>${azure.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-configuration-processor</artifactId>
</exclude>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
<plugin>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-webapp-maven-plugin</artifactId>
<version>1.12.0</version>
<configuration>
<authType>azure_cli</authType>
<resourceGroup>azuredemo-rg</resourceGroup>
<appName>azuredemo</appName>
<pricingTier>B1</pricingTier>
<region>eastus</region>
<deployment>
<resources>
<resource>
<directory>${project.basedir}/target</directory>
<includes>
<include>*.jar</include>
</includes>
</resource>
</resources>
</deployment>
<runtime>
<os>Linux</os>
<javaVersion>Java 11</javaVersion>
<webContainer>Tomcat 9.0</webContainer>
</runtime>
</configuration>
</plugin>
</plugins>
</build>
</project>
The name of Azure App Services are required to be globally unique. Presumably, someone else has already created a web app with the name 'azuredemo'. Try making the name more unique and publishing again.
In my case I wanted to update an app with a new deploy. If you get the error in this situation like I did, make sure that the resource group name matches the one that is already added in azure.
Related
I have two simple Flink streaming jobs that read from Kafka do some transformations and put the result into a Cassandra Sink. They read from different Kafka topics and save into different Cassandra tables.
When I run any one of the two jobs alone everything works fine. Checkpoints are triggered and completed and data is saved to Cassandra.
But when ever I run both jobs (or one of them twice) the second job fails at start up with this exception:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [localhost/127.0.0.1] Error writing)).
I could not find much info about this error, it may be caused by any one of the following:
Flink (v 1.10.0-scala_2.12),
Flink Cassandra Connector (flink-connector-cassandra_2.11:jar:1.10.2, also tried with flink-connector-cassandra_2.12:jar:1.10.0),
Datastax underlying driver (v 3.10.2),
Cassandra v4.0 (same with v3.0),
Netty transport (v 4.1.51.Final).
I also use packages that may have collisions with the first ones:
mysql-connector-java (v 8.0.19),
cassandra-driver-extras (v 3.10.2)
Finally this is my code for the cluster builder:
ClusterBuilder builder = new ClusterBuilder() {
#Override
protected Cluster buildCluster(Cluster.Builder builder) {
Cluster cluster = null;
try {
cluster = builder
.addContactPoint("localhost")
.withPort(9042)
.withClusterName("Test Cluster")
.withoutJMXReporting()
.withProtocolVersion(ProtocolVersion.V4)
.withoutMetrics()
.build();
// register codecs from datastax extras.
cluster.getConfiguration().getCodecRegistry()
.register(LocalTimeCodec.instance);
} catch (ConfigurationException e) {
e.printStackTrace();
} catch (NoHostAvailableException nhae) {
nhae.printStackTrace();
}
return cluster;
}
};
I tried with different PoolingOptions and SocketOptions settings but no success.
Cassandra Sink:
CassandraSink.addSink(dataRows)
.setQuery("insert into table_name_(16 columns names) " +
"values (16 placeholders);")
.enableWriteAheadLog()
.setClusterBuilder(builder)
.setFailureHandler(new CassandraFailureHandler() {
#Override
public void onFailure(Throwable throwable) {
LOG.error("A {} occurred.", "Cassandra Failure", throwable);
}
})
.build()
.setParallelism(1)
.name("Cassandra Sink For Unique Count every N minutes.");
The full trace log from flink job manager:
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: localhost/127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [localhost/127.0.0.1] Error writing))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:231)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:77)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1414)
at com.datastax.driver.core.Cluster.init(Cluster.java:162)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:333)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:308)
at com.datastax.driver.core.Cluster.connect(Cluster.java:250)
at org.apache.flink.streaming.connectors.cassandra.CassandraSinkBase.createSession(CassandraSinkBase.java:143)
at org.apache.flink.streaming.connectors.cassandra.CassandraSinkBase.open(CassandraSinkBase.java:87)
at org.apache.flink.streaming.connectors.cassandra.AbstractCassandraTupleSink.open(AbstractCassandraTupleSink.java:49)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeStateAndOpen(StreamTask.java:1007)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:454)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:449)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
at java.base/java.lang.Thread.run(Thread.java:834)
Any help is appreciated.
Edit:
I just tried using two Cassandra separate instances (different machines and different clusters). I then pointed one job to an instance and the other job to the other instance. Nothing has changed, I still get the same error.
Tried to reduce dependencies, here is the new pom file:
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.abcde.ai</groupId>
<artifactId>analytics-etl</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>Flink Quickstart Job</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.10.2</flink.version>
<java.version>1.8</java.version>
<scala.binary.version>2.11</scala.binary.version>
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
</properties>
<repositories>
<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<!-- Apache Flink dependencies -->
<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.6</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-cassandra_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>commons-configuration</groupId>
<artifactId>commons-configuration</artifactId>
<version>1.10</version>
</dependency>
<!-- Add logging framework, to produce console output when running in the IDE. -->
<!-- These dependencies are excluded from the application JAR by default. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Java Compiler -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
<!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
<!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.1</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.apache.flink:force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.abcde.analytics.etl.KafkaUniqueCountsStreamingJob</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. -->
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<versionRange>[3.1.1,)</versionRange>
<goals>
<goal>shade</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<versionRange>[3.1,)</versionRange>
<goals>
<goal>testCompile</goal>
<goal>compile</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
Edit:
I managed to narrow down the problem. The error gets fixed when I mark the dependency flink-connector-cassandra as provided and I simply copy the jar file from my local maven repository (~/.m2/repository/org/apache/flink/flink-connector-cassandra_2.11/1.10.2/flink-connector-cassandra_2.11-1.10.2.jar) to Flink lib folder. My problem is solved but the root cause is still a mystery.
I might be wrong, but most likely the issue is caused by netty client version conflict. The error states NoHostAvailableException, however the underlying error is TransportException with Error writing error message. Cassandra s definetely operating well.
There is a kind of similar stackoverflow case - Cassandra - error writing, with a very similar symptoms - a single project running well and AllNodesFailedException with TransportException with Error writing message as a root cause when adding one more. The author was able to solve it by unifying the netty client.
In your case, I'm not sure why there are so many dependencies, so I would try to get rid of all extras and libraries and would just leave Flink (v 1.10.0-scala_2.12) and Flink Cassandra Connector (flink-connector-cassandra_2.12:jar:1.10.0) libraries. They must already include necessary drivers, netty, etc. All other drivers should be skipped (at least for initial iteration to ensure that this solves the issue and it's library conflict).
To fix the error I marked the dependency flink-connector-cassandra as provided and I simply copy the jar file from my local maven repository (~/.m2/repository/org/apache/flink/flink-connector-cassandra_2.11/1.10.2/flink-connector-cassandra_2.11-1.10.2.jar) to Flink lib folder and restarted Flink, here is my new pom.xml file:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-cassandra_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
How I found this? I was about to try to compile the connector from source with a more recent driver version. First I tried to reproduce the error with the unchanged sources. So I compiled it without changing anything, put the jar into Flink lib folder, Hooray it works! I then suspected that the jar from maven had something different. I copied it into the lib folder and at my surprise it also worked.
My problem is solved but the root cause remains a mystery.
My last attempt was to check if any packages are in conflict with Cassandra connector so I run dependency:tree -Dverbose there was one conflict with org.apache.flink:flink-metrics-dropwizard about metrics-core:
[INFO] +- org.apache.flink:flink-connector-cassandra_2.12:jar:1.10.0:provided
[INFO] | +- (io.dropwizard.metrics:metrics-core:jar:3.1.2:provided - omitted for conflict with 3.1.5)
[INFO] | \- (org.apache.flink:force-shading:jar:1.10.0:provided - omitted for duplicate)
I removed this dependency from my project but the error remains if the connector is not marked as provided and also put in the lib folder.
I m new to Karate API automation tool and just try to set up the tool.I'm getting Compilation errors when I try to compile (Clean install ) my maven project.
Appreciate if anyone can help me on this. This is compiled and worked fine with 0.6.0 and I change it to the latest version. but now it's not working even for the previous one.
Find the bellow PoM file I m using :
http://maven.apache.org/xsd/maven-4.0.0.xsd">
4.0.0
<groupId>com.test.karate</groupId>
<artifactId>Examples</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>com.intuit.karate</groupId>
<artifactId>karate-apache</artifactId>
<version>0.8.0.RC3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.intuit.karate</groupId>
<artifactId>karate-junit4</artifactId>
<version>0.8.0.RC3</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<testResources>
<testResource>
<directory>src/test/java</directory>
<excludes>
<exclude>**/*.java</exclude>
</excludes>
</testResource>
</testResources>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${maven.compiler.version}</version>
<configuration>
<encoding>UTF-8</encoding>
<source>${java.version}</source>
<target>${java.version}</target>
<compilerArgument>-Werror</compilerArgument>
</configuration>
**Console log :**
Building Examples 0.0.1-SNAPSHOT
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /Users/XXX/eclipse-workspace/Examples/src/main/java/Ru`enter code here`nnerKarate.java:[3,24] package org.junit.runner does not exist
[ERROR] /Users/XXX/eclipse-workspace/Examples/src/main/java/RunnerKarate.java:[5,1] package com.intuit.karate.junit4 does not exist
[ERROR] /Users/XXX/eclipse-workspace/Examples/src/main/java/RunnerKarate.java:[7,20] package cucumber.api does not exist
[ERROR] /Users/XXX/eclipse-workspace/Examples/src/main/java/RunnerKarate.java:[8,26] package cucumber.api.junit does not exist
[ERROR] /Users/uwickdi/eclipse-workspace/Examples/src/main/java/RunnerKarate.java:[10,2] cannot find symbol
symbol: class RunWith
[ERROR] /Users/XXX/eclipse-workspace/Examples/src/main/java/RunnerKarate.java:[11,2] cannot find symbol
symbol: class CucumberOptions
[INFO] 6 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ----------------------------------------------------------------
Sounds like your local maven repository is corrupted so try to clean and re-download the JAR-s. You can use this command dependency:purge-local-repository
I am trying to connect to Amazon Kinesis Stream from Google Dataproc but am only getting Empty RDDs.
Command: spark-submit --verbose --packages org.apache.spark:spark-streaming-kinesis-asl_2.10:1.6.2 demo_kinesis_streaming.py --awsAccessKeyId XXXXX --awsSecretKey XXXX
Detailed Log: https://gist.github.com/sshrestha-datalicious/e3fc8ebb4916f27735a97e9fcc42136c
More Details
Spark 1.6.1
Hadoop 2.7.2
Assembly Used: /usr/lib/spark/lib/spark-assembly-1.6.1-hadoop2.7.2.jar
Surprisingly that works when I download and use the assembly containing SPARK 1.6.1 with Hadoop 2.6.0 with the following command.
Command: SPARK_HOME=/opt/spark-1.6.1-bin-hadoop2.6 spark-submit --verbose --packages org.apache.spark:spark-streaming-kinesis-asl_2.10:1.6.2 demo_kinesis_streaming.py --awsAccessKeyId XXXXX --awsSecretKey XXXX
I am not sure if there is any version conflict between the two hadoop versions and Kinesis ASL or it has to do with custom settings with Google Dataproc.
Any help would be appreciated.
Thanks
Suren
Our team was in a similar situation and we managed to solve it:
We are running on the same environment:
DataProc Image Version 1 with Spark 1.6.1 with Hadoop 2.7
A simple SparkStream Kinesis Script that boils down to this:
# Run the script as
# spark-submit \
# --packages org.apache.spark:spark-streaming-kinesis-asl_2.10:1.6.1\
# demo_kinesis_streaming.py\
# --awsAccessKeyId FOO\
# --awsSecretKey BAR\
# ...
import argparse
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.storagelevel import StorageLevel
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream
ap = argparse.ArgumentParser()
ap.add_argument('--awsAccessKeyId', required=True)
ap.add_argument('--awsSecretKey', required=True)
ap.add_argument('--stream_name')
ap.add_argument('--region')
ap.add_argument('--app_name')
ap = ap.parse_args()
kinesis_application_name = ap.app_name
kinesis_stream_name = ap.stream_name
kinesis_region = ap.region
kinesis_endpoint_url = 'https://kinesis.{}.amazonaws.com'.format(ap.region)
spark_context = SparkContext(appName=kinesis_application_name)
streamingContext = StreamingContext(spark_context, 60)
kinesisStream = KinesisUtils.createStream(
ssc=streamingContext,
kinesisAppName=kinesis_application_name,
streamName=kinesis_stream_name,
endpointUrl=kinesis_endpoint_url,
regionName=kinesis_region,
initialPositionInStream=InitialPositionInStream.TRIM_HORIZON,
checkpointInterval=60,
storageLevel=StorageLevel.MEMORY_AND_DISK_2,
awsAccessKeyId=ap.awsAccessKeyId,
awsSecretKey=ap.awsSecretKey
)
kinesisStream.pprint()
streamingContext.start()
streamingContext.awaitTermination()
The code had been tested working on AWS EMR and on local environment using the same Spark 1.6.1 with Hadoop 2.7 setup.
The script is returning empty RDDs without printing any error while there is data in the Kinesis stream on DataProc.
We've tested it on DataProc with the following envs, and none of them worked.
Submit job via gcloud command;
ssh into Cluster Master Node and run in yarn client mode;
ssh into Cluster Master Node and run as local[*].
Upon enabling verbose logging by updating /etc/spark/conf/log4.properties with the following value:
log4j.rootCategory=DEBUG, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c: %m%n
log4j.logger.org.eclipse.jetty=ERROR
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=DEBUG
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=DEBUG
log4j.logger.org.apache.spark=DEBUG
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=DEBUG
log4j.logger.org.spark-project.jetty.server.handler.ContextHandler=DEBUG
log4j.logger.org.apache=DEBUG
log4j.logger.com.amazonaws=DEBUG
We've notice something weird in the log(Note that spark-streaming-kinesis-asl_2.10:1.6.1 uses aws-sdk-java/1.9.37 as dependence while somehow aws-sdk-java/1.7.4 was used [suggested by user-agent]):
16/07/10 06:30:16 DEBUG com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardConsumer: PROCESS task encountered execution exception:
java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: com.amazonaws.services.kinesis.model.GetRecordsResult.getMillisBehindLatest()Ljava/lang/Long;
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardConsumer.checkAndSubmitNextTask(ShardConsumer.java:137)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardConsumer.consumeShard(ShardConsumer.java:126)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker.run(Worker.java:334)
at org.apache.spark.streaming.kinesis.KinesisReceiver$$anon$1.run(KinesisReceiver.scala:174)
Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.kinesis.model.GetRecordsResult.getMillisBehindLatest()Ljava/lang/Long;
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask.call(ProcessTask.java:119)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:48)
at com.amazonaws.services.kinesis.clientlibrary.lib.worker.MetricsCollectingTaskDecorator.call(MetricsCollectingTaskDecorator.java:23)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
content-length:282
content-type:application/x-amz-json-1.1
host:kinesis.ap-southeast-2.amazonaws.com
user-agent:SparkDemo,amazon-kinesis-client-library-java-1.4.0, aws-sdk-java/1.7.4 Linux/3.16.0-4-amd64 OpenJDK_64-Bit_Server_VM/25.91-b14/1.8.0_91
x-amz-date:20160710T063016Z
x-amz-target:Kinesis_20131202.GetRecords
It appears that DataProc had build its own Spark with a much older AWS SDK as dependencies and it will blow up when used in conjunction with codes that requires much newer version of AWS SDK although we are not sure exactly which module had cause this error.
Update:
Base on #DennisHuo's comment, this behaviour is caused by Hadoop's leaky classpath:
https://github.com/apache/hadoop/blob/branch-2.7.2/hadoop-project/pom.xml#L650
To make things worst, the AWS KCL 1.4.0 (used by Spark 1.6.1) will suppress any runtime error silently instead of throwing RuntimeException and causing a lot of headache while debugging.
Eventually Our solution was to build our org.apache.spark:spark-streaming-kinesis-asl_2.10:1.6.1 with all of its com.amazonaws.* shaded.
Building the JAR with the following pom (update spark/extra/kinesis-asl/pom.xml) and shit the new JAR with --jars flag in spark-submit
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.10</artifactId>
<version>1.6.1</version>
<relativePath>../../pom.xml</relativePath>
</parent>
<!-- Kinesis integration is not included by default due to ASL-licensed code. -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kinesis-asl_2.10</artifactId>
<packaging>jar</packaging>
<name>Spark Kinesis Integration</name>
<properties>
<sbt.project.name>streaming-kinesis-asl</sbt.project.name>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>amazon-kinesis-client</artifactId>
<version>${aws.kinesis.client.version}</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>amazon-kinesis-producer</artifactId>
<version>${aws.kinesis.producer.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalacheck</groupId>
<artifactId>scalacheck_${scala.binary.version}</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-test-tags_${scala.binary.version}</artifactId>
</dependency>
</dependencies>
<build>
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<shadedArtifactAttached>false</shadedArtifactAttached>
<artifactSet>
<includes>
<!-- At a minimum we must include this to force effective pom generation -->
<include>org.spark-project.spark:unused</include>
<include>com.amazonaws:*</include>
</includes>
</artifactSet>
<relocations>
<relocation>
<pattern>com.amazonaws</pattern>
<shadedPattern>foo.bar.YO.com.amazonaws</shadedPattern>
<includes>
<include>com.amazonaws.**</include>
</includes>
</relocation>
</relocations>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
According to the drools 6.3 documentation, section 4.2.2.3 :
The KIE plugin for Maven ensures that artifact resources are validated and pre-compiled, it is recommended that this is used at all times. To use the plugin simply add it to the build section of the Maven pom.xml
Consequently, my pom.xml is :
<dependencies>
<dependency>
<groupId>org.drools</groupId>
<artifactId>drools-compiler</artifactId>
<version>6.3.0.Final-redhat-9</version>
</dependency>
<dependency>
<groupId>org.drools</groupId>
<artifactId>drools-decisiontables</artifactId>
<version>6.3.0.Final-redhat-9</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.kie</groupId>
<artifactId>kie-maven-plugin</artifactId>
<version>6.3.0.Final-redhat-9</version>
<extensions>true</extensions>
</plugin>
</plugins>
</build>
I placed my decision table file, let's call it rules.xls, in the src/main/resources/my/company/package/. The one and only other file in my Maven project is the kmodule.xml file, which is in src/main/resources/META-INF/ and contains the minimal code :
<?xml version="1.0" encoding="UTF-8"?>
<kmodule xmlns="http://jboss.org/kie/6.0.0/kmodule"/>
Because I removed all the other dependencies (including the one that contains the Java objects used in the decision table), I was expecting drools to launch an error message of type cannot find symbol.
But when I try to clean install the project using Maven, the compilation succeed, a JAR file is created and it does contain the kmodule.xml and rules.xls files at the exact location where I placed them. No additional file generated during compilation, no error, no warning, nothing. Just a plain JAR with my files, a MANIFEST.MF file and a maven directory under META-INF.
In the maven trace, here are the plugins activated (in that order) :
[INFO] --- maven-clean-plugin:2.4.1:clean
[INFO] --- maven-resources-plugin:2.5:resources
[INFO] --- maven-compiler-plugin:2.3.2:compile
[INFO] --- maven-resources-plugin:2.5:testResources
[INFO] --- maven-compiler-plugin:2.3.2:testCompile
[INFO] --- maven-surefire-plugin:2.10:test
[INFO] --- maven-jar-plugin:2.3.2:jar
[INFO] --- maven-install-plugin:2.3.1:install
Nothing about the kie plugin. Any hint to activate the rules compilation at build time ?
In order to activate kie-maven-plugin it is also necessary to specify the correct packaging type in your pom.xml
<packaging>kjar</packaging>
I finally did find a way to activate the plugin. I added this to my pom.xml :
<pluginManagement>
<plugins>
<plugin>
<groupId>org.kie</groupId>
<artifactId>kie-maven-plugin</artifactId>
<version>6.3.0.Final-redhat-9</version>
<extensions>true</extensions>
<executions>
<execution>
<id>brms-rules-compilation</id>
<phase>generate-resources</phase>
<goals>
<goal>build</goal>
</goals>
<inherited>false</inherited>
<configuration>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</pluginManagement>
And now get a Drools compilation trace including :
[INFO] --- kie-maven-plugin:6.3.0.Final-redhat-9:build
[main] INFO org.drools.compiler.kie.builder.impl.KieRepositoryImpl - Adding KieModule from resource: FileResource[...]
I'm no Maven guru, this solution may not be the best (I even do not entirely understand what I wrote as a Maven perspective), so I would be glad to get an easier one.
I'm attempting to compile (with Maven) the simple kafka reference project I found here:
https://github.com/spring-projects/spring-integration-samples/tree/master/basic/kafka
However, I repeatedly get this error when I run mvn install -U
[ERROR] Failed to execute goal on project kafka: Could not resolve dependencies for project org.springframework.integration.samples:kafka:jar:4.1.0.BUILD-SNAPSHOT: Could not find artifact org.springframework.integration:spring-integration-kafka:jar:1.2.2.BUILD-SNAPSHOT in repo.spring.io.milestone (https://repo.spring.io/libs-milestone) -> [Help 1]
When I go looking, I can see what appears to be the jar file in my .m2 folder - right where I think it ought to be -- under org/springframework/integration/spring-integration-kafka/1.2.2.BUILD.SNAPSHOT. It does not end in .jar however, but rather .jar.lastUpdated
I tried loading the project into Spring Tools Suite as a maven project and attempted to clean the project... but in that case I get the error "Archive for Required Library...... in project 'kafka' cannot be read or is not a valid ZIP file.
I'm stumped - not sure where to go from here - I've blown away the directory and replaced it with the original project download several times. Still no joy.
Oh, I also tried renaming the file in the hope it was a valid zip/jar, but apparently it isn't...
Any suggestions would be appreciated.
========== Thanks Gary! ===========
I wiped the 'kafka' directory and re-added it from the zip I downloaded - then I tried adding that entry to the POM and got the following error:
ERROR] [ERROR] Some problems were encountered while processing the POMs:
[ERROR] 'repositories.repository.id' must be unique: repo.spring.io.milestone -> https://repo.spring.io/libs-milestone vs https://repo.spring.io/libs-snapshot # line 114, column 8
I then changed the name of the repository so it would be different, thus:
<repository>
<id>repo.spring.io.snapshot</id>
<name>Spring Framework Maven Snapshot Repository</name>
<url>https://repo.spring.io/libs-snapshot</url>
</repository>
running mvn install -U again, I got the following warning and error about a problem with the POM for 1.3.0.M5 being missing...
[INFO] Building Apache Kafka Sample 4.1.0.BUILD-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: https://repo.maven.apache.org/maven2/org/springframework/boot/spring-boot-maven-plugin/1.3.0.M5/spring-boot-maven-plugin-1.3.0.M5.pom
[WARNING] The POM for org.springframework.boot:spring-boot-maven-plugin:jar:1.3.0.M5 is missing, no dependency information available
Downloading: https://repo.maven.apache.org/maven2/org/springframework/boot/spring-boot-maven-plugin/1.3.0.M5/spring-boot-maven-plugin-1.3.0.M5.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.922 s
[INFO] Finished at: 2015-10-29T11:28:18-06:00
[INFO] Final Memory: 13M/245M
[INFO] ------------------------------------------------------------------------
[ERROR] Plugin org.springframework.boot:spring-boot-maven-plugin:1.3.0.M5 or one of its dependencies could not be resolved: Could not find artifact org.springframework.boot:spring-boot-maven-plugin:jar:1.3.0.M5 in central (https://repo.maven.apache.org/maven2) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginResolutionException
In case it helps (or in case of fat-fingering on my part) here's a copy of the POM... Thanks again!
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.3.0.M5</version>
</parent>
<groupId>org.springframework.integration.samples</groupId>
<artifactId>kafka</artifactId>
<version>4.1.0.BUILD-SNAPSHOT</version>
<name>Apache Kafka Sample</name>
<description>Apache Kafka Sample</description>
<url>http://projects.spring.io/spring-integration</url>
<organization>
<name>SpringIO</name>
<url>https://spring.io</url>
</organization>
<licenses>
<license>
<name>The Apache Software License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
</license>
</licenses>
<developers>
<developer>
<id>garyrussell</id>
<name>Gary Russell</name>
<email>grussell#pivotal.io</email>
<roles>
<role>project lead</role>
</roles>
</developer>
<developer>
<id>markfisher</id>
<name>Mark Fisher</name>
<email>mfisher#pivotal.io</email>
<roles>
<role>project founder and lead emeritus</role>
</roles>
</developer>
<developer>
<id>ghillert</id>
<name>Gunnar Hillert</name>
<email>ghillert#pivotal.io</email>
</developer>
<developer>
<id>abilan</id>
<name>Artem Bilan</name>
<email>abilan#pivotal.io</email>
</developer>
</developers>
<scm>
<connection>scm:git:scm:git:git://github.com/spring-projects/spring-integration-samples.git</connection>
<developerConnection>scm:git:scm:git:ssh://git#github.com:spring-projects/spring-integration-samples.git</developerConnection>
<url>https://github.com/spring-projects/spring-integration-samples</url>
</scm>
<dependencies>
<dependency>
<groupId>org.springframework.integration</groupId>
<artifactId>spring-integration-core</artifactId>
<version>4.2.0.RELEASE</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.hamcrest</groupId>
<artifactId>hamcrest-all</artifactId>
<version>1.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-integration</artifactId>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.springframework.integration</groupId>
<artifactId>spring-integration-kafka</artifactId>
<version>1.2.2.BUILD-SNAPSHOT</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-test</artifactId>
<version>4.2.0.RELEASE</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>1.9.5</version>
<scope>test</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>repo.spring.io.milestone</id>
<name>Spring Framework Maven Milestone Repository</name>
<url>https://repo.spring.io/libs-milestone</url>
</repository>
<repository>
<id>repo.spring.io.snapshot</id>
<name>Spring Framework Maven Snapshot Repository</name>
<url>https://repo.spring.io/libs-snapshot</url>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
There's a bug in the POM; add
<repository>
<id>repo.spring.io.snapshot</id>
<name>Spring Framework Maven Snapshot Repository</name>
<url>https://repo.spring.io/libs-snapshot</url>
</repository>
to the repositories section.