Unable to create Spark Context (Livy) in Knime Analytics Platform - apache-spark

I’m sorry if I am writing in the wrong topic, but I have the following issue:
I have created a Create Spark Context (Livy) node and I am trying to connect it to HDFS cluster managed by Cloudera.
I have made all the settings for Spark Job server and Livy URL (hope so) and when I try to execute the node, it creates a livy session (checked in YARN), it allocates the configured resources from the node, but after that I get the following error:
“ERROR Create Spark Context (Livy) 3:30 Execute failed: Broken pipe (Write failed) (SocketException)”
Here are some YARN logs:
Yarn Logs
Here are Knime Logs:
2021-12-22 18:02:49,579 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Creating new remote Spark context sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb at https://cm-master1-all-prod.emag.network:8998 with authentication KERBEROS.
2021-12-22 18:02:49,585 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkContext : Create Spark Context (Livy) : 3:30 : Spark context sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb changed status from CONFIGURED to OPEN
2021-12-22 18:03:21,097 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Uploading Kryo version detector job jar.
2021-12-22 18:03:21,801 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Running Kryo version detector job jar.
2021-12-22 18:03:22,400 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Using Kryo serializer version: kryo2
2021-12-22 18:03:22,400 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Uploading job jar: /var/folders/13/8vjx3pj137l2qrh_xwpnqwqc0000gp/T/sparkClasses16857522895332955503.jar
2021-12-22 18:03:22,423 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Destroying Livy Spark context
2021-12-22 18:03:23,150 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkContext : Create Spark Context (Livy) : 3:30 : Spark context sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb changed status from OPEN to CONFIGURED
2021-12-22 18:03:23,151 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : reset
2021-12-22 18:03:23,151 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkNodeModel : Create Spark Context (Livy) : 3:30 : In reset() of SparkNodeModel. Calling deleteSparkDataObjects.
2021-12-22 18:03:23,152 : ERROR : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : Execute failed: Broken pipe (Write failed) (SocketException)
java.util.concurrent.ExecutionException: java.net.SocketException: Broken pipe (Write failed)
at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
at org.knime.bigdata.spark.core.livy.context.LivySparkContext.waitForFuture(LivySparkContext.java:492)
at org.knime.bigdata.spark.core.livy.context.LivySparkContext.uploadJobJar(LivySparkContext.java:464)
at org.knime.bigdata.spark.core.livy.context.LivySparkContext.open(LivySparkContext.java:327)
at org.knime.bigdata.spark.core.context.SparkContext.ensureOpened(SparkContext.java:145)
at org.knime.bigdata.spark.core.livy.node.create.LivySparkContextCreatorNodeModel2.executeInternal(LivySparkContextCreatorNodeModel2.java:85)
at org.knime.bigdata.spark.core.node.SparkNodeModel.execute(SparkNodeModel.java:240)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:549)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1267)
at org.knime.core.node.Node.execute(Node.java:1041)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:559)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:365)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:219)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
at java.base/java.net.SocketOutputStream.socketWrite(Unknown Source)
at java.base/java.net.SocketOutputStream.write(Unknown Source)
at java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(Unknown Source)
at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(Unknown Source)
at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:124)
at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:136)
at org.apache.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:167)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)
at org.apache.http.entity.mime.content.FileBody.writeTo(FileBody.java:121)
at org.apache.http.entity.mime.AbstractMultipartForm.doWriteTo(AbstractMultipartForm.java:134)
at org.apache.http.entity.mime.AbstractMultipartForm.writeTo(AbstractMultipartForm.java:157)
at org.apache.http.entity.mime.MultipartFormEntity.writeTo(MultipartFormEntity.java:113)
at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at org.apache.livy.client.http.LivyConnection.executeRequest(LivyConnection.java:292)
at org.apache.livy.client.http.LivyConnection.access$000(LivyConnection.java:68)
at org.apache.livy.client.http.LivyConnection$3.run(LivyConnection.java:277)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.livy.client.http.LivyConnection.sendRequest(LivyConnection.java:274)
at org.apache.livy.client.http.LivyConnection.post(LivyConnection.java:228)
at org.apache.livy.client.http.HttpClient$3.call(HttpClient.java:256)
at org.apache.livy.client.http.HttpClient$3.call(HttpClient.java:253)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
2021-12-22 18:03:23,152 : DEBUG : pool-7-thread-1 : : DestroyAndDisposeSparkContextTask : : : Destroying and disposing Spark context: sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb
2021-12-22 18:03:23,153 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : WorkflowManager : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 doBeforePostExecution
2021-12-22 18:03:23,154 : INFO : pool-7-thread-1 : : LivySparkContext : : : Destroying Livy Spark context
2021-12-22 18:03:23,155 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 has new state: POSTEXECUTE
2021-12-22 18:03:23,155 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : WorkflowManager : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 doAfterExecute - failure
2021-12-22 18:03:23,155 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Parquet to Spark 3:32 has new state: CONFIGURED
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Spark to Parquet 3:37 has new state: IDLE
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : reset
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkNodeModel : Create Spark Context (Livy) : 3:30 : In reset() of SparkNodeModel. Calling deleteSparkDataObjects.
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : clean output ports.
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : WorkflowDataRepository : Create Spark Context (Livy) : 3:30 : Removing handler 9c1d8004-908d-43fe-8347-45ba78bd7c58 (Create Spark Context (Livy) 3:30: ) - 5 remaining
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 has new state: IDLE
2021-12-22 18:03:23,156 : DEBUG : pool-7-thread-1 : : DestroyAndDisposeSparkContextTask : : : Destroying and disposing Spark context: sparkLivy://bf46d062-17db-4ec1-8d06-28b53da7c624
2021-12-22 18:03:23,157 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkContext : Create Spark Context (Livy) : 3:30 : Spark context sparkLivy://f5354b5c-b44d-4460-b270-17f0ea238f4b changed status from NEW to CONFIGURED
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : Configure succeeded. (Create Spark Context (Livy))
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 has new state: CONFIGURED
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Parquet to Spark : 3:32 : Configure succeeded. (Parquet to Spark)
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : HDFS 3 has new state: IDLE
2021-12-22 18:03:42,421 : DEBUG : main : : NodeContainerEditPart : : : Spark to Parquet 3:37 (IDLE)
2021-12-22 18:03:42,424 : DEBUG : main : : NodeContainerEditPart : : : Parquet to Spark 3:32 (CONFIGURED)
2021-12-22 18:03:44,073 : DEBUG : main : : NodeContainerEditPart : : : Parquet to Spark 3:32 (CONFIGURED)
2021-12-22 18:03:44,073 : DEBUG : main : : NodeContainerEditPart : : : Spark to Parquet 3:37 (IDLE)
2021-12-22 18:53:19,830 : INFO : main : : SparkContext : : : Spark context jobserver://cm-master3-all-prod.emag.network:8090/knimeSparkContext changed status from NEW to CONFIGURED
2021-12-22 18:53:23,611 : DEBUG : main : : NodeContainerEditPart : : : Spark to Parquet 3:37 (IDLE)
2021-12-22 18:53:23,613 : DEBUG : main : : NodeContainerEditPart : : : Create Spark Context (Livy) 3:30 (CONFIGURED)
Please let me know if any more info is needed.
Thank you in advance,
Andrei

Related

NoSuchMethodError trying to ingest HDFS data into Elasticsearch

I'm using Spark 3.12, Scala 2.12, Hadoop 3.1.1.3.1.2-50, Elasticsearch 7.10.1 (due to license issues), Centos 7
to try an ingest json data in gzip files located on HDFS into Elasticsearch using spark streaming.
I get a
Logical Plan:
FileStreamSource[hdfs://pct/user/papago-mlops-datalake/raw/mt-log/engine=n2mt/year=2022/date=0430/hour=00]
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:356)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:244)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(Lorg/apache/spark/sql/SparkSession;Lorg/apache/spark/sql/execution/QueryExecution;Lscala/Function0;)Ljava/lang/Object;
at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink.addBatch(EsSparkSqlStreamingSink.scala:62)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$16(MicroBatchExecution.scala:586)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:584)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:357)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:355)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:584)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:226)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:357)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:355)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:194)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:57)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:188)
at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:334)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:317)
... 1 more
ApplicationMaster host: ac3m8x2183.bdp.bdata.ai
ApplicationMaster RPC port: 39673
queue: batch
start time: 1654588583366
final status: FAILED
tracking URL: https://gemini-rm2.bdp.bdata.ai:9090/proxy/application_1654575947385_29572/
user: papago-mlops-datalake
Exception in thread "main" org.apache.spark.SparkException: Application application_1654575947385_29572 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1269)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1627)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
using
implementation("org.elasticsearch:elasticsearch-hadoop:8.2.2")
implementation("com.typesafe:config:1.4.2")
implementation("org.apache.spark:spark-sql_2.12:3.1.2")
testImplementation("org.scalatest:scalatest_2.12:3.2.12")
testRuntimeOnly("com.vladsch.flexmark:flexmark-all:0.61.0")
compileOnly("org.apache.spark:spark-sql_2.12:3.1.2")
compileOnly("org.apache.spark:spark-core_2.12:3.1.2")
compileOnly("org.apache.spark:spark-launcher_2.12:3.1.2")
compileOnly("org.apache.spark:spark-streaming_2.12:3.1.2")
compileOnly("org.elasticsearch:elasticsearch-spark-30_2.12:8.2.2")
libraries. I tried using ES-Hadoop version 7.10.1, but ES-Spark only supports down to 7.12.0 for Spark 3.0 and I still get the same error.
My code is pretty simple
def main(args: Array[String]): Unit = {
// Set the log level to only print errors
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession
.builder()
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_USER, elasticsearchUser)
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_PASS, elasticsearchPass)
.config(ConfigurationOptions.ES_NODES, elasticsearchHost)
.config(ConfigurationOptions.ES_PORT, elasticsearchPort)
.appName(appName)
.master(master)
.getOrCreate()
val streamingDF: DataFrame = spark.readStream
.schema(jsonSchema)
.format("org.apache.spark.sql.execution.datasources.json.JsonFileFormat")
.load(pathToJSONResource)
streamingDF.writeStream
.outputMode(outputMode)
.format(destination)
.option("checkpointLocation", checkpointLocation)
.start(indexAndDocType)
.awaitTermination()
// Stop the session
spark.stop()
}
}
If I can't use the ES-Hadoop libraries is there another way I can go about ingesting JSON into ES from HDFS?

Backup Sharepoint Online with Azure functions

Is there a way to generate a sharepoint backup online using azure functions?
I have already authenticated the tenant using managed identity
The idea would be to download all the files in the documents and upload them to an azure storage account
I used this code but I get an error:
using namespace System.Net
param($Request, $TriggerMetadata)
$TenantSiteURL = "https://tenant.sharepoint.com"
$SiteRelativeURL = "/sites/BackupSource"
$LibraryName = "Documenti Condivisi"
$DownloadPath ="\Temp\Docs"
#Connect-PnPOnline -ManagedIdentity
Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity
Write-Warning "Connesso"
#Set-Location -Path SPO:\$SiteRelativeURL
Get-PnPMicrosoft365Group
Push-OutputBinding -Name Response -Value ([HttpResponseContext]#{
StatusCode = [HttpStatusCode]::OK
})
2022-01-03T16:58:32.983 [Information] Executing 'Functions.HttpTrigger1' (Reason='This function was programmatically called via the host APIs.', Id=09c1fe95-6b6e-4e85-8c8f-815e37e99d04)
2022-01-03T16:58:40.915 [Information] OUTPUT:
2022-01-03T16:58:41.541 [Information] OUTPUT: Account SubscriptionName TenantId Environment
2022-01-03T16:58:41.542 [Information] OUTPUT: ------- ---------------- -------- -----------
2022-01-03T16:58:41.549 [Information] OUTPUT: MSI#50342 --- AzureCloud
2022-01-03T16:58:41.549 [Information] OUTPUT:
2022-01-03T16:58:43.530 [Error] ERROR: Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.Exception :Type : System.Management.Automation.ParameterBindingExceptionMessage : Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.ErrorId : AmbiguousParameterSetLine : 13Offset : 1CommandInvocation :MyCommand : Connect-PnPOnlineBoundParameters :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys : …Values : …SyncRoot : …ScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1InvocationName : Connect-PnPOnlinePipelineLength : 1PipelinePosition : 1CommandOrigin : InternalErrorRecord :Exception :Type : System.Management.Automation.ParentContainsErrorRecordExceptionMessage : Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.HResult : -2146233087CategoryInfo : InvalidArgument: (:) [Connect-PnPOnline], ParentContainsErrorRecordExceptionFullyQualifiedErrorId : AmbiguousParameterSet,PnP.PowerShell.Commands.Base.ConnectOnlineInvocationInfo :MyCommand : Connect-PnPOnlineScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1CommandOrigin : InternalScriptStackTrace : at <ScriptBlock>, C:\home\site\wwwroot\HttpTrigger1\run.ps1: line 13TargetSite :Name : ThrowAmbiguousParameterSetExceptionDeclaringType : System.Management.Automation.CmdletParameterBinderController, System.Management.Automation, Version=7.0.7.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35MemberType : MethodModule : System.Management.Automation.dllStackTrace :at System.Management.Automation.CmdletParameterBinderController.ThrowAmbiguousParameterSetException(UInt32 parameterSetFlags, MergedCommandParameterMetadata bindableParameters)at System.Management.Automation.CmdletParameterBinderController.ValidateParameterSets(Boolean prePipelineInput, Boolean setDefault)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParametersNoValidation(Collection`1 arguments)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParameters(Collection`1 arguments)at System.Management.Automation.CommandProcessor.BindCommandLineParameters()at System.Management.Automation.CommandProcessor.Prepare(IDictionary psDefaultParameterValues)at System.Management.Automation.CommandProcessorBase.DoPrepare(IDictionary psDefaultParameterValues)at System.Management.Automation.Internal.PipelineProcessor.Start(Boolean incomingStream)at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)--- End of stack trace from previous location where exception was thrown ---at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)at System.Management.Automation.PipelineOps.InvokePipeline(Object input, Boolean ignoreInput, CommandParameterInternal[][] pipeElements, CommandBaseAst[] pipeElementAsts, CommandRedirection[][] commandRedirections, FunctionContext funcContext)at System.Management.Automation.Interpreter.ActionCallInstruction`6.Run(InterpretedFrame frame)at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)Data : System.Collections.ListDictionaryInternalSource : System.Management.AutomationHResult : -2146233087CategoryInfo : InvalidArgument: (:) [Connect-PnPOnline], ParameterBindingExceptionFullyQualifiedErrorId : AmbiguousParameterSet,PnP.PowerShell.Commands.Base.ConnectOnlineInvocationInfo :MyCommand : Connect-PnPOnlineScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1CommandOrigin : InternalScriptStackTrace : at <ScriptBlock>, C:\home\site\wwwroot\HttpTrigger1\run.ps1: line 13Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException : Result: ERROR: Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.Exception :Type : System.Management.Automation.ParameterBindingExceptionMessage : Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.ErrorId : AmbiguousParameterSetLine : 13Offset : 1CommandInvocation :MyCommand : Connect-PnPOnlineBoundParameters :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys : …Values : …SyncRoot : …ScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1InvocationName : Connect-PnPOnlinePipelineLength : 1PipelinePosition : 1CommandOrigin : InternalErrorRecord :Exception :Type : System.Management.Automation.ParentContainsErrorRecordExceptionMessage : Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.HResult : -2146233087CategoryInfo : InvalidArgument: (:) [Connect-PnPOnline], ParentContainsErrorRecordExceptionFullyQualifiedErrorId : AmbiguousParameterSet,PnP.PowerShell.Commands.Base.ConnectOnlineInvocationInfo :MyCommand : Connect-PnPOnlineScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1CommandOrigin : InternalScriptStackTrace : at <ScriptBlock>, C:\home\site\wwwroot\HttpTrigger1\run.ps1: line 13TargetSite :Name : ThrowAmbiguousParameterSetExceptionDeclaringType : System.Management.Automation.CmdletParameterBinderController, System.Management.Automation, Version=7.0.7.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35MemberType : MethodModule : System.Management.Automation.dllStackTrace :at System.Management.Automation.CmdletParameterBinderController.ThrowAmbiguousParameterSetException(UInt32 parameterSetFlags, MergedCommandParameterMetadata bindableParameters)at System.Management.Automation.CmdletParameterBinderController.ValidateParameterSets(Boolean prePipelineInput, Boolean setDefault)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParametersNoValidation(Collection`1 arguments)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParameters(Collection`1 arguments)at System.Management.Automation.CommandProcessor.BindCommandLineParameters()at System.Management.Automation.CommandProcessor.Prepare(IDictionary psDefaultParameterValues)at System.Management.Automation.CommandProcessorBase.DoPrepare(IDictionary psDefaultParameterValues)at System.Management.Automation.Internal.PipelineProcessor.Start(Boolean incomingStream)at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)--- End of stack trace from previous location where exception was thrown ---at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)at System.Management.Automation.PipelineOps.InvokePipeline(Object input, Boolean ignoreInput, CommandParameterInternal[][] pipeElements, CommandBaseAst[] pipeElementAsts, CommandRedirection[][] commandRedirections, FunctionContext funcContext)at System.Management.Automation.Interpreter.ActionCallInstruction`6.Run(InterpretedFrame frame)at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)Data : System.Collections.ListDictionaryInternalSource : System.Management.AutomationHResult : -2146233087CategoryInfo : InvalidArgument: (:) [Connect-PnPOnline], ParameterBindingExceptionFullyQualifiedErrorId : AmbiguousParameterSet,PnP.PowerShell.Commands.Base.ConnectOnlineInvocationInfo :MyCommand : Connect-PnPOnlineScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1CommandOrigin : InternalScriptStackTrace : at <ScriptBlock>, C:\home\site\wwwroot\HttpTrigger1\run.ps1: line 13Exception: Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.Stack: at System.Management.Automation.CmdletParameterBinderController.ThrowAmbiguousParameterSetException(UInt32 parameterSetFlags, MergedCommandParameterMetadata bindableParameters)at System.Management.Automation.CmdletParameterBinderController.ValidateParameterSets(Boolean prePipelineInput, Boolean setDefault)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParametersNoValidation(Collection`1 arguments)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParameters(Collection`1 arguments)at System.Management.Automation.CommandProcessor.BindCommandLineParameters()at System.Management.Automation.CommandProcessor.Prepare(IDictionary psDefaultParameterValues)at System.Management.Automation.CommandProcessorBase.DoPrepare(IDictionary psDefaultParameterValues)at System.Management.Automation.Internal.PipelineProcessor.Start(Boolean incomingStream)at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)--- End of stack trace from previous location where exception was thrown ---at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)at System.Management.Automation.PipelineOps.InvokePipeline(Object input, Boolean ignoreInput, CommandParameterInternal[][] pipeElements, CommandBaseAst[] pipeElementAsts, CommandRedirection[][] commandRedirections, FunctionContext funcContext)at System.Management.Automation.Interpreter.ActionCallInstruction`6.Run(InterpretedFrame frame)at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)
2022-01-03T16:58:43.540 [Warning] WARNING: Connesso
2022-01-03T16:58:43.657 [Error] ERROR: There is currently no connection yet. Use Connect-PnPOnline to connect.
There is no PowerShell command available for backup and restore in SharePoint Online.
Use the recycle bin and version history.
Use a 3rd party tool for backup and restore like SharePoint farm to Azure with MABS .
Manually backup sites, lists, and libraries (Information
about manual migration of SharePoint Online
content
Create a Microsoft 365 support request (Restore options in
SharePoint Online)
Backup solutions for SharePoint online here
you can back up a SharePoint online Azure with MABS. Refer here

problem with write from spark structured streaming to oracle table

I read files in a directory with readStream and process files, at the end I have a dataframe that I want to write it to oracle table. I use jdbc driver for do that and foreachbach() api.here my code:
def SaveToOracle(df,epoch_id):
try:
df.write.format('jdbc').options(
url='jdbc:oracle:thin:#192.168.49.8:1521:ORCL',
driver='oracle.jdbc.driver.OracleDriver',
dbtable='spark.result_table',
user='spark',
password='spark').mode('append').save()
pass
except Exception as e:
response = e.__str__()
print(response)
streamingQuery = (summaryDF4.writeStream
.outputMode("append")
.foreachBatch(SaveToOracle)
.start()
)
the job fail without any error and stop after start query streaming. the console log is like this:
2021-08-11 10:45:11,003 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
2021-08-11 10:45:11,003 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
2021-08-11 10:45:11,007 INFO streaming.MicroBatchExecution: Starting new streaming query.
2021-08-11 10:45:11,009 INFO cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped
2021-08-11 10:45:11,011 INFO streaming.MicroBatchExecution: Stream started from {}
2021-08-11 10:45:11,021 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
2021-08-11 10:45:11,034 INFO memory.MemoryStore: MemoryStore cleared
2021-08-11 10:45:11,034 INFO storage.BlockManager: BlockManager stopped
2021-08-11 10:45:11,042 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
2021-08-11 10:45:11,046 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
2021-08-11 10:45:11,053 INFO spark.SparkContext: Successfully stopped SparkContext
2021-08-11 10:45:11,056 INFO util.ShutdownHookManager: Shutdown hook called
2021-08-11 10:45:11,056 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-bf5c7539-9d1f-4c9d-af46-0c0874a81a40/pyspark-7416fc8a-18bd-4e79-aa0f-ea673e7c5cd8
2021-08-11 10:45:11,060 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-47c28d1d-236c-4b64-bc66-d07a918abe01
2021-08-11 10:45:11,063 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-bf5c7539-9d1f-4c9d-af46-0c0874a81a40
2021-08-11 10:45:11,065 INFO util.ShutdownHookManager: Deleting directory /tmp/temporary-f9420356-164a-4806-abb2-f132b8026b20
what is the problem and how can I get a proper log?
this is my sparkSession conf:
conf = SparkConf()
conf.set("spark.jars", "/home/hadoop/ojdbc6.jar")
spark=(SparkSession
.builder
.config(conf=conf)
.master("yarn")
.appName("Test010")
.getOrCreate()
)
Update:
I get Error on jdbc save(), here is :
An error occurred while calling o379.save.
: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:46)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1(JDBCOptions.scala:102)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1$adapted(JDBCOptions.scala:102)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:102)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcOptionsInWrite.<init>(JDBCOptions.scala:217)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcOptionsInWrite.<init>(JDBCOptions.scala:221)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
You need to call awaitTermination() method on streamingQuery after start() method call like this:
streamingQuery = (summaryDF4.writeStream
.outputMode("append")
.foreachBatch(SaveToOracle)
.start()
.awaitTermination())
thanks for your answer it keeps streaming job alive but still it doesn't work
"id" : "3dc2a37f-4a7a-49b9-aa83-bff526aa14c5",
"runId" : "e8864901-2729-41c5-b7e4-a19ac2478f1c",
"name" : null,
"timestamp" : "2021-08-11T06:51:00.000Z",
"batchId" : 2,
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0,
"durationMs" : {
"addBatch" : 240,
"getBatch" : 66,
"latestOffset" : 21,
"queryPlanning" : 143,
"triggerExecution" : 514,
"walCommit" : 23
},
"eventTime" : {
"watermark" : "1970-01-01T00:00:00.000Z"
},
"stateOperators" : [ {
"numRowsTotal" : 0,
"numRowsUpdated" : 0,
"memoryUsedBytes" : -1,
"numRowsDroppedByWatermark" : 0,
"customMetrics" : {
"loadedMapCacheHitCount" : 0,
"loadedMapCacheMissCount" : 0,
"stateOnCurrentVersionSizeBytes" : -1
}
} ],
"sources" : [ {
"description" : "FileStreamSource[hdfs://192.168.49.13:9000/input/lz]",
"startOffset" : {
"logOffset" : 1
},
"endOffset" : {
"logOffset" : 2
},
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0
} ],
"sink" : {
"description" : "ForeachBatchSink",
"numOutputRows" : -1
}
}
if I use console my process doesn't have any problem and it will be correct but for write to oracle I have problem.

Though I have setMaster as local, my spark application gives error

I have the following application (I am starting and stopping spark) in Windows. I use Scala-IDE(Eclipse). I get "A master URL must be set in your configuration" error even though I have set it here. I use spark-2.4.4 version.
Can someone please help me to fix this issue.
import org.apache.spark._;
import org.apache.spark.sql._;
object SampleApp {
def main(args: Array[String]) {
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("Simple Application")
val sc = new SparkContext(conf)
sc.stop()
}
}
The error is:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/10/28 22:58:56 INFO SparkContext: Running Spark version 2.4.4
19/10/28 22:58:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/10/28 22:58:56 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.spark.renga.SampleApp$.main(SampleApp.scala:8)
at com.spark.renga.SampleApp.main(SampleApp.scala)
19/10/28 22:58:56 ERROR Utils: Uncaught exception in thread main
java.lang.NullPointerException
at org.apache.spark.SparkContext.postApplicationEnd(SparkContext.scala:2416)
at org.apache.spark.SparkContext.$anonfun$stop$2(SparkContext.scala:1931)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1931)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:585)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.spark.renga.SampleApp$.main(SampleApp.scala:8)
at com.spark.renga.SampleApp.main(SampleApp.scala)
19/10/28 22:58:56 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.spark.renga.SampleApp$.main(SampleApp.scala:8)
at com.spark.renga.SampleApp.main(SampleApp.scala)
if you are using version 2.4.4 try this:
import org.apache.spark.sql.SparkSession
object SampleApp {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.master("local[*]")
.appName("test")
.getOrCreate()
println(spark.sparkContext.version)
spark.stop()
}
}

ClassCastException when using Spark Dataset API+case class+Spark Job Server

I'm getting weird error whenever I re-create (delete and create context) the Spark SQL Context and run the job for 2nd time or after it will always throw this exception.
[2016-09-20 13:52:28,743] ERROR .jobserver.JobManagerActor [] [akka://JobServer/user/context-supervisor/ctx] - Exception from job 23fe1335-55ec-47b2-afd3-07396483eae0:
java.lang.RuntimeException: Error while encoding: java.lang.ClassCastException: org.lala.Country cannot be cast to org.lala.Country
staticinvoke(class org.apache.spark.unsafe.types.UTF8String,StringType,fromString,invoke(input[0, ObjectType(class org.lala.Country)],code,ObjectType(class java.lang.String)),true) AS code#10
+- staticinvoke(class org.apache.spark.unsafe.types.UTF8String,StringType,fromString,invoke(input[0, ObjectType(class org.lala.Country)],code,ObjectType(class java.lang.String)),true)
+- invoke(input[0, ObjectType(class org.lala.Country)],code,ObjectType(class java.lang.String))
+- input[0, ObjectType(class org.lala.Country)]
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:220)
at org.apache.spark.sql.SQLContext$$anonfun$8.apply(SQLContext.scala:504)
at org.apache.spark.sql.SQLContext$$anonfun$8.apply(SQLContext.scala:504)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:504)
at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:141)
at org.lala.HelloJob$.runJob(HelloJob.scala:18)
at org.lala.HelloJob$.runJob(HelloJob.scala:13)
at spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:301)
My Spark Class :
case class Country(code:String)
object TestJob extends SparkSqlJob {
override def runJob(sc: SQLContext, jobConfig: Config): Any = {
import sc.implicits._
val country = List(Country("A"),Country("B"))
val countryDS = country.toDS()
countryDS.collect().foreach(println)
}
override def validate(sc: SQLContext, config: Config): SparkJobValidation = {
SparkJobValid
}
}
I'm using:
Spark 1.6.1
Spark Job Server 0.6.2 (docker)

Resources