I’m sorry if I am writing in the wrong topic, but I have the following issue:
I have created a Create Spark Context (Livy) node and I am trying to connect it to HDFS cluster managed by Cloudera.
I have made all the settings for Spark Job server and Livy URL (hope so) and when I try to execute the node, it creates a livy session (checked in YARN), it allocates the configured resources from the node, but after that I get the following error:
“ERROR Create Spark Context (Livy) 3:30 Execute failed: Broken pipe (Write failed) (SocketException)”
Here are some YARN logs:
Yarn Logs
Here are Knime Logs:
2021-12-22 18:02:49,579 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Creating new remote Spark context sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb at https://cm-master1-all-prod.emag.network:8998 with authentication KERBEROS.
2021-12-22 18:02:49,585 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkContext : Create Spark Context (Livy) : 3:30 : Spark context sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb changed status from CONFIGURED to OPEN
2021-12-22 18:03:21,097 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Uploading Kryo version detector job jar.
2021-12-22 18:03:21,801 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Running Kryo version detector job jar.
2021-12-22 18:03:22,400 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Using Kryo serializer version: kryo2
2021-12-22 18:03:22,400 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Uploading job jar: /var/folders/13/8vjx3pj137l2qrh_xwpnqwqc0000gp/T/sparkClasses16857522895332955503.jar
2021-12-22 18:03:22,423 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : LivySparkContext : Create Spark Context (Livy) : 3:30 : Destroying Livy Spark context
2021-12-22 18:03:23,150 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkContext : Create Spark Context (Livy) : 3:30 : Spark context sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb changed status from OPEN to CONFIGURED
2021-12-22 18:03:23,151 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : reset
2021-12-22 18:03:23,151 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkNodeModel : Create Spark Context (Livy) : 3:30 : In reset() of SparkNodeModel. Calling deleteSparkDataObjects.
2021-12-22 18:03:23,152 : ERROR : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : Execute failed: Broken pipe (Write failed) (SocketException)
java.util.concurrent.ExecutionException: java.net.SocketException: Broken pipe (Write failed)
at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
at org.knime.bigdata.spark.core.livy.context.LivySparkContext.waitForFuture(LivySparkContext.java:492)
at org.knime.bigdata.spark.core.livy.context.LivySparkContext.uploadJobJar(LivySparkContext.java:464)
at org.knime.bigdata.spark.core.livy.context.LivySparkContext.open(LivySparkContext.java:327)
at org.knime.bigdata.spark.core.context.SparkContext.ensureOpened(SparkContext.java:145)
at org.knime.bigdata.spark.core.livy.node.create.LivySparkContextCreatorNodeModel2.executeInternal(LivySparkContextCreatorNodeModel2.java:85)
at org.knime.bigdata.spark.core.node.SparkNodeModel.execute(SparkNodeModel.java:240)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:549)
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1267)
at org.knime.core.node.Node.execute(Node.java:1041)
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:559)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:201)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:117)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:365)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:219)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
at java.base/java.net.SocketOutputStream.socketWrite(Unknown Source)
at java.base/java.net.SocketOutputStream.write(Unknown Source)
at java.base/sun.security.ssl.SSLSocketOutputRecord.deliver(Unknown Source)
at java.base/sun.security.ssl.SSLSocketImpl$AppOutputStream.write(Unknown Source)
at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:124)
at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:136)
at org.apache.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:167)
at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)
at org.apache.http.entity.mime.content.FileBody.writeTo(FileBody.java:121)
at org.apache.http.entity.mime.AbstractMultipartForm.doWriteTo(AbstractMultipartForm.java:134)
at org.apache.http.entity.mime.AbstractMultipartForm.writeTo(AbstractMultipartForm.java:157)
at org.apache.http.entity.mime.MultipartFormEntity.writeTo(MultipartFormEntity.java:113)
at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at org.apache.livy.client.http.LivyConnection.executeRequest(LivyConnection.java:292)
at org.apache.livy.client.http.LivyConnection.access$000(LivyConnection.java:68)
at org.apache.livy.client.http.LivyConnection$3.run(LivyConnection.java:277)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.livy.client.http.LivyConnection.sendRequest(LivyConnection.java:274)
at org.apache.livy.client.http.LivyConnection.post(LivyConnection.java:228)
at org.apache.livy.client.http.HttpClient$3.call(HttpClient.java:256)
at org.apache.livy.client.http.HttpClient$3.call(HttpClient.java:253)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
2021-12-22 18:03:23,152 : DEBUG : pool-7-thread-1 : : DestroyAndDisposeSparkContextTask : : : Destroying and disposing Spark context: sparkLivy://dfe62d7e-a250-41a3-9601-1bac31379ffb
2021-12-22 18:03:23,153 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : WorkflowManager : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 doBeforePostExecution
2021-12-22 18:03:23,154 : INFO : pool-7-thread-1 : : LivySparkContext : : : Destroying Livy Spark context
2021-12-22 18:03:23,155 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 has new state: POSTEXECUTE
2021-12-22 18:03:23,155 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : WorkflowManager : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 doAfterExecute - failure
2021-12-22 18:03:23,155 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Parquet to Spark 3:32 has new state: CONFIGURED
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Spark to Parquet 3:37 has new state: IDLE
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : reset
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkNodeModel : Create Spark Context (Livy) : 3:30 : In reset() of SparkNodeModel. Calling deleteSparkDataObjects.
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : clean output ports.
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : WorkflowDataRepository : Create Spark Context (Livy) : 3:30 : Removing handler 9c1d8004-908d-43fe-8347-45ba78bd7c58 (Create Spark Context (Livy) 3:30: ) - 5 remaining
2021-12-22 18:03:23,156 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 has new state: IDLE
2021-12-22 18:03:23,156 : DEBUG : pool-7-thread-1 : : DestroyAndDisposeSparkContextTask : : : Destroying and disposing Spark context: sparkLivy://bf46d062-17db-4ec1-8d06-28b53da7c624
2021-12-22 18:03:23,157 : INFO : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : SparkContext : Create Spark Context (Livy) : 3:30 : Spark context sparkLivy://f5354b5c-b44d-4460-b270-17f0ea238f4b changed status from NEW to CONFIGURED
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Create Spark Context (Livy) : 3:30 : Configure succeeded. (Create Spark Context (Livy))
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : Create Spark Context (Livy) 3:30 has new state: CONFIGURED
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : Node : Parquet to Spark : 3:32 : Configure succeeded. (Parquet to Spark)
2021-12-22 18:03:23,157 : DEBUG : KNIME-Worker-78-Create Spark Context (Livy) 3:30 : : NodeContainer : Create Spark Context (Livy) : 3:30 : HDFS 3 has new state: IDLE
2021-12-22 18:03:42,421 : DEBUG : main : : NodeContainerEditPart : : : Spark to Parquet 3:37 (IDLE)
2021-12-22 18:03:42,424 : DEBUG : main : : NodeContainerEditPart : : : Parquet to Spark 3:32 (CONFIGURED)
2021-12-22 18:03:44,073 : DEBUG : main : : NodeContainerEditPart : : : Parquet to Spark 3:32 (CONFIGURED)
2021-12-22 18:03:44,073 : DEBUG : main : : NodeContainerEditPart : : : Spark to Parquet 3:37 (IDLE)
2021-12-22 18:53:19,830 : INFO : main : : SparkContext : : : Spark context jobserver://cm-master3-all-prod.emag.network:8090/knimeSparkContext changed status from NEW to CONFIGURED
2021-12-22 18:53:23,611 : DEBUG : main : : NodeContainerEditPart : : : Spark to Parquet 3:37 (IDLE)
2021-12-22 18:53:23,613 : DEBUG : main : : NodeContainerEditPart : : : Create Spark Context (Livy) 3:30 (CONFIGURED)
Please let me know if any more info is needed.
Thank you in advance,
Andrei
Related
I'm using Spark 3.12, Scala 2.12, Hadoop 3.1.1.3.1.2-50, Elasticsearch 7.10.1 (due to license issues), Centos 7
to try an ingest json data in gzip files located on HDFS into Elasticsearch using spark streaming.
I get a
Logical Plan:
FileStreamSource[hdfs://pct/user/papago-mlops-datalake/raw/mt-log/engine=n2mt/year=2022/date=0430/hour=00]
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:356)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:244)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(Lorg/apache/spark/sql/SparkSession;Lorg/apache/spark/sql/execution/QueryExecution;Lscala/Function0;)Ljava/lang/Object;
at org.elasticsearch.spark.sql.streaming.EsSparkSqlStreamingSink.addBatch(EsSparkSqlStreamingSink.scala:62)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$16(MicroBatchExecution.scala:586)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runBatch$15(MicroBatchExecution.scala:584)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:357)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:355)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runBatch(MicroBatchExecution.scala:584)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:226)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:357)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:355)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:68)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:194)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:57)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:188)
at org.apache.spark.sql.execution.streaming.StreamExecution.$anonfun$runStream$1(StreamExecution.scala:334)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:317)
... 1 more
ApplicationMaster host: ac3m8x2183.bdp.bdata.ai
ApplicationMaster RPC port: 39673
queue: batch
start time: 1654588583366
final status: FAILED
tracking URL: https://gemini-rm2.bdp.bdata.ai:9090/proxy/application_1654575947385_29572/
user: papago-mlops-datalake
Exception in thread "main" org.apache.spark.SparkException: Application application_1654575947385_29572 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1269)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1627)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
using
implementation("org.elasticsearch:elasticsearch-hadoop:8.2.2")
implementation("com.typesafe:config:1.4.2")
implementation("org.apache.spark:spark-sql_2.12:3.1.2")
testImplementation("org.scalatest:scalatest_2.12:3.2.12")
testRuntimeOnly("com.vladsch.flexmark:flexmark-all:0.61.0")
compileOnly("org.apache.spark:spark-sql_2.12:3.1.2")
compileOnly("org.apache.spark:spark-core_2.12:3.1.2")
compileOnly("org.apache.spark:spark-launcher_2.12:3.1.2")
compileOnly("org.apache.spark:spark-streaming_2.12:3.1.2")
compileOnly("org.elasticsearch:elasticsearch-spark-30_2.12:8.2.2")
libraries. I tried using ES-Hadoop version 7.10.1, but ES-Spark only supports down to 7.12.0 for Spark 3.0 and I still get the same error.
My code is pretty simple
def main(args: Array[String]): Unit = {
// Set the log level to only print errors
Logger.getLogger("org").setLevel(Level.ERROR)
val spark = SparkSession
.builder()
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_USER, elasticsearchUser)
.config(ConfigurationOptions.ES_NET_HTTP_AUTH_PASS, elasticsearchPass)
.config(ConfigurationOptions.ES_NODES, elasticsearchHost)
.config(ConfigurationOptions.ES_PORT, elasticsearchPort)
.appName(appName)
.master(master)
.getOrCreate()
val streamingDF: DataFrame = spark.readStream
.schema(jsonSchema)
.format("org.apache.spark.sql.execution.datasources.json.JsonFileFormat")
.load(pathToJSONResource)
streamingDF.writeStream
.outputMode(outputMode)
.format(destination)
.option("checkpointLocation", checkpointLocation)
.start(indexAndDocType)
.awaitTermination()
// Stop the session
spark.stop()
}
}
If I can't use the ES-Hadoop libraries is there another way I can go about ingesting JSON into ES from HDFS?
Is there a way to generate a sharepoint backup online using azure functions?
I have already authenticated the tenant using managed identity
The idea would be to download all the files in the documents and upload them to an azure storage account
I used this code but I get an error:
using namespace System.Net
param($Request, $TriggerMetadata)
$TenantSiteURL = "https://tenant.sharepoint.com"
$SiteRelativeURL = "/sites/BackupSource"
$LibraryName = "Documenti Condivisi"
$DownloadPath ="\Temp\Docs"
#Connect-PnPOnline -ManagedIdentity
Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity
Write-Warning "Connesso"
#Set-Location -Path SPO:\$SiteRelativeURL
Get-PnPMicrosoft365Group
Push-OutputBinding -Name Response -Value ([HttpResponseContext]#{
StatusCode = [HttpStatusCode]::OK
})
2022-01-03T16:58:32.983 [Information] Executing 'Functions.HttpTrigger1' (Reason='This function was programmatically called via the host APIs.', Id=09c1fe95-6b6e-4e85-8c8f-815e37e99d04)
2022-01-03T16:58:40.915 [Information] OUTPUT:
2022-01-03T16:58:41.541 [Information] OUTPUT: Account SubscriptionName TenantId Environment
2022-01-03T16:58:41.542 [Information] OUTPUT: ------- ---------------- -------- -----------
2022-01-03T16:58:41.549 [Information] OUTPUT: MSI#50342 --- AzureCloud
2022-01-03T16:58:41.549 [Information] OUTPUT:
2022-01-03T16:58:43.530 [Error] ERROR: Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.Exception :Type : System.Management.Automation.ParameterBindingExceptionMessage : Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.ErrorId : AmbiguousParameterSetLine : 13Offset : 1CommandInvocation :MyCommand : Connect-PnPOnlineBoundParameters :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys : …Values : …SyncRoot : …ScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1InvocationName : Connect-PnPOnlinePipelineLength : 1PipelinePosition : 1CommandOrigin : InternalErrorRecord :Exception :Type : System.Management.Automation.ParentContainsErrorRecordExceptionMessage : Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.HResult : -2146233087CategoryInfo : InvalidArgument: (:) [Connect-PnPOnline], ParentContainsErrorRecordExceptionFullyQualifiedErrorId : AmbiguousParameterSet,PnP.PowerShell.Commands.Base.ConnectOnlineInvocationInfo :MyCommand : Connect-PnPOnlineScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1CommandOrigin : InternalScriptStackTrace : at <ScriptBlock>, C:\home\site\wwwroot\HttpTrigger1\run.ps1: line 13TargetSite :Name : ThrowAmbiguousParameterSetExceptionDeclaringType : System.Management.Automation.CmdletParameterBinderController, System.Management.Automation, Version=7.0.7.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35MemberType : MethodModule : System.Management.Automation.dllStackTrace :at System.Management.Automation.CmdletParameterBinderController.ThrowAmbiguousParameterSetException(UInt32 parameterSetFlags, MergedCommandParameterMetadata bindableParameters)at System.Management.Automation.CmdletParameterBinderController.ValidateParameterSets(Boolean prePipelineInput, Boolean setDefault)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParametersNoValidation(Collection`1 arguments)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParameters(Collection`1 arguments)at System.Management.Automation.CommandProcessor.BindCommandLineParameters()at System.Management.Automation.CommandProcessor.Prepare(IDictionary psDefaultParameterValues)at System.Management.Automation.CommandProcessorBase.DoPrepare(IDictionary psDefaultParameterValues)at System.Management.Automation.Internal.PipelineProcessor.Start(Boolean incomingStream)at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)--- End of stack trace from previous location where exception was thrown ---at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)at System.Management.Automation.PipelineOps.InvokePipeline(Object input, Boolean ignoreInput, CommandParameterInternal[][] pipeElements, CommandBaseAst[] pipeElementAsts, CommandRedirection[][] commandRedirections, FunctionContext funcContext)at System.Management.Automation.Interpreter.ActionCallInstruction`6.Run(InterpretedFrame frame)at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)Data : System.Collections.ListDictionaryInternalSource : System.Management.AutomationHResult : -2146233087CategoryInfo : InvalidArgument: (:) [Connect-PnPOnline], ParameterBindingExceptionFullyQualifiedErrorId : AmbiguousParameterSet,PnP.PowerShell.Commands.Base.ConnectOnlineInvocationInfo :MyCommand : Connect-PnPOnlineScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1CommandOrigin : InternalScriptStackTrace : at <ScriptBlock>, C:\home\site\wwwroot\HttpTrigger1\run.ps1: line 13Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException : Result: ERROR: Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.Exception :Type : System.Management.Automation.ParameterBindingExceptionMessage : Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.ErrorId : AmbiguousParameterSetLine : 13Offset : 1CommandInvocation :MyCommand : Connect-PnPOnlineBoundParameters :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys :Length : 3Length : 15Values :Length : 30IsPresent : TrueSyncRoot :Comparer : System.OrdinalIgnoreCaseComparerCount : 2Keys : …Values : …SyncRoot : …ScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1InvocationName : Connect-PnPOnlinePipelineLength : 1PipelinePosition : 1CommandOrigin : InternalErrorRecord :Exception :Type : System.Management.Automation.ParentContainsErrorRecordExceptionMessage : Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.HResult : -2146233087CategoryInfo : InvalidArgument: (:) [Connect-PnPOnline], ParentContainsErrorRecordExceptionFullyQualifiedErrorId : AmbiguousParameterSet,PnP.PowerShell.Commands.Base.ConnectOnlineInvocationInfo :MyCommand : Connect-PnPOnlineScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1CommandOrigin : InternalScriptStackTrace : at <ScriptBlock>, C:\home\site\wwwroot\HttpTrigger1\run.ps1: line 13TargetSite :Name : ThrowAmbiguousParameterSetExceptionDeclaringType : System.Management.Automation.CmdletParameterBinderController, System.Management.Automation, Version=7.0.7.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35MemberType : MethodModule : System.Management.Automation.dllStackTrace :at System.Management.Automation.CmdletParameterBinderController.ThrowAmbiguousParameterSetException(UInt32 parameterSetFlags, MergedCommandParameterMetadata bindableParameters)at System.Management.Automation.CmdletParameterBinderController.ValidateParameterSets(Boolean prePipelineInput, Boolean setDefault)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParametersNoValidation(Collection`1 arguments)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParameters(Collection`1 arguments)at System.Management.Automation.CommandProcessor.BindCommandLineParameters()at System.Management.Automation.CommandProcessor.Prepare(IDictionary psDefaultParameterValues)at System.Management.Automation.CommandProcessorBase.DoPrepare(IDictionary psDefaultParameterValues)at System.Management.Automation.Internal.PipelineProcessor.Start(Boolean incomingStream)at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)--- End of stack trace from previous location where exception was thrown ---at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)at System.Management.Automation.PipelineOps.InvokePipeline(Object input, Boolean ignoreInput, CommandParameterInternal[][] pipeElements, CommandBaseAst[] pipeElementAsts, CommandRedirection[][] commandRedirections, FunctionContext funcContext)at System.Management.Automation.Interpreter.ActionCallInstruction`6.Run(InterpretedFrame frame)at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)Data : System.Collections.ListDictionaryInternalSource : System.Management.AutomationHResult : -2146233087CategoryInfo : InvalidArgument: (:) [Connect-PnPOnline], ParameterBindingExceptionFullyQualifiedErrorId : AmbiguousParameterSet,PnP.PowerShell.Commands.Base.ConnectOnlineInvocationInfo :MyCommand : Connect-PnPOnlineScriptLineNumber : 13OffsetInLine : 1HistoryId : 1ScriptName : C:\home\site\wwwroot\HttpTrigger1\run.ps1Line : Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentityPositionMessage : At C:\home\site\wwwroot\HttpTrigger1\run.ps1:13 char:1+ Connect-PnPOnline -Url $TenantSiteURL -ManagedIdentity+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~PSScriptRoot : C:\home\site\wwwroot\HttpTrigger1PSCommandPath : C:\home\site\wwwroot\HttpTrigger1\run.ps1CommandOrigin : InternalScriptStackTrace : at <ScriptBlock>, C:\home\site\wwwroot\HttpTrigger1\run.ps1: line 13Exception: Parameter set cannot be resolved using the specified named parameters. One or more parameters issued cannot be used together or an insufficient number of parameters were provided.Stack: at System.Management.Automation.CmdletParameterBinderController.ThrowAmbiguousParameterSetException(UInt32 parameterSetFlags, MergedCommandParameterMetadata bindableParameters)at System.Management.Automation.CmdletParameterBinderController.ValidateParameterSets(Boolean prePipelineInput, Boolean setDefault)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParametersNoValidation(Collection`1 arguments)at System.Management.Automation.CmdletParameterBinderController.BindCommandLineParameters(Collection`1 arguments)at System.Management.Automation.CommandProcessor.BindCommandLineParameters()at System.Management.Automation.CommandProcessor.Prepare(IDictionary psDefaultParameterValues)at System.Management.Automation.CommandProcessorBase.DoPrepare(IDictionary psDefaultParameterValues)at System.Management.Automation.Internal.PipelineProcessor.Start(Boolean incomingStream)at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)--- End of stack trace from previous location where exception was thrown ---at System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate(Object input)at System.Management.Automation.PipelineOps.InvokePipeline(Object input, Boolean ignoreInput, CommandParameterInternal[][] pipeElements, CommandBaseAst[] pipeElementAsts, CommandRedirection[][] commandRedirections, FunctionContext funcContext)at System.Management.Automation.Interpreter.ActionCallInstruction`6.Run(InterpretedFrame frame)at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)
2022-01-03T16:58:43.540 [Warning] WARNING: Connesso
2022-01-03T16:58:43.657 [Error] ERROR: There is currently no connection yet. Use Connect-PnPOnline to connect.
There is no PowerShell command available for backup and restore in SharePoint Online.
Use the recycle bin and version history.
Use a 3rd party tool for backup and restore like SharePoint farm to Azure with MABS .
Manually backup sites, lists, and libraries (Information
about manual migration of SharePoint Online
content
Create a Microsoft 365 support request (Restore options in
SharePoint Online)
Backup solutions for SharePoint online here
you can back up a SharePoint online Azure with MABS. Refer here
I read files in a directory with readStream and process files, at the end I have a dataframe that I want to write it to oracle table. I use jdbc driver for do that and foreachbach() api.here my code:
def SaveToOracle(df,epoch_id):
try:
df.write.format('jdbc').options(
url='jdbc:oracle:thin:#192.168.49.8:1521:ORCL',
driver='oracle.jdbc.driver.OracleDriver',
dbtable='spark.result_table',
user='spark',
password='spark').mode('append').save()
pass
except Exception as e:
response = e.__str__()
print(response)
streamingQuery = (summaryDF4.writeStream
.outputMode("append")
.foreachBatch(SaveToOracle)
.start()
)
the job fail without any error and stop after start query streaming. the console log is like this:
2021-08-11 10:45:11,003 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
2021-08-11 10:45:11,003 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
2021-08-11 10:45:11,007 INFO streaming.MicroBatchExecution: Starting new streaming query.
2021-08-11 10:45:11,009 INFO cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped
2021-08-11 10:45:11,011 INFO streaming.MicroBatchExecution: Stream started from {}
2021-08-11 10:45:11,021 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
2021-08-11 10:45:11,034 INFO memory.MemoryStore: MemoryStore cleared
2021-08-11 10:45:11,034 INFO storage.BlockManager: BlockManager stopped
2021-08-11 10:45:11,042 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
2021-08-11 10:45:11,046 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
2021-08-11 10:45:11,053 INFO spark.SparkContext: Successfully stopped SparkContext
2021-08-11 10:45:11,056 INFO util.ShutdownHookManager: Shutdown hook called
2021-08-11 10:45:11,056 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-bf5c7539-9d1f-4c9d-af46-0c0874a81a40/pyspark-7416fc8a-18bd-4e79-aa0f-ea673e7c5cd8
2021-08-11 10:45:11,060 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-47c28d1d-236c-4b64-bc66-d07a918abe01
2021-08-11 10:45:11,063 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-bf5c7539-9d1f-4c9d-af46-0c0874a81a40
2021-08-11 10:45:11,065 INFO util.ShutdownHookManager: Deleting directory /tmp/temporary-f9420356-164a-4806-abb2-f132b8026b20
what is the problem and how can I get a proper log?
this is my sparkSession conf:
conf = SparkConf()
conf.set("spark.jars", "/home/hadoop/ojdbc6.jar")
spark=(SparkSession
.builder
.config(conf=conf)
.master("yarn")
.appName("Test010")
.getOrCreate()
)
Update:
I get Error on jdbc save(), here is :
An error occurred while calling o379.save.
: java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:46)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1(JDBCOptions.scala:102)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$1$adapted(JDBCOptions.scala:102)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:102)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcOptionsInWrite.<init>(JDBCOptions.scala:217)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcOptionsInWrite.<init>(JDBCOptions.scala:221)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:45)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
You need to call awaitTermination() method on streamingQuery after start() method call like this:
streamingQuery = (summaryDF4.writeStream
.outputMode("append")
.foreachBatch(SaveToOracle)
.start()
.awaitTermination())
thanks for your answer it keeps streaming job alive but still it doesn't work
"id" : "3dc2a37f-4a7a-49b9-aa83-bff526aa14c5",
"runId" : "e8864901-2729-41c5-b7e4-a19ac2478f1c",
"name" : null,
"timestamp" : "2021-08-11T06:51:00.000Z",
"batchId" : 2,
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0,
"durationMs" : {
"addBatch" : 240,
"getBatch" : 66,
"latestOffset" : 21,
"queryPlanning" : 143,
"triggerExecution" : 514,
"walCommit" : 23
},
"eventTime" : {
"watermark" : "1970-01-01T00:00:00.000Z"
},
"stateOperators" : [ {
"numRowsTotal" : 0,
"numRowsUpdated" : 0,
"memoryUsedBytes" : -1,
"numRowsDroppedByWatermark" : 0,
"customMetrics" : {
"loadedMapCacheHitCount" : 0,
"loadedMapCacheMissCount" : 0,
"stateOnCurrentVersionSizeBytes" : -1
}
} ],
"sources" : [ {
"description" : "FileStreamSource[hdfs://192.168.49.13:9000/input/lz]",
"startOffset" : {
"logOffset" : 1
},
"endOffset" : {
"logOffset" : 2
},
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0
} ],
"sink" : {
"description" : "ForeachBatchSink",
"numOutputRows" : -1
}
}
if I use console my process doesn't have any problem and it will be correct but for write to oracle I have problem.
I have the following application (I am starting and stopping spark) in Windows. I use Scala-IDE(Eclipse). I get "A master URL must be set in your configuration" error even though I have set it here. I use spark-2.4.4 version.
Can someone please help me to fix this issue.
import org.apache.spark._;
import org.apache.spark.sql._;
object SampleApp {
def main(args: Array[String]) {
val conf = new SparkConf()
.setMaster("local[*]")
.setAppName("Simple Application")
val sc = new SparkContext(conf)
sc.stop()
}
}
The error is:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/10/28 22:58:56 INFO SparkContext: Running Spark version 2.4.4
19/10/28 22:58:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/10/28 22:58:56 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.spark.renga.SampleApp$.main(SampleApp.scala:8)
at com.spark.renga.SampleApp.main(SampleApp.scala)
19/10/28 22:58:56 ERROR Utils: Uncaught exception in thread main
java.lang.NullPointerException
at org.apache.spark.SparkContext.postApplicationEnd(SparkContext.scala:2416)
at org.apache.spark.SparkContext.$anonfun$stop$2(SparkContext.scala:1931)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1931)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:585)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.spark.renga.SampleApp$.main(SampleApp.scala:8)
at com.spark.renga.SampleApp.main(SampleApp.scala)
19/10/28 22:58:56 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:368)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.spark.renga.SampleApp$.main(SampleApp.scala:8)
at com.spark.renga.SampleApp.main(SampleApp.scala)
if you are using version 2.4.4 try this:
import org.apache.spark.sql.SparkSession
object SampleApp {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.master("local[*]")
.appName("test")
.getOrCreate()
println(spark.sparkContext.version)
spark.stop()
}
}
I'm getting weird error whenever I re-create (delete and create context) the Spark SQL Context and run the job for 2nd time or after it will always throw this exception.
[2016-09-20 13:52:28,743] ERROR .jobserver.JobManagerActor [] [akka://JobServer/user/context-supervisor/ctx] - Exception from job 23fe1335-55ec-47b2-afd3-07396483eae0:
java.lang.RuntimeException: Error while encoding: java.lang.ClassCastException: org.lala.Country cannot be cast to org.lala.Country
staticinvoke(class org.apache.spark.unsafe.types.UTF8String,StringType,fromString,invoke(input[0, ObjectType(class org.lala.Country)],code,ObjectType(class java.lang.String)),true) AS code#10
+- staticinvoke(class org.apache.spark.unsafe.types.UTF8String,StringType,fromString,invoke(input[0, ObjectType(class org.lala.Country)],code,ObjectType(class java.lang.String)),true)
+- invoke(input[0, ObjectType(class org.lala.Country)],code,ObjectType(class java.lang.String))
+- input[0, ObjectType(class org.lala.Country)]
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:220)
at org.apache.spark.sql.SQLContext$$anonfun$8.apply(SQLContext.scala:504)
at org.apache.spark.sql.SQLContext$$anonfun$8.apply(SQLContext.scala:504)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:504)
at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:141)
at org.lala.HelloJob$.runJob(HelloJob.scala:18)
at org.lala.HelloJob$.runJob(HelloJob.scala:13)
at spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:301)
My Spark Class :
case class Country(code:String)
object TestJob extends SparkSqlJob {
override def runJob(sc: SQLContext, jobConfig: Config): Any = {
import sc.implicits._
val country = List(Country("A"),Country("B"))
val countryDS = country.toDS()
countryDS.collect().foreach(println)
}
override def validate(sc: SQLContext, config: Config): SparkJobValidation = {
SparkJobValid
}
}
I'm using:
Spark 1.6.1
Spark Job Server 0.6.2 (docker)