I'm trying to install libraries using this api: https:///api/2.0/libraries/install/
Here is the body:
{
"cluster_id": "<CLUSTER_ID>",
"libraries": [
{
"pypi": {
"package": "cryptography==3.4.7"
}
}
]
}
And I'm getting the below response while my clusterId is valid and it's running
{
"error_code": "INVALID_PARAMETER_VALUE",
"message": "Cluster <CLUSTER_ID> is terminated or does not exist"
}
Recently I have encountered same issue. This happened when I have 2 clusters with same name, but one is active, and the other is terminated. For some reason it is pulling the terminated cluster when used.
I have removed the cluster which is terminated and now it is working fine.
Related
I have a spark cluster deployed on windows. I'm trying to submit a simple spark job using the rest api. The job is just python code that does simple hello world sentence as follows :
from pyspark.sql import SparkSession
def main(args):
print('hello world')
return 0
if __name__ == '__main__':
main(None)
The url Im using to submit the job is:
http://:6066/v1/submissions/create
With the following Post Body:
{
"appResource": "file:../../helloworld.py",
"sparkProperties": {
"spark.executor.memory": "2g",
"spark.master": "spark://<Master IP>:7077",
"spark.app.name": "Spark REST API - Hello world",
"spark.driver.memory": "2g",
"spark.eventLog.enabled": "false",
"spark.driver.cores": "2",
"spark.submit.deployMode": "cluster",
"spark.driver.supervise": "true"
},
"clientSparkVersion": "3.3.1",
"mainClass": "org.apache.spark.deploy.SparkSubmit",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"action": "CreateSubmissionRequest",
"appArgs": [
"../../helloworld.py", "80"
]
}
After I run this post using postmant, I get the following response:
`
{
"action": "CreateSubmissionResponse",
"message": "Driver successfully submitted as driver-20221216112633-0005",
"serverSparkVersion": "3.3.1",
"submissionId": "driver-20221216112633-0005",
"success": true
}
`
However when I try to get the job status using :
http://:6066/v1/submissions/status/driver-20221216112633-0005
I get the driverState: ERROR , NullPointerException as follows:
`
{
"action": "SubmissionStatusResponse",
"driverState": "ERROR",
"message": "Exception from the cluster:\njava.lang.NullPointerException\n\torg.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:158)\n\torg.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:179)\n\torg.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:99)",
"serverSparkVersion": "3.3.1",
"submissionId": "driver-20221216112633-0005",
"success": true,
"workerHostPort": "10.9.8.120:56060",
"workerId": "worker-20221216093629-<IP>-56060"
}
`
Not sure why Im getting this error and what it means. Can someone please point me in the right direction or help me at least how I can trouble this farther? Thanks
I tried to submit the API request using Postman and I was expecting a FINISHED or RUNNING state since its just simple hello application
EDIT 2022-01-12:
I was finally able to figure out the problem. To resolve this issue basically it seems like the py\jar file as specified in the "appResource" & ""spark.jars" needs to be accessible by all nodes in the cluster, for example if you have network path you can specify the network path in both attributes as follows:
"appResource": "file:////Servername/somefolder/HelloWorld.jar",
...
"spark.jars": "file:////Servername/someFolder/HelloWorld.jar",
...
Hope that helps and save them the time and agony I had to go through.
If anybody knows why the py\jar file has to be accessible by all nodes even though we are submitting the job to the master please help me understand. Thanks
You should set the property deployMode to client since you are launching the dirver locally:
"spark.submit.deployMode": "client",
i was able to run successfully local Eclipse Ditto version using docker container latest version downloaded from https://github.com/eclipse/ditto/tree/master/deployment/docker.
Following the tutorial I was training first to create a new policy using the following curl:
curl -X PUT 'http://localhost:8080/api/2/policies/my.test:policy' -u 'ditto:ditto' -H 'Content-Type: application/json' -d '{
"entries": {
"owner": {
"subjects": {
"nginx:ditto": {
"type": "nginx basic auth user"
}
},
"resources": {
"thing:/": {
"grant": [
"READ","WRITE"
],
"revoke": []
},
"policy:/": {
"grant": [
"READ","WRITE"
],
"revoke": []
},
"message:/": {
"grant": [
"READ","WRITE"
],
"revoke": []
}
}
}
}
}'
401 - Authentication is possible but has failed or not yet been provided, the same one I'm getting from the local swagger.
Trying to create it on sandbox: https://www.eclipse.org/ditto/http-api-doc.html#/
I'm getting: Undocumented
TypeError: NetworkError when attempting to fetch resource.
What I'm missing? i choose the API version 2 and authorize myself as a ditto user to start working.
Is there any additional configuration required to start working with the local version?
And what I'm doing wrong with the sandbox?
I am able to use the below JSON through POSTMAN to run my Databricks notebook.
I want to be able to give a name to the cluster that is created through the "new_cluster" options.
Is there any such option available?
{
"tasks": [
{
"task_key": "Job_Run_Api",
"description": "To see how the run and trigger api works",
"new_cluster": {
"spark_version": "9.0.x-scala2.12",
"node_type_id": "Standard_E8as_v4",
"num_workers": "1",
"custom_tags": {
"Workload": "Job Run Api"
}
},
"libraries": [
{
"maven": {
"coordinates": "net.sourceforge.jtds:jtds:1.3.1"
}
}
],
"notebook_task": {
"notebook_path": "/Shared/POC/Job_Run_Api_POC",
"base_parameters": {
"name": "Junaid Khan"
}
},
"timeout_seconds": 2100,
"max_retries": 0
}
],
"job_clusters": null,
"run_name": "RUN_API_TEST",
"timeout_seconds": 2100
}
When the above API call is done, the cluster created has a name like "job-5975-run-2" and that is not super explanatory.
I have tried to use the tag "cluster_name" inside the "new_cluster" tag but I got an error that I can't do that, like this:
{
"error_code": "INVALID_PARAMETER_VALUE",
"message": "Cluster name should not be provided for jobs."
}
Appreciate any help here
Cluster name for jobs are automatically generated and can't be changed. If you want somehow track specific jobs, use tags.
P.S. If you want to have more "advanced" tracking capability, look onto Overwatch project.
I using Analytics Engine on IBM Cloud and trying to pass Ambari configuration Like below in Advanced provisioning options.
{
"ambari_config": {
"hardware_config": "default",
"software_package": "ae-1.2-hive-spark",
"num_compute_nodes": 1,
"advanced_options": {
"ambari_config": {
"spark2-defaults": {
"spark.dynamicAllocation.minExecutors": 1,
"spark.shuffle.service.enabled": true,
"spark.dynamicAllocation.maxExecutors": 2,
"spark.dynamicAllocation.enabled": true
}
}
}
}
}
I am following this documentation to pass the above configuration
https://cloud.ibm.com/docs/services/AnalyticsEngine?topic=AnalyticsEngine-advanced-provisioning-options
After multiple retires i see that each time my cluster request is failing.
After reviewing my request, I figured out that I am passing ambari_config attribute twice for my request which i not accepted
Valid json which worked for me looks like this
{
"hardware_config": "default",
"software_package": "ae-1.2-hive-spark",
"num_compute_nodes": 1,
"advanced_options": {
"ambari_config": {
"spark2-defaults": {
"spark.dynamicAllocation.minExecutors": 1,
"spark.shuffle.service.enabled": true,
"spark.dynamicAllocation.maxExecutors": 2,
"spark.dynamicAllocation.enabled": true
}
}
}
}
one more scenario where cluster creation can fail is like InvalidTopologyException: The following config types are not defined in the stack: [spar2-hive-site-override]
Above issue was because of TYPO to define config property file where user want to add or modify properties.
I have connected Eclipse hono with Eclipse ditto using the Connectivity api. When I set it up, this works fine. However, after some time the forwarding connection fails. When I retrieve the metrics, I'm getting following response:
{
"?": {
"?": {
"type": "connectivity.responses:aggregatedResponse",
"status": 200,
"connectionId": "<connectionId>",
"responsesType": "connectivity.responses:retrieveConnectionMetrics",
"responses": {
"connectivity-7cc7b5dc4c-6nn59": {
"type": "connectivity.responses:retrieveConnectionMetrics",
"status": 200,
"connectionId": "<connectionId>",
"connectionMetrics": {
"connectionStatus": "open",
"connectionStatusDetails": "Connected at 2019-03-19T08:28:53.211Z",
"inConnectionStatusSince": "2019-03-19T08:28:53.211Z",
"clientState": "CONNECTED",
"sourcesMetrics": [],
"targetsMetrics": [
{
"addressMetrics": {
"gw/{{ thing:namespace }}/{{ thing:id }}": {
"status": "failed",
"statusDetails": "Producer closed at 2019-03-19T21:00:16.466Z",
"messageCount": 2048,
"lastMessageAt": "2019-03-19T21:00:05.361Z"
}
},
"publishedMessages": 4070
}
]
}
}
}
}
}
}
I've been checking the logs around the time mentioned, but I'm not getting any errors. The logs I'm posting here are the last one before and the first one after the mentioned timestamp (2019-03-19T21:00:16.466Z).
2019-03-19 21:00:11,771 DEBUG [ID:AMQP_NO_PREFIX:TelemetrySenderImpl-42872] o.e.d.s.c.m.a.AmqpPublisherActor akka://ditto-cluster/system/sharding/connection/7/tenant_aloxy_consumer-aloxy-forward/pa/$a/c1/amqpPublisherActor3
- Message JmsTextMessage { org.apache.qpid.jms.provider.amqp.message.AmqpJmsTextMessageFacade#9bc051af } sent successfully.
2019-03-19 21:01:11,733 DEBUG [ID:AMQP_NO_PREFIX:TelemetrySenderImpl-42872] o.e.d.s.c.m.a.AmqpClientActor akka://ditto-cluster/system/sharding/connection/1/tenant_aloxy_consumer-aloxy/pa/$a/c1 - Inbound message: JmsInboundMessageDispatch { sequence = 38885, messageId = TelemetrySenderImpl-42873, consumerId = ID:a4925b59-1bb4-4cd8-9151-96ad422c36df:1:1:1 }
Although the log levels for all ditto services are set to debug, I'm not getting any useful logging.
Does any of you have any idea how I can get the loggging to investigate this problem or, even better, have any idea on what the problem might be and how to fix it?
When I delete the connection and recreate it, everything works as expected again. Maybe ditto can do this under the hood automatically?
UPDATE
When retrieving the connection via the API, I'm getting following response (including the failoverEnabled property which is set to true). This also indicates that the connection uses AMQP 1.0. The broker used is Enmasse.
{
"?": {
"?": {
"type": "connectivity.responses:retrieveConnection",
"status": 200,
"connection": {
"id": "<connectionId>",
"name": null,
"connectionType": "amqp-10",
"connectionStatus": "open",
"uri": "amqp://<consumer>:<password>#<amqp-host>:5672",
"sources": [],
"targets": [
{
"address": "gw/{{ thing:namespace }}/{{ thing:id }}",
"topics": [
"_/_/things/twin/events?filter=exists(features/alp)"
],
"authorizationContext": [
"<auth-context>"
]
}
],
"clientCount": 1,
"failoverEnabled": true,
"validateCertificates": true,
"processorPoolSize": 5,
"tags": []
}
}
}
}
Eclipse Ditto does an automatic failover if configured to so do (see https://www.eclipse.org/ditto/basic-connections.html - "failoverEnabled" property in the model).
It could however be that this was improved since the release 0.8.0 you are using.
The Ditto team is currently working towards a 0.9.0-M1 release which would contain an improved reconnection behavior.
Does the connection to Eclipse Hono automatically reconnect?
You described that the "forwarding connection" fails from time to time. Which technology (broker, etc.) is as endpoint for that gw/{{ thing:namespace }}/{{ thing:id }} address?