Unable to create SparkApplications on Kubernetes cluster using SparkKubernetesOperator from Airflow DAG (Airflow version 2.0.2 MWAA) - apache-spark

I try to use SparkKubernetesOperator to run spark job into Kubernetes with the same DAG and yaml files as the following question:
Unable to create SparkApplications on Kubernetes cluster using SparkKubernetesOperator from Airflow DAG
But airflow shows the following error:
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'e2e1833d-a1a6-40d4-9d05-104a32897deb', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 10 Sep 2021 08:38:33 GMT', 'Content-Length': '462'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the object provided is unrecognized (must be of type SparkApplication): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string \"json:\\\"apiVersion,omitempty\\\"\"; Kind string \"json:\\\"kind,omitempty\\\"\" } (222f7573722f6c6f63616c2f616972666c6f772f646167732f636f6e6669 ...)","reason":"BadRequest","code":400}
Any suggestion to resolve that problem???

think u had the same problem like me
SparkKubernetesOperator(
task_id='spark_pi_submit',
namespace="default",
application_file=open("/opt/airflow/dags/repo/script/spark-test.yaml").read(), #officially know bug
kubernetes_conn_id="kubeConnTest", #ns default in airflow connection UI
do_xcom_push=True,
dag=dag
)
I wrapped it like this.
and it works like charm
https://github.com/apache/airflow/issues/17371

Related

Boto 3 filter_log_events returns null but describe_log_streams gives correct values

I am trying to retrieve cloud watch logs from log group /frontend/lambda/FEservice. The logs are stored in multiple stream with pattern YYYY/MM/DD/[$LATEST]*
Example: 2022/04/05/[$LATEST]00a561e2246d41b616d4c3b7e2fb3frt.
There are more than 5000 streams in the log group.
While I am trying to retrieve log data using filter_log_events
client = boto3.client('logs')
resp = client.filter_log_events(
logGroupName='/frontend/lambda/FEservice',
filterPattern='visited the website',
logStreamNamePrefix='2022/05/01',
startTime=1648771200000,
endTime=1651795199000,
nextToken=currentToken
)
I am getting a null result
{'events': [], 'searchedLogStreams': [], 'nextToken': 'Bxkq6kVGFtq2y_MoigeqscPOdhXVbhiVtLoAmXb5jCrI7fXLrCWjfclUd7NavbCh3qEZ3ldX2CKRPPWLt_z0-NByZyCUE5XjMyqJW5ajEEUVoxzFGkADR_7uFQhD0XGgof85Q25xWQQUXocoe3J_UbDW4YZ22sEvL05G9oQsykCfTDJy50efjliqpPRFOBUVIbtQ2Rm_ng4Vrr8yNIzx1jaemLtP2uJT_9rBNO2EwITsMYgUVJ2GblvyNfEMVN-aL4yfsaKjc1cae9smXXb0SRksaBZti8As_G3uOPWyuPU', 'ResponseMetadata': {'RequestId': 'b733e213-da06-4060-a0a8-490252adfc8d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'b733e213-da06-4060-a0a8-490252adfc8d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '439', 'date': 'Sat, 14 May 2022 06:38:15 GMT'}, 'RetryAttempts': 0}}
However if I tried to use describe_log_streams with a Prefix Parameter. I am getting all the log stream prefixed by 2022/05/01/
resp = client.describe_log_streams(
logGroupName='/frontend/lambda/FEservice',
logStreamNamePrefix= '2022/05/01/',
descending=False,
limit=20
)
I am also getting results if I remove all parameters. Like this.
resp = client.filter_log_events(logGroupName='/aws/lambda/CasperFrontendLambda',
limit=200)
Can someone help me find the issue

Influxdb2 Python API: Path not found

I have a working InfluxDb2 server and, on a Raspberry Pi, the Python client library.
I've generated the the tokens in the server UI and copied an all-areas one into the Python. The test bucket is set up in the UI too. In the Python program I have this:
bucket = "test"
org = "test-org"
#
token = "blabla=="
# Store the URL of your InfluxDB instance
url="http://10.0.1.1:8086/api/v2"
client = influxdb_client.InfluxDBClient(
url=url,
token=token,
org=org
)
Followed later by:
p = influxdb_client.Point("my_measurement").tag("location", "Prague").field("temperature", 25.3)
write_api = client.write_api(write_options=SYNCHRONOUS)
write_api.write(bucket='test', org='test-org', record=p)
I've overcome the not-authorized but now, whatever I do, I end up with this:
influxdb_client.rest.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json; charset=utf-8', 'X-Influxdb-Build': 'OSS', 'X-Influxdb-Version': 'v2.2.0', 'X-Platform-Error-Code': 'not found', 'Date': 'Tue, 26 Apr 2022 14:35:50 GMT', 'Content-Length': '54'})
HTTP response body: {
"code": "not found",
"message": "path not found"
}
I've also gone back to Curl which gives me not authorized problem with the same parameters. Any help appreciated, beginning to regret trying to upgrade now.
You don't need the /api/v2 in your url parameter, just url="http://10.0.1.1:8086"
See https://github.com/influxdata/influxdb-client-python#getting-started

Swagger produces key behaving differently in Openapi 3

I'm currently converting my swagger file to openapi3 and I have an endpoint thats returning a json response and I am using produces to output it as text/plain. I know the produces key for responses has been replaced in openapi3 by content: text/plain: etc.. but this is no longer converting my response. So previously if I called response.text after calling the endpoint I would get "This is a test." but now I get '"This is a test."\n'
Swagger 2 file:
/api/logs:
get:
description: Retrieve logs .
operationId: controller.get_logs
responses:
"200":
description: Job logs found
schema:
type: string
produces:
- text/plain
Openapi 3 file:
/api/logs:
get:
description: Retrieve logs .
operationId: controller.get_logs
responses:
"200":
description: Job logs found
content:
text/plain:
schema:
type: string
Below is a snippet of the application code, we call an external API and just return the response from that call. I don't have a snippet of the Api code to share but I have added some logs to display the response content:
resp = job.get_output() # Api call
try:
json_resp = resp.json()
LOGGER.info(f"Response: {resp}")
LOGGER.info(f"JSON: {json_resp}")
LOGGER.info(f"Text: {repr(resp.text)}")
LOGGER.info(f"Headers: {dict(resp.headers)}")
except Exception as error:
return (
{
"message": f"Failed to get job output for job: {job_id}"
},
HTTPStatus.NOT_FOUND,
)
return json_resp, HTTPStatus.OK, dict(resp.headers)
Log output:
LOGGER - Response: <Response [200]>
LOGGER - JSON: This is a test.
LOGGER - Text: '"This is a test."\n'
LOGGER - Headers: {'Date': 'Tue, 28 Jul 2020 15:23:31 GMT', 'Content-Type': 'application/json', 'Content-Length': '18', 'Connection': 'keep-alive'}
What am I doing wrong or am I missing something?

Azure CLI hangs when deleting blobs

I'm using the Azure CLI to delete multiple blobs (in this case there's only 3 to delete), by specifying a pattern:
az storage blob delete-batch --connection-string myAzureBlobConnectionString -s my-container --pattern clients/client_name/*
This hangs and sees to get stuck in some kind of loop, I've tried adding --debug onto the end and it appears to be entering a never ending cycle of requests:
x-ms-client-request-id:16144555-a87c-11e9-bf86-sd391bc3b6f9
x-ms-date:Wed, 17 Jul 2019 10:17:12 GMT
x-ms-version:2018-11-09
/fsonss7393djamxomaa/mycontainer
comp:list
marker:2!152!XJJ4HDHKANnmLWUIWUDCN75DSDS89DXNNAKNK3NNINI4NKLNXLNLA88NSAMOXA
yOCE5OTk5LTEyLTMxVDIzOjU5OjU5Ljk5OTk5OTlaIQ--
restype:container
azure.multiapi.storage.v2018_11_09.common.storageclient : Client-Request-ID=446db2f0-d87e-11e9-ac19-jj324kc3b6f9 Outgoin
g request: Method=GET, Path=/mycontainer, Query={'restype': 'container', 'comp': 'list', 'prefix': None, 'delimiter
': None, 'marker': '2!152!MDAwMDY4IWNsaXASADYnJpc3RvbG9sZHZpYyOKD87986xlcy8wYWY3YTllYi02MzUyLTRmMmUtODE3MaSDXXZTdkYmYzOT
cuanBnITAwMDAyOCE5DADATEyLTMxVDIzOjUDD8223HKjk5OTk5OTlaIQ--', 'maxresults': None, 'include': None, 'timeout': None}, Head
ers={'x-ms-version': '2018-11-09', 'User-Agent': 'Azure-Storage/2.0.0-2.0.1 (Python CPython 3.6.6; Windows 2008ServerR2)
AZURECLI/2.0.68', 'x-ms-client-request-id': '1664324-a87c-1fsfs-bf86-ee291b5252f9', 'x-ms-date': 'Wed, 17 Jul 2019 10:1
9:14 GMT', 'Authorization': 'REDACTED'}.
urllib3.connectionpool : https://fsonss7393djamxomaa.blob.core.windows.net:443 "GET /mycontainer?restype=contain
er&comp=list&marker=2%21452%21MDXAXMDY4IWNsaWVudHMvYnJpc3RvbG9sZHZpYySnsns8sWY3YTllYi02MzUyLTRDASXXDE3MS01YzJmZTdkYm
YzOTcuanBnFFSFSAyOXASAOTk5LTEyLTMxGSGSOjU4535Ljk5OTk5OTlaIQ-- HTTP/1.1" 200 None
azure.multiapi.storage.v2018_11_09.common.storageclient : Client-Request-ID=544db2f0-a88c-23x9-ac19-jkjd89bc3b6f9 Receivi
ng Response: Server-Timestamp=Wed, 17 Jul 2019 10:19:14 GMT, Server-Request-ID=44fsfs2-701e-004e-2589-3cae723232000, HTT
P Status Code=200, Message=OK, Headers={'transfer-encoding': 'chunked', 'content-type': 'application/xml', 'server': 'Wi
ndows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-request-id': '4a43c59b2-701e-44c-2989-3cdsd70000000', 'x-ms-version':
'2018-11-09', 'date': 'Wed, 17 Jul 2019 10:19:14 GMT'}.
azure.multiapi.storage.v2018_11_09.common._auth : String_to_sign=GET
It loops these requests over and over. Running an az storage list with a prefix returns the 3 files immediately.
Any ideas?
I think there is a minor error in your cli code: the container name is incorrect(means it does not have the path clients/client_name).
In your cli code, the container name is my-container. But in the debug info, I can see the container name is mycontainer which is not consistent with the name in your cli code.
Please make sure you specify the correct container name in your cli code, and which does contain the path clients/client_name.
I test the code at my side with a container, which does not have the path clients/client_name, and the same error with you. But if test with a container which has the path clients/client_name, then it deletes all the blobs inside it.
Otherwise, you should check cli version with az --version, the latest version is 2.0.69

Nodejs app using Mailjet throwing a confusing error

I'm building an app using Mailjet, and using their connection example.
app.get('/send',function(req,res){
...
var request = mailjet
.post("send")
.request({
<request stuff, email details>
});
request
.on('success', function (response, body) {
<handle response>
})
.on('error', function (err, response) {
<handle error>
});
Getting this error:
Unhandled rejection Error: Unsuccessful
at /home/ubuntu/workspace/node_modules/node-mailjet/mailjet-client.js:203:23
When I go to the Mailjet client and ask it to log the error, it tells me:
{ [Error: Unauthorized]
original: null,
...
Anyone have an idea of where I should start troubleshooting?
Update: saw this in the error output:
header:
{ server: 'nginx',
date: 'Thu, 02 Mar 2017 14:04:11 GMT',
'content-type': 'text/html',
'content-length': '20',
connection: 'close',
'www-authenticate': 'Basic realm="Provide an apiKey and secretKey"',
vary: 'Accept-Encoding',
'content-encoding': 'gzip' },
So it's not eating my API key and secret. Can anyone tell me how to set those as environmental variables in Cloud9?
You can set environment variables in ~/.profile. Files outside of the workspace directory /home/ubuntu/workspace aren't accessible for read-only users so people won't be able to see them.
In the terminal, you can do for example:
$> echo "export MAILJET_PUBLIC=foo" >> ~/.profile
$> echo "export MAILJET_SECRET=bar" >> ~/.profile
Then, you'll be able to access those variables in Node when using the connect method:
const mailjet = require ('node-mailjet')
.connect(process.env.MAILJET_PUBLIC, process.env.MAILJET_SECRET)
The runners (from the "run" button) and the terminal will evaluate ~/.profile and make the environment variable available to your app.

Resources