Azure CLI hangs when deleting blobs - azure

I'm using the Azure CLI to delete multiple blobs (in this case there's only 3 to delete), by specifying a pattern:
az storage blob delete-batch --connection-string myAzureBlobConnectionString -s my-container --pattern clients/client_name/*
This hangs and sees to get stuck in some kind of loop, I've tried adding --debug onto the end and it appears to be entering a never ending cycle of requests:
x-ms-client-request-id:16144555-a87c-11e9-bf86-sd391bc3b6f9
x-ms-date:Wed, 17 Jul 2019 10:17:12 GMT
x-ms-version:2018-11-09
/fsonss7393djamxomaa/mycontainer
comp:list
marker:2!152!XJJ4HDHKANnmLWUIWUDCN75DSDS89DXNNAKNK3NNINI4NKLNXLNLA88NSAMOXA
yOCE5OTk5LTEyLTMxVDIzOjU5OjU5Ljk5OTk5OTlaIQ--
restype:container
azure.multiapi.storage.v2018_11_09.common.storageclient : Client-Request-ID=446db2f0-d87e-11e9-ac19-jj324kc3b6f9 Outgoin
g request: Method=GET, Path=/mycontainer, Query={'restype': 'container', 'comp': 'list', 'prefix': None, 'delimiter
': None, 'marker': '2!152!MDAwMDY4IWNsaXASADYnJpc3RvbG9sZHZpYyOKD87986xlcy8wYWY3YTllYi02MzUyLTRmMmUtODE3MaSDXXZTdkYmYzOT
cuanBnITAwMDAyOCE5DADATEyLTMxVDIzOjUDD8223HKjk5OTk5OTlaIQ--', 'maxresults': None, 'include': None, 'timeout': None}, Head
ers={'x-ms-version': '2018-11-09', 'User-Agent': 'Azure-Storage/2.0.0-2.0.1 (Python CPython 3.6.6; Windows 2008ServerR2)
AZURECLI/2.0.68', 'x-ms-client-request-id': '1664324-a87c-1fsfs-bf86-ee291b5252f9', 'x-ms-date': 'Wed, 17 Jul 2019 10:1
9:14 GMT', 'Authorization': 'REDACTED'}.
urllib3.connectionpool : https://fsonss7393djamxomaa.blob.core.windows.net:443 "GET /mycontainer?restype=contain
er&comp=list&marker=2%21452%21MDXAXMDY4IWNsaWVudHMvYnJpc3RvbG9sZHZpYySnsns8sWY3YTllYi02MzUyLTRDASXXDE3MS01YzJmZTdkYm
YzOTcuanBnFFSFSAyOXASAOTk5LTEyLTMxGSGSOjU4535Ljk5OTk5OTlaIQ-- HTTP/1.1" 200 None
azure.multiapi.storage.v2018_11_09.common.storageclient : Client-Request-ID=544db2f0-a88c-23x9-ac19-jkjd89bc3b6f9 Receivi
ng Response: Server-Timestamp=Wed, 17 Jul 2019 10:19:14 GMT, Server-Request-ID=44fsfs2-701e-004e-2589-3cae723232000, HTT
P Status Code=200, Message=OK, Headers={'transfer-encoding': 'chunked', 'content-type': 'application/xml', 'server': 'Wi
ndows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-request-id': '4a43c59b2-701e-44c-2989-3cdsd70000000', 'x-ms-version':
'2018-11-09', 'date': 'Wed, 17 Jul 2019 10:19:14 GMT'}.
azure.multiapi.storage.v2018_11_09.common._auth : String_to_sign=GET
It loops these requests over and over. Running an az storage list with a prefix returns the 3 files immediately.
Any ideas?

I think there is a minor error in your cli code: the container name is incorrect(means it does not have the path clients/client_name).
In your cli code, the container name is my-container. But in the debug info, I can see the container name is mycontainer which is not consistent with the name in your cli code.
Please make sure you specify the correct container name in your cli code, and which does contain the path clients/client_name.
I test the code at my side with a container, which does not have the path clients/client_name, and the same error with you. But if test with a container which has the path clients/client_name, then it deletes all the blobs inside it.
Otherwise, you should check cli version with az --version, the latest version is 2.0.69

Related

Boto 3 filter_log_events returns null but describe_log_streams gives correct values

I am trying to retrieve cloud watch logs from log group /frontend/lambda/FEservice. The logs are stored in multiple stream with pattern YYYY/MM/DD/[$LATEST]*
Example: 2022/04/05/[$LATEST]00a561e2246d41b616d4c3b7e2fb3frt.
There are more than 5000 streams in the log group.
While I am trying to retrieve log data using filter_log_events
client = boto3.client('logs')
resp = client.filter_log_events(
logGroupName='/frontend/lambda/FEservice',
filterPattern='visited the website',
logStreamNamePrefix='2022/05/01',
startTime=1648771200000,
endTime=1651795199000,
nextToken=currentToken
)
I am getting a null result
{'events': [], 'searchedLogStreams': [], 'nextToken': 'Bxkq6kVGFtq2y_MoigeqscPOdhXVbhiVtLoAmXb5jCrI7fXLrCWjfclUd7NavbCh3qEZ3ldX2CKRPPWLt_z0-NByZyCUE5XjMyqJW5ajEEUVoxzFGkADR_7uFQhD0XGgof85Q25xWQQUXocoe3J_UbDW4YZ22sEvL05G9oQsykCfTDJy50efjliqpPRFOBUVIbtQ2Rm_ng4Vrr8yNIzx1jaemLtP2uJT_9rBNO2EwITsMYgUVJ2GblvyNfEMVN-aL4yfsaKjc1cae9smXXb0SRksaBZti8As_G3uOPWyuPU', 'ResponseMetadata': {'RequestId': 'b733e213-da06-4060-a0a8-490252adfc8d', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'b733e213-da06-4060-a0a8-490252adfc8d', 'content-type': 'application/x-amz-json-1.1', 'content-length': '439', 'date': 'Sat, 14 May 2022 06:38:15 GMT'}, 'RetryAttempts': 0}}
However if I tried to use describe_log_streams with a Prefix Parameter. I am getting all the log stream prefixed by 2022/05/01/
resp = client.describe_log_streams(
logGroupName='/frontend/lambda/FEservice',
logStreamNamePrefix= '2022/05/01/',
descending=False,
limit=20
)
I am also getting results if I remove all parameters. Like this.
resp = client.filter_log_events(logGroupName='/aws/lambda/CasperFrontendLambda',
limit=200)
Can someone help me find the issue

Unable to create SparkApplications on Kubernetes cluster using SparkKubernetesOperator from Airflow DAG (Airflow version 2.0.2 MWAA)

I try to use SparkKubernetesOperator to run spark job into Kubernetes with the same DAG and yaml files as the following question:
Unable to create SparkApplications on Kubernetes cluster using SparkKubernetesOperator from Airflow DAG
But airflow shows the following error:
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'e2e1833d-a1a6-40d4-9d05-104a32897deb', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 10 Sep 2021 08:38:33 GMT', 'Content-Length': '462'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the object provided is unrecognized (must be of type SparkApplication): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string \"json:\\\"apiVersion,omitempty\\\"\"; Kind string \"json:\\\"kind,omitempty\\\"\" } (222f7573722f6c6f63616c2f616972666c6f772f646167732f636f6e6669 ...)","reason":"BadRequest","code":400}
Any suggestion to resolve that problem???
think u had the same problem like me
SparkKubernetesOperator(
task_id='spark_pi_submit',
namespace="default",
application_file=open("/opt/airflow/dags/repo/script/spark-test.yaml").read(), #officially know bug
kubernetes_conn_id="kubeConnTest", #ns default in airflow connection UI
do_xcom_push=True,
dag=dag
)
I wrapped it like this.
and it works like charm
https://github.com/apache/airflow/issues/17371

AWS Lambda function times out when reading bucket file

Last two lines of code below are the issue. I have line of sight to the csv file in the bucket as can be seen in the printout below, the file in the bucket is an object that is returned with key/value conventions. The problem is the .read(). It ALWAYS times out. Per the pointers when I first posted this question I've changed my settings in AWS to 3 minutes before a function times out and I also try to download it but that returns None. I guess the central questions are why does the .read() function take so long and what is missing in my download_file command? The file is small: 1KB. Any help appreciated thanks
import boto3
import csv
s3 = boto3.resource('s3')
bucket = s3.Bucket('polly-partner')
obj = bucket.Object(key='CyclingLog.csv')
def lambda_handler(event, context):
response = obj.get()
print(response)
key = obj.key
filepath = '/tmp/' + key
print(bucket.download_file(key, filepath))
lines = response['Body'].read()
print(lines)
Printout is:
Response:
{
"errorType": "Runtime.ExitError",
"errorMessage": "RequestId: 541f6cc6-2195-409a-88d3-e98c57fbd539 Error: Runtime exited with error: signal: killed"
}
Request ID:
"541f6cc6-2195-409a-88d3-e98c57fbd539"
Function Logs:
START RequestId: 541f6cc6-2195-409a-88d3-e98c57fbd539 Version: $LATEST
{'ResponseMetadata': {'RequestId': '0860AE16F7A96522', 'HostId': 'D6k1kFcCv9Qz70ANXjEnPQEFsKpAntqJND9FRf5diae3WWmDbVDJENkPCd1oOOOfFt8BJ8b8OOY=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'D6k1kFcCv9Qz70ANXjEnPQEFsKpAntqJND9FRf5diae3WWmDbVDJENkPCd1oOOOfFt8BJ8b8OOY=', 'x-amz-request-id': '0860AE16F7A96522', 'date': 'Wed, 01 Apr 2020 17:51:49 GMT', 'last-modified': 'Thu, 19 Mar 2020 17:17:37 GMT', 'etag': '"b56479c4073a90943b3d862d5d4ff38d-6"', 'accept-ranges': 'bytes', 'content-type': 'text/csv', 'content-length': '50000056', 'server': 'AmazonS3'}, 'RetryAttempts': 1}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2020, 3, 19, 17, 17, 37, tzinfo=tzutc()), 'ContentLength': 50000056, 'ETag': '"b56479c4073a90943b3d862d5d4ff38d-6"', 'ContentType': 'text/csv', 'Metadata': {}, 'Body': <botocore.response.StreamingBody object at 0x7f536df1ddc0>}
None
END RequestId: 541f6cc6-2195-409a-88d3-e98c57fbd539
REPORT RequestId: 541f6cc6-2195-409a-88d3-e98c57fbd539 Duration: 12923.11 ms Billed Duration: 13000 ms Memory Size: 128 MB Max Memory Used: 129 MB Init Duration: 362.26 ms
RequestId: 541f6cc6-2195-409a-88d3-e98c57fbd539 Error: Runtime exited with error: signal: killed
Runtime.ExitError
The error message says: Task timed out after 3.00 seconds
You can increase the Timeout on a Lambda function by opening the function in the console, going to the Basic settings section and clicking Edit.
While you say that you increased this timeout setting, the fact that it is timing-out after exactly 3 seconds suggests that the setting has not been changed.
I know this is an old post, (and hopefully solved long ago!), but I ended up here so I'll share my findings.
These generic Runtime error messages:
"Error: Runtime exited with error: signal: killed Runtime.ExitError"
...when accompanied by something like this on the REPORT line:
Memory Size: 128 MB Max Memory Used: 129 MB Init Duration: 362.26 ms
...Looks like a low memory issue. Especially when "Max Memory Used" is >= "Memory Size"
From what I've seen, Lambda can and often will utilize up to 100% memory without issue (Discussed in this post). But when you attempt to load data into memory, or perform memory intensive processing (copying large data sets stored in variables?), the Python runtime can hit a memory error and exit.
Unfortunately, it isn't very well documented, or logged, or captured with CloudWatch metrics.
I believe the same error in NodeJS runtime looks like:
"Error: Runtime exited with error: signal: aborted (core dumped)"

Nodejs app using Mailjet throwing a confusing error

I'm building an app using Mailjet, and using their connection example.
app.get('/send',function(req,res){
...
var request = mailjet
.post("send")
.request({
<request stuff, email details>
});
request
.on('success', function (response, body) {
<handle response>
})
.on('error', function (err, response) {
<handle error>
});
Getting this error:
Unhandled rejection Error: Unsuccessful
at /home/ubuntu/workspace/node_modules/node-mailjet/mailjet-client.js:203:23
When I go to the Mailjet client and ask it to log the error, it tells me:
{ [Error: Unauthorized]
original: null,
...
Anyone have an idea of where I should start troubleshooting?
Update: saw this in the error output:
header:
{ server: 'nginx',
date: 'Thu, 02 Mar 2017 14:04:11 GMT',
'content-type': 'text/html',
'content-length': '20',
connection: 'close',
'www-authenticate': 'Basic realm="Provide an apiKey and secretKey"',
vary: 'Accept-Encoding',
'content-encoding': 'gzip' },
So it's not eating my API key and secret. Can anyone tell me how to set those as environmental variables in Cloud9?
You can set environment variables in ~/.profile. Files outside of the workspace directory /home/ubuntu/workspace aren't accessible for read-only users so people won't be able to see them.
In the terminal, you can do for example:
$> echo "export MAILJET_PUBLIC=foo" >> ~/.profile
$> echo "export MAILJET_SECRET=bar" >> ~/.profile
Then, you'll be able to access those variables in Node when using the connect method:
const mailjet = require ('node-mailjet')
.connect(process.env.MAILJET_PUBLIC, process.env.MAILJET_SECRET)
The runners (from the "run" button) and the terminal will evaluate ~/.profile and make the environment variable available to your app.

AWS SDK - change autoscaling group update policy

I've an autoscaling group on AWS and I'd like to change its update policy to get rolling update.
I've tried
var autoScaling = new AWS.AutoScaling(awsConfig);
autoScaling.updateAutoScalingGroup({
AutoScalingGroupName: <some name>,
UpdatePolicy: {
AutoScalingReplacingUpdate: {
WillReplace: true,
},
}
})
But this is failing with:
{ [UnexpectedParameter: Unexpected key 'UpdatePolicy' found in params]
message: 'Unexpected key \'UpdatePolicy\' found in params',
code: 'UnexpectedParameter',
time: Tue Nov 08 2016 22:15:42 GMT-0800 (PST) }
UpdatePolicy is a feature of AWS CloudFormation. It is not a feature found in the AWS API itself so none of the SDKs will have it. This is the documentation from CF.
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html

Resources