cannot upload data to s3 through lambda - python-3.x

I'm trying to extract data from trust advisor through lambda function and upload to s3. Some part of the function executes the append module on the data. However, that module block throws error. That specific block is
try:
check_summary = support_client.describe_trusted_advisor_check_summaries(
checkIds=[checks['id']])['summaries'][0]
if check_summary['status'] != 'not_available':
checks_list[checks['category']].append(
[checks['name'], check_summary['status'],
str(check_summary['resourcesSummary']['resourcesProcessed']),
str(check_summary['resourcesSummary']['resourcesFlagged']),
str(check_summary['resourcesSummary']['resourcesSuppressed']),
str(check_summary['resourcesSummary']['resourcesIgnored'])
])
else:
print("unable to append checks")
except:
print('Failed to get check: ' + checks['name'])
traceback.print_exc()
The error logs
unable to append checks
I'm new to Python. So, unsure of how to check for trackback stacks under else: statement. Also, am I doing anything wrong in the above ? Plz help

You are not calling the s3_upload function anywhere, also the code is invalid since it has file_name variable in it which is not initialized.

I've observed your script-
traceback.print_exc() This should be executed before the return statement so that the python compiler can identify the obstacles/errors
if __name__ == '__main__':
lambda_handler
This will work only if is used to execute some code only if the file was run directly, and not imported.
According to the documentation the first three parameters of the put_object method,
def put_object(self, bucket_name, object_name, data, length,
Fix your parameters of put_object.
you're not using s3_upload in your lambda.

Related

getting scheduling error when forwarding s3 object to flask response

I'm using the following to pull a data file from an s3 compliant server:
try:
buf = io.BytesIO()
client.download_fileobj(project_id, path, buf)
body = buf.getvalue().decode("utf-8")
except botocore.exceptions.ClientError as e:
if defaultValue is not None:
return defaultValue
else:
raise S3Error(project_id, path, e) from e
else:
return body
The code generates this error:
RuntimeError: cannot schedule new futures after interpreter shutdown
In general, I'm simply trying to read an s3-compliant file into the body of a response object. The caller of the above snippet is as follows:
data = read_file(project_id, f"{PATH}/data.csv")
response = Response(
data,
mimetype="text/csv",
headers=[
("Content-Type", "application/octet-stream; charset=utf-8"),
("Content-Disposition", "attachment; filename=data.csv")
],
direct_passthrough=True
)
Playing with the code, if I don't get a runtime error, the request hangs in that I don't get a returned response.
Thank you to anyone with guidance.
I'm not sure how generic this answer will be, however, the combination of using boto to access the digital ocean version of the s3 implementation does not "strictly" permit using an object key that starts with /. Once I removed the offending leading character, the files downloaded as expected.
I base the boto specificity on the fact that I was able to read the files using a Haskell's amazonka.

How to reuse python variable which was assigned to a s3 return object from AWS

I need to reuse a python variable which stores the object returned from a s3 get_object call. Below is my code
def check_csv_format(s3_object):
try:
pd.read_csv(obj['Body'], header=None)
except Exception as e:
raise Exception(e)
obj = s3.get_object(bucket, key)
check_csv_format(obj)
df = pd.read_csv(obj['Body'])
But when i run this code, it gives below error.
pandas.errors.EmptyDataError: No columns to parse from file
I tried to use python deepcopy to keep a copy of that object. But it didn't work. Suggest a solution pls
The obj['Body'] element in the returned dict is a StreamingBody. It doesn't support seek or re-streaming. If you call read() on it passing no parameters then you read all of the data. So, if you call read() a 2nd time, you will get no more bytes.
Why don't you simply save the streamed object like this:
csv_content = obj['Body'].read().decode('utf-8')
Then you can pass csv_content to Pandas as needed.

Can You Retry/Loop inside a Try/Except?

I'm trying to understand if it's possible to set a loop inside of a Try/Except call, or if I'd need to restructure to use functions. Long story short, after spending a few hours learning Python and BeautifulSoup, I managed to frankenstein some code together to scrape a list of URLs, pull that data out to CSV (and now update it to a MySQL db). The code is now working as planned, except that I occasionally run into a 10054, either because my VPN hiccups, or possibly the source host server is occasionally bouncing me (I have a 30 second delay in my loop but it still kicks me on occasion).
I get the general idea of Try/Except structure, but I'm not quite sure how I would (or if I could) loop inside it to try again. My base code to grab the URL, clean it and parse the table I need looks like this:
for url in contents:
print('Processing record', (num+1), 'of', len(contents))
if url:
print('Retrieving data from ', url[0])
html = requests.get(url[0]).text
soup = BeautifulSoup(html, 'html.parser')
for span in soup('span'):
span.decompose()
trs = soup.select('div#collapseOne tr')
if trs:
print('Processing')
for t in trs:
for header, value in zip(t.select('td')[0], t.select('td:nth-child(2)')):
if num == 0:
headers.append(' '.join(header.split()))
values.append(re.sub(' +', ' ', value.get_text(' ', strip=True)))
After that is just processing the data to CSV and running an update sql statement.
What I'd like to do is if the HTML request call fails is wait 30 seconds, try the request again, then process, or if the retry fails X number of times, go ahead and exit the script (assuming at that point I have a full connection failure).
Is it possible to do something like that in line, or would I need to make the request statement into a function and set up a loop to call it? Have to admit I'm not familiar with how Python works with function returns yet.
You can add an inner loop for the retries and put your try/except block in that. Here is a sketch of what it would look like. You could put all of this into a function and put that function call in its own try/except block to catch other errors that cause the loop to exit.
Looking at requests exception hierarchy, Timeout covers multiple recoverable exceptions and is a good start for everything you may want to catch. Other things like SSLError aren't going to get better just because you retry, so skip them. You can go through the list to see what is reasonable for you.
import itertools
# requests exceptions at
# https://requests.readthedocs.io/en/master/_modules/requests/exceptions/
for url in contents:
print('Processing record', (num+1), 'of', len(contents))
if url:
print('Retrieving data from ', url[0])
retry_count = itertools.count()
# loop for retries
while True:
try:
# get with timeout and convert http errors to exceptions
resp = requests.get(url[0], timeout=10)
resp.raise_for_status()
# the things you want to recover from
except requests.Timeout as e:
if next(retry_count) <= 5:
print("timeout, wait and retry:", e)
time.sleep(30)
continue
else:
print("timeout, exiting")
raise # reraise exception to exit
except Exception as e:
print("unrecoverable error", e)
raise
break
html = resp.text
etc…
I've done a little example by myself to graphic this, and yes, you can put loops inside try/except blocks.
from sys import exit
def example_func():
try:
while True:
num = input("> ")
try:
int(num)
if num == "10":
print("Let's go!")
else:
print("Not 10")
except ValueError:
exit(0)
except:
exit(0)
example_func()
This is a fairly simple program that takes input and if it's 10, then it says "Let's go!", otherwise it tells you it's not 10 (if it's not a valid value, it just kicks you out).
Notice that inside the while loop I put a try/except block, taking into account the necessary indentations. You can take this program as a model and use it on your favor.

Asyncio shared object at the same address does not hold same values

Okay, so I am created a DataStream object which is just a wrapper class around asyncio.Queue. I am passing this around all over and everything is working fine up until the following functions. I am calling ensure_future to run 2 infinite loops, one that replicates the data in one DataStream object, and one that sends data to a websocket. here is that code:
def start(self):
# make sure that we set the event loop before we run our async requests
print("Starting WebsocketProducer on ", self.host, self.port)
RUNTIME_LOGGER.info(
"Starting WebsocketProducer on %s:%i", self.host, self.port)
#Get the event loop and add a task to it.
asyncio.set_event_loop(self.loop)
asyncio.get_event_loop().create_task(self._mirror_stream(self.data_stream))
asyncio.ensure_future(self._serve(self.ssl_context))enter code here
Ignore the indent issue, SO wont indent correctly.
And here is the method that is failing with the error 'Task was destroyed but it is pending!'. Keep in mind, if I do not include the lines with 'data_stream.get()' the function runs fine. I made sure, the objects in both locations have the same memory address AND value for id(). If i print the data that comes from the await self.data_stream.get() I get the correct data. However after that it seems to just return and break. Here is the code:
async def _mirror_stream(self):
while True:
stream_length = self.data_stream.length
try:
if stream_length > 1:
for _ in range(0, stream_length):
data = await self.data_stream.get()
else:
data = await self.data_stream.get()
except Exception as e:
print(str(e))
# If the data is null, keep the last known value
if self._is_json_serializable(data) and data is not None:
self.payload = json.dumps(data)
else:
RUNTIME_LOGGER.warning(
"Mirroring stream encountered a Null payload in WebsocketProducer!")
await asyncio.sleep(self.poll_rate)enter code here
The issue has been resolved by implementing my own async Queue by utilizing the normal queue.Queue object. For some reason the application would only work if I would 'await' for queue.get(), even though it wasnt an asyncio.Queue object... Not entirely sure why this behavior occured, however the application is running well, and still performing as if the Queue were from the asyncio lib. Thanks to those who looked!

How to check if boto3 S3.Client.upload_fileobj succeeded?

I want to save the result of a long running job on S3. The job is implemented in Python, so I'm using boto3. The user guide says to use S3.Client.upload_fileobj for this purpose which works fine, except I can't figure out how to check if the upload has succeeded. According to the documentation, the method doesn't return anything and doesn't raise an error. The Callback param seems to be intended for progress tracking instead of error checking. It is also unclear if the method call is synchronous or asynchronous.
If the upload failed for any reason, I would like to save the contents to the disk and log an error. So my question is: How can I check if a boto3 S3.Client.upload_fileobj call succeeded and do some error handling if it failed?
I use a combination of head_object and wait_until_exists.
import boto3
from botocore.exceptions import ClientError, WaiterError
session = boto3.Session()
s3_client = session.client('s3')
s3_resource = session.resource('s3')
def upload_src(src, filename, bucketName):
success = False
try:
bucket = s3_resource.Bucket(bucketName)
except ClientError as e:
bucket = None
try:
# In case filename already exists, get current etag to check if the
# contents change after upload
head = s3_client.head_object(Bucket=bucketName, Key=filename)
except ClientError:
etag = ''
else:
etag = head['ETag'].strip('"')
try:
s3_obj = bucket.Object(filename)
except ClientError, AttributeError:
s3_obj = None
try:
s3_obj.upload_fileobj(src)
except ClientError, AttributeError:
pass
else:
try:
s3_obj.wait_until_exists(IfNoneMatch=etag)
except WaiterError as e:
pass
else:
head = s3_client.head_object(Bucket=bucketName, Key=filename)
success = head['ContentLength']
return success
There is a wait_until_exists() helper function that seems to be for this purpose in the boto3.resource object.
This is how we are using it:
s3_client.upload_fileobj(file, BUCKET_NAME, file_path)
s3_resource.Object(BUCKET_NAME, file_path).wait_until_exists()
I would recommend you to perform the following operations-
try:
response = upload_fileobj()
except Exception as e:
save the contents to the disk and log an error.
if response is None:
polling after every 10s to check if the file uploaded successfully or not using **head_object()** function..
If you got the success response from head_object :
break
If you got error in accessing the object:
save the contents to the disk and log an error.
So , basically do poll using head_object()

Resources