How to check if boto3 S3.Client.upload_fileobj succeeded? - python-3.x

I want to save the result of a long running job on S3. The job is implemented in Python, so I'm using boto3. The user guide says to use S3.Client.upload_fileobj for this purpose which works fine, except I can't figure out how to check if the upload has succeeded. According to the documentation, the method doesn't return anything and doesn't raise an error. The Callback param seems to be intended for progress tracking instead of error checking. It is also unclear if the method call is synchronous or asynchronous.
If the upload failed for any reason, I would like to save the contents to the disk and log an error. So my question is: How can I check if a boto3 S3.Client.upload_fileobj call succeeded and do some error handling if it failed?

I use a combination of head_object and wait_until_exists.
import boto3
from botocore.exceptions import ClientError, WaiterError
session = boto3.Session()
s3_client = session.client('s3')
s3_resource = session.resource('s3')
def upload_src(src, filename, bucketName):
success = False
try:
bucket = s3_resource.Bucket(bucketName)
except ClientError as e:
bucket = None
try:
# In case filename already exists, get current etag to check if the
# contents change after upload
head = s3_client.head_object(Bucket=bucketName, Key=filename)
except ClientError:
etag = ''
else:
etag = head['ETag'].strip('"')
try:
s3_obj = bucket.Object(filename)
except ClientError, AttributeError:
s3_obj = None
try:
s3_obj.upload_fileobj(src)
except ClientError, AttributeError:
pass
else:
try:
s3_obj.wait_until_exists(IfNoneMatch=etag)
except WaiterError as e:
pass
else:
head = s3_client.head_object(Bucket=bucketName, Key=filename)
success = head['ContentLength']
return success

There is a wait_until_exists() helper function that seems to be for this purpose in the boto3.resource object.
This is how we are using it:
s3_client.upload_fileobj(file, BUCKET_NAME, file_path)
s3_resource.Object(BUCKET_NAME, file_path).wait_until_exists()

I would recommend you to perform the following operations-
try:
response = upload_fileobj()
except Exception as e:
save the contents to the disk and log an error.
if response is None:
polling after every 10s to check if the file uploaded successfully or not using **head_object()** function..
If you got the success response from head_object :
break
If you got error in accessing the object:
save the contents to the disk and log an error.
So , basically do poll using head_object()

Related

getting scheduling error when forwarding s3 object to flask response

I'm using the following to pull a data file from an s3 compliant server:
try:
buf = io.BytesIO()
client.download_fileobj(project_id, path, buf)
body = buf.getvalue().decode("utf-8")
except botocore.exceptions.ClientError as e:
if defaultValue is not None:
return defaultValue
else:
raise S3Error(project_id, path, e) from e
else:
return body
The code generates this error:
RuntimeError: cannot schedule new futures after interpreter shutdown
In general, I'm simply trying to read an s3-compliant file into the body of a response object. The caller of the above snippet is as follows:
data = read_file(project_id, f"{PATH}/data.csv")
response = Response(
data,
mimetype="text/csv",
headers=[
("Content-Type", "application/octet-stream; charset=utf-8"),
("Content-Disposition", "attachment; filename=data.csv")
],
direct_passthrough=True
)
Playing with the code, if I don't get a runtime error, the request hangs in that I don't get a returned response.
Thank you to anyone with guidance.
I'm not sure how generic this answer will be, however, the combination of using boto to access the digital ocean version of the s3 implementation does not "strictly" permit using an object key that starts with /. Once I removed the offending leading character, the files downloaded as expected.
I base the boto specificity on the fact that I was able to read the files using a Haskell's amazonka.

cannot upload data to s3 through lambda

I'm trying to extract data from trust advisor through lambda function and upload to s3. Some part of the function executes the append module on the data. However, that module block throws error. That specific block is
try:
check_summary = support_client.describe_trusted_advisor_check_summaries(
checkIds=[checks['id']])['summaries'][0]
if check_summary['status'] != 'not_available':
checks_list[checks['category']].append(
[checks['name'], check_summary['status'],
str(check_summary['resourcesSummary']['resourcesProcessed']),
str(check_summary['resourcesSummary']['resourcesFlagged']),
str(check_summary['resourcesSummary']['resourcesSuppressed']),
str(check_summary['resourcesSummary']['resourcesIgnored'])
])
else:
print("unable to append checks")
except:
print('Failed to get check: ' + checks['name'])
traceback.print_exc()
The error logs
unable to append checks
I'm new to Python. So, unsure of how to check for trackback stacks under else: statement. Also, am I doing anything wrong in the above ? Plz help
You are not calling the s3_upload function anywhere, also the code is invalid since it has file_name variable in it which is not initialized.
I've observed your script-
traceback.print_exc() This should be executed before the return statement so that the python compiler can identify the obstacles/errors
if __name__ == '__main__':
lambda_handler
This will work only if is used to execute some code only if the file was run directly, and not imported.
According to the documentation the first three parameters of the put_object method,
def put_object(self, bucket_name, object_name, data, length,
Fix your parameters of put_object.
you're not using s3_upload in your lambda.

Can an except block of python have 2 conditions simultaneously?

I was trying to learn stock prediction with the help of this github project. but when I run the main.py file given in the repository, via the cmd. I encountered an error
File "/Stock-Predictor/src/tweetstream/streamclasses.py", line 101
except urllib2.HTTPError, exception:
^
SyntaxError: invalid syntax
The below given code is part of a PyPi module named tweetstreami.e. named as tweetstream/streamclasses.py. Which while implementing in a Twitter sentiment analysis project gave the error
import time
import urllib
import urllib2
import socket
from platform import python_version_tuple
import anyjson
from . import AuthenticationError, ConnectionError, USER_AGENT
class BaseStream(object):
"""A network connection to Twitters streaming API
:param username: Twitter username for the account accessing the API.
:param password: Twitter password for the account accessing the API.
:keyword count: Number of tweets from the past to get before switching to
live stream.
:keyword url: Endpoint URL for the object. Note: you should not
need to edit this. It's present to make testing easier.
.. attribute:: connected
True if the object is currently connected to the stream.
.. attribute:: url
The URL to which the object is connected
.. attribute:: starttime
The timestamp, in seconds since the epoch, the object connected to the
streaming api.
.. attribute:: count
The number of tweets that have been returned by the object.
.. attribute:: rate
The rate at which tweets have been returned from the object as a
float. see also :attr: `rate_period`.
.. attribute:: rate_period
The amount of time to sample tweets to calculate tweet rate. By
default 10 seconds. Changes to this attribute will not be reflected
until the next time the rate is calculated. The rate of tweets vary
with time of day etc. so it's useful to set this to something
sensible.
.. attribute:: user_agent
User agent string that will be included in the request. NOTE: This can
not be changed after the connection has been made. This property must
thus be set before accessing the iterator. The default is set in
:attr: `USER_AGENT`.
"""
def __init__(self, username, password, catchup=None, url=None):
self._conn = None
self._rate_ts = None
self._rate_cnt = 0
self._username = username
self._password = password
self._catchup_count = catchup
self._iter = self.__iter__()
self.rate_period = 10 # in seconds
self.connected = False
self.starttime = None
self.count = 0
self.rate = 0
self.user_agent = USER_AGENT
if url: self.url = url
def __enter__(self):
return self
def __exit__(self, *params):
self.close()
return False
def _init_conn(self):
"""Open the connection to the twitter server"""
headers = {'User-Agent': self.user_agent}
postdata = self._get_post_data() or {}
if self._catchup_count:
postdata["count"] = self._catchup_count
poststring = urllib.urlencode(postdata) if postdata else None
req = urllib2.Request(self.url, poststring, headers)
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, self.url, self._username, self._password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(handler)
try:
self._conn = opener.open(req)
except urllib2.HTTPError, exception: #___________________________problem here
if exception.code == 401:
raise AuthenticationError("Access denied")
elif exception.code == 404:
raise ConnectionError("URL not found: %s" % self.url)
else: # re raise. No idea what would cause this, so want to know
raise
except urllib2.URLError, exception:
raise ConnectionError(exception.reason)
The second item in the except is an identifier used in the body of the exception to access the exception information. The try/except syntax changed between Python 2 and Python 3 and your code is the Python 2 syntax.
Python 2 (language reference):
try:
...
except <expression>, <identifier>:
...
Python 3 (language reference, rationale):
try:
...
except <expression> as <identifier>:
...
Note that can be a single exception class or a tuple of exception classes to catch more than one type in a single except clause, so to answer your titled question you could use the following to handle more than one possible exception being thrown:
try:
x = array[5] # NameError if array doesn't exist, IndexError if it is too short
except (IndexError,NameError) as e:
print(e) # which was it?
Use...
Try: #code here
Except MyFirstError: #exception handling
Except AnotherError: #exception handling
You can repeat this many times

How to get the processed results from dramatiq python?

import dramatiq
from dramatiq.brokers.redis import RedisBroker
from dramatiq.results import Results
from dramatiq.results.backends import RedisBackend
broker = RedisBroker(host="127.0.0.1", port=6379)
broker.declare_queue("default")
dramatiq.set_broker(broker)
# backend = RedisBackend()
# broker.add_middleware(Results(backend=backend))
#dramatiq.actor()
def print_words(text):
print('This is ' + text)
print_words('sync')
a = print_words.send('async')
a.get_results()
I was checking alternatives to celery and found Dramatiq. I'm just getting started with dramatiq and I'm unable to retrieve results. I even tried setting the backend and 'save_results' to True. I'm always getting this AttributeError: 'Message' object has no attribute 'get_results'
Any idea on how to get the result?
You were on the right track with adding a result backend. The way to instruct an actor to store results is store_results=True, not save_results and the method to retrieve results is get_result(), not get_results.
When you run get_result() with block=False, you should wait the worker set result ready, like this:
while True:
try:
res = a.get_result(backend=backend)
break
except dramatiq.results.errors.ResultMissing:
# do something like retry N times.
time.sleep(1)
print(res)

Flask: delete file from server after send_file() is completed

I have a Flask backend which generates an image based on some user input, and sends this image to the client side using the send_file() function of Flask.
This is the Python server code:
#app.route('/image',methods=['POST'])
def generate_image():
cont = request.get_json()
t=cont['text']
print(cont['text'])
name = pic.create_image(t) //A different function which generates the image
time.sleep(0.5)
return send_file(f"{name}.png",as_attachment=True,mimetype="image/png")
I want to delete this image from the server after it has been sent to the client.
How do I achieve it?
Ok I solved it. I used the #app.after_request and used an if condition to check the endpoint,and then deleted the image
#app.after_request
def delete_image(response):
global image_name
if request.endpoint=="generate_image": //this is the endpoint at which the image gets generated
os.remove(image_name)
return response
Another way would be to include the decorator in the route. Thus, you do not need to check for the endpoint. Just import after_this_request from the flask lib.
from flask import after_this_request
#app.route('/image',methods=['POST'])
def generate_image():
#after_this_request
def delete_image(response):
try:
os.remove(image_name)
except Exception as ex:
print(ex)
return response
cont = request.get_json()
t=cont['text']
print(cont['text'])
name = pic.create_image(t) //A different function which generates the image
time.sleep(0.5)
return send_file(f"{name}.png",as_attachment=True,mimetype="image/png")
You could have another function delete_image() and call it at the bottom of the generate_image() function

Resources