Python3 code Uploading to S3 bucket with IO instead of String IO

Python3 code Uploading to S3 bucket with IO instead of String IO - python-3.x

I am trying to download the zip file in memory, expand it and upload it to S3.
import boto3
import io
import zipfile
import mimetypes
s3 = boto3.resource('s3')
service_zip = io.BytesIO()
service_bucket = s3.Bucket('services.mydomain.com')
build_bucket = s3.Bucket('servicesbuild.mydomain.com')
build_bucket.download_fileobj('servicesbuild.zip', service_zip)
with zipfile.ZipFile(service_zip) as myzip:
for nm in myzip.namelist():
obj = myzip.open(nm)
print(obj)
service_bucket.upload_fileobj(obj,nm,
ExtraArgs={'ContentType': mimetypes.guess_type(nm)[0]})
service_bucket.Object(nm).Acl().put(ACL='public-read')
Here is the error I get
<zipfile.ZipExtFile name='favicon.ico' mode='r' compress_type=deflate>
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-5941e5e45adc> in <module>
18 print(obj)
19 service_bucket.upload_fileobj(obj,nm,
---> 20 ExtraArgs={'ContentType': mimetypes.guess_type(nm)[0]})
21 service_bucket.Object(nm).Acl().put(ACL='public-read')
~/bitbucket/clguru/env/lib/python3.7/site-packages/boto3/s3/inject.py in bucket_upload_fileobj(self, Fileobj, Key, ExtraArgs, Callback, Config)
579 return self.meta.client.upload_fileobj(
580 Fileobj=Fileobj, Bucket=self.name, Key=Key, ExtraArgs=ExtraArgs,
--> 581 Callback=Callback, Config=Config)
582
583
~/bitbucket/clguru/env/lib/python3.7/site-packages/boto3/s3/inject.py in upload_fileobj(self, Fileobj, Bucket, Key, ExtraArgs, Callback, Config)
537 fileobj=Fileobj, bucket=Bucket, key=Key,
538 extra_args=ExtraArgs, subscribers=subscribers)
--> 539 return future.result()
540
541
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/futures.py in result(self)
71 # however if a KeyboardInterrupt is raised we want want to exit
72 # out of this and propogate the exception.
---> 73 return self._coordinator.result()
74 except KeyboardInterrupt as e:
75 self.cancel()
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/futures.py in result(self)
231 # final result.
232 if self._exception:
--> 233 raise self._exception
234 return self._result
235
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/tasks.py in _main(self, transfer_future, **kwargs)
253 # Call the submit method to start submitting tasks to execute the
254 # transfer.
--> 255 self._submit(transfer_future=transfer_future, **kwargs)
256 except BaseException as e:
257 # If there was an exception raised during the submission of task
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/upload.py in _submit(self, client, config, osutil, request_executor, transfer_future, bandwidth_limiter)
547 # Determine the size if it was not provided
548 if transfer_future.meta.size is None:
--> 549 upload_input_manager.provide_transfer_size(transfer_future)
550
551 # Do a multipart upload if needed, otherwise do a regular put object.
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/upload.py in provide_transfer_size(self, transfer_future)
324 fileobj.seek(0, 2)
325 end_position = fileobj.tell()
--> 326 fileobj.seek(start_position)
327 transfer_future.meta.provide_transfer_size(
328 end_position - start_position)
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in seek(self, offset, whence)
1023 # Position is before the current position. Reset the ZipExtFile
1024
-> 1025 self._fileobj.seek(self._orig_compress_start)
1026 self._running_crc = self._orig_start_crc
1027 self._compress_left = self._orig_compress_size
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in seek(self, offset, whence)
702 def seek(self, offset, whence=0):
703 with self._lock:
--> 704 if self.writing():
705 raise ValueError("Can't reposition in the ZIP file while "
706 "there is an open writing handle on it. "
AttributeError: '_SharedFile' object has no attribute 'writing'
If I comment out the lines after print(obj) to see the validate the zip file content,
import boto3
import io
import zipfile
import mimetypes
s3 = boto3.resource('s3')
service_zip = io.BytesIO()
service_bucket = s3.Bucket('services.readspeech.com')
build_bucket = s3.Bucket('servicesbuild.readspeech.com')
build_bucket.download_fileobj('servicesbuild.zip', service_zip)
with zipfile.ZipFile(service_zip) as myzip:
for nm in myzip.namelist():
obj = myzip.open(nm)
print(obj)
# service_bucket.upload_fileobj(obj,nm,
# ExtraArgs={'ContentType': mimetypes.guess_type(nm)[0]})
# service_bucket.Object(nm).Acl().put(ACL='public-read')
I see the following:
<zipfile.ZipExtFile name='favicon.ico' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='styles/main.css' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='images/example3.png' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='images/example1.png' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='images/example2.png' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='index.html' mode='r' compress_type=deflate>

Appears the issue is with python 3.7. I downgraded to python 3.6 and everything is fine. There is a bug reported on python 3.7
The misprint in the file lib/zipfile.py in line 704 leads to AttributeError: '_SharedFile' object has no attribute 'writing'
"self.writing()" should be replaced by "self._writing()". I also think this code should be covered by tests.
attribute 'writing
So to resolve the issue, use python 3.6.
On osx you can go back to Python 3.6 with the following command.
brew switch python 3.6.4_4

Related

Error when trying to extract zipfile using python (NotImplementedError: That compression method is not supported)

I am trying to unzip several zip files using a python script. However, an error always occurs at this line of code:
import zipfile
with zipfile.ZipFile(FILE_PATH) as zfile:
zfile.extractall(OUTPUT_FILE_PATH)
The error raised is:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-7-ead33468a516> in <module>
1 with zipfile.ZipFile(FILE_PATH) as zip_ref:
----> 2 zip_ref.extractall(OUTPUT_FILE_PATH)
~\anaconda3\lib\zipfile.py in extractall(self, path, members, pwd)
1645
1646 for zipinfo in members:
-> 1647 self._extract_member(zipinfo, path, pwd)
1648
1649 #classmethod
~\anaconda3\lib\zipfile.py in _extract_member(self, member, targetpath, pwd)
1698 return targetpath
1699
-> 1700 with self.open(member, pwd=pwd) as source, \
1701 open(targetpath, "wb") as target:
1702 shutil.copyfileobj(source, target)
~\anaconda3\lib\zipfile.py in open(self, name, mode, pwd, force_zip64)
1569 pwd = None
1570
-> 1571 return ZipExtFile(zef_file, mode, zinfo, pwd, True)
1572 except:
1573 zef_file.close()
~\anaconda3\lib\zipfile.py in __init__(self, fileobj, mode, zipinfo, pwd, close_fileobj)
817 self._left = zipinfo.file_size
818
--> 819 self._decompressor = _get_decompressor(self._compress_type)
820
821 self._eof = False
~\anaconda3\lib\zipfile.py in _get_decompressor(compress_type)
718
719 def _get_decompressor(compress_type):
--> 720 _check_compression(compress_type)
721 if compress_type == ZIP_STORED:
722 return None
~\anaconda3\lib\zipfile.py in _check_compression(compression)
698 "Compression requires the (missing) lzma module")
699 else:
--> 700 raise NotImplementedError("That compression method is not supported")
701
702
NotImplementedError: That compression method is not supported
I have also noticed that the code works for zipfile that are smaller in size, while the error message is only raised for larger zipfiles. I am not sure why this is the case when all the zipfiles are being created using the same method.
What can I do to fix this error?

Downloading "Imdb_reviews" from Tensorflow_datasets: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 30 invalid continuation byte

When I was downloading "imbd_reviews" dataset I am facing the below error,
'utf-8' codec can't decode byte 0xc5 in position 171: invalid continuation byte
import tensorflow_datasets as tfds
datasets, info = tfds.load("imdb_reviews",as_supervised=True, with_info=True)
Downloading and preparing dataset imdb_reviews (80.23 MiB) to C:\Users\desig\tensorflow_datasets\imdb_reviews\plain_text\0.1.0...
Dl Completed...:
0/0 [00:00<?, ? url/s]
Dl Size...:
0/0 [00:00<?, ? MiB/s]
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-6-f3ae52bd604b> in <module>
1 import numpy as np
----> 2 datasets, info = tfds.load("imdb_reviews",as_supervised=True, with_info=True)
3
~\anaconda3\lib\site-packages\tensorflow_datasets\core\api_utils.py in disallow_positional_args_dec(fn, instance, args, kwargs)
50 _check_no_positional(fn, args, ismethod, allowed=allowed)
51 _check_required(fn, kwargs)
---> 52 return fn(*args, **kwargs)
53
54 return disallow_positional_args_dec(wrapped) # pylint: disable=no-value-for-parameter
~\anaconda3\lib\site-packages\tensorflow_datasets\core\registered.py in load(name, split, data_dir, batch_size, in_memory, shuffle_files, download, as_supervised, decoders, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
298 if download:
299 download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 300 dbuilder.download_and_prepare(**download_and_prepare_kwargs)
301
302 if as_dataset_kwargs is None:
~\anaconda3\lib\site-packages\tensorflow_datasets\core\api_utils.py in disallow_positional_args_dec(fn, instance, args, kwargs)
50 _check_no_positional(fn, args, ismethod, allowed=allowed)
51 _check_required(fn, kwargs)
---> 52 return fn(*args, **kwargs)
53
54 return disallow_positional_args_dec(wrapped) # pylint: disable=no-value-for-parameter
~\anaconda3\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in download_and_prepare(self, download_dir, download_config)
305 self.info.size_in_bytes = dl_manager.downloaded_size
306 # Write DatasetInfo to disk, even if we haven't computed the statistics.
--> 307 self.info.write_to_directory(self._data_dir)
308 self._log_download_done()
309
~\anaconda3\lib\contextlib.py in __exit__(self, type, value, traceback)
118 if type is None:
119 try:
--> 120 next(self.gen)
121 except StopIteration:
122 return False
~\anaconda3\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py in incomplete_dir(dirname)
198 try:
199 yield tmp_dir
--> 200 tf.io.gfile.rename(tmp_dir, dirname)
201 finally:
202 if tf.io.gfile.exists(tmp_dir):
~\anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py in rename_v2(src, dst, overwrite)
543 errors.OpError: If the operation fails.
544 """
--> 545 _pywrap_file_io.RenameFile(
546 compat.as_bytes(src), compat.as_bytes(dst), overwrite)
547
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 171: invalid continuation byte
Tensorflow version - 2.3.0
numpy version - 1.18.5
python version - 3.8.8
windows10 x64
Does any one have an idea, Thank you.

My tensorlflow version is 2.4.1 and I solved it by updating tfds to 4.5.2. Therefore, update tfds to a new version may be useful.

(As mentioned by 徐奥博)
Please try again by upgrading the Tensorflow version or tensorflow-datasets as below:
pip install --upgrade tensorflow
pip install --upgrade tensorflow-datasets
import tensorflow_datasets as tfds
datasets, info = tfds.load("imdb_reviews",as_supervised=True, with_info=True)

WinError 126 Error when connecting to HDFS using hdfs3

I am trying to read a file of a work HDFS location using the following code:
import hdfs3
from hdfs3 import HDFileSystem
hdfs=HDFileSystem(host='host',port='port')
with hdfs.open('FILE') as f:
model_AOB = f.read()
I am getting the following error:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-1-d44f943ebe4e> in <module>()
1 import hdfs3
2 from hdfs3 import HDFileSystem
----> 3 hdfs=HDFileSystem(host='HOST',port=PORT)
4 with hdfs.open('FILE') as f:
5 model_AOB = f.read()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\hdfs3\core.py in __init__(self, host, port, connect, autoconf, pars, **kwargs)
86
87 if connect:
---> 88 self.connect()
89
90 def __getstate__(self):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\hdfs3\core.py in connect(self)
104 This happens automatically at startup
105 """
--> 106 get_lib()
107 conf = self.conf.copy()
108 if self._handle:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\hdfs3\core.py in get_lib()
668 global _lib
669 if _lib is None:
--> 670 from .lib import _lib as l
671 _lib = l
672
~\AppData\Local\Continuum\anaconda3\lib\site-packages\hdfs3\lib.py in <module>()
15 for name in ['libhdfs3.so', 'libhdfs3.dylib']:
16 try:
---> 17 _lib = ct.cdll.LoadLibrary(name)
18 break
19 except OSError as e:
~\AppData\Local\Continuum\anaconda3\lib\ctypes\__init__.py in LoadLibrary(self, name)
432
433 def LoadLibrary(self, name):
--> 434 return self._dlltype(name)
435
436 cdll = LibraryLoader(CDLL)
~\AppData\Local\Continuum\anaconda3\lib\ctypes\__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
354
355 if handle is None:
--> 356 self._handle = _dlopen(self._name, mode)
357 else:
358 self._handle = handle
OSError: [WinError 126] The specified module could not be found
I have also tried adding in the argument pars = {"hadoop.security.authentication": "kerberos"} in the HDFileSystem function as I believe the hadoop cluster is kerberized.
Can anyone help with this issue? Apologies for the chunky question, I'm new to python so I didn't want to accidentally leave out something relevant in the error.
Thanks

keras model.save() issues RuntimeError: Unable to flush file's cached information

I'm facing some issues with my databricks cluster configuration, and issue is that i'm not able to put a finger on where and why.
I was trying to save a keras model, and it seems to be not going well
dataset = pd.DataFrame([item.split(',') for item in '''6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1'''.split('\n')])
X = dataset.iloc[:,0:8]
y = dataset.iloc[:,8]
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=3, batch_size=10)
accuracy = model.evaluate(X, y, verbose=0)
print(accuracy)
The issue is with saving the model, can anyone help me understand what the error is all about
I'm using Python 3.7.3, DBRuntime 6.2 (includes Apache Spark 2.4.4, Scala 2.11)
model.save('/dbfs/FileStore/tables/temp/new_model.h5')
KeyError Traceback (most recent call
last)
/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py
in save_model(model, filepath, overwrite, include_optimizer)
540 with H5Dict(filepath, mode='w') as h5dict:
--> 541 _serialize_model(model, h5dict, include_optimizer)
542 elif hasattr(filepath, 'write') and callable(filepath.write):
/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py
in _serialize_model(model, h5dict, include_optimizer)
160 for name, val in zip(weight_names, weight_values):
--> 161 layer_group[name] = val
162 if include_optimizer and model.optimizer:
/databricks/python/lib/python3.7/site-packages/keras/utils/io_utils.py
in setitem(self, attr, val)
230 raise KeyError('Cannot set attribute. '
--> 231 'Group with name "{}" exists.'.format(attr))
232 if is_np:
KeyError: 'Cannot set attribute. Group with name
"b\'dense_1/kernel:0\'" exists.'
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call
last) in
----> 1 model.save('/dbfs/FileStore/tables/temp/new_model.h5')
/databricks/python/lib/python3.7/site-packages/keras/engine/network.py
in save(self, filepath, overwrite, include_optimizer) 1150
raise NotImplementedError 1151 from ..models import
save_model
-> 1152 save_model(self, filepath, overwrite, include_optimizer) 1153 1154 #saving.allow_write_to_gcs
/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py
in save_wrapper(obj, filepath, overwrite, *args, **kwargs)
447 os.remove(tmp_filepath)
448 else:
--> 449 save_function(obj, filepath, overwrite, *args, **kwargs)
450
451 return save_wrapper
/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py
in save_model(model, filepath, overwrite, include_optimizer)
539 return
540 with H5Dict(filepath, mode='w') as h5dict:
--> 541 _serialize_model(model, h5dict, include_optimizer)
542 elif hasattr(filepath, 'write') and callable(filepath.write):
543 # write as binary stream
/databricks/python/lib/python3.7/site-packages/keras/utils/io_utils.py
in exit(self, exc_type, exc_val, exc_tb)
368
369 def exit(self, exc_type, exc_val, exc_tb):
--> 370 self.close()
371
372
/databricks/python/lib/python3.7/site-packages/keras/utils/io_utils.py
in close(self)
344 def close(self):
345 if isinstance(self.data, h5py.Group):
--> 346 self.data.file.flush()
347 if self._is_file:
348 self.data.close()
/databricks/python/lib/python3.7/site-packages/h5py/_hl/files.py in
flush(self)
450 """
451 with phil:
--> 452 h5f.flush(self.id)
453
454 #with_phil
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5f.pyx in h5py.h5f.flush()
RuntimeError: Unable to flush file's cached information (file write
failed: time = Fri Jan 31 08:19:53 2020 , filename =
'/dbfs/FileStore/tables/temp/new_model.h5', file descriptor = 9, errno
= 95, error message = 'Operation not supported', buf = 0x6993c98, total write size = 320, bytes this sub-write = 320, bytes actually
written = 18446744073709551615, offset = 800)

I was finally able to save the model, by saving it on driver only and copying it on s3...
import os
import shutil
classification_model.save('news_dedup_model.h5')
shutil.copyfile('/databricks/driver/news_dedup_model.h5', '/dbfs/FileStore/tables/temp/nemish/news_dedup_model.h5')
classification_model = load_model('/dbfs/FileStore/tables/temp/nemish/news_dedup_model.h5', custom_objects={'tf': tf})
Still unable to figure out, why wouldn't it save normally

Because keras model.save() doesn't support writing to a FUSE mount. Doing so you'll get 'Operation Not Supported' error.
You need to first write it to the driver node's local disk (where python's working directory is), then move it to DFBS FUSE mount using '/dbfs/your/path/on/DBFS'.

Connect to S3 accelerate endpoint with boto3

I want to download a file into a Python file object from an S3 bucket that has acceleration activated. I came across a few resources suggesting whether to overwrite the endpoint_url to "s3-accelerate.amazonaws.com" and/or to use the use_accelerate_endpoint attribute.
I have tried both, and several variations but the same error was returned everytime. One of the scripts I tried is:
from botocore.config import Config
import boto3
from io import BytesIO
session = boto3.session.Session()
s3 = session.client(
service_name='s3',
aws_access_key_id=<MY_KEY_ID>,
aws_secret_access_key=<MY_KEY>,
region_name="us-west-2",
config=Config(s3={"use_accelerate_endpoint": True,
"addressing_style": "path"}))
input = BytesIO()
s3.download_fileobj(<MY_BUCKET>,<MY_KEY>, input)
Returns the following error:
---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
<ipython-input-61-92b89b45f215> in <module>()
11 "addressing_style": "path"}))
12 input = BytesIO()
---> 13 s3.download_fileobj(bucket, filename, input)
14
15
~/Project/venv/lib/python3.5/site-packages/boto3/s3/inject.py in download_fileobj(self, Bucket, Key, Fileobj, ExtraArgs, Callback, Config)
568 bucket=Bucket, key=Key, fileobj=Fileobj,
569 extra_args=ExtraArgs, subscribers=subscribers)
--> 570 return future.result()
571
572
~/Project//venv/lib/python3.5/site-packages/s3transfer/futures.py in result(self)
71 # however if a KeyboardInterrupt is raised we want want to exit
72 # out of this and propogate the exception.
---> 73 return self._coordinator.result()
74 except KeyboardInterrupt as e:
75 self.cancel()
~/Project/venv/lib/python3.5/site-packages/s3transfer/futures.py in result(self)
231 # final result.
232 if self._exception:
--> 233 raise self._exception
234 return self._result
235
~/Project/venv/lib/python3.5/site-packages/s3transfer/tasks.py in _main(self, transfer_future, **kwargs)
253 # Call the submit method to start submitting tasks to execute the
254 # transfer.
--> 255 self._submit(transfer_future=transfer_future, **kwargs)
256 except BaseException as e:
257 # If there was an exception raised during the submission of task
~/Project/venv/lib/python3.5/site-packages/s3transfer/download.py in _submit(self, client, config, osutil, request_executor, io_executor, transfer_future)
347 Bucket=transfer_future.meta.call_args.bucket,
348 Key=transfer_future.meta.call_args.key,
--> 349 **transfer_future.meta.call_args.extra_args
350 )
351 transfer_future.meta.provide_transfer_size(
~/Project/venv/lib/python3.5/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
310 "%s() only accepts keyword arguments." % py_operation_name)
311 # The "self" in this scope is referring to the BaseClient.
--> 312 return self._make_api_call(operation_name, kwargs)
313
314 _api_call.__name__ = str(py_operation_name)
~/Project/venv/lib/python3.5/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
603 error_code = parsed_response.get("Error", {}).get("Code")
604 error_class = self.exceptions.from_code(error_code)
--> 605 raise error_class(parsed_response, operation_name)
606 else:
607 return parsed_response
ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
When I run the same script with "use_accelerate_endpoint": False it works fine.
However, it returned the same error when:
I overwrite the endpoint_url with "s3-accelerate.amazonaws.com"
I define "addressing_style": "virtual"
When running
s3.get_bucket_accelerate_configuration(Bucket=<MY_BUCKET>)
I get {..., 'Status': 'Enabled'} as expected.
Any idea what is wrong with that code and what I should change to properly query the accelerate endpoint of that bucket?
Using python3.5 with boto3==1.4.7, botocore==1.7.43 on Ubuntu 17.04.
EDIT:
I have also tried a similar script for uploads:
from botocore.config import Config
import boto3
from io import BytesIO
session = boto3.session.Session()
s3 = session.client(
service_name='s3',
aws_access_key_id=<MY_KEY_ID>,
aws_secret_access_key=<MY_KEY>,
region_name="us-west-2",
config=Config(s3={"use_accelerate_endpoint": True,
"addressing_style": "virtual"}))
output = BytesIO()
output.seek(0)
s3.upload_fileobj(output, <MY_BUCKET>,<MY_KEY>)
Which works without the use_accelerate_endpoint option (so my keys are fine), but returns this error when True:
ClientError: An error occurred (SignatureDoesNotMatch) when calling the PutObject operation: The request signature we calculated does not match the signature you provided. Check your key and signing method.
I have tried both addressing_style options here as well (virtual and path)

Using boto3==1.4.7 and botocore==1.7.43.
Here is one way to retrieve an object from a bucket with transfer acceleration enabled.
import boto3
from botocore.config import Config
from io import BytesIO
config = Config(s3={"use_accelerate_endpoint": True})
s3_resource = boto3.resource("s3",
aws_access_key_id=<MY_KEY_ID>,
aws_secret_access_key=<MY_KEY>,
region_name="us-west-2",
config=config)
s3_client = s3_resource.meta.client
file_object = BytesIO()
s3_client.download_fileobj(<MY_BUCKET>, <MY_KEY>, file_object)
Note that the client sends a HEAD request to the accelerated endpoint before a GET.
The canonical request of which looks somewhat like the following:
CanonicalRequest:
HEAD
/<MY_KEY>
host:<MY_BUCKET>.s3-accelerate.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20200520T204128Z
host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Some reasons why the HEAD request can fail include:
Object with given key doesn't exist or has strict access control enabled
Invalid credentials
Transfer acceleration isn't enabled

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python3 code Uploading to S3 bucket with IO instead of String IO - python-3.x

Related

Error when trying to extract zipfile using python (NotImplementedError: That compression method is not supported)

Downloading "Imdb_reviews" from Tensorflow_datasets: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 30 invalid continuation byte

WinError 126 Error when connecting to HDFS using hdfs3

keras model.save() issues RuntimeError: Unable to flush file's cached information

Connect to S3 accelerate endpoint with boto3

Categories

Resources