Unable to use equ.traineddata for tesseract (throes error) but hin,ben,eng works well - python-3.x

I have my tesseract installed at /usr/share/tesseract-ocr/ and is working fine wit hte tessdata directory at /usr/share/tesseract-ocr/4.0/tessdata. Because the equ.traineddata is not given with the original data, I have downoaded it from the officil documentation at managed to paste it at the /usr/share/tesseract-ocr/4.0/tessdata/equ.traineddata. Aong with it, I pasted hin,ben and a few more files too. When I use --l eng+hin+ben it works fine but with the equ it throws error. I used pytesseract too with a few configs such as:
# making a copy of tessdata dir in the home
cli_config = '--oem 1 --psm 12 --tessdata-dir ~/tessdata/ -l eng+equ+ben+hin'
ocr.image_to_string(image=img_path,config=cli_config)
and also
cli_config = '--oem 1 --psm 12` # tessdata is at default location too
ocr.image_to_string(image=img_path,config=cli_config,lang='eng+equ+hin+ben`)
but it keeps throwing me error ONLY FOR equ like:
TesseractError Traceback (most recent call last)
<ipython-input-30-8529ae8e51e8> in <module>
----> 1 ocr.image_to_string(image=img_path,config=cli_config,lang='equ')
~/anaconda3/envs/py36/lib/python3.6/site-packages/pytesseract/pytesseract.py in image_to_string(image, lang, config, nice, output_type, timeout)
356 Output.DICT: lambda: {'text': run_and_get_output(*args)},
357 Output.STRING: lambda: run_and_get_output(*args),
--> 358 }[output_type]()
359
360
~/anaconda3/envs/py36/lib/python3.6/site-packages/pytesseract/pytesseract.py in <lambda>()
355 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
356 Output.DICT: lambda: {'text': run_and_get_output(*args)},
--> 357 Output.STRING: lambda: run_and_get_output(*args),
358 }[output_type]()
359
~/anaconda3/envs/py36/lib/python3.6/site-packages/pytesseract/pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
264 }
265
--> 266 run_tesseract(**kwargs)
267 filename = kwargs['output_filename_base'] + extsep + extension
268 with open(filename, 'rb') as output_file:
~/anaconda3/envs/py36/lib/python3.6/site-packages/pytesseract/pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
240 with timeout_manager(proc, timeout) as error_string:
241 if proc.returncode:
--> 242 raise TesseractError(proc.returncode, get_errors(error_string))
243
244
TesseractError: (1, 'Error opening data file /home/deshwal/anaconda3/envs/py36/share/tessdata/equ.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'equ\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
What could be the reason for this? How can use the equ.traineddata ?

equ is legacy language data. Thus, you'd need to use the appropriate oem value. Try tesseract --help-extra command to show usage.

Related

Huggingface tokenizer not able to load model after upgrading python to 3.10

I just updated Python to version 3.10.8. Note that I use JupyterLab.
I had to re-install a lot of packages, but now I get an error when I try to load the tokenizer of an HuggingFace model
This is my code:
# Import libraries
from transformers import pipeline, AutoTokenizer
# Define checkpoint
model_checkpoint = 'deepset/xlm-roberta-large-squad2'
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
Note that version of transformers is 4.24.0.
This is the error I get:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In [3], line 2
1 # Tokenizer
----> 2 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
File ~/.local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:637, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
635 tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
636 if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
--> 637 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
638 else:
639 if tokenizer_class_py is not None:
File ~/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1777, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
1774 else:
1775 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
-> 1777 return cls._from_pretrained(
1778 resolved_vocab_files,
1779 pretrained_model_name_or_path,
1780 init_configuration,
1781 *init_inputs,
1782 use_auth_token=use_auth_token,
1783 cache_dir=cache_dir,
1784 local_files_only=local_files_only,
1785 _commit_hash=commit_hash,
1786 **kwargs,
1787 )
File ~/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1932, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, use_auth_token, cache_dir, local_files_only, _commit_hash, *init_inputs, **kwargs)
1930 # Instantiate tokenizer.
1931 try:
-> 1932 tokenizer = cls(*init_inputs, **init_kwargs)
1933 except OSError:
1934 raise OSError(
1935 "Unable to load vocabulary from file. "
1936 "Please check that the provided vocabulary is accessible and not corrupted."
1937 )
File ~/.local/lib/python3.10/site-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py:155, in XLMRobertaTokenizerFast.__init__(self, vocab_file, tokenizer_file, bos_token, eos_token, sep_token, cls_token, unk_token, pad_token, mask_token, **kwargs)
139 def __init__(
140 self,
141 vocab_file=None,
(...)
151 ):
152 # Mask token behave like a normal word, i.e. include the space before it
153 mask_token = AddedToken(mask_token, lstrip=True, rstrip=False) if isinstance(mask_token, str) else mask_token
--> 155 super().__init__(
156 vocab_file,
157 tokenizer_file=tokenizer_file,
158 bos_token=bos_token,
159 eos_token=eos_token,
160 sep_token=sep_token,
161 cls_token=cls_token,
162 unk_token=unk_token,
163 pad_token=pad_token,
164 mask_token=mask_token,
165 **kwargs,
166 )
168 self.vocab_file = vocab_file
169 self.can_save_slow_tokenizer = False if not self.vocab_file else True
File ~/.local/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:114, in PreTrainedTokenizerFast.__init__(self, *args, **kwargs)
111 fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
112 elif slow_tokenizer is not None:
113 # We need to convert a slow tokenizer to build the backend
--> 114 fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
115 elif self.slow_tokenizer_class is not None:
116 # We need to create and convert a slow tokenizer to build the backend
117 slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
File ~/.local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:1162, in convert_slow_tokenizer(transformer_tokenizer)
1154 raise ValueError(
1155 f"An instance of tokenizer class {tokenizer_class_name} cannot be converted in a Fast tokenizer instance."
1156 " No converter was found. Currently available slow->fast convertors:"
1157 f" {list(SLOW_TO_FAST_CONVERTERS.keys())}"
1158 )
1160 converter_class = SLOW_TO_FAST_CONVERTERS[tokenizer_class_name]
-> 1162 return converter_class(transformer_tokenizer).converted()
File ~/.local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:438, in SpmConverter.__init__(self, *args)
434 requires_backends(self, "protobuf")
436 super().__init__(*args)
--> 438 from .utils import sentencepiece_model_pb2 as model_pb2
440 m = model_pb2.ModelProto()
441 with open(self.original_tokenizer.vocab_file, "rb") as f:
File ~/.local/lib/python3.10/site-packages/transformers/utils/sentencepiece_model_pb2.py:20
18 from google.protobuf import descriptor as _descriptor
19 from google.protobuf import message as _message
---> 20 from google.protobuf import reflection as _reflection
21 from google.protobuf import symbol_database as _symbol_database
24 # ##protoc_insertion_point(imports)
File /usr/lib/python3/dist-packages/google/protobuf/reflection.py:58
56 from google.protobuf.pyext import cpp_message as message_impl
57 else:
---> 58 from google.protobuf.internal import python_message as message_impl
60 # The type of all Message classes.
61 # Part of the public interface, but normally only used by message factories.
62 GeneratedProtocolMessageType = message_impl.GeneratedProtocolMessageType
File /usr/lib/python3/dist-packages/google/protobuf/internal/python_message.py:69
66 import copyreg as copyreg
68 # We use "as" to avoid name collisions with variables.
---> 69 from google.protobuf.internal import containers
70 from google.protobuf.internal import decoder
71 from google.protobuf.internal import encoder
File /usr/lib/python3/dist-packages/google/protobuf/internal/containers.py:182
177 collections.MutableMapping.register(MutableMapping)
179 else:
180 # In Python 3 we can just use MutableMapping directly, because it defines
181 # __slots__.
--> 182 MutableMapping = collections.MutableMapping
185 class BaseContainer(object):
187 """Base container class."""
AttributeError: module 'collections' has no attribute 'MutableMapping'
I tried several solutions (for example, this and this), but none seem to work.
According to this link, I should change collections.Mapping into collections.abc.Mapping, but I wouldn't knwo where to do it.
Another possible solution is downgrading Python to 3.9, but I would like to keep it as last resort.
How can I fix this?
Turned out it was a problem related to protobuf module. I updated it to the latest version to date (which is 4.21.9).
This changed the error to:
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
1. Downgrade the protobuf package to 3.20.x or lower.
2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
So I downgraded protobuf to version 3.20.0 and that worked.
For further details, look here.

Cytoscape: How do you import ABC file types with py2cytoscape's cyrest api?

I have a file of the type:
A B 0.123
A C 0.84
B D 0.52
...
Where the data are tab separated, and the first and second columns are the nodes, and the third is the associated edge weight.
When trying to import this file into cytoscape using py2cytoscape, I'm receiving an error:
from py2cytoscape import cyrest
fileName="/Users/96v/Documents/lco/lcoAllAt25/lcoAll25/lcoAll25_top0.041pct_data/lcoAll25_top0.041pct.txt"
cyclient = cyrest.cyclient()
cyclient.network.import_file(dataTypeList='string,string,double',
afile=fileName,
delimiters='\t',
indexColumnSourceInteraction="0",
indexColumnTargetInteraction="1",
verbose=True)
'http://localhost:1234/v1/commands/network/import file'
TypeError Traceback (most recent call last)
in
----> 1 cyclient.network.import_file(dataTypeList='string,string,double', afile=fileName, delimiters='\t', indexColumnSourceInteraction="0", indexColumnTargetInteraction="1", defaultInteraction="Edge Attribute",verbose=True)
2
~/opt/anaconda3/lib/python3.8/site-packages/py2cytoscape/cyrest/network.py in import_file(self, dataTypeList, defaultInteraction, delimiters, delimitersForDataList, afile, firstRowAsColumnNames, indexColumnSourceInteraction, indexColumnTargetInteraction, indexColumnTypeInteraction, NetworkViewRendererList, RootNetworkList, startLoadRow, TargetColumnList, verbose)
464 afile,firstRowAsColumnNames,indexColumnSourceInteraction,indexColumnTargetInteraction,
465 indexColumnTypeInteraction,NetworkViewRendererList,RootNetworkList,startLoadRow,TargetColumnList])
--> 466 response=api(url=self.__url+"/import file", PARAMS=PARAMS, method="POST", verbose=verbose)
467 return response
468
~/opt/anaconda3/lib/python3.8/site-packages/py2cytoscape/cyrest/base.py in api(namespace, command, PARAMS, body, host, port, version, method, verbose, url, parse_params)
139 sys.stdout.flush()
140 r = requests.post(url = baseurl, json = PARAMS)
--> 141 verbose_=checkresponse(r, verbose=verbose)
142 if (verbose) or (verbose_):
143 verbose=True
~/opt/anaconda3/lib/python3.8/site-packages/py2cytoscape/cyrest/base.py in checkresponse(r, verbose)
43 if 200 <= status < 300:
44 if verbose:
---> 45 print("response status "+status)
46 sys.stdout.flush()
47 res=None
TypeError: can only concatenate str (not "int") to str
The edge weights aren't being recognized, yet the documentation isn't as verbose for this function.
Any help would be extremely appreciated!
After looking further at the GUI, I realized:
Columns are not 0 indexed.
Verbose has an error in it.
The below code works fine:
from py2cytoscape import cyrest
fileName="pathToFile"
cyclient = cyrest.cyclient()
collection = cyclient.network.import_file(dataTypeList='string,string,double',
afile=fileName,
delimiters='\t',
indexColumnSourceInteraction="1",
indexColumnTargetInteraction="2",
defaultInteraction="interacts with")

Some mistakes when I reload "torch.nn.functioanl"?

I want to modify some functions in "torch.nn.functional", and then I use "importlib.reload" to reload the new "torch.nn.functional".But I encounter a Runtime error:function 'avg_pool2d' already has a docstring.
I am trying to do some work on excitationbp(https://github.com/greydanus/excitationbp), and there is a operation named "eb.use_eb()" to modify "torch.nn.functional". The functiona eb.use_eb() is for modifying the "torch.nn.linear","torch.nn.functional.conv1d" and so on.I try to modify the code including many eb.use_eb() in the utils.py.When I want to run the original "torch.nn.linear",I must run the eb.use_eb(False).But the module has loaded in the head of the file(import torch.nn.funtional). So I want to reload the module "torch.nn.funtional".
When I use the "importlib.reload", I encounter a error.
eb.use_eb(False,True)
importlib.reload(torch)
importlib.reload(torch.nn)
importlib.reload(torch.nn.functional)
print("gradient_evidence:"+str(gradient_evidence))
return torch.autograd.grad(contr_h_, target_h_, grad_outputs=gradient_evidence)[0]
RuntimeError Traceback (most recent call last)
ipython-input-15-29d0fca29882 in module()
----> 1 prob_inputs_one = eb.excitation_backprop(model, inputs, prob_outputs_one, contrastive=2)
2 #pdb.set_trace()
3 prob_inputs_true = eb.excitation_backprop(model, inputs, prob_outputs_true, contrastive=2)
~/code/excitationbp/excitationbp-master/excitationbp/utils.py in excitation_backprop(model, inputs, prob_outputs, contrastive, target_layer)
75 importlib.reload(torch)
76 importlib.reload(torch.nn)
---> 77 importlib.reload(torch.nn.functional)
78
79 gradient_evidence = torch.autograd.grad(top_h_, contr_h_, grad_outputs=prob_outputs.clone())[0]
~/anaconda3/envs/pytorch0.3/lib/python3.6/importlib/__init__.py in reload(module)
164 target = module
165 spec = module.__spec__ = _bootstrap._find_spec(name, pkgpath, target)
--> 166 _bootstrap._exec(spec, module)
167 # The module may have replaced itself in sys.modules!
168 return sys.modules[name]
~/anaconda3/envs/pytorch0.3/lib/python3.6/importlib/_bootstrap.py in _exec(spec, module)
~/anaconda3/envs/pytorch0.3/lib/python3.6/importlib/_bootstrap_external.py in exec_module(self, module)
~/anaconda3/envs/pytorch0.3/lib/python3.6/importlib/_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)
~/anaconda3/envs/pytorch0.3/lib/python3.6/site-packages/torch/nn/functional.py in <module>()
286 count_include_pad: when True, will include the zero-padding in th
287 averaging calculation. Default: ``True``
--> 288 """)
289
290 avg_pool3d = _add_docstr(torch._C._nn.avg_pool3d, r"""
RuntimeError: function 'avg_pool2d' already has a docstring

Python3 code Uploading to S3 bucket with IO instead of String IO

I am trying to download the zip file in memory, expand it and upload it to S3.
import boto3
import io
import zipfile
import mimetypes
s3 = boto3.resource('s3')
service_zip = io.BytesIO()
service_bucket = s3.Bucket('services.mydomain.com')
build_bucket = s3.Bucket('servicesbuild.mydomain.com')
build_bucket.download_fileobj('servicesbuild.zip', service_zip)
with zipfile.ZipFile(service_zip) as myzip:
for nm in myzip.namelist():
obj = myzip.open(nm)
print(obj)
service_bucket.upload_fileobj(obj,nm,
ExtraArgs={'ContentType': mimetypes.guess_type(nm)[0]})
service_bucket.Object(nm).Acl().put(ACL='public-read')
Here is the error I get
<zipfile.ZipExtFile name='favicon.ico' mode='r' compress_type=deflate>
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-5941e5e45adc> in <module>
18 print(obj)
19 service_bucket.upload_fileobj(obj,nm,
---> 20 ExtraArgs={'ContentType': mimetypes.guess_type(nm)[0]})
21 service_bucket.Object(nm).Acl().put(ACL='public-read')
~/bitbucket/clguru/env/lib/python3.7/site-packages/boto3/s3/inject.py in bucket_upload_fileobj(self, Fileobj, Key, ExtraArgs, Callback, Config)
579 return self.meta.client.upload_fileobj(
580 Fileobj=Fileobj, Bucket=self.name, Key=Key, ExtraArgs=ExtraArgs,
--> 581 Callback=Callback, Config=Config)
582
583
~/bitbucket/clguru/env/lib/python3.7/site-packages/boto3/s3/inject.py in upload_fileobj(self, Fileobj, Bucket, Key, ExtraArgs, Callback, Config)
537 fileobj=Fileobj, bucket=Bucket, key=Key,
538 extra_args=ExtraArgs, subscribers=subscribers)
--> 539 return future.result()
540
541
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/futures.py in result(self)
71 # however if a KeyboardInterrupt is raised we want want to exit
72 # out of this and propogate the exception.
---> 73 return self._coordinator.result()
74 except KeyboardInterrupt as e:
75 self.cancel()
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/futures.py in result(self)
231 # final result.
232 if self._exception:
--> 233 raise self._exception
234 return self._result
235
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/tasks.py in _main(self, transfer_future, **kwargs)
253 # Call the submit method to start submitting tasks to execute the
254 # transfer.
--> 255 self._submit(transfer_future=transfer_future, **kwargs)
256 except BaseException as e:
257 # If there was an exception raised during the submission of task
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/upload.py in _submit(self, client, config, osutil, request_executor, transfer_future, bandwidth_limiter)
547 # Determine the size if it was not provided
548 if transfer_future.meta.size is None:
--> 549 upload_input_manager.provide_transfer_size(transfer_future)
550
551 # Do a multipart upload if needed, otherwise do a regular put object.
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/upload.py in provide_transfer_size(self, transfer_future)
324 fileobj.seek(0, 2)
325 end_position = fileobj.tell()
--> 326 fileobj.seek(start_position)
327 transfer_future.meta.provide_transfer_size(
328 end_position - start_position)
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in seek(self, offset, whence)
1023 # Position is before the current position. Reset the ZipExtFile
1024
-> 1025 self._fileobj.seek(self._orig_compress_start)
1026 self._running_crc = self._orig_start_crc
1027 self._compress_left = self._orig_compress_size
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in seek(self, offset, whence)
702 def seek(self, offset, whence=0):
703 with self._lock:
--> 704 if self.writing():
705 raise ValueError("Can't reposition in the ZIP file while "
706 "there is an open writing handle on it. "
AttributeError: '_SharedFile' object has no attribute 'writing'
If I comment out the lines after print(obj) to see the validate the zip file content,
import boto3
import io
import zipfile
import mimetypes
s3 = boto3.resource('s3')
service_zip = io.BytesIO()
service_bucket = s3.Bucket('services.readspeech.com')
build_bucket = s3.Bucket('servicesbuild.readspeech.com')
build_bucket.download_fileobj('servicesbuild.zip', service_zip)
with zipfile.ZipFile(service_zip) as myzip:
for nm in myzip.namelist():
obj = myzip.open(nm)
print(obj)
# service_bucket.upload_fileobj(obj,nm,
# ExtraArgs={'ContentType': mimetypes.guess_type(nm)[0]})
# service_bucket.Object(nm).Acl().put(ACL='public-read')
I see the following:
<zipfile.ZipExtFile name='favicon.ico' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='styles/main.css' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='images/example3.png' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='images/example1.png' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='images/example2.png' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='index.html' mode='r' compress_type=deflate>
Appears the issue is with python 3.7. I downgraded to python 3.6 and everything is fine. There is a bug reported on python 3.7
The misprint in the file lib/zipfile.py in line 704 leads to AttributeError: '_SharedFile' object has no attribute 'writing'
"self.writing()" should be replaced by "self._writing()". I also think this code should be covered by tests.
attribute 'writing
So to resolve the issue, use python 3.6.
On osx you can go back to Python 3.6 with the following command.
brew switch python 3.6.4_4

Connect to S3 accelerate endpoint with boto3

I want to download a file into a Python file object from an S3 bucket that has acceleration activated. I came across a few resources suggesting whether to overwrite the endpoint_url to "s3-accelerate.amazonaws.com" and/or to use the use_accelerate_endpoint attribute.
I have tried both, and several variations but the same error was returned everytime. One of the scripts I tried is:
from botocore.config import Config
import boto3
from io import BytesIO
session = boto3.session.Session()
s3 = session.client(
service_name='s3',
aws_access_key_id=<MY_KEY_ID>,
aws_secret_access_key=<MY_KEY>,
region_name="us-west-2",
config=Config(s3={"use_accelerate_endpoint": True,
"addressing_style": "path"}))
input = BytesIO()
s3.download_fileobj(<MY_BUCKET>,<MY_KEY>, input)
Returns the following error:
---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
<ipython-input-61-92b89b45f215> in <module>()
11 "addressing_style": "path"}))
12 input = BytesIO()
---> 13 s3.download_fileobj(bucket, filename, input)
14
15
~/Project/venv/lib/python3.5/site-packages/boto3/s3/inject.py in download_fileobj(self, Bucket, Key, Fileobj, ExtraArgs, Callback, Config)
568 bucket=Bucket, key=Key, fileobj=Fileobj,
569 extra_args=ExtraArgs, subscribers=subscribers)
--> 570 return future.result()
571
572
~/Project//venv/lib/python3.5/site-packages/s3transfer/futures.py in result(self)
71 # however if a KeyboardInterrupt is raised we want want to exit
72 # out of this and propogate the exception.
---> 73 return self._coordinator.result()
74 except KeyboardInterrupt as e:
75 self.cancel()
~/Project/venv/lib/python3.5/site-packages/s3transfer/futures.py in result(self)
231 # final result.
232 if self._exception:
--> 233 raise self._exception
234 return self._result
235
~/Project/venv/lib/python3.5/site-packages/s3transfer/tasks.py in _main(self, transfer_future, **kwargs)
253 # Call the submit method to start submitting tasks to execute the
254 # transfer.
--> 255 self._submit(transfer_future=transfer_future, **kwargs)
256 except BaseException as e:
257 # If there was an exception raised during the submission of task
~/Project/venv/lib/python3.5/site-packages/s3transfer/download.py in _submit(self, client, config, osutil, request_executor, io_executor, transfer_future)
347 Bucket=transfer_future.meta.call_args.bucket,
348 Key=transfer_future.meta.call_args.key,
--> 349 **transfer_future.meta.call_args.extra_args
350 )
351 transfer_future.meta.provide_transfer_size(
~/Project/venv/lib/python3.5/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
310 "%s() only accepts keyword arguments." % py_operation_name)
311 # The "self" in this scope is referring to the BaseClient.
--> 312 return self._make_api_call(operation_name, kwargs)
313
314 _api_call.__name__ = str(py_operation_name)
~/Project/venv/lib/python3.5/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
603 error_code = parsed_response.get("Error", {}).get("Code")
604 error_class = self.exceptions.from_code(error_code)
--> 605 raise error_class(parsed_response, operation_name)
606 else:
607 return parsed_response
ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
When I run the same script with "use_accelerate_endpoint": False it works fine.
However, it returned the same error when:
I overwrite the endpoint_url with "s3-accelerate.amazonaws.com"
I define "addressing_style": "virtual"
When running
s3.get_bucket_accelerate_configuration(Bucket=<MY_BUCKET>)
I get {..., 'Status': 'Enabled'} as expected.
Any idea what is wrong with that code and what I should change to properly query the accelerate endpoint of that bucket?
Using python3.5 with boto3==1.4.7, botocore==1.7.43 on Ubuntu 17.04.
EDIT:
I have also tried a similar script for uploads:
from botocore.config import Config
import boto3
from io import BytesIO
session = boto3.session.Session()
s3 = session.client(
service_name='s3',
aws_access_key_id=<MY_KEY_ID>,
aws_secret_access_key=<MY_KEY>,
region_name="us-west-2",
config=Config(s3={"use_accelerate_endpoint": True,
"addressing_style": "virtual"}))
output = BytesIO()
output.seek(0)
s3.upload_fileobj(output, <MY_BUCKET>,<MY_KEY>)
Which works without the use_accelerate_endpoint option (so my keys are fine), but returns this error when True:
ClientError: An error occurred (SignatureDoesNotMatch) when calling the PutObject operation: The request signature we calculated does not match the signature you provided. Check your key and signing method.
I have tried both addressing_style options here as well (virtual and path)
Using boto3==1.4.7 and botocore==1.7.43.
Here is one way to retrieve an object from a bucket with transfer acceleration enabled.
import boto3
from botocore.config import Config
from io import BytesIO
config = Config(s3={"use_accelerate_endpoint": True})
s3_resource = boto3.resource("s3",
aws_access_key_id=<MY_KEY_ID>,
aws_secret_access_key=<MY_KEY>,
region_name="us-west-2",
config=config)
s3_client = s3_resource.meta.client
file_object = BytesIO()
s3_client.download_fileobj(<MY_BUCKET>, <MY_KEY>, file_object)
Note that the client sends a HEAD request to the accelerated endpoint before a GET.
The canonical request of which looks somewhat like the following:
CanonicalRequest:
HEAD
/<MY_KEY>
host:<MY_BUCKET>.s3-accelerate.amazonaws.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20200520T204128Z
host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Some reasons why the HEAD request can fail include:
Object with given key doesn't exist or has strict access control enabled
Invalid credentials
Transfer acceleration isn't enabled

Resources