Problem loading trained scikit-learn/imblearn pipeline model

Problem loading trained scikit-learn/imblearn pipeline model - scikit-learn

I have built, trained an imblearn.pipeline Pipeline with imblearn and RandomForestClassifer from Scikit-learn.
The model is saved using joblib.dump('model.joblib').
However, when I try to load the model, it throws an error
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-26-d3ee277020d2> in <module>
----> 1 model = joblib.load('model.joblib')
~/SageMaker/custom-miniconda/miniconda/envs/datascience/lib/python3.7/site-packages/joblib/numpy_pickle.py in load(filename, mmap_mode)
583 return load_compatibility(fobj)
584
--> 585 obj = _unpickle(fobj, filename, mmap_mode)
586 return obj
~/SageMaker/custom-miniconda/miniconda/envs/datascience/lib/python3.7/site-packages/joblib/numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
502 obj = None
503 try:
--> 504 obj = unpickler.load()
505 if unpickler.compat_mode:
506 warnings.warn("The file '%s' has been generated with a "
~/SageMaker/custom-miniconda/miniconda/envs/datascience/lib/python3.7/pickle.py in load(self)
1086 raise EOFError
1087 assert isinstance(key, bytes_types)
-> 1088 dispatch[key[0]](self)
1089 except _Stop as stopinst:
1090 return stopinst.value
~/SageMaker/custom-miniconda/miniconda/envs/datascience/lib/python3.7/pickle.py in load_global(self)
1374 module = self.readline()[:-1].decode("utf-8")
1375 name = self.readline()[:-1].decode("utf-8")
-> 1376 klass = self.find_class(module, name)
1377 self.append(klass)
1378 dispatch[GLOBAL[0]] = load_global
~/SageMaker/custom-miniconda/miniconda/envs/datascience/lib/python3.7/pickle.py in find_class(self, module, name)
1424 elif module in _compat_pickle.IMPORT_MAPPING:
1425 module = _compat_pickle.IMPORT_MAPPING[module]
-> 1426 __import__(module, level=0)
1427 if self.proto >= 4:
1428 return _getattribute(sys.modules[module], name)[0]
ModuleNotFoundError: No module named 'imblearn.over_sampling._smote.base'; 'imblearn.over_sampling._smote' is not a package
I do have imblearn installed in the conda environment. Not sure why it's not finding imblearn. Any tips will be helpful.

use
python-m pip install package name
to install it globally on system ,maybe it can help

Related

"RuntimeError: PytorchStreamReader failed reading zip archive" when loading an already tuned .ckpt model

I want to use a model pre-trained by a friend. However, when I load it:
checkpoint_path = "../Models/ckpt_camembert.ckpt"
py_dict = torch.load(checkpoint_path)
model.load_state_dict(py_dict)
I get:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-19-be1591e802b7> in <module>
1 # If you already finetuned a model you can load it by the following rows
2 checkpoint_path = "../Models/ckpt_camembert.ckpt"
----> 3 py_dict = torch.load(checkpoint_path)
4 model.load_state_dict(py_dict)
c:\Programs\Anaconda\lib\site-packages\torch\serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
703 # reset back to the original position.
704 orig_position = opened_file.tell()
--> 705 with _open_zipfile_reader(opened_file) as opened_zipfile:
706 if _is_torchscript_zip(opened_zipfile):
707 warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"
c:\Programs\Anaconda\lib\site-packages\torch\serialization.py in __init__(self, name_or_buffer)
240 class _open_zipfile_reader(_opener):
241 def __init__(self, name_or_buffer) -> None:
--> 242 super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
243
244
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
Is it because my .ckpt file hasn't been properly downloaded/corrupted?

Can not find the pytorch model when loading BERT model in Python

I am following this article to find the text similarity.
The code I have is this:
from sentence_transformers import SentenceTransformer
from tqdm import tqdm
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import pandas as pd
documents = [
"Vodafone Wins ₹ 20,000 Crore Tax Arbitration Case Against Government",
"Voda Idea shares jump nearly 15% as Vodafone wins retro tax case in Hague",
"Gold prices today fall for 4th time in 5 days, down ₹6500 from last month high",
"Silver futures slip 0.36% to Rs 59,415 per kg, down over 12% this week",
"Amazon unveils drone that films inside your home. What could go wrong?",
"IPHONE 12 MINI PERFORMANCE MAY DISAPPOINT DUE TO THE APPLE B14 CHIP",
"Delhi Capitals vs Chennai Super Kings: Prithvi Shaw shines as DC beat CSK to post second consecutive win in IPL",
"French Open 2020: Rafael Nadal handed tough draw in bid for record-equaling 20th Grand Slam"
]
model = SentenceTransformer('sentence-transformers/bert-base-nli-mean-tokens')
I get an error when running the above code:
Full:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\anaconda3\envs\py3_nlp\lib\tarfile.py in nti(s)
188 s = nts(s, "ascii", "strict")
--> 189 n = int(s.strip() or "0", 8)
190 except ValueError:
ValueError: invalid literal for int() with base 8: 'ld_tenso'
During handling of the above exception, another exception occurred:
InvalidHeaderError Traceback (most recent call last)
~\anaconda3\envs\py3_nlp\lib\tarfile.py in next(self)
2298 try:
-> 2299 tarinfo = self.tarinfo.fromtarfile(self)
2300 except EOFHeaderError as e:
~\anaconda3\envs\py3_nlp\lib\tarfile.py in fromtarfile(cls, tarfile)
1092 buf = tarfile.fileobj.read(BLOCKSIZE)
-> 1093 obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
1094 obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
~\anaconda3\envs\py3_nlp\lib\tarfile.py in frombuf(cls, buf, encoding, errors)
1034
-> 1035 chksum = nti(buf[148:156])
1036 if chksum not in calc_chksums(buf):
~\anaconda3\envs\py3_nlp\lib\tarfile.py in nti(s)
190 except ValueError:
--> 191 raise InvalidHeaderError("invalid header")
192 return n
InvalidHeaderError: invalid header
During handling of the above exception, another exception occurred:
ReadError Traceback (most recent call last)
~\anaconda3\envs\py3_nlp\lib\site-packages\torch\serialization.py in _load(f, map_location,
pickle_module, **pickle_load_args)
594 try:
--> 595 return legacy_load(f)
596 except tarfile.TarError:
~\anaconda3\envs\py3_nlp\lib\site-packages\torch\serialization.py in legacy_load(f)
505
--> 506 with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as
tar, \
507 mkdtemp() as tmpdir:
~\anaconda3\envs\py3_nlp\lib\tarfile.py in open(cls, name, mode, fileobj, bufsize, **kwargs)
1590 raise CompressionError("unknown compression type %r" % comptype)
-> 1591 return func(name, filemode, fileobj, **kwargs)
1592
~\anaconda3\envs\py3_nlp\lib\tarfile.py in taropen(cls, name, mode, fileobj, **kwargs)
1620 raise ValueError("mode must be 'r', 'a', 'w' or 'x'")
-> 1621 return cls(name, mode, fileobj, **kwargs)
1622
~\anaconda3\envs\py3_nlp\lib\tarfile.py in __init__(self, name, mode, fileobj, format, tarinfo, dereference, ignore_zeros, encoding, errors, pax_headers, debug, errorlevel, copybufsize)
1483 self.firstmember = None
-> 1484 self.firstmember = self.next()
1485
~\anaconda3\envs\py3_nlp\lib\tarfile.py in next(self)
2310 elif self.offset == 0:
-> 2311 raise ReadError(str(e))
2312 except EmptyHeaderError:
ReadError: invalid header
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
~\anaconda3\envs\py3_nlp\lib\site-packages\transformers\modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1210 try:
-> 1211 state_dict = torch.load(resolved_archive_file, map_location="cpu")
1212 except Exception:
~\anaconda3\envs\py3_nlp\lib\site-packages\torch\serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
425 pickle_load_args['encoding'] = 'utf-8'
--> 426 return _load(f, map_location, pickle_module, **pickle_load_args)
427 finally:
~\anaconda3\envs\py3_nlp\lib\site-packages\torch\serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
598 # .zip is used for torch.jit.save and will throw an un-pickling error here
--> 599 raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
600 # if not a tarfile, reset file offset and proceed
RuntimeError: C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens\pytorch_model.bin is a zip archive (did you mean to use torch.jit.load()?)
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-3-bba56aac60aa> in <module>
----> 1 model = SentenceTransformer('sentence-transformers/bert-base-nli-mean-tokens')
~\anaconda3\envs\py3_nlp\lib\site-packages\sentence_transformers\SentenceTransformer.py in __init__(self, model_name_or_path, modules, device, cache_folder)
88
89 if os.path.exists(os.path.join(model_path, 'modules.json')): #Load as SentenceTransformer model
---> 90 modules = self._load_sbert_model(model_path)
91 else: #Load with AutoModel
92 modules = self._load_auto_model(model_path)
~\anaconda3\envs\py3_nlp\lib\site-packages\sentence_transformers\SentenceTransformer.py in _load_sbert_model(self, model_path)
820 for module_config in modules_config:
821 module_class = import_from_string(module_config['type'])
--> 822 module = module_class.load(os.path.join(model_path, module_config['path']))
823 modules[module_config['name']] = module
824
~\anaconda3\envs\py3_nlp\lib\site-packages\sentence_transformers\models\Transformer.py in load(input_path)
122 with open(sbert_config_path) as fIn:
123 config = json.load(fIn)
--> 124 return Transformer(model_name_or_path=input_path, **config)
125
126
~\anaconda3\envs\py3_nlp\lib\site-packages\sentence_transformers\models\Transformer.py in __init__(self, model_name_or_path, max_seq_length, model_args, cache_dir, tokenizer_args, do_lower_case, tokenizer_name_or_path)
27
28 config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
---> 29 self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)
30 self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path if tokenizer_name_or_path is not None else model_name_or_path, cache_dir=cache_dir, **tokenizer_args)
31
~\anaconda3\envs\py3_nlp\lib\site-packages\transformers\models\auto\auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
393 if type(config) in cls._model_mapping.keys():
394 model_class = _get_model_class(config, cls._model_mapping)
--> 395 return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
396 raise ValueError(
397 f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
~\anaconda3\envs\py3_nlp\lib\site-packages\transformers\modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1212 except Exception:
1213 raise OSError(
-> 1214 f"Unable to load weights from pytorch checkpoint file for '{pretrained_model_name_or_path}' "
1215 f"at '{resolved_archive_file}'"
1216 "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. "
OSError: Unable to load weights from pytorch checkpoint file for 'C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens\' at 'C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens\pytorch_model.bin'If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
Short:
OSError: Unable to load weights from pytorch checkpoint file for 'C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens' at 'C:\Users\user1/.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens\pytorch_model.bin'If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
I do have the pytorch_model.bin in the '.cache\torch\sentence_transformers\sentence-transformers_bert-base-nli-mean-tokens' folder.
Why am I getting this error?

The reason for the error seems to be that the pre-trained model weight files are not available or loadable.
You can try that one to load pretrained model weight file:
from transformers import AutoModel
model = AutoModel.from_pretrained('sentence-transformers/bert-base-nli-mean-tokens')
Reference: https://huggingface.co/sentence-transformers/bert-base-nli-mean-tokens
Also, the model's hugging face page says:
This model is deprecated. Please don't use it as it produces sentence embeddings of low quality. You can find recommended sentence embedding models here: SBERT.net - Pretrained Models
Maybe you might want to take a look.

You may need to use the model without sentence_transformers.
The following code is tweaked from https://www.sbert.net/examples/applications/computing-embeddings/README.html
As I understand it, from the exception you need to pass from_tf=True to AutoModel.
from transformers import AutoTokenizer, AutoModel
import torch
#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
return sum_embeddings / sum_mask
#Sentences we want sentence embeddings for
sentences = ['This framework generates embeddings for each input sentence',
'Sentences are passed as a list of string.',
'The quick brown fox jumps over the lazy dog.']
#Load AutoModel from huggingface model repository
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/bert-base-nli-mean-tokens')
model = AutoModel.from_pretrained('sentence-transformers/bert-base-nli-mean-tokens',from_tf=True)
#Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')
#Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
#Perform pooling. In this case, mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

keras model.save() issues RuntimeError: Unable to flush file's cached information

I'm facing some issues with my databricks cluster configuration, and issue is that i'm not able to put a finger on where and why.
I was trying to save a keras model, and it seems to be not going well
dataset = pd.DataFrame([item.split(',') for item in '''6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1'''.split('\n')])
X = dataset.iloc[:,0:8]
y = dataset.iloc[:,8]
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=3, batch_size=10)
accuracy = model.evaluate(X, y, verbose=0)
print(accuracy)
The issue is with saving the model, can anyone help me understand what the error is all about
I'm using Python 3.7.3, DBRuntime 6.2 (includes Apache Spark 2.4.4, Scala 2.11)
model.save('/dbfs/FileStore/tables/temp/new_model.h5')
KeyError Traceback (most recent call
last)
/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py
in save_model(model, filepath, overwrite, include_optimizer)
540 with H5Dict(filepath, mode='w') as h5dict:
--> 541 _serialize_model(model, h5dict, include_optimizer)
542 elif hasattr(filepath, 'write') and callable(filepath.write):
/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py
in _serialize_model(model, h5dict, include_optimizer)
160 for name, val in zip(weight_names, weight_values):
--> 161 layer_group[name] = val
162 if include_optimizer and model.optimizer:
/databricks/python/lib/python3.7/site-packages/keras/utils/io_utils.py
in setitem(self, attr, val)
230 raise KeyError('Cannot set attribute. '
--> 231 'Group with name "{}" exists.'.format(attr))
232 if is_np:
KeyError: 'Cannot set attribute. Group with name
"b\'dense_1/kernel:0\'" exists.'
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call
last) in
----> 1 model.save('/dbfs/FileStore/tables/temp/new_model.h5')
/databricks/python/lib/python3.7/site-packages/keras/engine/network.py
in save(self, filepath, overwrite, include_optimizer) 1150
raise NotImplementedError 1151 from ..models import
save_model
-> 1152 save_model(self, filepath, overwrite, include_optimizer) 1153 1154 #saving.allow_write_to_gcs
/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py
in save_wrapper(obj, filepath, overwrite, *args, **kwargs)
447 os.remove(tmp_filepath)
448 else:
--> 449 save_function(obj, filepath, overwrite, *args, **kwargs)
450
451 return save_wrapper
/databricks/python/lib/python3.7/site-packages/keras/engine/saving.py
in save_model(model, filepath, overwrite, include_optimizer)
539 return
540 with H5Dict(filepath, mode='w') as h5dict:
--> 541 _serialize_model(model, h5dict, include_optimizer)
542 elif hasattr(filepath, 'write') and callable(filepath.write):
543 # write as binary stream
/databricks/python/lib/python3.7/site-packages/keras/utils/io_utils.py
in exit(self, exc_type, exc_val, exc_tb)
368
369 def exit(self, exc_type, exc_val, exc_tb):
--> 370 self.close()
371
372
/databricks/python/lib/python3.7/site-packages/keras/utils/io_utils.py
in close(self)
344 def close(self):
345 if isinstance(self.data, h5py.Group):
--> 346 self.data.file.flush()
347 if self._is_file:
348 self.data.close()
/databricks/python/lib/python3.7/site-packages/h5py/_hl/files.py in
flush(self)
450 """
451 with phil:
--> 452 h5f.flush(self.id)
453
454 #with_phil
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
h5py/h5f.pyx in h5py.h5f.flush()
RuntimeError: Unable to flush file's cached information (file write
failed: time = Fri Jan 31 08:19:53 2020 , filename =
'/dbfs/FileStore/tables/temp/new_model.h5', file descriptor = 9, errno
= 95, error message = 'Operation not supported', buf = 0x6993c98, total write size = 320, bytes this sub-write = 320, bytes actually
written = 18446744073709551615, offset = 800)

I was finally able to save the model, by saving it on driver only and copying it on s3...
import os
import shutil
classification_model.save('news_dedup_model.h5')
shutil.copyfile('/databricks/driver/news_dedup_model.h5', '/dbfs/FileStore/tables/temp/nemish/news_dedup_model.h5')
classification_model = load_model('/dbfs/FileStore/tables/temp/nemish/news_dedup_model.h5', custom_objects={'tf': tf})
Still unable to figure out, why wouldn't it save normally

Because keras model.save() doesn't support writing to a FUSE mount. Doing so you'll get 'Operation Not Supported' error.
You need to first write it to the driver node's local disk (where python's working directory is), then move it to DFBS FUSE mount using '/dbfs/your/path/on/DBFS'.

Google Colab - tensowflow object detection api - 'function' object has no attribute 'called'

I encountered the following error when I try to test the object detection api model_builder_test.py.
!apt-get install -y -qq protobuf-compiler python-pil python-lxml
!git clone --quiet https://github.com/tensorflow/models.git
import os
os.chdir('models/research')
!protoc object_detection/protos/*.proto --python_out=.
import sys
sys.path.append('/content/models/research/slim')
%run object_detection/builders/model_builder_test.py
The following error appears after running the model_builder_test.py
.W0220 03:22:35.097244 140099951081344 deprecation.py:323] From
/content/models/research/object_detection/anchor_generators/grid_anchor_generator.py:59:
to_float (from tensorflow.python.ops.math_ops) is deprecated and will
be removed in a future version. Instructions for updating: Use tf.cast
instead. .. WARNING: The TensorFlow contrib module will not be
included in TensorFlow 2.0. For more information, please see: *
https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons If you depend on functionality not listed there, please file an issue.
..................s
---------------------------------------------------------------------- Ran 22 tests in 0.203s
OK (skipped=1)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call
last) in ()
----> 1 get_ipython().magic('run object_detection/builders/model_builder_test.py')
/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py
in magic(self, arg_s) 2158 magic_name, _, magic_arg_s =
arg_s.partition(' ') 2159 magic_name =
magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2160 return self.run_line_magic(magic_name, magic_arg_s) 2161 2162
-------------------------------------------------------------------------
/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py
in run_line_magic(self, magic_name, line) 2079
kwargs['local_ns'] = sys._getframe(stack_depth).f_locals 2080
with self.builtin_trap:
-> 2081 result = fn(*args,**kwargs) 2082 return result 2083
in run(self, parameter_s, runner, file_finder)
/usr/local/lib/python3.6/dist-packages/IPython/core/magic.py in
(f, *a, **k)
186 # but it's overkill for just that one bit of state.
187 def magic_deco(arg):
--> 188 call = lambda f, *a, **k: f(*a, **k)
189
190 if callable(arg):
/usr/local/lib/python3.6/dist-packages/IPython/core/magics/execution.py
in run(self, parameter_s, runner, file_finder)
740 else:
741 # regular execution
--> 742 run()
743
744 if 'i' in opts:
/usr/local/lib/python3.6/dist-packages/IPython/core/magics/execution.py
in run()
726 def run():
727 runner(filename, prog_ns, prog_ns,
--> 728 exit_ignore=exit_ignore)
729
730 if 't' in opts:
/usr/local/lib/python3.6/dist-packages/IPython/core/pylabtools.py in
mpl_execfile(fname, *where, **kw)
175 matplotlib.interactive(is_interactive)
176 # make rendering call now, if the user tried to do it
--> 177 if plt.draw_if_interactive.called:
178 plt.draw()
179 plt.draw_if_interactive.called = False
AttributeError: 'function' object has no attribute 'called'

This is how I overcame the issue:
install prompt-toolkit to the version 1.0.15, as explained in the link below
https://github.com/jupyter/jupyter_console/issues/158
restart the runtime to activate the package
use '!python' instead of '%run'

Python3 code Uploading to S3 bucket with IO instead of String IO

I am trying to download the zip file in memory, expand it and upload it to S3.
import boto3
import io
import zipfile
import mimetypes
s3 = boto3.resource('s3')
service_zip = io.BytesIO()
service_bucket = s3.Bucket('services.mydomain.com')
build_bucket = s3.Bucket('servicesbuild.mydomain.com')
build_bucket.download_fileobj('servicesbuild.zip', service_zip)
with zipfile.ZipFile(service_zip) as myzip:
for nm in myzip.namelist():
obj = myzip.open(nm)
print(obj)
service_bucket.upload_fileobj(obj,nm,
ExtraArgs={'ContentType': mimetypes.guess_type(nm)[0]})
service_bucket.Object(nm).Acl().put(ACL='public-read')
Here is the error I get
<zipfile.ZipExtFile name='favicon.ico' mode='r' compress_type=deflate>
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-5941e5e45adc> in <module>
18 print(obj)
19 service_bucket.upload_fileobj(obj,nm,
---> 20 ExtraArgs={'ContentType': mimetypes.guess_type(nm)[0]})
21 service_bucket.Object(nm).Acl().put(ACL='public-read')
~/bitbucket/clguru/env/lib/python3.7/site-packages/boto3/s3/inject.py in bucket_upload_fileobj(self, Fileobj, Key, ExtraArgs, Callback, Config)
579 return self.meta.client.upload_fileobj(
580 Fileobj=Fileobj, Bucket=self.name, Key=Key, ExtraArgs=ExtraArgs,
--> 581 Callback=Callback, Config=Config)
582
583
~/bitbucket/clguru/env/lib/python3.7/site-packages/boto3/s3/inject.py in upload_fileobj(self, Fileobj, Bucket, Key, ExtraArgs, Callback, Config)
537 fileobj=Fileobj, bucket=Bucket, key=Key,
538 extra_args=ExtraArgs, subscribers=subscribers)
--> 539 return future.result()
540
541
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/futures.py in result(self)
71 # however if a KeyboardInterrupt is raised we want want to exit
72 # out of this and propogate the exception.
---> 73 return self._coordinator.result()
74 except KeyboardInterrupt as e:
75 self.cancel()
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/futures.py in result(self)
231 # final result.
232 if self._exception:
--> 233 raise self._exception
234 return self._result
235
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/tasks.py in _main(self, transfer_future, **kwargs)
253 # Call the submit method to start submitting tasks to execute the
254 # transfer.
--> 255 self._submit(transfer_future=transfer_future, **kwargs)
256 except BaseException as e:
257 # If there was an exception raised during the submission of task
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/upload.py in _submit(self, client, config, osutil, request_executor, transfer_future, bandwidth_limiter)
547 # Determine the size if it was not provided
548 if transfer_future.meta.size is None:
--> 549 upload_input_manager.provide_transfer_size(transfer_future)
550
551 # Do a multipart upload if needed, otherwise do a regular put object.
~/bitbucket/clguru/env/lib/python3.7/site-packages/s3transfer/upload.py in provide_transfer_size(self, transfer_future)
324 fileobj.seek(0, 2)
325 end_position = fileobj.tell()
--> 326 fileobj.seek(start_position)
327 transfer_future.meta.provide_transfer_size(
328 end_position - start_position)
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in seek(self, offset, whence)
1023 # Position is before the current position. Reset the ZipExtFile
1024
-> 1025 self._fileobj.seek(self._orig_compress_start)
1026 self._running_crc = self._orig_start_crc
1027 self._compress_left = self._orig_compress_size
/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in seek(self, offset, whence)
702 def seek(self, offset, whence=0):
703 with self._lock:
--> 704 if self.writing():
705 raise ValueError("Can't reposition in the ZIP file while "
706 "there is an open writing handle on it. "
AttributeError: '_SharedFile' object has no attribute 'writing'
If I comment out the lines after print(obj) to see the validate the zip file content,
import boto3
import io
import zipfile
import mimetypes
s3 = boto3.resource('s3')
service_zip = io.BytesIO()
service_bucket = s3.Bucket('services.readspeech.com')
build_bucket = s3.Bucket('servicesbuild.readspeech.com')
build_bucket.download_fileobj('servicesbuild.zip', service_zip)
with zipfile.ZipFile(service_zip) as myzip:
for nm in myzip.namelist():
obj = myzip.open(nm)
print(obj)
# service_bucket.upload_fileobj(obj,nm,
# ExtraArgs={'ContentType': mimetypes.guess_type(nm)[0]})
# service_bucket.Object(nm).Acl().put(ACL='public-read')
I see the following:
<zipfile.ZipExtFile name='favicon.ico' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='styles/main.css' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='images/example3.png' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='images/example1.png' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='images/example2.png' mode='r' compress_type=deflate>
<zipfile.ZipExtFile name='index.html' mode='r' compress_type=deflate>

Appears the issue is with python 3.7. I downgraded to python 3.6 and everything is fine. There is a bug reported on python 3.7
The misprint in the file lib/zipfile.py in line 704 leads to AttributeError: '_SharedFile' object has no attribute 'writing'
"self.writing()" should be replaced by "self._writing()". I also think this code should be covered by tests.
attribute 'writing
So to resolve the issue, use python 3.6.
On osx you can go back to Python 3.6 with the following command.
brew switch python 3.6.4_4

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Problem loading trained scikit-learn/imblearn pipeline model - scikit-learn

use python-m pip install package name to install it globally on system ,maybe it can help

Related

"RuntimeError: PytorchStreamReader failed reading zip archive" when loading an already tuned .ckpt model

Can not find the pytorch model when loading BERT model in Python

keras model.save() issues RuntimeError: Unable to flush file's cached information

Google Colab - tensowflow object detection api - 'function' object has no attribute 'called'

Python3 code Uploading to S3 bucket with IO instead of String IO

Categories

Resources