Huggingface Tokenizer object is not callable

Huggingface Tokenizer object is not callable - huggingface-tokenizers

I am creating a deep learning code that embeds text into BERT based embedding. I am seeing unexpected issues in a code that was working fine before. Below is the snippet:
sentences = ["person in red riding a motorcycle", "lady cutting cheese with reversed knife"]
# Embed text using BERT model.
text_tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', cache_dir="cache/")
model = DistilBertModel.from_pretrained('distilbert-base-uncased')
print(text_tokenizer.tokenize(sentences[0]))
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True) # error comes here
Error is below:
['person', 'in', 'red', 'riding', 'a', 'motorcycle']
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 92, in <module>
load_data()
File "/Users/amitgh/PycharmProjects/682_image_caption_errors/model/model.py", line 59, in load_data
inputs = text_tokenizer(sentences, return_tensors="pt", padding=True)
TypeError: 'DistilBertTokenizer' object is not callable
As you can see text_tokenizer.tokenize() works fine. I tried force downloading the tokenizer and even changing the cache directory but to no good effect.
The code runs fine in some other machine (friend's laptop) and was also working fine in my some time back before I tried installing torchvision and using PIL library for image part. Now it's not somehow always giving this error.
OS: MacOS 11.6, using Conda environment, python=3.9

This was a rather easy fix. At some point, I had removed the transformer version from the environment.yml file and I started using MV 2.x with python=3.9 which perhaps doesn't allow calling the tokenizer directly. I added the MV again as transformers=4.11.2 and added the channel conda-forge in the yml file. After that I was able to get past this error.

Related

Which Theano for PyMC3?

Disclaimer: SO is forcing me to express stdout output copied below as code. Simple quoting wouldn't allow me to post
I have a pyproject.toml file from a Proof of Concept that worked beautifully a few months ago, which includes:
[tool.poetry.dependencies]
theano = "^1.0.5"
pymc3 = "^3.9.3"
I copied this over to start extending from the Proof of Concept, and when trying to run it I get
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
The installed Theano(-PyMC) version (1.0.5) does not match the PyMC3 requirements.
For PyMC3 to work, Theano must be uninstalled and replaced with Theano-PyMC.
See https://github.com/pymc-devs/pymc3/wiki for installation instructions.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
So I followed instructions -- removed theano and added Theano-PyMC3. This gave me no big alert at the beginning of the output, but failed:
File "mycode.py", line 4, in <module>
import pymc3
File "/home/user/.cache/pypoetry/virtualenvs/project-xNaK0WN7-py3.8/lib/python3.8/site-packages/pymc3/__init__.py", line 34, in <module>
if not semver.match(theano.__version__, ">=1.1.2"):
AttributeError: module 'theano' has no attribute '__version__'
I then tried adding aesara with a just-try-it scattergun mentality, and got the same error. I tried adding stock theano again on top of all the above, and got the first error again, even with Theano-PyMC3 still installed, and the Traceback ends with:
File "/home/user/.../code.py", line 4, in <module>
import pymc3
File "/home/user/.cache/pypoetry/virtualenvs/project-xNaK0WN7-py3.8/lib/python3.8/site-packages/pymc3/__init__.py", line 50, in <module>
__set_compiler_flags()
File "/home/user/.cache/pypoetry/virtualenvs/project-xNaK0WN7-py3.8/lib/python3.8/site-packages/pymc3/__init__.py", line 46, in __set_compiler_flags
current = theano.config.gcc__cxxflags
AttributeError: 'TheanoConfigParser' object has no attribute 'gcc__cxxflags'
Why would the same version pymc3 that worked just fine with theano last time now have a problem with it? The link provided (https://github.com/pymc-devs/pymc3/wiki) in the !!!! alert is virtually blank, so not helpful.

how to download en_core_web_sm in orange3?

I want to use spacy inside Python script in Orange3 tool, but I get this error Can't find model 'en_core_web_sm' for using nlp = spacy.load("en_core_web_sm"). I wonder how I can install this model in orange3?. Btw, I am using Orange3 as individual tool , not inside Anaconda jupyter notebook
Error: Traceback (most recent call last): File "", line 1, in File "", line 2, in File "C:\Users\saif\AppData\Local\Orange\lib\site-packages\spacy__init__.py", line 30, in load return util.load_model(name, **overrides) File "C:\Users\saif\AppData\Local\Orange\lib\site-packages\spacy\util.py", line 169, in load_model raise IOError(Errors.E050.format(name=name)) OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory. >>>
Thanks in advance

It appears that spaCy wants you to download their models using their command line interface, however if you're working in a development environment where all you can do is write scripts and run them (which I assume is the case regarding Orange3), you can import the function that spaCy uses internally to download and install models and call that.
from spacy.cli.download import download
download('en_core_web_sm')

ValueError: Protocol message SsdFeatureExtractor has no field replace_preprocessor_with_placeholder

I'm using an object-detection API to train my own model, but while running the training using this command:
python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config
I get this error:
WARNING:tensorflow:From C:\Users\MHD\Anaconda3\envs\tf15\lib\site-packages\tensorflow\python\platform\app.py:124: main (from __main__) is deprecated and will be removed in a future version.
Instructions for updating:
Use object_detection/model_main.py.
Traceback (most recent call last):
File "train.py", line 179, in <module>
tf.app.run()
File "C:\Users\MHD\Anaconda3\envs\tf15\lib\site-packages\tensorflow\python\platform\app.py", line 124, in run
_sys.exit(main(argv))
File "C:\Users\MHD\Anaconda3\envs\tf15\lib\site-packages\tensorflow\python\util\deprecation.py", line 136, in new_func
return func(*args, **kwargs)
File "train.py", line 175, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 249, in train
detection_model = create_model_fn()
File "C:\tensorflow1\models\research\object_detection\builders\model_builder.py", line 119, in build
return _build_ssd_model(model_config.ssd, is_training, add_summaries)
File "C:\tensorflow1\models\research\object_detection\builders\model_builder.py", line 237, in _build_ssd_model
is_training=is_training)
File "C:\tensorflow1\models\research\object_detection\builders\model_builder.py", line 187, in _build_ssd_feature_extractor
if feature_extractor_config.HasField('replace_preprocessor_with_placeholder'):
ValueError: Protocol message SsdFeatureExtractor has no field replace_preprocessor_with_placeholder
please help me guys

Tracing down the cause of this error, I found the option replace_preprocessor_with_placeholder was recently added. Here is the commit record. (On that page if you search for replace_preprocessor_with_placeholder you will find that it was added recently on March 7th, 2019).
So the cause of the error is obviously your proto files version is not consistent with the code version. If you compare object_detection/protos/ssd.proto on your local machine and on the github repo, you will probably find this line does not exist on your local machine's file (because this filed was also added recently!).
The easiest way to fix this error is to reinstall the object detection api following this guide.
Since you already have all packages installed, essentially there are two steps you need to do, install the coco api and compile the protobuff. A new protobuff compilation will fix your error.
Also I recommend you follow the latest api tutorial, I see in your call you are using train.py, this file has now been put in the legacy folder and is not recommended to run since they may not be up-to-date.

Calling octave .m function from python not executing

I understand this question has been asked multiple times and answered in different forums but it looks like there are variants of the issue and mine is probably, slightly different.
Let me explain. I am trying to call a octave function which I had written for plotting and I don't want to re-write it again python. Therefore I did the following:-
1. install the oct2py
2. setup the OCTAVE_EXECUTABLE= c:\Octave\Octave-4.2.1\bin\octave-gui.exe
3. did the following in the code:-
#importing oct2py
from oct2py import octave as oc
oc.addpath("C:\\personal\\learning\\octave-lib") #containing my octave .m files
#Now I am trying to call a plot function written in octave called displayData.m
oc.displayData(X) # where X is a numpy matrix for plotting
However on executing call to the method gives no errors - but it does not do anything. I see a windows shell prompt opening & closing but nothing else.
I have also tried to replace octave-gui-4.2.1.exe with octave-cli-4.2.1.exe following the suggesting from some sites but I was getting errors that bulk of the windows dll required were not found.
I started with the advice from the oct2py site asking me to just add the path to the folder containing octave.exe ( note that this folder contains all the octave executable) , but that resulted in windows permission errors. There should not be any reason for this error since I am the only user on my windows laptop and have administrative privileges. I am getting the following errors:-
File "mcclassifier.py", line 21, in <module>
from oct2py import octave as oc
File "C:\Users\Sam\Anaconda3\lib\site-packages\oct2py\__init__.py", line
38, in <module>
octave = Oct2Py()
File "C:\Users\Sam\Anaconda3\lib\site-packages\oct2py\core.py", line 73,
in __init__self.restart()
File "C:\Users\Sam\Anaconda3\lib\site-packages\oct2py\core.py", line 508, in restart logger=self.logger)
File "C:\Users\Sam\Anaconda3\lib\site-packages\octave_kernel\kernel.py",
line 144, in __init__
self.repl = self._create_repl()
File "C:\Users\Sam\Anaconda3\lib\site-packages\octave_kernel\kernel.py",
line 338, in _create_repl
version = subprocess.check_output(version_cmd).decode('utf-8')
File "C:\Users\Sam\Anaconda3\lib\subprocess.py", line 336, in check_output
**kwargs).stdout
File "C:\Users\Sam\Anaconda3\lib\subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\Sam\Anaconda3\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
File "C:\Users\Sam\Anaconda3\lib\subprocess.py", line 990, in
_execute_child startupinfo)
PermissionError: [WinError 5] Access is denied
I tried to run it from Spyder IDE and also from the command line. Both have the identical behavior. Its been frustrating , so any suggestion to get me past this issue will be a big help !!
ADDITIONAL INFORMATION:-
Probably I was not very clear above, but what I am trying to do is to execute a plotting function which i had implemented in octave, & is working.
I made some changes to my python code to see if I can instantiate the Oct2Py class and then call feval function. I commented the previous lines of code above and added the following:-
octave=oct2py.Oct2Py()
octave.feval('C:\\personal\\learning\\octave-lib\displayData',Xmat,timeout=80)
I can see from the windows taskbar that octave-gui.exe being invoked and is seen as running in the background. But it is still not plotting and there is no error.
How do I make it run as foreground process and render the plot. What I want to do is similar to what is shown in the oct2py demo example at:-
http://blink1073.github.io/oct2py/source/demo.html.
As you can see that oc.plot([1,2,3],...) renders the plot.
I will greatly appreciate any help here ?

Load pkl (using joblib or pickle) generates keyerror 120

I am trying to load a pkl file (in a windows machine) using joblib.
So my code is
from sklearn.externals import joblib
output = joblib.load("file.pkl")
I get this error:
File "cleaning.py", line 97, in <module>
output = joblib.load('file.pkl')
File "C:\Users\me\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py", line 459, in load
obj = unpickler.load()
File "C:\Users\me\AppData\Local\Continuum\Anaconda3\lib\pickle.py", line 1039, in load
dispatch[key[0]](self)
KeyError: 120
I tried also using pickle, in this way:
import pickle
with open('file.pkl', 'r') as input:
output = pickle.load(input)
But I got this other error:
File "cleaning.py", line 94, in <module>
output = pickle.load(input)
_pickle.UnpicklingError: invalid load key, 'x'.
Does anyone could help me?
I have already searched on stackoverflow but I didn't find any solution which works for me...
Thanks

Try upgrading scikit-learn to 0.18.1.
pip install scikitlearn==0.18.1
This worked for me after upgrading from the default anaconda version (0.17).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Huggingface Tokenizer object is not callable - huggingface-tokenizers

Related

Which Theano for PyMC3?

how to download en_core_web_sm in orange3?

ValueError: Protocol message SsdFeatureExtractor has no field replace_preprocessor_with_placeholder

Calling octave .m function from python not executing

Load pkl (using joblib or pickle) generates keyerror 120

Categories

Resources