upgraded sklearn makes my previous onehotencoder failed to tranform - scikit-learn

I stored one of my previous ml model into pickle and plan to use it later for production.
Everything works fine for quite a while. Months later, I upgraded my sklearn, now i load it i got this warning...
> c:\programdata\miniconda3\lib\site-packages\sklearn\base.py:318:
> UserWarning: Trying to unpickle estimator OneHotEncoder from version
> 0.20.1 when using version 0.22.2.post1. This might lead to breaking code or invalid results. Use at your own risk. UserWarning)
When i used it for transform, i got this error:
model_pipeline["ohe"].transform(df)
Error says:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-72436472fbb4> in <module>
----> 1 model_pipeline["ohe"].transform(df_merge[['CATEGORY']][:])
c:\programdata\miniconda3\lib\site-packages\sklearn\preprocessing\_encoders.py in transform(self, X)
392 n_samples, n_features = X_int.shape
393
--> 394 if self.drop is not None:
395 to_drop = self.drop_idx_.reshape(1, -1)
396
AttributeError: 'OneHotEncoder' object has no attribute 'drop'
This is model pipeline is trained very expensively. Is there any for me to fix this model pipeline without retraining everything? Thanks!

I've also encountered the same problem. In my case it was because of trying to load and use encoder I've created with a prior version of scikit-learn. When I re-created the encoder and saved, problem after loading disappeared.

Related

Loading Pytorch model doesn't seem to work

I want to load a pretrained model from PyTorch. Specifically, I want to run the SAT-Speaker-with-emotion-grounding (431MB) model from this repo. However, I don't seem to be able to load it. When I download the model and run the script below, I get a dictionary and not the model.
Loading the model:
model_emo = torch_load_model('best_model.pt', map_location=torch.device('cpu'))
Running the model
model_emo(image)
The error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-e80ccbe8b6ed> in <module>
----> 1 model_emo(image)
TypeError: 'dict' object is not callable
Now, the docs say that I should instantiate the model class and then load the checkpoint data. However, I don't know what model class this belongs to, and the documents don't say. Does anyone have any advice on how to proceed with this issue? Thanks.

UnboundLocalError: local variable 'photoshop' referenced before assignment

I am working on Dog Breed classifier, I am getting the following the error when I run the train code for my model.
I tried to downgrade the pillow version, but still facing the same issue.
Error shown in the line model_scratch:
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
<ipython-input-12-360fef19693f> in <module>
1 #training the model
2 model_scratch = train(5, loaders_scratch, model_scratch, optimizer_scratch,
----> 3 criterion_scratch)
<ipython-input-11-c90fddb93f0d> in train(n_epochs, loaders, model, optimizer, criterion)
9 #train model
10 model.train()
---> 11 for batch_idx, (data,target) in enumerate(loaders['train']):
12
13 # zero the parameter (weight) gradients
UnboundLocalError: local variable 'photoshop' referenced before assignment
This is a known issue in Pillow 6.0.0. Based on the line number information in your linked full stack trace, I think your downgrade didn't succeed, and you are still using 6.0.0. Either downgrading to 5.4.1, or building from the latest source, should fix this problem, although the latter option is probably a little difficult for the average user.

Python 3 keras: UnpicklingError: pickle data was truncated for partly downloaded keras cifar10 dataset

I need some help to fix my errors. I tried to load the cifar10 dataset and it wasn't able to download completely the first time due to unstable internet, subsequently re-running the code with a stable internet gives this error:
UnpicklingError:
Traceback (most recent call last)
<ipython-input-16-9117078ebdb2> in <module>()
1 from keras.datasets import cifar10
----> 2 (x_train, y_train), (x_test, y_test) = cifar10.load_data()
c:\users\keboc\anaconda3\envs\tensorflow_1.8\lib\site-
packages\keras\datasets\cifar10.py in load_data(label_mode)
32
33 fpath = os.path.join(path, 'test_batch')
---> 34 x_test, y_test = load_batch(fpath)
35
36 y_train = np.reshape(y_train, (len(y_train), 1))
c:\users\keboc\anaconda3\envs\tensorflow_1.8\lib\site-packages\keras\datasets\cifar.py in load_batch(fpath, label_key)
25 d = cPickle.load(f, encoding='bytes')
26 # decode utf8
---> 27 #d_decoded = {}
28 for k, v in d.items():
29 d_decoded[k.decode('utf8')] = v
UnpicklingError: pickle data was truncated
I loaded the dataset with the code:
from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
Please help me to fix this.
Thanks alot.
Remove the file ~/.keras/datasets/cifar-10-batches-py.tar.gz and possibly the folder ~/.keras/datasets/cifar-10-batches-py if it exists and try again, it should redownload the file, hopefully succeeding this time.
I had the same issue. I was trying to run my code on a large multi-gpu (like 100 gpus). The issue was that each of my runs where trying to download cifar from scratch, and the download process was running on my CPU instead of my GPUs, so at some point it would be problematic for it download all of them correctly and accurately. Finally, I realized that if I only download the dataset only once and share the folder between all of my experiments, I wouldn't face this issue anymore.
That is because your cifar10 dataset are not downloaded completely ,you should delete the incomplete data and redownload.You can debug the cifar10.load_data() to see where is your own path().My path is below.Just delete is and redownload.
this is my path
then you can also manually put your downloaded cifar10 data in that path.
like this

How to get the cluster ID in Bisecting K-means method in pyspark

I've tried
from numpy import array
from pyspark.mllib.clustering import BisectingKMeans, BisectingKMeansModel
I'm using the iris.data set:
iris_model.transform(iris)
but I get this error:
AttributeError
Traceback (most recent call last)
<ipython-input-241-59b5e8c1e068> in <module>()
----> 1 iris_model.transform(iris)
AttributeError: 'BisectingKMeansModel' object has no attribute 'transform'
I can get the ClusterCenters and I get the array, but I need the group of which each case belongs to.
Thanks
You probably mismatch Spark ML and MLlib APIs.
MLLib package was the first package, but then developers started to build new package, ML, which works with DataFrames.
Change your package to pyspark.ml.clustering and you will have new version, which has transform function and work with DataFrame and new ML Pipelines. I suggest yo build Pipeline when you will have algorithm working :)

Undeprecating tensorflow

When making a DNN regressor and predicting the values by
print(list(estimator.predict({"p": np.array([[0.,0.],[1.,0.],[0.,1.],[1.,1.]])})))
this is the output of the console:
WARNING:tensorflow:From "...\tensorflow\contrib\learn\python\learn\estimators\dnn.py":692: calling BaseEstimator.predict (from tensorflow.contrib.learn.python.learn.estimators.estimator) with x is deprecated and will be removed after 2016-12-01.
Instructions for updating:
Estimator is decoupled from Scikit Learn interface by moving into
separate class SKCompat. Arguments x, y and batch_size are only
available in the SKCompat class, Estimator will only accept input_fn.
Example conversion:
est = Estimator(...) -> est = SKCompat(Estimator(...))
So I head into line 692 of dnn.py and this is what I find
preds = super(DNNRegressor, self).predict(
x=x,
input_fn=input_fn,
batch_size=batch_size,
outputs=[key],
as_iterable=as_iterable)
So following the advice from the error, and assuming that super(DNNRegressor, self) is an Estimator I've just did
preds = estimator.SKCompat(super(DNNRegressor, self)).predict(...)
But doing that I get
TypeError: predict() got an unexpected keyword argument 'input_fn'
that looks like it's not a tensorflow error.
The problem is I don't know how to get rid of the warning (not an error).
This portion of the Github tree is under active development. I expect this warning message to go away once the Estimator class is moved into tf.core which is schedule for version r1.1. I found the 2017 TensorFlow Dev Summit video by Martin Wicke to be very informative on the future plans of high level TensorFlow.

Resources