How to recover keras model weights from bytes? - keras

I have extracted the weights of a specific model from a .pb file. It gives me all the weights trucked in a variable in bytes format. as below:
weights = b'\n\x1b\n\t\x08\x01\x12\x05model\n\x0e\x08\x02\x12\nsignatures\n\xe2\x01\n\x18\x08\x03\x12\x14layer_with_weights-0\n\x0b\x08\x03\x12\x07layer-0\n\x0b\x08\x04\x12\x07layer-1\n\x18\x08\x05\x12\x14layer_with_weights-1\n\x0b\x08\x05\x12\x07layer-2\n\r\x08\x06\x12\tvariables\n\x17\x08\x07\x12\x13trainable_variables\n\x19\x08\x08\x12\x15regularization_losses\n\r\x08\t\x12\tkeras_api\n\x0e\x08\n\x12\nsignatures\n#\x08\x0b\x12\x1f_self_saveable_object_factories\n\x00\n\x92R\n\x0b\x08\x0c\x12\x07layer-0\n\x0b\x08\r\x12\x07layer-1\n\x18\x08\x0e\x12\x14layer_with_weights-0\n\x0b\x08\x0e\x12\x07layer-2\n\x0b\x08\x0f\x12\x07layer-3\n\x18\x08\x10\x12\x14layer_with_weights-1\n\x0b\x08\x10\x12\x07layer-4\n\x18\x08\x11\x12\x14layer_with_weights-2\n\x0b\x08\x11\x12\x07layer-5\n\x0b\x08\x12\x12\x07layer-6\n\x18\x08\x13\x12\x14layer_with_weights-3\n\x0b\x08\x13\x12\x07layer-7\n\x18\x08\x14\x12\x14layer_with_weights-4\n\x0b\x08\x14\x12\x07layer-8\n\x0b\x08\x15\x12\x07layer-9\n\x0c\x08\x16\x12\x08layer-10\n\x0c\x08\x17\x12\x08layer-11\n\x18\x08\x18\x12\x14layer_with_weights-5\n\x0c\x08\x18\x12\x08layer-12\n\x18\x08\x19\x12\x14layer_with_weights-6\n\x0c\x08\x19\x12\x08layer-13\n\x0c\x08\x1a\x12\x08layer-14\n\x18\x08\x1b\x12\x14layer_with_weights-7\n\x0c\x08\x1b\x12\x08layer-15\n\x18\x08\x1c\x12\x14layer_with_weights-8\n\x0c\x08\x1c\x12\x08layer-16\n\x18\x08\x1d\x12\x14layer_with_weights-9\n\x0c\x08\x1d\x12\x08layer-17\n\x19\x08\x1e\x12\x15layer_with_weights-10\n\x0c\x08\x1e\x12\x08layer-18\n\x0c\x08\x1f\x12\x08layer-19\n\x0c\x08 \x12\x08layer-20\n\x0c\x08!\x12\x08layer-21\n\x19\x08"\x12\x15layer_with_weights-11\n\x0c\x08"\x12\x08layer-22\n\x19\x08#\x12\x15layer_with_weights-12\n\x0c\x08#\x12\x08layer-23\n\x0c\x08$\x12\x08layer-24\n\x19\x08...
I have tried to convert it using array like this:
import array
arr = array.array('f', weights)
However, I get the following error:
Traceback (most recent call last):
File "/tmp/ipykernel_4441/2375324399.py", line 1, in <module>
arr = array.array('f', value)
ValueError: bytes length not a multiple of item size

Related

OneHotEncoder failing after combining dataframes

I have a model that runs successfully.
When I tried to predict using it, it was failing due to the fact that after OneHotEncoding, the test set had more columns than the train.
After some reading I found where I need to concat the two df's first, OneHotEncode, then split them apart.
Added a 'temp' column to the train data set with value 'train'.
Added a 'temp' column to the test data set with value 'test'.
This is so that I can split the df apart later using boolean indexing like this:
X = temp_df[temp_df['temp'] == 'train']
X2 = temp_df[temp_df['temp'] == 'test']
Vertically concat the two df's.
Verify the shape of the new combined df.
Change all columns to type 'category' except 'temp', which is object:
basin category
region category
lga category
extraction_type_class category
management category
quality_group category
quantity category
source category
waterpoint_type category
cluster category
temp object
Now I am simply trying to OneHotEncode like I did before. I choose only categorical columns:
cat_ix = temp_df.select_dtypes(include=['category']).columns
And I try to apply with:
ct = ColumnTransformer([('o', OneHotEncoder(), cat_ix)], remainder='passthrough')
temp_df = ct.fit_transform(temp_df)
It fails on the temp_df = ct.fit_transform(temp_df) line.
These identical steps worked perfectly before I added the temp column and concat'd the two df's.
The exact error:
Traceback (most recent call last):
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\compose\_column_transformer.py", line 778, in _hstack
converted_Xs = [
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\compose\_column_transformer.py", line 779, in <listcomp>
check_array(X, accept_sparse=True, force_all_finite=False)
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\utils\validation.py", line 738, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py", line 1993, in __array__
return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: 'train'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\compose\_column_transformer.py", line 783, in _hstack
raise ValueError(
ValueError: For a sparse output, all columns should be a numeric or convertible to a numeric.
Why is it complaining about 'train'? That is in the 'temp' column which is being excluded.
Note that the traceback doesn't reference OneHotEncoder, it's all the ColumnTransformer. You're trying to pass through the temp column, which gets tacked onto the one-hot-encoded sparse matrix in the method _hstack, and the second error message is the more relevant one. It cannot stack a string-type array onto a numeric sparse array (which leads to the first error message).
If the sparse matrix isn't too large, you can just force it to be dense by using sparse_threshold=0 in the ColumnTransformer or sparse=False in the OneHotEncoder. If it is too large for memory (or you'd prefer the sparse matrices), you could use a 0/1 indicator for the train/test split instead of the strings "train", "test".

MemoryError: Unable to allocate GiB for an array with shape and data type float64 - on a sparse matrix

I am working with textual data and have a document-term matrix, represented in a scipy sparse matrix (for memory efficiency).
I have built a class in which I train a topic model (the outcome of the topic model is the matrix prob_word_given_topic.
Currently, I am doing some post analysis on different models, with the following code:
colnames = ['Model', 'Coherence','SVD_values','Min_c0','Max_c0','Min_c1','Max_c1','Min_sv0','Max_sv0','Min_sv1','Max_sv1', 'PWGT']
analysis_two_factors = pd.DataFrame(columns=colnames)
directory = 'C:~/Images/'
#Experiment with: singular values, number of topics, weighting methods
for i, top in enumerate(range(3,28,2)):
for weighting_method in [2,3,4,5,1]:
print(type(top))
one_round=[]
model = FLSA(input_file = data_list,
num_topics = top,
num_words = 20,
word_weighting =weighting_method,
svd_factors=2,
cluster_method='fcm')
model.plot_svd_graph_2D(directory)
model.plot_cluster_datapoints_graph(directory)
one_round.append(model.setting)
one_round.append(model.calc_coherence_value)
one_round.append(model.s)
one_round.append(min(model.cluster_centers[:,0]))
one_round.append(max(model.cluster_centers[:,0]))
one_round.append(min(model.cluster_centers[:,1]))
one_round.append(min(model.cluster_centers[:,1]))
one_round.append(min(model.svd_data[:,0]))
one_round.append(max(model.svd_data[:,0]))
one_round.append(min(model.svd_data[:,1]))
one_round.append(min(model.svd_data[:,1]))
one_round.append(model.prob_word_given_topic)
analysis_two_factors.loc[i] = one_round
print('Finished iteration',str(i))
However, while being in top = 19, I suddenly got the following error:
Traceback (most recent call last):
File "<ipython-input-687-fe7cf1e4ea7a>", line 15, in <module>
cluster_method='fcm')
File "<ipython-input-672-e9c098fb0e45>", line 92, in __init__
prob_word_given_doc = np.asarray(self.sparse_weighted_matrix / self.sparse_weighted_matrix.sum(1))
File "c:~\continuum\anaconda3\lib\site-packages\scipy\sparse\base.py", line 620, in __truediv__
return self._divide(other, true_divide=True)
File "c:~\continuum\anaconda3\lib\site-packages\scipy\sparse\base.py", line 599, in _divide
return np.true_divide(self.todense(), other)
MemoryError: Unable to allocate 2.87 GiB for an array with shape (4280, 90140) and data type float64
This surprises me, as it did perform all previous iterations in the loop and also self.sparse_weighted_matrix is a sparse matrix (dok_matrix), so I dont expect such high memory requirements here. Can somebody explain why I get this error? And what I can do to overcome the problem?
prob_word_given_doc = np.asarray(self.sparse_weighted_matrix / self.sparse_weighted_matrix.sum(1))

BERT NER: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first

I want to train my BERT NER model on colab. But following error occurs
Code:
tr_logits = tr_logits.detach().cpu().numpy()
tr_label_ids = torch.masked_select(b_labels, (preds_mask == 1))
tr_batch_preds = np.argmax(tr_logits[preds_mask.squeeze()], axis=1)
tr_batch_labels = tr_label_ids.to(device).numpy()
tr_preds.extend(tr_batch_preds)
tr_labels.extend(tr_batch_labels)
Error:
Using TensorFlow backend.
Saved standardized data to ./data/en/combined/train_combined.txt.
Saved standardized data to ./data/en/combined/dev_combined.txt.
Saved standardized data to ./data/en/combined/test_combined.txt.
Constructed SentenceGetter with 25650 examples.
Constructed SentenceGetter with 8934 examples.
Loaded training and validation data into DataLoaders.
Initialized model and moved it to cuda.
Initialized optimizer and set hyperparameters.
Epoch: 0% 0/5 [00:00<?, ?it/s]Starting training loop.
Epoch: 0% 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/content/FYP_Presentation/python/main.py", line 102, in <module>
valid_dataloader,
File "/content/FYP_Presentation/python/utils/main_utils.py", line 431, in train_and_save_model
tr_batch_preds = torch.max(tr_logits[preds_mask.squeeze()], axis=1)
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 412, in __array__
return self.numpy()
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
How would I solve this issue?
In the first line of your code, tr_logits = tr_logits.detach().cpu().numpy() already turns tr_logits into a numpy array. In the line that raises the error:
tr_batch_preds = torch.max(tr_logits[preds_mask.squeeze()], axis=1)
the first thing for the program to do is to evaluate tr_logits[preds_mask.squeeze()]. Now that tr_logits is numpy array, its index preds_mask must also be numpy array. So the programs calls preds_mask.numpy() to change it to a numpy array. However, it is on GPU and hence the error.
I'd suggest using either numpy arrays or pytorch tensors all the way in one program, not alternatively .

cannot reshape array of size 64 into shape (28,28)

Not able to reshape the image in mnist dataset using sklean
This is the starting portion of my code just load the data
some_digit = X[880]
some_digit_image = some_digit.reshape(28, 28)
ERROR PART
ValueError Traceback (most recent call last)
<ipython-input-15-4d618bdb57bc> in <module>
1 some_digit = X[880]
----> 2 some_digit_image = some_digit.reshape(28,28)
ValueError: cannot reshape array of size 64 into shape (28,28)
You can only reshape it into a 8, 8 array. 8x8=64
try:
some_digit = X[880]
some_digit_image = some_digit.reshape(8, 8)

scikit learn says num samples must be greater than num clusters

Using sklearn.cluster.KMeans. Nearly this exact code worked earlier, all I changed was the way I built my dataset. I have just no idea where even to start... Here's the code:
from sklearn.cluster import KMeans
km = KMeans(n_clusters=20)
for item in dfX:
if type(item) != type(dfX[0]):
print(item)
print(len(dfX))
print(dfX[:10])
km.fit(dfX)
print(km.cluster_centers_)
Which outputs the following:
12147
[1.201, 1.237, 1.092, 1.074, 0.979, 0.885, 1.018, 1.083, 1.067, 1.071]
/home/sbendl/anaconda3/lib/python3.5/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Traceback (most recent call last):
File "/home/sbendl/PycharmProjects/MLFP/K-means.py", line 20, in <module>
km.fit(dfX)
File "/home/sbendl/anaconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py", line 812, in fit
X = self._check_fit_data(X)
File "/home/sbendl/anaconda3/lib/python3.5/site-packages/sklearn/cluster/k_means_.py", line 789, in _check_fit_data
X.shape[0], self.n_clusters))
ValueError: n_samples=1 should be >= n_clusters=20
Process finished with exit code 1
As you can see from the output, there are definitely 12147 samples, which is greater than 20 in most counting systems ;). Additionally they're all floats, so it couldn't be having a problem with that. Anyone have any ideas?

Resources