padding in tf.data.Dataset in tensorflow - python-3.x

Code:
a=training_dataset.map(lambda x,y: (tf.pad(x,tf.constant([[13-int(tf.shape(x)[0]),0],[0,0]])),y))
gives the following error:
TypeError: in user code:
<ipython-input-32-b25101c2110a>:1 None *
a=training_dataset.map(lambda x,y: (tf.pad(tensor=x,paddings=tf.constant([[13-int(tf.shape(x)[0]),0],[0,0]]),mode="CONSTANT"),y))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:264 constant **
allow_broadcast=True)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:282 _constant_impl
allow_broadcast=allow_broadcast))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:456 make_tensor_proto
_AssertCompatible(values, dtype)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:333 _AssertCompatible
raise TypeError("Expected any non-tensor type, got a tensor instead.")
TypeError: Expected any non-tensor type, got a tensor instead.
However, when I use:
a=training_dataset.map(lambda x,y: (tf.pad(x,tf.constant([[1,0],[0,0]])),y))
Above code works fine.
This brings me to the conclusion that something is wrong with: 13-tf.shape(x)[0] but cannot understand what.
I tried converting the tf.shape(x)[0] to int(tf.shape(x)[0]) and still got the same error.
What I want the code to do:
I have a tf.data.Dataset object having variable length sequences of size (None,128) where the first dimension(None) is less than 13. I want to pad the sequences such that the size of every collection is 13 i.e (13,128).
Is there any alternate way (if the above problem cannot be solved)?

A solution that works:
using:
paddings = tf.concat(([[13-tf.shape(x)[0],0]], [[0,0]]), axis=0)
instead of using:
paddings = tf.constant([[13-tf.shape(x)[0],0],[0,0]])
works for me.
However, I still cannot figure out why the latter one did not work.

Related

ValueError: cannot reshape array of size 14333830 into shape (14130,1,1286), how do I solve this?

Getting this error in python:
ValueError: cannot reshape array of size 14333830 into shape (14130,1,1286),
How do I solve this?
This is the code generating the error:
data_train1=data_train.reshape(14130,1,1286)
For doing reshaping, your new shape should match the previous shape. If you multiply 14130 * 1286, you get 18171180 which is obviously not the same as 14333830. So you must write something correct.

Can you please help 'tf.vectorize_map' Shape output error

I am trying to iterate over two arrays of both batch size 32 but with nested tf.vectorize_maps inside them so one for each Array, but they are different shapes so: (32,12-1,4) and (32,1024,4)
But I am doing this inside of model.fit/model.train_on_batch inside trainging_step inside nested tf.function. And I get this error.
And I get this error:
tf.vectorized Input to reshape is a tensor with 98304 values, but the requested shape has 288.
Attempts to fix or original.
IoUs = tf.vectorized_map(lambda batch: tf.vectorized_map(lambda ypredValues:
tfytrue(ytrue[batch],ypredValues),ypred[batch]),tf.range(32))
IoUs = tf.vectorized_map(lambda batch: tf.vectorized_map(lambda ypredValues: tfytrue(batch[0],ypredValues),batch[1]),(ytrue,ypred))
It works when batch index is the same:
IoUs = tf.vectorized_map(lambda batch: tf.vectorized_map(lambda ypredValues: tfytrue(batch[1],ypredValues),batch[1]),(ytrue,ypred`))
But that is stupid because I can't parrel map the batch size and use the same array: maybe it's inside the other vectorized_maps but if anyone can help please, would be great. And the reason for me using tf.vectorized is that the speed increase is about 3000x over map_fn and I was able to use numpy as tf.py_func and tf.numpy_array returned invalid placeholder values.
But it does work with elems=(ytrue,ypred) but not in model.fit or model.train_on_batch.
RegLoss = tfIoU(batch[1],self.RPN(batch[0])[0])
Works Great but inside of train_on_batch/model.fit inside my custom train_step tfIoU(ytrue,ypred):
Input to reshape is a tensor with 393216 values, but the requested shape has 4608
No reshaping is happening and it traces back to the batch vectorized map
My question is that because it works the first time it like it sets the shape for the tf.vectorized map then it doesn't work the second time?
so (32,12,4) works then (32,8,4) fails then reverse the order and they do the same can anyone help?
It was the tf.function(); it can only be on the last nested function because it implies shapes no matter if its expermital_relax_shapes=True.

Customise train_step in model.fit() "OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function"

I am trying to write a custom train_step to use in the tf.keras.Model.fit() function. I am following tensor flow tutorial. Here in the train_step function from what I understand the input argument data is supposed to be the training dataset that I am about to pass in Model.fit() function. My dataset is TFRecordDataset. My dataset gives three particular features i.e. image, labels and the box. So, in the train_step function i am first trying to get the img, labels and box parameters from the data argument that is passed.
def train_step(self, data):
print("printing data fed to train_step")
print(data)
img, label, gt_boxes = data
if self.DEBUG:
if(img == None):
print("img input in train step is none")
with tf.GradientTape() as tape:
rpn_classification, rpn_regression = self(img, training=True)
self.tf_rpn_target_generation_layer(gt_boxes, rpn_regression)
loss = self.rpn_loss_function(rpn_classification)
trainable_vars = self.trainable_variables
gradients = tape.gradient(loss, trainable_vars)
self.optimizer.apply_gradients(zip(gradients, trainable_vars))
loss_tracker.update_state(loss)
#mae_metric.update_state()
return [loss_tracker]
The above is the code I use for my custom train_step function. When I run the fit, I get the following error
OperatorNotAllowedInGraphError: iterating over tf.Tensor is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.
I have used shuffle, cache, and repeat operations on my training dataset. Can anyone please help me understand why exactly this error appears?
From my previous experience, i generally create an iterator for the dataset followed by get_next operation to obtain the features.
Edit:
I have tried the following procedures but did not yield any outcome
Since the data being sent into the train_step is a dataset object, I have used tf.raw_ops.IteratorGetNext method to access the elements of the iterator which returned an error saying
"TypeError: Input 'iterator' of 'IteratorGetNext' Op has type string that does not match the expected type of resource."
To fix this error, I have assumed that it was likely tensorflow returning iterator graph and hence unable to access the elements, so I have added run_eagerly=True argument to the model.compile() function which returned gibberish being printed and the same error.
Epoch 1/5
printing data fed to train_step
Tensor("Shape:0", shape=(0,), dtype=int32)
Tensor("IteratorGetNext:0", shape=(), dtype=string)
I have found the solution. The data that is being passed to my step function is an iterator and hence I have to use tf.raw_ops.IteratorGetNext method to access the contents of the iterator.
When doing this I initially got another error saying that the iterator type does not match the expected type of resource and when debugged carefully I understood that the read_tfrecords mapping that I had to do to the dataset was unsuccessful and that lead to the dataset still containing unmapped tfrecords of format tf.string which is not an expected type of resource for the train_Step.

shap.force_plot() raises Exeption: In v0.20 force_plot now requires the base value as the first parameter

I'm using Catboost and would like to visualize shap_values:
from catboost import CatBoostClassifier
model = CatBoostClassifier(iterations=300)
model.fit(X, y,cat_features=cat_features)
pool1 = Pool(data=X, label=y, cat_features=cat_features)
shap_values = model.get_feature_importance(data=pool1, fstr_type='ShapValues', verbose=10000)
shap_values.shape
Output: (32769, 10)
X.shape
Output: (32769, 9)
Then I do the following and an exception is raised:
shap.initjs()
shap.force_plot(shap_values[0,:-1], X.iloc[0,:])
Exception: In v0.20 force_plot now requires the base value as the first parameter! Try shap.force_plot(explainer.expected_value, shap_values) or for multi-output models try shap.force_plot(explainer.expected_value[0], shap_values[0]).
The following works, but I would like to make force_plot() work:
shap.initjs()
shap.summary_plot(shap_values[:,:-1], X)
I read the Documentation but can't make sense of explainer. I tried:
explainer = shap.TreeExplainer(model,data=pool1)
#Also tried:
explainer = shap.TreeExplainer(model,data=X)
but I get: TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Can anyone point me in the right direction? THX
I had the same error as below-
Exception: In v0.20 force_plot now requires the base value as the
first parameter! Try shap.force_plot(explainer.expected_value,
shap_values) or for multi-output models try
shap.force_plot(explainer.expected_value[0], shap_values[0]).
This helped me resolve the issue-
import shap
explainer = shap.TreeExplainer(model,data=X)
shap.initjs()
shap.force_plot(explainer.expected_value[0],X.iloc[0,:])
Also for the below issue -
TypeError: ufunc 'isnan' not supported for the input types, and the
inputs could not be safely coerced to any supported types according to
the casting rule ''safe''
Check your data, if it contains any NaN's or missing values.
Hope this helps!
try this:
shap.force_plot(explainer.expected_value, shap_values.values[0, :], X.iloc[0, :])
Building on #Sparsha's answer, since I was still getting errors, what worked for me was:
explainer = shap.TreeExplainer(model, data = X)
shap_values = explainer.shap_values(X_train)
shap.initjs()
shap.force_plot(explainer.expected_value, shap_values[0], feature_names = explainer.data_feature_names)

Fitting multiple gaussian using **curve_fit** function from scipy using python 3.x

I am trying to trimodal gaussian functions using scipy and python 3.x. I think I'm really almost there but I'm scratching my head here because I can't quite figure out what is going wrong with it.
data =np.loadtxt('mock.txt')
my_x=data[:,0]
my_y=data[:,1]
def gauss(x,mu,sigma,A):
return A*np.exp(-(x-mu)**2/2/sigma**2)
def trimodal_gauss(x,mu1,sigma1,A1,mu2,sigma2,A2,mu3,sigma3,A3):
return gauss(x,mu1,sigma1,A1)+gauss(x,mu2,sigma2,A2)+gauss(x,mu3,sigma3,A3)
"""""
Gaussian fitting parameters recognized in each file
"""""
first_centroid=(10180.4*2+9)/9
second_centroid=(10180.4*2+(58.6934*1)+7)/9
third_centroid=(10180.4*2+(58.6934*2)+5)/9
centroid=[]
centroid+=(first_centroid,second_centroid,third_centroid)
apparent_resolving_power=1200
sigma=[]
for i in range(len(centroid)):
sigma.append(centroid[i]/((apparent_resolving_power)*2.355))
height=[1,1,1]
p=[]
p = [list(t) for t in zip(centroid, sigma, height)]
for i in range(9):
popt, pcov = curve_fit(trimodal_gauss,my_x,my_y,p0=p[i])
Using this code, I get the following error.
TypeError: trimodal_gauss() missing 6 required positional arguments: 'mu2', 'sigma2', 'A2', 'mu3', 'sigma3', and 'A3'
I understand what the error message is saying but I don't think I understand how I'm not providing the 6 initial guesses.
I appreciate your input!
It looks like you are trying to call curve_fit nine separate times, and give it a different initial parameter guess by specifying p0=p[i] (which is probably not what your code does, because p is a nested list).
You should make sure that p is a one-dimensional array with 9 elements, and call curve_fit only once. Something like
p = np.array([list(t) for t in zip(centroid, sigma, height)]).flatten()
popt, pcov = curve_fit(trimodal_gauss,my_x,my_y,p0=p])
might work.

Resources