Can you please help 'tf.vectorize_map' Shape output error - python-3.x

I am trying to iterate over two arrays of both batch size 32 but with nested tf.vectorize_maps inside them so one for each Array, but they are different shapes so: (32,12-1,4) and (32,1024,4)
But I am doing this inside of model.fit/model.train_on_batch inside trainging_step inside nested tf.function. And I get this error.
And I get this error:
tf.vectorized Input to reshape is a tensor with 98304 values, but the requested shape has 288.
Attempts to fix or original.
IoUs = tf.vectorized_map(lambda batch: tf.vectorized_map(lambda ypredValues:
tfytrue(ytrue[batch],ypredValues),ypred[batch]),tf.range(32))
IoUs = tf.vectorized_map(lambda batch: tf.vectorized_map(lambda ypredValues: tfytrue(batch[0],ypredValues),batch[1]),(ytrue,ypred))
It works when batch index is the same:
IoUs = tf.vectorized_map(lambda batch: tf.vectorized_map(lambda ypredValues: tfytrue(batch[1],ypredValues),batch[1]),(ytrue,ypred`))
But that is stupid because I can't parrel map the batch size and use the same array: maybe it's inside the other vectorized_maps but if anyone can help please, would be great. And the reason for me using tf.vectorized is that the speed increase is about 3000x over map_fn and I was able to use numpy as tf.py_func and tf.numpy_array returned invalid placeholder values.
But it does work with elems=(ytrue,ypred) but not in model.fit or model.train_on_batch.
RegLoss = tfIoU(batch[1],self.RPN(batch[0])[0])
Works Great but inside of train_on_batch/model.fit inside my custom train_step tfIoU(ytrue,ypred):
Input to reshape is a tensor with 393216 values, but the requested shape has 4608
No reshaping is happening and it traces back to the batch vectorized map
My question is that because it works the first time it like it sets the shape for the tf.vectorized map then it doesn't work the second time?
so (32,12,4) works then (32,8,4) fails then reverse the order and they do the same can anyone help?

It was the tf.function(); it can only be on the last nested function because it implies shapes no matter if its expermital_relax_shapes=True.

Related

Cant perform forward pass twice in DDP

I am trying to forward pass 2 different inputs with the same model as shown below:
for epoch in range(num_epochs):
dataloader.sampler.set_epoch(epoch)
for batch_index, (real,_) in enumerate(dataloader):
disc.zero_grad()
real=real.to(rank)
noise=torch.randn((batch_size,z_dim,1,1)).to(rank)
fake_img=gen(noise)
fake_img_clone=fake_img.detach().clone()
disc_real=disc(real).reshape(-1)
lossD_real=critereon(disc_real,torch.ones_like(disc_real))
disc_fake=disc(fake_img.detach()).reshape(-1)
lossD_fake=critereon(disc_fake,torch.zeros_like(disc_fake))
lossD = (lossD_fake+lossD_real)/2
opt_disc.step()
However, I keep getting the error "one of the variables needed for gradient computation has been modified by an inplace operation"
Setting torch.autograd.set_detect_anomaly(True, check_nan=True) shows that the error occurs in disc_real=disc(real).reshape(-1), but when I manually debug it, the error occurs only when I add the second forward pass line disc_fake=disc(fake_img.detach()).reshape(-1)
I am currently using the latest version of pytorch. Please help me solve this :frowning:
The error message "one of the variables needed for gradient computation has been modified by an inplace operation" typically occurs when you modify a tensor in-place, which can break the computation graph and cause issues with backpropagation.
In your code, it seems that you are modifying the fake_img tensor in-place when you detach and clone it with fake_img_clone=fake_img.detach().clone(). This is because the detach() function returns a new tensor with the same data as the original tensor, but it shares the same storage as the original tensor, which means that modifying the new tensor will also modify the original tensor.
To avoid this issue, you can detach the fake_img tensor without cloning it, like this: fake_img.detach(). This will create a new tensor that is not connected to the computation graph and will not be modified by any subsequent operations.
Here's the updated code:
for epoch in range(num_epochs):
dataloader.sampler.set_epoch(epoch)
for batch_index, (real,_) in enumerate(dataloader):
disc.zero_grad()
real = real.to(rank)
noise = torch.randn((batch_size,z_dim,1,1)).to(rank)
fake_img = gen(noise)
disc_real = disc(real).reshape(-1)
lossD_real = critereon(disc_real, torch.ones_like(disc_real))
with torch.no_grad():
fake_img_detached = fake_img.detach()
disc_fake = disc(fake_img_detached).reshape(-1)
lossD_fake = critereon(disc_fake, torch.zeros_like(disc_fake))
lossD = (lossD_fake + lossD_real) / 2
lossD.backward()
opt_disc.step()
In the updated code, we detach the fake_img tensor without cloning it, and store the result in a new tensor fake_img_detached with a with torch.no_grad() context manager to avoid tracking the gradient of the detached tensor. We then use fake_img_detached for the forward pass of the discriminator, which should avoid the in-place modification issue.

Tokenizer can add padding without error, but data collator cannot

I'm trying to fine tune a GPT2-based model on my data using the run_clm.py example script from HuggingFace.
I have a .json data file that looks like this:
...
{"text": "some text"}
{"text": "more text"}
...
I had to change the default behavior of the script that used to concatenate input text, because all my examples are separate demonstrations that should not be concatenated:
def add_labels(example):
example['labels'] = example['input_ids'].copy()
return example
with training_args.main_process_first(desc="grouping texts together"):
lm_datasets = tokenized_datasets.map(
add_labels,
batched=False,
# batch_size=1,
num_proc=data_args.preprocessing_num_workers,
load_from_cache_file=not data_args.overwrite_cache,
desc=f"Grouping texts in chunks of {block_size}",
)
This essentially only adds the appropriate 'labels' field required by CLM.
However since GPT2 has a 1024-sized context-window, the examples should be padded to that length.
I can achieve this by modifying the tokenization procedure like this:
def tokenize_function(examples):
with CaptureLogger(tok_logger) as cl:
output = tokenizer(
examples[text_column_name], padding='max_length') # added: padding='max_length'
# ...
The training runs correctly.
However, I believe this should not be done by the tokenizer, but by the data collator instead. When I remove padding='max_length' from the tokenizer, I get the following error:
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`labels` in this case) have excessive nesting (inputs type `list` where type `int` is expected).
And also, above that:
Traceback (most recent call last):
File "/home/jan/repos/text2task/.venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 716, in convert_to_tensors
tensor = as_tensor(value)
ValueError: expected sequence of length 9 at dim 1 (got 33)
During handling of the above exception, another exception occurred:
To fix this, I have created a data collator that should do the padding:
data_collator = DataCollatorWithPadding(tokenizer, padding='max_length')
This is what is passed to the trainer. However, the above error remains.
What's going on?
I managed to fix the error but I'm really unsure about my solution, details below. Will accept a better answer.
This seems to solve it:
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, padding=True)
Found in the documentation
It seems like DataCollatorWithPadding doesn't pad the labels?
My problem is about generating an output sequence from an input sequence, so I'm guessing that using DataCollatorForSeq2Seq is what I actually want to do. However, my data does not have separate input and target columns, but a single text column (that contains a string input => target). I'm not really that this collator is intended to be used for GPT2...

How to set Keras TimeseriesGenerator to predict the second next value?

Currently I have the following code using TimeseriesGenerator from Keras:
TimeseriesGenerator(train, prediction, length=TIME_STEPS, batch_size=1)
Currently this shifts prediction one value backwards, so the train data for t will have the output of t+1. Which makes sense, but I want to predict t+2, thus train data for t will have the output of t+2.
Is there any way to do it using TimeseriesGenerator?
The quickest solution is to just shift your predictions by 1, ie.:
TimeseriesGenerator(train[:-1], prediction[1:], length=TIME_STEPS, batch_size=1)
Note that you have to trim the train set, so both datasets have equal lengths.
You can also use the timeseries_dataset_from_array function where you can align the data and targets according to your needs as you can read in the documentation:
data: Numpy array or eager tensor containing consecutive data points
(timesteps). Axis 0 is expected to be the time dimension.
targets:
Targets corresponding to timesteps in data. It should have same length
as data. targets[i] should be the target corresponding to the window
that starts at index i (see example 2 below). Pass None if you don't
have target data (in this case the dataset will only yield the input
data).
So in your case it would be something like this:
tf.keras.preprocessing.timeseries_dataset_from_array(
train[:-TIME_STEPS-2],
prediction[TIME_STEPS+2:],
length=TIME_STEPS,
batch_size=1
)

ValueError: shapes (5,14) and (16,) not aligned: 14 (dim 1)!= 16 (dim 0)

I am working on housing dataset and when trying to fit the linear regression model getting error as mentioned. Complete code as below.
I am not sure where is code going wrong. I tried pasting the code as it is from the reference book.
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing_prepared, housing_labels)
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.transform(some_data)
print("Predictions:\t", lin_reg.predict(some_data_prepared))
ERROR: ValueError: shapes (5,14) and (16,) not aligned: 14 (dim 1) != 16 (dim 0)
What am I doing wrong here?
Explanation
Hi, I guess you are reading and following the Hands on Machine Learning with Scikit Learn and Tensorflow book. The problem also occurred to me.
In the following part of the code you select from the data set the first 5 instances. One of the attributes in the data set which is called ocean_proximity is an object and for the linear regression model to be able to operate with it, it must be translated to an integer, which in the book is done with a one hot encoding.
One hot encoding works by analyzing all the categories that can be assigned to the attribute, in this case 5 ('<1H OCEAN', 'INLAND', 'NEAR OCEAN', 'NEAR BAY', 'ISLAND'), and then creating a matrix of that length for each instance and zeroing every element of the matrix except the category of that instance which is assigned a 1 (or another value). For example:
If ocean_proximity equals '<1H OCEAN' the conversion would be [1, 0, 0, 0, 0]
In this piece of code you select the five first instances of the data set, but this does not assure you that all the categories in "ocean_proximity" will appear. It could happen that only 3 of them appear or just 1. Therefor if you apply a one hot encoding to those five selected rows and only 3 categories appear (for example just 'INLAND', 'ISLAND' and 'NEAR BAY'), the matrices created by the one hot encoding will be of length 3.
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.transform(some_data)
The error is just telling you that, since the one hot conversion of some_data created matrices of a length inferior to 5, the total columns in some_data_prepared is 14, which is less than the columns in housing_prepared (16), thus making the model unable to predict the prices.
If you transform both some_data_prepared and housing_prepared into dataframes and then call .head() you will see the problem.
some_data_prepared.head()
housing_prepared.head()
Solution
To solve the problem you must create the columns missing in some_data_prepared by creating a zeroed numpy array of shape [5,x] (being 5 the number of rows and x the number of columns missing) and concatenating it to some_data_prepared to match the shape of the housing_prepared data set.
some_data = housing.iloc[:5]
some_labels = housing_labels.iloc[:5]
some_data_prepared = full_pipeline.fit_transform(some_data)
dummy_array = np.zeros((5,1))
some_data_prepared = np.c_[some_data_prepared, dummy_array]
predictions = linear_regression.predict(some_data_prepared)
print("Predictions: ", predictions)
print("Labels: ", some_labels.values)
Missing category values (ocean proximity in this case) in some_data compared to housing_prepared is the issue.
housing_prepared.shape gives (16512, 16), but some_data_prepared.shape gives (5,14), so add zeros for the missing columns:
dummy_array = np.zeros((5,2))
some_data_prepared = np.c_[some_data_prepared,dummy_array]
the 2 in np.zeros determines the difference of columns
I've at first encountered the same issue on the considered piece of code. After exploring the issues of the handson-ml repository, I think I have understood the subtlety which is causing the error here.
My guess is that (as in my case), closing the notebook might have caused what was in memory (and the trained model in particular) to be lost. In my case, I could get the result and avoid the error rerunning the notebook from the beginning.
Instead, from a theoretical viewpoint, you should never call fit() or fit_transform() on data which is not training data (eg on some_data). Here, running fit_transform(some_data) and then stacking the dummy array to some_data_prepared works, but it forces the model to be trained again on some_data rather than on housing_prepared, which is not what you want.

Number of operation increases with tf.gradient

So I was trying to calculate the gradient wrt input, using a combination of Keras and tensorflow:
the code (in a loop) is like:
import keras.backend as K
loss = K.categorical_crossentropy(model's output, target)
gradient = sess.run([tf.gradients(loss, model.input, colocate_gradients_with_ops=True)],feed_dict={model.input: img}) # img is a numpy array that fits the dimension requirements
n_operations = len(tf.get_default_graph().get_operations())
I noticed that "n_operations" increases every iteration, and so as time it costs. Is that normal? Is there any way to prevent this?
Thank you!
No this is not the desired behavior. Your problem is that you are defining your gradient operation again and again, while you only need to execute the operation. The tf.gradient function pushes new operations onto the graph and return a handle to those gradients. So you only have to execute them to get the desired results. With multiple runs of the function multiple operations are generated and this will eventually ruin your performance. The solution is as follows:
# outside the loop
loss = K.categorical_crossentropy(model's output, target)
gradients = tf.gradients(loss, model.input, colocate_gradients_with_ops=True)
# inside the loop
gradient_np = sess.run([gradients],feed_dict={model.input: img}) # img is a numpy array that fits the dimension requirements

Resources