I'm trying to build a custom TensorFlow layer from two "input sources" that should do this
exp(source A) + cos(source B)
However, I have no idea how to even set-up writing such a custom layer.
Note: I'm really looking to learn/understand how this works so a workaround would be sub-optimal...
this is a possibility
class custom_layer(tf.keras.layers.Layer):
def __init__(self):
super(custom_layer, self).__init__()
pass
def call(self, inputs):
input1, input2 = inputs
return tf.exp(input1) + tf.cos(input2)
inp1 = Input((10,))
inp2 = Input((10,))
x = custom_layer()([inp1,inp2])
x = Dense(1)(x)
model = Model([inp1,inp2],x)
model.compile('adam','mse')
model.summary()
X1 = np.random.uniform(0,1, (100,10))
X2 = np.random.uniform(0,1, (100,10))
y = np.random.uniform(0,1, 100)
model.fit([X1,X2],y, epochs=3)
Related
class MixModel(nn.Module):
def __init__(self,pre_trained='bert-base-uncased'):
super().__init__()
config = BertConfig.from_pretrained('bert-base-uncased', output_hidden_states=True)
self.bert = BertModel.from_pretrained('bert-base-uncased',config=config)
self.hidden_size = self.bert.config.hidden_size
self.conv = nn.Conv1d(in_channels=3072, out_channels=256, kernel_size=5, stride=1)
self.relu = nn.ReLU()
self.pool = nn.MaxPool1d(kernel_size= 64- 5 + 1)
self.dropout = nn.Dropout(0.3)
self.flat=nn.Flatten()
self.clf1 = nn.Linear(256,256)
self.clf2= nn.Linear(256,6)
def forward(self,inputs, mask , labels):
inputs=torch.tensor(inputs)
mask=torch.tensor(mask)
labels=torch.tensor(labels)
x = self.bert(input_ids=inputs,attention_mask=mask, return_dict= True)
x = self.conv(x)
x = self.relu(x)
x = self.pool(x)
x = self.dropout(x)
x = self.flat(x)
x = self.clf1(x)
x = self.clf2(x)
return x
I want to save model,weights and config file for my model after training. after searching I found that model.save_pretrained function is good solution for me but I got an error that model called mixmodel has no function called save_pretrained
so how can I save config file for my model mixmodel?
I think that "state_dict" is what you need.
There's good tutorial for PyTorch in the documentation
https://pytorch.org/tutorials/beginner/saving_loading_models.html
I created a GAN for text and using VanillaGAN training approach, but the main problem is the sequences that are created are good, but the main problem is that when I use nn.sigmoid to see the discriminator labels it shows [0] for data that is created which are completely real, and it is not correct.
Here is my Discriminator code:
class Classifier(nn.Module):
def __init__(self, hidden_size, hidden_size2, dropout):
super().__init__()
self.FC1 = nn.Sequential(
nn.Linear(512, hidden_size),
nn.LeakyReLU(0.1),
nn.Dropout(dropout))
self.FC2 = nn.Sequential(
nn.Linear(hidden_size, hidden_size2),
nn.LeakyReLU(0.1),
nn.Dropout(dropout)
)
self.FC3 = nn.Linear(hidden_size2, 1)
self.dropout = nn.Dropout(dropout)
self.bach = nn.BatchNorm1d(512)
self.bach2 = nn.BatchNorm1d(hidden_size)
self.bach3 = nn.BatchNorm1d(64)
def forward(self, x):
z = self.dropout(x)
z = self.bach(z)
z = self.FC1(z)
z = self.bach2(z)
z = self.FC2(z)
z = self.bach3(z)
out = self.FC3(z)
return out
As input to this Classifier, hidden states of real and fake sequences are feed into this network.
My loss function is BCEWithlogits and this is the Train class
# to create real labels (1s)
def label_real(size):
data = torch.ones(size, 1)
return data.to(device)
# to create fake labels (0s)
def label_fake(size):
data = torch.zeros(size, 1)
return data.to(device)
# function to train the discriminator network
def train_discriminator(optimizer, data_real, data_fake):
b_size = data_real.size(1)
real_label = label_real(b_size)
fake_label = label_fake(b_size)
optimizer.zero_grad()
output_real = discriminator(data_real)
loss_real = criterion(output_real, real_label)
output_fake = discriminator(data_fake)
loss_fake = criterion(output_fake, fake_label)
loss_real.backward()
loss_fake.backward()
optimizer.step()
return loss_real + loss_fake
I use nn.sigmoid after training and on testing the model. Please help me to know what is wrong with my neural network?
I would like to implement a GRU able to encode a sequence of vectors to one vector (many-to-one), and then another GRU able to decode a vector to a sequence of vector (one-to-many). The size of the vectors wouldn't be changed. I would like to have an opinion about what I implemented.
Here is the code:
class AEGRU(nn.Module):
def __init__(self, opt):
super(AEGRU, self).__init__()
self.length = 256
self.latent_space = 256
self.num_layers = 1
self.GRU_enc = nn.GRU(input_size=3, hidden_size=self.latent_space, num_layers=self.num_layers, batch_first=True)
self.fc_enc = nn.Linear(self.latent_space, self.latent_space)
self.GRU_dec = nn.GRU(input_size=self.latent_space, hidden_size=3, num_layers=self.num_layers, batch_first=True)
self.fc_dec = nn.Linear(3, 3)
def enc(self, x):
# x has shape: Batch_size x self.length x 3
h0 = torch.zeros(self.num_layers, x.shape[0], self.latent_space).cuda()
out, _ = self.GRU_enc(x, h0)
out = out[:, -1, :]
out = self.fc_enc(out)
return out
def dec(self, x):
# x has shape: Batch_size x self.latent_space
x = x[:, None, :]
h = torch.zeros(self.num_layers, x.shape[0], 3).cuda()
# method 1 ??
'''outputs = torch.zeros(x.shape[0], self.length, 3).cuda()
for i in range(self.length):
out, h = self.GRU_dec(x, h)
outputs[:, i, :] = out[:, 0, :]'''
# method 2 ??
x = x.repeat(1, self.length, 1)
outputs, _ = self.GRU_dec(x, h)
# linear layer
outputs = self.fc_dec(outputs)
return outputs
def forward(self, x):
self.indices = []
latent = self.enc(x)
output = self.dec(latent)
return output
I am not sure whether this is the good way to do a one-to-many GRU. Could I have some opinions about this?
Thanks for reading!
I have two networks. The output of the first network is the input to the other. In order to calculate the loss for the second network, I use vanilla policy gradient. I want to backpropagate this loss into the first network. After checking if the gradeints has changed, I see that they are all none.
I first load the first network (a pre-trained autoencoer in my network this way):
def load_checkpoint(filepath, model):
checkpoint = torch.load(filepath)
model.load_state_dict(checkpoint['state_dict'])
for parameter in model.parameters():
parameter.requires_grad = True
model.train()
return model
Then I define the optimizers for both networks this way:
class MultipleOptimizer(object):
def __init__(self, *op):
self.optimizers = op
def zero_grad(self):
for op in self.optimizers:
op.zero_grad()
def step(self):
for op in self.optimizers:
op.step()
opt = MultipleOptimizer(SGD(model.parameters(), lr=1, momentum=0.9), Adam(logits_net.parameters(), lr=lr))
the reward function is:
#Reward function
def reward(x, act):
#print('action', act)
#print('x type', type(x))
km = KMeans(act, n_init=20, n_jobs=4)
y_pred = km.fit_predict(x.detach().cpu().numpy())# seems we can only get a centre from batch
#print('k-means output type', type(y_pred))
sil_score = sil(x.detach().cpu().numpy(), y_pred)
#print('sil score', sil_score)
return sil_score
The architecture of the second neural net and an alternative to avoid (logits=logits.mean(0)):
def mlp(sizes, activation=nn.Tanh, output_activation=nn.Identity):
# Build a feedforward neural network. outputs are the logits
layers = []
for j in range(len(sizes)-1):
act = activation if j < len(sizes)-2 else output_activation
layers += [nn.Linear(sizes[j], sizes[j+1]), act()]
return nn.Sequential(*layers)
class mlp2(torch.nn.Module):
def __init__(self):
super(mlp2, self).__init__()
self.linear1 = nn.Linear(10,100)
self.relu1 = nn.ReLU(inplace=True)
self.linear2 = torch.nn.Linear(100,100)
self.linear3 = torch.nn.Linear(100,20)
self.linear4 = torch.nn.Linear(2000,100)
self.ident = nn.Identity()
def forward(self, x):
a = self.linear1(x)
a = self.relu1(a)
a = self.linear2(a)
a = self.relu1(a)
a = self.linear3(a)
a = torch.flatten(a)
a = self.linear4(a)
a = self.relu1(a)
a = self.linear3(a)
out = self.ident(a)
return out
Loss is calculated as in the following order:
def get_policy(obs):
logits = logits_net(obs)
return Categorical(logits=logits.mean(0))
def get_action(obs):
return get_policy(obs).sample().item()
def Logp(obs, act):
logp = get_policy(obs).log_prob(act.cuda())
return logp
def compute_loss(logp, weights):
return -(logp * weights).mean()
def train_one_epoch():
# make some empty lists for logging.
batch_obs = [] # for observations
batch_acts = [] # for actions
batch_weights = [] # for R(tau) weighting in policy gradient
batch_logp = []
# reset episode-specific variables
j = 1 # signal from environment that episode is over
ep_rews = [] # list for rewards accrued throughout ep
for i, data in enumerate(train_loader):
#Create the mean image out of those 100 images
x, label = data
x = model(x.cuda())#torch.Size([100, 10])
obs = x.data.cpu().numpy()#[100, 10] - a trajectory with only one state
# Save obs
batch_obs.append(obs.copy())
#act in the environment
#act = get_action(torch.as_tensor(obs, dtype=torch.float32))
act = get_action(x)
print('action type', type(act))
#log probability
#logp = Logp(torch.as_tensor(obs, dtype=torch.float32),act = torch.as_tensor(act, dtype=torch.int32))
logp = Logp(x, act = torch.as_tensor(act, dtype=torch.int32))
#rew = reward(obs, act+2)
rew = reward(x, act+2)
# save action, reward
batch_acts.append(act)
batch_weights.append(rew)#episode rewards
batch_logp.append(logp)
opt.zero_grad()
batch_logp = torch.stack(batch_logp, dim=0)
batch_loss = compute_loss(logp = torch.as_tensor(batch_logp, dtype=torch.float32),
weights = torch.as_tensor(batch_weights, dtype=torch.float32))
batch_loss.backward() #does it return anything? gradients? print them!
opt.step()
for name, param in logits_net.named_parameters():
print(name, param.grad)
I applied some changes with the assumption that maybe recreating some of the tensors maybe the issue:
I have the output of the first network, obs, converted like obs = x.data.cpu().numpy() this and then sent to get_action function: act = get_action(torch.as_tensor(obs, dtype=torch.float32)). I changes this to act = get_action(x) so, x is sent directly to this function. Also, change arguments of logp to logp = Logp(x, act = torch.as_tensor(act, dtype=torch.int32)).
After these changes, I still get the none value for the gradient. Is there anyway possible to backpropagate the gradient when loss is calculated this way? any changes that I can apply?
any help is appreciated.
i have this model:
class model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(in_channels=12,out_channels=64,kernel_size=3,stride= 1,padding=1)
# self.conv2 = nn.Conv2d(in_channels=64,out_channels=64,kernel_size=3,stride= 1,padding=1)
self.fc1 = nn.Linear(24576, 128)
self.bn = nn.BatchNorm1d(128)
self.dropout1 = nn.Dropout2d(0.5)
self.fc2 = nn.Linear(128, 10)
self.fc3 = nn.Linear(10, 3)
def forward(self, x):
x = F.relu(self.conv1(x))
# x = F.relu(self.conv2(x))
x = F.max_pool2d(x, (2,2))
# print(x.shape)
x = x.view(-1,24576)
x = self.bn(F.relu(self.fc1(x)))
x = self.dropout1(x)
embeding_stage = F.relu(self.fc2(x))
x = self.fc3(embeding_stage)
return x
and i want to save the embeding_stage layer like i save the model here:
model = model()
torch.save(model.state_dict(), 'C:\project\count_speakers\model_pytorch.h5')
thanks,
Ayal
I'm not sure I understand what you mean with "save the embedding_stage layer" but if you want to save fc2 or fc3 or something, then you can do that with torch.save().
Ex: to save fc3: torch.save(model.fc3),'C:\...\fc3.pt')
Edit:
Op wants to have the output of the embedding_stage.
You can do that in several ways:
load your model with model.load_state_dict(torch.load('C:\...\model_pytorch.h5'))
then model = nn.Sequential(*list(model.children())[:-1]). The output of model is the embeding_stage.
make a Model2(nn.Module), exactly the same as your first Model(), but replace return x in def forward(self, x): with return embeding_stage. Then load the state of your first model into your second model like this: model2.load_state_dict(torch.load('C:\...\model_pytorch.h5'))
Like this fc3 will be loaded, but not used. The output of model2(x) will be the embeding_stage.