Make nn.Transformer work for Text Generation

Make nn.Transformer work for Text Generation - pytorch

I am trying to make a Transformer work for paraphrase generation but the generations are not useful (the same everytime, full of BOS tokens or "?" tokens).
I followed this tutorial for reference. My implementation is embedded into a framework which requires an Encoder and a Decoder:
The encoder is like this:
class TransformerEncoder(nn.Module):
def __init__(
self,
vocab_size,
pad_token_id=None,
embedding_size=256,
num_heads=8,
num_layers=3,
ffnn_size=512,
dropout=0.1,
):
super(TransformerEncoder, self).__init__()
self.vocab_size = vocab_size
self.pad_token_id = pad_token_id
self.embedding_size = embedding_size
self.num_heads = num_heads
self.num_layers = num_layers
self.ffnn_size = ffnn_size
self.embed_tokens = TokenEmbedding(vocab_size, embedding_size)
self.embed_positions = PositionalEmbedding(embedding_size, dropout=dropout)
encoder_layer = nn.TransformerEncoderLayer(
embedding_size,
num_heads,
ffnn_size,
dropout,
)
encoder_norm = nn.LayerNorm(embedding_size)
self.encoder = nn.TransformerEncoder(encoder_layer, num_layers, encoder_norm)
def forward(
self,
input_ids,
):
# seq_len = input_ids.shape[1]
# device = next(self.parameters()).device
embedded_tokens = self.embed_positions(self.embed_tokens(input_ids))
# B x T x C -> T x B x C
embedded_tokens = embedded_tokens.transpose(0, 1)
memory = self.encoder(embedded_tokens)
return (memory,)
The decoder is like this:
class TransformerDecoder(nn.Module):
def __init__(
self,
vocab_size,
pad_token_id=None,
embedding_size=256,
num_heads=8,
num_layers=3,
ffnn_size=512,
dropout=0.1,
):
super(TransformerDecoder, self).__init__()
self.vocab_size = vocab_size
self.pad_token_id = pad_token_id
self.embedding_size = embedding_size
self.num_heads = num_heads
self.num_layers = num_layers
self.ffnn_size = ffnn_size
self.dropout_module = nn.Dropout(p=dropout)
self.embed_tokens = TokenEmbedding(vocab_size, embedding_size)
self.embed_positions = PositionalEmbedding(embedding_size, dropout=dropout)
decoder_layer = nn.TransformerDecoderLayer(
embedding_size, num_heads, ffnn_size, dropout
)
decoder_norm = nn.LayerNorm(embedding_size)
self.decoder = nn.TransformerDecoder(decoder_layer, num_layers, decoder_norm)
self.fc_out = nn.Linear(embedding_size, vocab_size)
def forward(
self,
input_ids,
encoder_out,
):
seq_len = input_ids.shape[1]
device = next(self.parameters()).device
mask = generate_square_subsequent_mask(seq_len).to(device)
embedded_tokens = self.embed_positions(self.embed_tokens(input_ids))
# B x T x C -> T x B x C
embedded_tokens = embedded_tokens.transpose(0, 1)
output = self.decoder(embedded_tokens, encoder_out[0], tgt_mask=mask)
# T x B x C -> B x T x C
output = output.transpose(1, 0)
return (self.fc_out(output),)
TokenEmbedding and PositionalEmbedding are as in the tutorial.
The main model just invokes encoder and decoder like:
encoder_outputs = self.encoder(input_ids=input_ids, **kwargs)
decoder_outputs = self.decoder(
input_ids=decoder_input_ids,
encoder_out=encoder_outputs,
**kwargs,
)
The labels are shifted one token to the right to be fed to the decoder using:
def shift_tokens_right(self, input_ids: torch.Tensor, decoder_start_token_id: int):
shifted_input_ids = input_ids.new_zeros(input_ids.shape)
shifted_input_ids[:, 1:] = input_ids[:, :-1].clone()
shifted_input_ids[:, 0] = decoder_start_token_id
return shifted_input_ids
The loss is calculated as:
loss_fct = nn.CrossEntropyLoss(ignore_index=self.pad_token_id)
loss = loss_fct(logits.reshape(-1, logits.shape[-1]), targets.reshape(-1))
The loss is going down, but the generations are real bad. Following is an example of the generations:
Source: < s > Can I jailbreak iOS 10 ? < /s > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad >
Preds: < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s > < s >
Target: < s > Can you jailbreak iOS 10 ? < /s > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad > < pad >
As you can see, the predictions in this case are only BOS tokens. The output of the decoder on each decoder step is always almost the same for every iteration. The model does not seem to be learning. I have tried learning rates from 0.1 to 1e-4. For a brief moment at the second or third epoch, there were produced intelligible sentences, but quickly after that the generations reverted back to just BOS or PAD tokens.
Do you have an intuition on what might be wrong? Sorry for the question not being self-contained. Thanks in advance for any help you can provide.

Related

Where should I put the count in this tim sort algorithm, to accurately compare runtime to other algorithms

I've written a Timsort sorting algorithm for a computer science class, I would like to be able to compare the runtime to other similar algorithms, such as merge sort for instance. However, I am not sure where I should put the count (ie: count +=1)within the code to have an accurate run time. Any help would be much appreciated.
RUN = 32
def insertion_sort(arr, left, right):
for i in range(left + 1, right + 1):
temp = arr[i]
j = i - 1
while (arr[j] > temp and j >= left):
arr[j + 1] = arr[j]
arr[j] = temp
j -= 1
def merge(arr, left, right, count):
c = 0
index = count
length = len(left) + len(right)
while left and right:
if left[0] < right[0]:
arr[index] = left.pop(0)
c += 1
index += 1
else:
arr[index] = right.pop(0)
c += 1
index += 1
if len(left) == 0:
while c < length:
arr[index] = right.pop(0)
c += 1
index += 1
elif len(right) == 0:
while c < length:
arr[index] = left.pop(0)
c += 1
index += 1
def tim_sort(arr):
n = len(arr)
for i in range(0, n, RUN):
insertion_sort(arr, i, min((i + (RUN - 1)), (n - 1)))
size = RUN
while size < n:
for left in range(0, n, 2 * size):
if (left + size > n):
merge(arr, arr[left:n], [], left)
else:
left_sub_arr = arr[left:(left + size)]
right_sub_arr = arr[(left + size):min((left + 2 * size), n)]
merge(arr, left_sub_arr, right_sub_arr, left)
size *= 2
return arr

Whats wrong with my decimal to binary converter?

I have to create a decimal to binary converter without using lists yet my code is giving my wrong values and I can't figure out why.
def DecToBin(val):
bine = 128
counter = 8
coded = 10
binary = 0
while val > 0 and bine != 0.5:
if val < bine:
bine = bine/2
counter -= 1
elif val > bine:
val = val - bine
binary = binary + (coded ^ counter)
counter -= 1
bine = bine/2
elif val == bine:
binary = binary + (coded ^ counter)
counter = 0
val = 0
return binary
When the value input is 3 it gives me 19.

You have two issues. First, the python power operator is ** not ^. Second, bine should be 256, not 128 given your counter value. Your code should look like this:
def DecToBin(val):
bine = 256
counter = 8
coded = 10
binary = 0
while val > 0 and bine != 0.5:
if val < bine:
bine = bine/2
counter -= 1
elif val > bine:
val = val - bine
binary = binary + (coded ** counter)
counter -= 1
bine = bine/2
elif val == bine:
binary = binary + (coded ** counter)
counter = 0
val = 0
return binary

How to check distance between two objects in PyOpenGL?

I am making a RPG in PyOpenGL and I want to check if the camera is pointing at the object (made by vertices) in a certain distance. How can I do that?
I have tried to use range() on the vertices of an object to check if the camera is in the range. But it didn't work.
import pygame
from pygame.locals import *
from OpenGL.GL import *
from OpenGL.GLU import *
import math,sys
def touched(tar_x,tar_y,tar_z,tar_w,tar_h,tar_d,tar_x1,tar_y1,tar_z1,tar_w1,tar_h1,tar_d1):
for i in range(tar_x1,tar_x1 + tar_w1):
for j in range(tar_y1,tar_y1 + tar_h1):
for k in range(tar_z1,tar_z1 + tar_d1)
if (tar_x < i < tar_x + tar_w) and (tar_y < j < tar_y + tar_h) and (tar_z < k < tar_z + tar_d):
return True
return False
#[...]
while True:
#[...]
if touched(int(person.x),int(person.y),int(person.z),10,5,5,int(camera_pos[0]),int(camera_pos[1]),int(camera_pos[2]),1,1,1): #
print("yes!") #

If you wat to kow if 2 cubes are touching you've to check if the cubes are "overlapping" in all 3 dimensions.
If you've a range [tar_x, tar_x+tar_w] and a 2nd range [tar_x1, tar_x1+tar_w1] then you can check if the ranges are "overlapping" by:
intersect = tar_x < tar_x1+tar_w1 and tar_x1 < tar_x+tar_w
Do this check for all 3 dimensions:
def touched(tar_x,tar_y,tar_z,tar_w,tar_h,tar_d,tar_x1,tar_y1,tar_z1,tar_w1,tar_h1,tar_d1):
intersect_x = tar_x < tar_x1+tar_w1 and tar_x1 < tar_x+tar_w
intersect_y = tar_y < tar_y1+tar_h1 and tar_y1 < tar_y+tar_h
intersect_z = tar_z < tar_z1+tar_d1 and tar_z1 < tar_z+tar_d
return intersect_x and intersect_y and intersect_z
If you want to know, if a point is inside a cuboid volume, then you've to test for each dimension, if the coordinate tar_w1 is in the range [tar_x, tar_x+tar_w]:
is_in = tar_x < tar_x1 < tar_x+tar_w
Again check this for all 3 dimension
def isIn(tar_x,tar_y,tar_z,tar_w,tar_h,tar_d,tar_x1,tar_y1,tar_z1):
is_in_x = tar_x < tar_x1 < tar_x+tar_w
is_in_y = tar_y < tar_y1 < tar_y+tar_h
is_in_z = tar_z < tar_z1 < tar_z+tar_d
return is_in_x and is_in_y and is_in_z
If you want to know the distance of a point ton another point, e.g. the center of the a cuboid volume, then you can use pygame.math.Vector3 and .distance_to():
centerPt = pygame.math.Vector3(tar_x + tar_w/2, tar_y + tar_h/2, tar_z + tar_d/2)
point2 = pygame.math.Vector3(tar_x1, tar_y1, tar_z1)
distance = centerPt.distance_to(point2)

Im trying to get the code to sort using the median of three method

Im trying to get the code to sort using the median of three method and im running into a few problems.
alist[first], alist[pivotindex] = alist[pivotindex], alist[first]
is returning a invalid syntax and i'm not sure why.
def quickSort(alist):
quickSortHelper(alist,0,len(alist)-1)
def quickSortHelper(alist,first,last):
if first<last:
splitpoint = partition(alist,first,last)
quickSortHelper(alist,first,splitpoint-1)
quickSortHelper(alist,splitpoint+1,last)
def partition(alist,first,last):
pivotindex = median(alist, first, last, (first + last //2)
alist[first], alist[pivotindex] = alist[pivotindex], alist[first]
pivotvalue = alist[first]
leftmark = first+1
rightmark = last
done = False
while not done:
while leftmark <= rightmark and \
alist[leftmark] <= pivotvalue:
leftmark = leftmark + 1
print(alist)
while alist[rightmark] >= pivotvalue and \
rightmark >= leftmark:
rightmark = rightmark -1
print(alist)
if rightmark < leftmark:
done = True
else:
temp = alist[leftmark]
alist[leftmark] = alist[rightmark]
alist[rightmark] = temp
print(alist)
temp = alist[first]
alist[first] = alist[rightmark]
alist[rightmark] = temp
return rightmark
def median (a, i, j, k):
if a [i] < a[j]:
return j if a [j] < a[k] else k
else:
return i if a[i] < a[k] else k
alist = [54,26,93,17,77,31,44,55,20]
quickSort(alist)
print(alist)

Because the line above it is missing a ).

Send character from arduino to python decode error

I try to send string from arduino to python via bluetooth
but when I try it looks like worked, but received data doesn't looks like what I want
This is my code
[Arduino]
void Send_Joystick(int X, int Y)
{
if(800 <= X && X < 1023 && 700 <= Y && Y < 1025){ BTSerial.write(byte(10);}
else if(600 <= X && X < 800 && 700 <= Y && Y < 1025){ BTSerial.write(byte(11);}
else if(400 <= X && X < 600 && 700 <= Y && Y < 1025){ BTSerial.write(byte(12);}
else if(200 <= X && X < 400 && 700 <= Y && Y < 1025){ BTSerial.write(byte(13);}
else if(0 <= X && X < 200 && 700 <= Y && Y < 1025){ BTSerial.write(byte(14);}
else if(800 <= X && X < 1025 && 300 <= Y && Y < 700){ BTSerial.write(byte(15);}
else if(600 <= X && X < 800 && 300 <= Y && Y < 700){ BTSerial.write(byte(16);}
else if(400 <= X && X < 600 && 300 <= Y && Y < 700){ BTSerial.write(byte(17);}
else if(200 <= X && X < 400 && 300 <= Y && Y < 700){ BTSerial.write(byte(18);}
else if(0 <= X && X < 200 && 300 <= Y && Y < 700){ BTSerial.write("19>");}
else if(800 <= X && X < 1025 && 0 <= Y && Y < 300){ BTSerial.write("20>");}
else if(600 <= X && X < 800 && 0 <= Y && Y < 300){ BTSerial.write("21>");}
else if(400 <= X && X < 600 && 0 <= Y && Y < 300){ BTSerial.write("22>");}
else if(200 <= X && X < 400 && 0 <= Y && Y < 300){ BTSerial.write("23>");}
else if(0 <= X && X < 200 && 0 <= Y && Y < 300){ BTSerial.write("24>");}
}
this is just part of my code and looks different with BTSerial
because I tried many ways
[Python3]
import bluetooth
bd_addr = "98:D3:37:00:8D:39" # The address from the HC-05 sensor
port = 1
sock = bluetooth.BluetoothSocket(bluetooth.RFCOMM)
sock.connect((bd_addr,port))
while True:
try:
data = sock.recv(1024)
print(data)
except KeyboardInterrupt:
break
sock.close()
#while True:
# try:
# data = sock.recv(1024)
# print("received [%s]" %data)
# except KeyboardInterrupt:
# break
#sock.close()
# below this was in the main code below "data = sock.recv(1024)" phrase
#data_end = data.find('>')
# if data_end != -1:
# rec = data[:data_end]
# print(rec)
# data = data[data_end+1:]
and this was my python code
and when I did this python shell shows me something like this
b'\xc3\xcc\xcf'
b'\xc3'
b'\xec\xcf'
b'\xc3'
b'\xec\xce'
b'\xc3\xec\xcf'
b'\xc3\xcc'
b'\xcf'
b'\xc3'
b'\xec\xcf'
b'\xc3\xec\xcf'
b'\xc3\xcc\xcf'
and when I change my python code to
data = sock.recv(1024).decode
the outcome looks like this
built-in method decode of bytes object at 0x2d1bd40
built-in method decode of bytes object at 0x2d1bd70
built-in method decode of bytes object at 0x2d1bda0
built-in method decode of bytes object at 0x2d1bdd0
built-in method decode of bytes object at 0x2d1be00
built-in method decode of bytes object at 0x2d1be30
built-in method decode of bytes object at 0x2d1be60
I want to received data as same as i sent from arduino
but every trying always not work
How can I get this work?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Make nn.Transformer work for Text Generation - pytorch

Related

Where should I put the count in this tim sort algorithm, to accurately compare runtime to other algorithms

Whats wrong with my decimal to binary converter?

How to check distance between two objects in PyOpenGL?

Im trying to get the code to sort using the median of three method

Send character from arduino to python decode error

Categories

Resources