How do I approach a seq2seq problem with timed events? - keras

I'm looking into building a sequence to sequence neural network, where each element of the sequence is a specific event happening in time.
Specifically, the problem revolves around music: there are different notes happening at different time, each have their own length and can be denser at different times. Raw data could look something like this: (<note.eventTime>, <note.pitch>, <note.sustainLength>). The neural network would then convert a sequence of notes (a musical piece or just a part of it) into a simplified sequence (for difficulty purposes) which is easier to play.
I've been reading up on RNN with LSTM or GRU to convert natural languages but there's always this element of time in my problem that I can't seem to figure out.
Guess the main question is then: How do I transform the data such that I can feed it to the RNN with LSTM and preserve the time and the pitch data?

As stated in keras documentation lstm layer takes 3D input i.e. [batch, timesteps, feature].
So, you will have to convert your data to following shape before feeding it to LSTM.
[[[Timestamp('2022-08-01 00:00:00') 0 11]
[Timestamp('2022-08-02 00:00:00') 1 12]
[Timestamp('2022-08-03 00:00:00') 2 13]]
[[Timestamp('2022-08-04 00:00:00') 3 14]
[Timestamp('2022-08-05 00:00:00') 4 15]
[Timestamp('2022-08-06 00:00:00') 5 16]]
[[Timestamp('2022-08-07 00:00:00') 6 17]
[Timestamp('2022-08-08 00:00:00') 7 18]
[Timestamp('2022-08-09 00:00:00') 8 19]]]
Here we used timestep of 3. The following makes up one sequence:
[[Timestamp('2022-08-01 00:00:00') 0 11]
[Timestamp('2022-08-02 00:00:00') 1 12]
[Timestamp('2022-08-03 00:00:00') 2 13]]

Related

What is Coef_.T? Purpose of using T [duplicate]

This question already has an answer here:
Numpy .T syntax for Python
(1 answer)
Closed 9 months ago.
Coef_ is used to find coefficients in linear equations in Python. But Coef_, which I could not find the answer to, was put at the end of .T. What is the .T function here?
for C, marker in zip([0.001, 1, 100], ['o', '^', 'v']):
lr_l1 = LogisticRegression(C=C, penalty="l1").fit(X_train, y_train)
print("Training accuracy of l1 logreg with C={:.3f}: {:.2f}".format(
C, lr_l1.score(X_train, y_train)))
print("Test accuracy of l1 logreg with C={:.3f}: {:.2f}".format(
C, lr_l1.score(X_test, y_test)))
plt.plot(lr_l1.coef_.T, marker, label="C={:.3f}".format(C))
".T" method means Transpose which switches rows & columns
if you have a matrix m:
[1 2 3
4 5 6
7 8 9]
Then m.T would be:
[1 4 7
2 5 8
3 6 9]
It looks like its used in this line:
plt.plot(lr_l1.coef_.T,...)
to make sure it plots the coefficients in an expected way. If the model was built from sklearn LogisticRegression, then you can review the docs here
coef_ has shape (n_classes,n_features), so that means
coef_.T has shape (n_features,n_classes)
Here is a notebook that shows how this works

understanding pytorch conv2d internally [duplicate]

This question already has answers here:
Understanding convolutional layers shapes
(3 answers)
Closed 1 year ago.
I'm trying to understand what does the nn.conv2d do internally.
so lets assume we are applying Conv2d to a 32*32 RGB image.
torch.nn.Conv2d(3, 49, 4, bias=True)
so :
when we initialize the conv layer how many weights and in which shapes would it have, please tell this for biases apart?
before applying it the conv the image would have 3 * 32 * 32 shape and after applying it would have 49 * 29 * 29 so what happens in between?
I define "slide" operation (don't know real name) as multiplying first to element of kernel to first element of box in shape of image going till last element of kernel corresponding making one the 1of29 * 1of29 is calculated.
and "slide all" doing this horizontally and vertically till the all 29 * 29 are calculated.
so I understand how a kernel would act but I don't understand how many kernels would be created by the torch.nn.Conv2d(3, 49, 4, bias=True) and which of them would be apllying on R,G,B channels.
I understand how a kernel would act but I don't understand how many
kernels would be created by the nn.Conv2d(3, 49, 4, bias=True) and
which of them would be applying on R, G, and B channels.
Calling nn.Conv2d(3, 49, 4, bias=True) will initialize 49 4x4-kernels, each having a total of three channels and a single bias parameter. That's a total of 49*(4*4*3 + 1) parameters, i.e. 2,401 parameters.
You can check that it is indeed correct with:
>>> conv2d = nn.Conv2d(3, 49, 4, bias=True)
Parameters list will contain the weight tensor shaped (n_filters=49, n_channels=3, kernel_height=4, kernel_width=4), and a bias tensor shaped (49,):
>>> [p.shape for p in conv2d.parameters()]
[torch.Size([49, 3, 4, 4]), torch.Size([49])]
If we get a look at the total number of parameters, we indeed find:
>>> nn.utils.parameters_to_vector(conv2d.parameters()).numel()
2401
Concerning how they are applied: each one of the 49 kernels will be applied 'independently' to the input map. For each filter operation, you are convolving the input of a three-channel tensor, with a three-channel kernel. Each one of those 49 convolutions gets its respective bias added. At the end, you are left with a number of 49 single-channel maps which are concatenated to make up a single 49-channel map. In practice, everything is done in one go using a window view of the input.
I am certainly biased towards my own posts: here you will find another explanation of shapes in convolutional neural networks.

Out of Sample Forecasting using Neural Network in Keras (Python)

I am doing a time series forecasting exercise using the window method but i am struggling to understand how to do the forecast out of sample.
Here is the code:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
dataset = tf.data.Dataset.from_tensor_slices(series)
dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
dataset = dataset.shuffle(shuffle_buffer).map(lambda window: (window[:-1], window[-1]))
dataset = dataset.batch(batch_size).prefetch(1)
return dataset
dataset = windowed_dataset(x_train, window_size, batch_size, shuffle_buffer_size)
The function windowed_dataset split the univariate time series series into a matrix. Imagine, we have a dataset as follows
dataset = tf.data.Dataset.range(10)
for val in dataset:
print(val.numpy())
0
1
2
3
4
5
6
7
8
9
the windowed_dataset function convert series into windows with x features on the left and y labels on the right.
[2 3 4 5] [6]
[4 5 6 7] [8]
[3 4 5 6] [7]
[1 2 3 4] [5]
[5 6 7 8] [9]
[0 1 2 3] [4]
In the next step, we implement the neural network model on the training dataset as follows:
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(10, input_shape=[window_size], activation="relu"),
tf.keras.layers.Dense(10, activation="relu"),
tf.keras.layers.Dense(1)
])
model.compile(loss="mse", optimizer=tf.keras.optimizers.SGD(lr=1e-6, momentum=0.9))
model.fit(dataset,epochs=100,verbose=0)
Up to here, i am fine with the code. However, I am struggling to understand the out of sample forecasting shown below:
forecast = []
for time in range(len(series) - window_size):
forecast.append(model.predict(series[time:time + window_size][np.newaxis]))
forecast = forecast[split_time-window_size:]
Can someone please explain to me why are we using a loop here for time in range(len(series) - window_size) ? why not simply do model.predict(dataset_validation) for the validation part and model.predict(dataset) for the training part ?
I don't understand the need for the for loop because this is not a rolling forecast we are not re-training the model. Can someone please explain to me?
While i understand why the data science community structure the dataset this way, i personally find it a lot clearer when we split the X and y and do the model.fit as follows model.fit(X,y,epochs=100,verbose=0) and the predict as as follows model.predict(X)
The for loop is returning the predictions in order, whereas if you call model.predict(dataset_validation) you'll get the predictions in a shuffled order (assumed you shuffled the dataset).
As for the point of using datasets - it can just help with code organization. There is no need to ever use one if you don't want to.

Pytorch summary only works for one specific input size for U-Net

I am trying to implement the UNet architecture in Pytorch. When I print the model using print(model) I get the correct architecture:
but when I try to print the summary using (or any other input size for that matter):
from torchsummary import summary
summary(model, input_size=(13, 572, 572))
I get an error:
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 70 and 71 in dimension 2 at /Users/distiller/project/conda/conda-bld/pytorch_1579022061893/work/aten/src/TH/generic/THTensor.cpp:612
However, it works perfectly if I give the input_size as input_size=(3, 224, 224))( like it worked for this person here). I am so baffled.
Can someone help me what's wrong?
Edit: I have used the model architecture from here.
This UNet architecture you provided doesn't support that shape (unless the depth parameter is <= 3). Ultimately the reason for this is that the size of a downsampling operation isn't invertible since multiple input shapes map to the same output shape. For example consider
>> torch.nn.functional.max_pool2d(torch.zeros(1, 1, 10, 10), 2).shape
torch.Size([1, 1, 5, 5])
>> torch.nn.functional.max_pool2d(torch.zeros(1, 1, 11, 11), 2).shape
torch.Size([1, 1, 5, 5])
So the question is, given only the output shape is 5x5, what was the shape of the input? Was it 10x10 or 11x11? This same phenomenon applies to downsampling via strided convolutions.
The problem is that the UNet class tries to combine features from the downsampling half to the network to the features in the upsampling half. If it "guesses wrong" about the original shape during upsampling then you will receive a dimension mismatch error.
To avoid this issue you'll need to ensure that the height and width of your input data are multiples of 2**(depth-1). So, for the default depth=5 you need the input image height and width to be a multiple of 16 (e.g. 560 or 576). Alternatively, since 572 is divisible by 4 then you could also set depth=3 to make it work.

Behaviour of train_test_split() from Scikit-learn

I am curious how the train_test_split() method of Scikit-learn will behave in the following scenario:
An imaginary dataset:
id, count, size
1, 4, 8
2, 5, 9
3, 6, 0
say I would divide it into two separate sets like this (keeping 'id' in both):
id, count | id, size
1, 4 | 1, 8
2, 5 | 2, 9
3, 6 | 3, 0
And split them both with train_test_split() with the same random_state of 0. Would the order of both be the same with 'id' as reference? (since you are shuffling the same dataset but with different parts left out)
I am curious as to how this works because I have two models. The first one gets trained with the dataset and adds it's results to the dataset, part of which is then used to train the second model.
When doing this it's important that when testing the generalization of the second model, no data points are used which were also used to train the first model. This is because the data was 'seen before' and the model will know what to do with it, so then you are not testing the generalization to new data anymore.
It would be great if train_test_split() would shuffle it the same since then one would not need to keep track of what data was used to train the first algorithm to prevent contamination of the test results.
They should have the same resulting indices if you use the same random_state parameter in each call.
However--you could also just reverse your order of operations. Call test/train split on the parent dataset, then create two sub-sets from both the test and train sets that result.
Example:
print(df)
id count size
0 1 4 8
1 2 5 9
2 3 6 0
from sklearn.model_selection import train_test_split
dfa = df[['id', 'count']].copy()
dfb = df[['id', 'size']].copy()
rstate = 123
traina, testa = train_test_split(dfa, random_state=123)
trainb, testb = train_test_split(dfb, random_state=123)
assert traina.index.equals(trainb.index)
# True

Resources