I am trying to train CNN on audio data for regression. This is my Network Architecture:
name: "AudioRegression"
layer{
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "c_trainList.txt"
batch_size: 32
shuffle: true
}
include: { phase: TRAIN }
}
layer{
name: "data"
type: "HDF5Data"
top: "data"
top: "label"
hdf5_data_param {
source: "d_testList.txt"
batch_size: 32
}
include: { phase: TEST }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param { lr_mult: 1 }
param { lr_mult: 2 }
convolution_param {
num_output: 32
kernel_h: 1
kernel_w: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
param { lr_mult: 1 }
param { lr_mult: 2 }
convolution_param {
num_output: 64
kernel_h: 1
kernel_w: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
param { lr_mult: 1 }
param { lr_mult: 2 }
convolution_param {
num_output: 128
kernel_h: 1
kernel_w: 3
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "pool3"
type: "Pooling"
bottom: "conv3"
top: "pool3"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer{
name: "fc1"
type: "InnerProduct"
bottom: "pool3"
top: "fc1"
param { lr_mult: 1 decay_mult: 1 }
param { lr_mult: 2 decay_mult: 0 }
inner_product_param {
num_output: 1024
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "dropout1"
type: "Dropout"
bottom: "fc1"
top: "fc1"
dropout_param {
dropout_ratio: 0.5
}
}
layer{
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param { lr_mult: 1 decay_mult: 1 }
param { lr_mult: 2 decay_mult: 0 }
inner_product_param {
num_output: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer{
name: "loss"
type: "EuclideanLoss"
bottom: "fc2"
bottom: "label"
top: "loss"
}
After running for about 6,000 iterations (my total training data was around 3000 and I tested on same data for checking how well the CNN works) , eventually the prediction looks like this: [Red = Actual Label, Blue = Prediction]
I am not sure why my prediction is so bad. I have checked increasing the amount of data and also number of iteration, but the result is similar. I also checked with spectrograms with no luck. My input audio data length is 1721(1D) (which is approx 40ms, 44.1KHz samp rate).
Can you please suggest what I could do to improve my results? Is there something wrong with the architecture or the way I'm taking in the data? Any help is appreciated. TIA.
Here is my training log:
I1209 00:01:47.464259 2189 solver.cpp:47] Solver scaffolding done.
I1209 00:01:47.464296 2189 solver.cpp:363] Solving AudioRegression
I1209 00:01:47.464303 2189 solver.cpp:364] Learning Rate Policy: inv
I1209 00:01:47.499138 2189 solver.cpp:424] Iteration 0, Testing net (#0)
I1209 00:04:07.222735 2189 solver.cpp:481] Test net output #0: loss = 0.0230729 (* 1 = 0.0230729 loss)
I1209 00:04:10.707535 2189 solver.cpp:240] Iteration 0, loss = 0.0259357
I1209 00:04:10.707590 2189 solver.cpp:255] Train net output #0: loss = 0.0259357 (* 1 = 0.0259357 loss)
I1209 00:04:10.707617 2189 solver.cpp:631] Iteration 0, lr = 0.0001
I1209 00:06:51.835078 2189 solver.cpp:240] Iteration 50, loss = 0.0168364
I1209 00:06:51.835170 2189 solver.cpp:255] Train net output #0: loss = 0.0168364 (* 1 = 0.0168364 loss)
I1209 00:06:51.835182 2189 solver.cpp:631] Iteration 50, lr = 9.96266e-05
I1209 00:09:42.754168 2189 solver.cpp:240] Iteration 100, loss = 0.0115463
I1209 00:09:42.754374 2189 solver.cpp:255] Train net output #0: loss = 0.0115463 (* 1 = 0.0115463 loss)
I1209 00:09:42.754402 2189 solver.cpp:631] Iteration 100, lr = 9.92565e-05
I1209 00:13:08.801769 2189 solver.cpp:240] Iteration 150, loss = 0.0113013
I1209 00:16:21.470883 2189 solver.cpp:255] Train net output #0: loss = 0.0136536 (* 1 = 0.0136536 loss)
I1209 00:16:21.470899 2189 solver.cpp:631] Iteration 200, lr = 9.85258e-05
I1209 00:19:31.274965 2189 solver.cpp:240] Iteration 250, loss = 0.014273
I1209 00:19:31.275111 2189 solver.cpp:255] Train net output #0: loss = 0.014273 (* 1 = 0.014273 loss)
I1209 00:19:31.275130 2189 solver.cpp:631] Iteration 250, lr = 9.81651e-05
I1209 00:22:41.283741 2189 solver.cpp:240] Iteration 300, loss = 0.016501
I1209 00:22:41.283836 2189 solver.cpp:255] Train net output #0: loss = 0.016501 (* 1 = 0.016501 loss)
I1209 00:22:41.283850 2189 solver.cpp:631] Iteration 300, lr = 9.78075e-05
I1209 00:25:49.767002 2189 solver.cpp:240] Iteration 350, loss = 0.0229812
I1209 00:25:49.767107 2189 solver.cpp:255] Train net output #0: loss = 0.0229812 (* 1 = 0.0229812 loss)
I1209 00:25:49.767119 2189 solver.cpp:631] Iteration 350, lr = 9.74529e-05
I1209 00:29:11.156260 2189 solver.cpp:240] Iteration 400, loss = 0.0114304
I1209 00:29:11.156347 2189 solver.cpp:255] Train net output #0: loss = 0.0114304 (* 1 = 0.0114304 loss)
I1209 00:29:11.156359 2189 solver.cpp:631] Iteration 400, lr = 9.71013e-05
I1209 00:32:28.614641 2189 solver.cpp:240] Iteration 450, loss = 0.0157431
I1209 00:32:28.614919 2189 solver.cpp:255] Train net output #0: loss = 0.0157431 (* 1 = 0.0157431 loss)
I1209 00:32:28.614943 2189 solver.cpp:631] Iteration 450, lr = 9.67526e-05
I1209 00:35:42.133127 2189 solver.cpp:240] Iteration 500, loss = 0.0166063
I1209 00:35:42.133232 2189 solver.cpp:255] Train net output #0: loss = 0.0166063 (* 1 = 0.0166063 loss)
I1209 00:35:42.133245 2189 solver.cpp:631] Iteration 500, lr = 9.64069e-05
I1209 00:38:54.340562 2189 solver.cpp:240] Iteration 550, loss = 0.0173505
I1209 00:38:54.340664 2189 solver.cpp:255] Train net output #0: loss = 0.0173505 (* 1 = 0.0173505 loss)
I1209 00:48:16.819653 2189 solver.cpp:255] Train net output #0: loss = 0.0121346 (* 1 = 0.0121346 loss)
I1209 00:48:16.819675 2189 solver.cpp:631] Iteration 700, lr = 9.50522e-05
I1209 00:51:06.433248 2189 solver.cpp:240] Iteration 750, loss = 0.0144362
I1209 00:51:06.433331 2189 solver.cpp:255] Train net output #0: loss = 0.0144362 (* 1 = 0.0144362 loss)
I1209 00:51:06.433346 2189 solver.cpp:631] Iteration 750, lr = 9.47204e-05
I1209 00:53:58.998585 2189 solver.cpp:240] Iteration 800, loss = 0.015145
I1209 00:53:58.998919 2189 solver.cpp:255] Train net output #0: loss = 0.015145 (* 1 = 0.015145 loss)
I1209 00:53:58.998991 2189 solver.cpp:631] Iteration 800, lr = 9.43913e-05
I1209 00:56:48.413203 2189 solver.cpp:240] Iteration 850, loss = 0.00943574
I1209 01:07:50.999053 2189 solver.cpp:631] Iteration 1000, lr = 9.31012e-05
I1209 01:10:48.213668 2189 solver.cpp:240] Iteration 1050, loss = 0.00977092
I1209 01:10:48.213749 2189 solver.cpp:255] Train net output #0: loss = 0.00977092 (* 1 = 0.00977092 loss)
I1209 01:10:48.213760 2189 solver.cpp:631] Iteration 1050, lr = 9.27851e-05
I1209 01:13:38.985514 2189 solver.cpp:240] Iteration 1100, loss = 0.0150719
I1209 01:13:38.985610 2189 solver.cpp:255] Train net output #0: loss = 0.0150719 (* 1 = 0.0150719 loss)
I1209 01:13:38.985623 2189 solver.cpp:631] Iteration 1100, lr = 9.24715e-05
I1209 01:16:33.194244 2189 solver.cpp:240] Iteration 1150, loss = 0.0118601
I1209 01:16:33.194334 2189 solver.cpp:255] Train net output #0: loss = 0.0118601 (* 1 = 0.0118601 loss)
I1209 01:16:33.194345 2189 solver.cpp:631] Iteration 1150, lr = 9.21603e-05
I1209 01:19:23.298775 2189 solver.cpp:240] Iteration 1200, loss = 0.0122899
I1209 01:19:23.298854 2189 solver.cpp:255] Train net output #0: loss = 0.0122899 (* 1 = 0.0122899 loss)
I1209 01:19:23.298866 2189 solver.cpp:631] Iteration 1200, lr = 9.18515e-05
I1209 01:22:15.248015 2189 solver.cpp:240] Iteration 1250, loss = 0.00906165
I1209 01:30:49.363504 2189 solver.cpp:255] Train net output #0: loss = 0.012574 (* 1 = 0.012574 loss)
I1209 01:47:09.507314 2189 solver.cpp:240] Iteration 1650, loss = 0.0127013
I1209 01:59:49.763129 2189 solver.cpp:255] Train net output #0: loss = 0.0114391 (* 1 = 0.0114391 loss)
I1209 01:59:49.763147 2189 solver.cpp:631] Iteration 1850, lr = 8.80463e-05
I1209 02:03:10.173984 2189 solver.cpp:240] Iteration 1900, loss = 0.0138886
I1209 02:03:10.174072 2189 solver.cpp:255] Train net output #0: loss = 0.0138886 (* 1 = 0.0138886 loss)
I1209 02:03:10.174087 2189 solver.cpp:631] Iteration 1900, lr = 8.77687e-05
I1209 02:06:20.400573 2189 solver.cpp:240] Iteration 1950, loss = 0.0137628
I1209 02:06:20.400668 2189 solver.cpp:631] Iteration 1950, lr = 8.74932e-05
I1209 02:09:27.890867 2189 solver.cpp:502] Snapshotting to outputModel_iter_2000.caffemodel
I1209 02:09:29.180711 2189 solver.cpp:510] Snapshotting solver state to outputModel_iter_2000.solverstate
I1209 02:09:30.414371 2189 solver.cpp:424] Iteration 2000, Testing net (#0)
I1209 02:12:13.271689 2189 solver.cpp:481] Test net output #0: loss = 0.01289 (* 1 = 0.01289 loss)
I1209 02:12:16.724727 2189 solver.cpp:240] Iteration 2000, loss = 0.0207793
I1209 02:12:16.724778 2189 solver.cpp:255] Train net output #0: loss = 0.0207793 (* 1 = 0.0207793 loss)
I1209 02:12:16.724791 2189 solver.cpp:631] Iteration 2000, lr = 8.72196e-05
I1209 02:15:27.976011 2189 solver.cpp:240] Iteration 2050, loss = 0.0118718
I1209 02:15:27.976099 2189 solver.cpp:255] Train net output #0: loss = 0.0118718 (* 1 = 0.0118718 loss)
I1209 02:15:27.976111 2189 solver.cpp:631] Iteration 2050, lr = 8.6948e-05
I1209 02:18:45.454749 2189 solver.cpp:240] Iteration 2100, loss = 0.00938784
I1209 02:18:45.454834 2189 solver.cpp:255] Train net output #0: loss = 0.00938784 (* 1 = 0.00938784 loss)
I1209 02:18:45.454852 2189 solver.cpp:631] Iteration 2100, lr = 8.66784e-05
I1209 02:22:03.167448 2189 solver.cpp:240] Iteration 2150, loss = 0.0127745
I1209 02:22:03.167531 2189 solver.cpp:255] Train net output #0: loss = 0.0127745 (* 1 = 0.0127745 loss)
I1209 02:22:03.167546 2189 solver.cpp:631] Iteration 2150, lr = 8.64107e-05
I1209 02:25:14.789314 2189 solver.cpp:240] Iteration 2200, loss = 0.00956969
I1209 02:25:14.789397 2189 solver.cpp:255] Train net output #0: loss = 0.00956969 (* 1 = 0.00956969 loss)
I1209 02:25:14.789410 2189 solver.cpp:631] Iteration 2200, lr = 8.6145e-05
I1209 02:28:25.455724 2189 solver.cpp:240] Iteration 2250, loss = 0.0143162
I1209 02:28:25.455807 2189 solver.cpp:255] Train net output #0: loss = 0.0143162 (* 1 = 0.0143162 loss)
I1209 02:28:25.455819 2189 solver.cpp:631] Iteration 2250, lr = 8.58812e-05
I1209 02:31:35.905241 2189 solver.cpp:240] Iteration 2300, loss = 0.00927413
I1209 02:31:35.905323 2189 solver.cpp:255] Train net output #0: loss = 0.00927413 (* 1 = 0.00927413 loss)
I1209 02:31:35.905335 2189 solver.cpp:631] Iteration 2300, lr = 8.56192e-05
I1209 02:34:56.982108 2189 solver.cpp:240] Iteration 2350, loss = 0.0176598
I1209 02:34:56.982213 2189 solver.cpp:255] Train net output #0: loss = 0.0176598 (* 1 = 0.0176598 loss)
I1209 02:34:56.982231 2189 solver.cpp:631] Iteration 2350, lr = 8.53591e-05
I1209 02:38:06.427983 2189 solver.cpp:240] Iteration 2400, loss = 0.0125867
I1209 02:38:06.428068 2189 solver.cpp:255] Train net output #0: loss = 0.0125867 (* 1 = 0.0125867 loss)
I1209 02:38:06.428081 2189 solver.cpp:631] Iteration 2400, lr = 8.51008e-05
I1209 02:41:16.814347 2189 solver.cpp:240] Iteration 2450, loss = 0.01254
I1209 02:41:16.814467 2189 solver.cpp:255] Train net output #0: loss = 0.01254 (* 1 = 0.01254 loss)
I1209 02:41:16.814481 2189 solver.cpp:631] Iteration 2450, lr = 8.48444e-05
I1209 02:44:24.596612 2189 solver.cpp:240] Iteration 2500, loss = 0.0128213
I1209 02:44:24.596699 2189 solver.cpp:255] Train net output #0: loss = 0.0128213 (* 1 = 0.0128213 loss)
I1209 02:44:24.596711 2189 solver.cpp:631] Iteration 2500, lr = 8.45897e-05
I1209 02:47:41.383929 2189 solver.cpp:240] Iteration 2550, loss = 0.0138613
I1209 02:47:41.384017 2189 solver.cpp:255] Train net output #0: loss = 0.0138613 (* 1 = 0.0138613 loss)
I1209 02:47:41.384029 2189 solver.cpp:631] Iteration 2550, lr = 8.43368e-05
I1209 02:51:01.780666 2189 solver.cpp:240] Iteration 2600, loss = 0.0135385
I1209 02:51:01.780769 2189 solver.cpp:255] Train net output #0: loss = 0.0135385 (* 1 = 0.0135385 loss)
I1209 02:51:01.780786 2189 solver.cpp:631] Iteration 2600, lr = 8.40857e-05
I1209 02:54:09.573009 2189 solver.cpp:240] Iteration 2650, loss = 0.0128414
I1209 02:54:09.573094 2189 solver.cpp:255] Train net output #0: loss = 0.0128414 (* 1 = 0.0128414 loss)
I1209 02:54:09.573107 2189 solver.cpp:631] Iteration 2650, lr = 8.38363e-05
I1209 02:57:19.939091 2189 solver.cpp:240] Iteration 2700, loss = 0.0132472
I1209 02:57:19.939178 2189 solver.cpp:255] Train net output #0: loss = 0.0132472 (* 1 = 0.0132472 loss)
I1209 02:57:19.939191 2189 solver.cpp:631] Iteration 2700, lr = 8.35886e-05
I1209 03:00:40.396674 2189 solver.cpp:240] Iteration 2750, loss = 0.0121989
I1209 03:00:40.396757 2189 solver.cpp:255] Train net output #0: loss = 0.0121989 (* 1 = 0.0121989 loss)
I1209 03:00:40.396770 2189 solver.cpp:631] Iteration 2750, lr = 8.33427e-05
I1209 03:03:49.423348 2189 solver.cpp:240] Iteration 2800, loss = 0.0132731
I1209 03:03:49.423435 2189 solver.cpp:255] Train net output #0: loss = 0.0132731 (* 1 = 0.0132731 loss)
I1209 03:03:49.423449 2189 solver.cpp:631] Iteration 2800, lr = 8.30984e-05
I1209 03:06:59.472712 2189 solver.cpp:240] Iteration 2850, loss = 0.00996264
I1209 03:06:59.472822 2189 solver.cpp:255] Train net output #0: loss = 0.00996264 (* 1 = 0.00996264 loss)
I1209 03:06:59.472837 2189 solver.cpp:631] Iteration 2850, lr = 8.28558e-05
I1209 03:10:11.587728 2189 solver.cpp:240] Iteration 2900, loss = 0.00787597
I1209 03:10:11.587813 2189 solver.cpp:255] Train net output #0: loss = 0.00787597 (* 1 = 0.00787597 loss)
I1209 03:10:11.587826 2189 solver.cpp:631] Iteration 2900, lr = 8.26148e-05
I1209 03:13:26.526146 2189 solver.cpp:240] Iteration 2950, loss = 0.0113055
I1209 03:13:26.526232 2189 solver.cpp:255] Train net output #0: loss = 0.0113055 (* 1 = 0.0113055 loss)
I1209 03:13:26.526244 2189 solver.cpp:631] Iteration 2950, lr = 8.23754e-05
I1209 03:16:33.236115 2189 solver.cpp:424] Iteration 3000, Testing net (#0)
I1209 03:19:24.886912 2189 solver.cpp:481] Test net output #0: loss = 0.0127196 (* 1 = 0.0127196 loss)
I1209 03:19:28.444213 2189 solver.cpp:240] Iteration 3000, loss = 0.0217011
I1209 03:26:00.578351 2189 solver.cpp:631] Iteration 3100, lr = 8.1667e-05
I1209 03:29:11.128749 2189 solver.cpp:240] Iteration 3150, loss = 0.0186126
I1209 03:29:11.128832 2189 solver.cpp:255] Train net output #0: loss = 0.0186126 (* 1 = 0.0186126 loss)
I1209 03:29:11.128844 2189 solver.cpp:631] Iteration 3150, lr = 8.1434e-05
I1209 03:32:21.431370 2189 solver.cpp:240] Iteration 3200, loss = 0.013207
I1209 03:32:21.431488 2189 solver.cpp:255] Train net output #0: loss = 0.013207 (* 1 = 0.013207 loss)
I1209 03:38:42.957159 2189 solver.cpp:240] Iteration 3300, loss = 0.0143691
I1209 03:38:42.957262 2189 solver.cpp:255] Train net output #0: loss = 0.0143691 (* 1 = 0.0143691 loss)
I1209 03:38:42.957275 2189 solver.cpp:631] Iteration 3300, lr = 8.07442e-05
I1209 03:41:56.213155 2189 solver.cpp:240] Iteration 3350, loss = 0.0111833
I1209 03:41:56.213243 2189 solver.cpp:255] Train net output #0: loss = 0.0111833 (* 1 = 0.0111833 loss)
I1209 03:41:56.213255 2189 solver.cpp:631] Iteration 3350, lr = 8.05173e-05
I1209 03:45:03.395630 2189 solver.cpp:240] Iteration 3400, loss = 0.0128313
I1209 03:45:03.396034 2189 solver.cpp:255] Train net output #0: loss = 0.0128313 (* 1 = 0.0128313 loss)
I1209 03:45:03.396091 2189 solver.cpp:631] Iteration 3400, lr = 8.02918e-05
I1209 03:48:14.938372 2189 solver.cpp:240] Iteration 3450, loss = 0.0114032
I1209 03:48:14.938454 2189 solver.cpp:255] Train net output #0: loss = 0.0114032 (* 1 = 0.0114032 loss)
I1209 03:48:14.938467 2189 solver.cpp:631] Iteration 3450, lr = 8.00679e-05
I1209 03:51:24.561987 2189 solver.cpp:240] Iteration 3500, loss = 0.0146534
I1209 03:51:24.562098 2189 solver.cpp:255] Train net output #0: loss = 0.0146534 (* 1 = 0.0146534 loss)
I1209 03:51:24.562114 2189 solver.cpp:631] Iteration 3500, lr = 7.98454e-05
I1209 03:54:35.916950 2189 solver.cpp:240] Iteration 3550, loss = 0.0107792
I1209 03:54:35.917058 2189 solver.cpp:255] Train net output #0: loss = 0.0107792 (* 1 = 0.0107792 loss)
I1209 03:54:35.917073 2189 solver.cpp:631] Iteration 3550, lr = 7.96243e-05
I1209 03:57:44.691856 2189 solver.cpp:240] Iteration 3600, loss = 0.0131089
I1209 03:57:44.691939 2189 solver.cpp:255] Train net output #0: loss = 0.0131089 (* 1 = 0.0131089 loss)
I1209 03:57:44.691951 2189 solver.cpp:631] Iteration 3600, lr = 7.94046e-05
I1209 04:00:50.050279 2189 solver.cpp:240] Iteration 3650, loss = 0.00767588
I1209 04:00:50.050366 2189 solver.cpp:255] Train net output #0: loss = 0.00767588 (* 1 = 0.00767588 loss)
I1209 04:00:50.050379 2189 solver.cpp:631] Iteration 3650, lr = 7.91864e-05
I1209 04:03:36.047211 2189 solver.cpp:240] Iteration 3700, loss = 0.0110371
I1209 04:03:36.047299 2189 solver.cpp:255] Train net output #0: loss = 0.0110371 (* 1 = 0.0110371 loss)
I1209 04:03:36.047312 2189 solver.cpp:631] Iteration 3700, lr = 7.89695e-05
I1209 04:06:27.261343 2189 solver.cpp:240] Iteration 3750, loss = 0.0176664
I1209 04:06:27.261425 2189 solver.cpp:255] Train net output #0: loss = 0.0176664 (* 1 = 0.0176664 loss)
I1209 04:06:27.261438 2189 solver.cpp:631] Iteration 3750, lr = 7.87541e-05
I1209 04:09:20.277091 2189 solver.cpp:240] Iteration 3800, loss = 0.0133606
I1209 04:09:20.277186 2189 solver.cpp:255] Train net output #0: loss = 0.0133606 (* 1 = 0.0133606 loss)
I1209 04:09:20.277199 2189 solver.cpp:631] Iteration 3800, lr = 7.854e-05
I1209 04:12:11.541723 2189 solver.cpp:240] Iteration 3850, loss = 0.0144475
I1209 04:12:11.541936 2189 solver.cpp:255] Train net output #0: loss = 0.0144475 (* 1 = 0.0144475 loss)
I1209 04:12:11.541952 2189 solver.cpp:631] Iteration 3850, lr = 7.83272e-05
I1209 04:15:03.661646 2189 solver.cpp:240] Iteration 3900, loss = 0.0142342
I1209 04:15:03.661834 2189 solver.cpp:255] Train net output #0: loss = 0.0142342 (* 1 = 0.0142342 loss)
I1209 04:15:03.661852 2189 solver.cpp:631] Iteration 3900, lr = 7.81158e-05
I1209 04:17:53.149144 2189 solver.cpp:240] Iteration 3950, loss = 0.0121178
I1209 04:17:53.149250 2189 solver.cpp:255] Train net output #0: loss = 0.0121178 (* 1 = 0.0121178 loss)
I1209 04:17:53.149263 2189 solver.cpp:631] Iteration 3950, lr = 7.79057e-05
I1209 04:20:40.580508 2189 solver.cpp:502] Snapshotting to outputModel_iter_4000.caffemodel
I1209 04:20:41.779126 2189 solver.cpp:510] Snapshotting solver state to outputModel_iter_4000.solverstate
I1209 04:20:42.885267 2189 solver.cpp:424] Iteration 4000, Testing net (#0)
I1209 04:23:32.727929 2189 solver.cpp:481] Test net output #0: loss = 0.0126429 (* 1 = 0.0126429 loss)
I1209 04:23:36.443990 2189 solver.cpp:240] Iteration 4000, loss = 0.0137481
I1209 04:23:36.444049 2189 solver.cpp:255] Train net output #0: loss = 0.0137481 (* 1 = 0.0137481 loss)
I1209 04:23:36.444067 2189 solver.cpp:631] Iteration 4000, lr = 7.76969e-05
I1209 04:26:47.281289 2189 solver.cpp:240] Iteration 4050, loss = 0.0153349
I1209 04:26:47.281397 2189 solver.cpp:255] Train net output #0: loss = 0.0153349 (* 1 = 0.0153349 loss)
I1209 04:26:47.281410 2189 solver.cpp:631] Iteration 4050, lr = 7.74895e-05
I1209 04:30:02.951990 2189 solver.cpp:240] Iteration 4100, loss = 0.0117784
I1209 04:30:02.952076 2189 solver.cpp:255] Train net output #0: loss = 0.0117784 (* 1 = 0.0117784 loss)
I1209 04:30:02.952090 2189 solver.cpp:631] Iteration 4100, lr = 7.72833e-05
I1209 04:33:12.287106 2189 solver.cpp:240] Iteration 4150, loss = 0.0150684
I1209 04:33:12.287194 2189 solver.cpp:255] Train net output #0: loss = 0.0150684 (* 1 = 0.0150684 loss)
I1209 04:33:12.287207 2189 solver.cpp:631] Iteration 4150, lr = 7.70784e-05
I1209 04:36:25.405246 2189 solver.cpp:240] Iteration 4200, loss = 0.0121092
I1209 05:08:37.213099 2189 solver.cpp:240] Iteration 4700, loss = 0.00963576
I1209 05:08:37.213321 2189 solver.cpp:255] Train net output #0: loss = 0.00963576 (* 1 = 0.00963576 loss)
I1209 05:08:37.213338 2189 solver.cpp:631] Iteration 4700, lr = 7.49052e-05
I1209 05:11:51.183197 2189 solver.cpp:240] Iteration 4750, loss = 0.0177338
I1209 05:11:51.183284 2189 solver.cpp:255] Train net output #0: loss = 0.0177338 (* 1 = 0.0177338 loss)
I1209 05:11:51.183296 2189 solver.cpp:631] Iteration 4750, lr = 7.47147e-05
I1209 05:15:12.716320 2189 solver.cpp:240] Iteration 4800, loss = 0.0155627
I1209 05:15:12.716401 2189 solver.cpp:255] Train net output #0: loss = 0.0155627 (* 1 = 0.0155627 loss)
I1209 05:15:12.716413 2189 solver.cpp:631] Iteration 4800, lr = 7.45253e-05
I1209 05:18:36.245213 2189 solver.cpp:240] Iteration 4850, loss = 0.0110798
I1209 05:18:36.245301 2189 solver.cpp:255] Train net output #0: loss = 0.0110798 (* 1 = 0.0110798 loss)
I1209 05:18:36.245314 2189 solver.cpp:631] Iteration 4850, lr = 7.4337e-05
I1209 05:21:53.006510 2189 solver.cpp:240] Iteration 4900, loss = 0.0149542
I1209 05:21:53.006613 2189 solver.cpp:255] Train net output #0: loss = 0.0149542 (* 1 = 0.0149542 loss)
I1209 05:21:53.006630 2189 solver.cpp:631] Iteration 4900, lr = 7.41499e-05
I1209 05:25:05.802839 2189 solver.cpp:240] Iteration 4950, loss = 0.0152574
I1209 05:25:05.802940 2189 solver.cpp:255] Train net output #0: loss = 0.0152574 (* 1 = 0.0152574 loss)
I1209 05:25:05.802953 2189 solver.cpp:631] Iteration 4950, lr = 7.39638e-05
I1209 05:28:12.186930 2189 solver.cpp:424] Iteration 5000, Testing net (#0)
I1209 05:30:57.153729 2189 solver.cpp:481] Test net output #0: loss = 0.0127499 (* 1 = 0.0127499 loss)
I1209 05:31:00.795545 2189 solver.cpp:240] Iteration 5000, loss = 0.0118315
I1209 05:31:00.795598 2189 solver.cpp:255] Train net output #0: loss = 0.0118315 (* 1 = 0.0118315 loss)
I1209 05:31:00.795609 2189 solver.cpp:631] Iteration 5000, lr = 7.37788e-05
I1209 05:34:18.038789 2189 solver.cpp:240] Iteration 5050, loss = 0.0147696
I1209 05:34:18.038875 2189 solver.cpp:255] Train net output #0: loss = 0.0147696 (* 1 = 0.0147696 loss)
I1209 05:34:18.038887 2189 solver.cpp:631] Iteration 5050, lr = 7.35949e-05
I1209 05:37:39.033630 2189 solver.cpp:240] Iteration 5100, loss = 0.0114872
I1209 05:37:39.033713 2189 solver.cpp:255] Train net output #0: loss = 0.0114872 (* 1 = 0.0114872 loss)
I1209 05:37:39.033725 2189 solver.cpp:631] Iteration 5100, lr = 7.3412e-05
I1209 05:40:54.287984 2189 solver.cpp:240] Iteration 5150, loss = 0.0112272
I1209 05:40:54.288146 2189 solver.cpp:255] Train net output #0: loss = 0.0112272 (* 1 = 0.0112272 loss)
I1209 05:40:54.288163 2189 solver.cpp:631] Iteration 5150, lr = 7.32303e-05
I1209 05:44:05.552100 2189 solver.cpp:240] Iteration 5200, loss = 0.0123504
I1209 05:44:05.552181 2189 solver.cpp:255] Train net output #0: loss = 0.0123504 (* 1 = 0.0123504 loss)
I1209 05:44:05.552193 2189 solver.cpp:631] Iteration 5200, lr = 7.30495e-05
I1209 05:47:18.410975 2189 solver.cpp:240] Iteration 5250, loss = 0.0172216
I1209 05:47:18.411062 2189 solver.cpp:255] Train net output #0: loss = 0.0172216 (* 1 = 0.0172216 loss)
I1209 05:47:18.411074 2189 solver.cpp:631] Iteration 5250, lr = 7.28698e-05
I1209 05:50:33.027614 2189 solver.cpp:240] Iteration 5300, loss = 0.0126125
I1209 05:50:33.027719 2189 solver.cpp:255] Train net output #0: loss = 0.0126125 (* 1 = 0.0126125 loss)
I1209 05:50:33.027731 2189 solver.cpp:631] Iteration 5300, lr = 7.26911e-05
I1209 05:53:44.704063 2189 solver.cpp:240] Iteration 5350, loss = 0.0117126
I1209 05:53:44.704151 2189 solver.cpp:255] Train net output #0: loss = 0.0117126 (* 1 = 0.0117126 loss)
I1209 05:53:44.704164 2189 solver.cpp:631] Iteration 5350, lr = 7.25135e-05
I1209 05:56:53.016206 2189 solver.cpp:240] Iteration 5400, loss = 0.00555667
I1209 05:56:53.016294 2189 solver.cpp:255] Train net output #0: loss = 0.00555667 (* 1 = 0.00555667 loss)
I1209 05:56:53.016306 2189 solver.cpp:631] Iteration 5400, lr = 7.23368e-05
I1209 06:00:04.420944 2189 solver.cpp:240] Iteration 5450, loss = 0.0111433
I1209 06:00:04.421066 2189 solver.cpp:255] Train net output #0: loss = 0.0111433 (* 1 = 0.0111433 loss)
I1209 06:00:04.421078 2189 solver.cpp:631] Iteration 5450, lr = 7.21612e-05
I1209 06:03:28.763087 2189 solver.cpp:240] Iteration 5500, loss = 0.00962537
I1209 06:03:28.763197 2189 solver.cpp:255] Train net output #0: loss = 0.00962537 (* 1 = 0.00962537 loss)
I1209 06:03:28.763214 2189 solver.cpp:631] Iteration 5500, lr = 7.19865e-05
I1209 06:06:39.725003 2189 solver.cpp:240] Iteration 5550, loss = 0.0111173
I1209 06:06:39.725289 2189 solver.cpp:255] Train net output #0: loss = 0.0111173 (* 1 = 0.0111173 loss)
I1209 06:06:39.725307 2189 solver.cpp:631] Iteration 5550, lr = 7.18129e-05
I1209 06:09:52.560111 2189 solver.cpp:240] Iteration 5600, loss = 0.0151178
I1209 06:09:52.560214 2189 solver.cpp:255] Train net output #0: loss = 0.0151178 (* 1 = 0.0151178 loss)
I1209 06:09:52.560227 2189 solver.cpp:631] Iteration 5600, lr = 7.16402e-05
I1209 06:13:05.462718 2189 solver.cpp:240] Iteration 5650, loss = 0.0125453
I1209 06:13:05.462803 2189 solver.cpp:255] Train net output #0: loss = 0.0125453 (* 1 = 0.0125453 loss)
I1209 06:13:05.462815 2189 solver.cpp:631] Iteration 5650, lr = 7.14684e-05
I1209 06:16:16.187785 2189 solver.cpp:240] Iteration 5700, loss = 0.00663404
I1209 06:16:16.187870 2189 solver.cpp:255] Train net output #0: loss = 0.00663404 (* 1 = 0.00663404 loss)
I1209 06:16:16.187880 2189 solver.cpp:631] Iteration 5700, lr = 7.12977e-05
I1209 06:19:29.129866 2189 solver.cpp:240] Iteration 5750, loss = 0.0136638
I1209 06:19:29.129968 2189 solver.cpp:255] Train net output #0: loss = 0.0136638 (* 1 = 0.0136638 loss)
I1209 06:19:29.129981 2189 solver.cpp:631] Iteration 5750, lr = 7.11278e-05
I1209 06:22:42.243834 2189 solver.cpp:240] Iteration 5800, loss = 0.0119521
I1209 06:29:11.570849 2189 solver.cpp:255] Train net output #0: loss = 0.0123193 (* 1 = 0.0123193 loss)
I1209 06:29:11.570874 2189 solver.cpp:631] Iteration 5900, lr = 7.0624e-05
I1209 06:32:26.092237 2189 solver.cpp:240] Iteration 5950, loss = 0.0102198
I1209 06:32:26.092344 2189 solver.cpp:255] Train net output #0: loss = 0.0102198 (* 1 = 0.0102198 loss)
I1209 06:32:26.092360 2189 solver.cpp:631] Iteration 5950, lr = 7.04579e-05
I1209 06:35:40.060386 2189 solver.cpp:502] Snapshotting to outputModel_iter_6000.caffemodel
I1209 06:35:41.191196 2189 solver.cpp:510] Snapshotting solver state to outputModel_iter_6000.solverstate
I1209 06:35:42.254009 2189 solver.cpp:424] Iteration 6000, Testing net (#0)
I1209 06:38:31.810992 2189 solver.cpp:481] Test net output #0: loss = 0.0125662 (* 1 = 0.0125662 loss)
I1209 06:38:35.926373 2189 solver.cpp:240] Iteration 6000, loss = 0.0130608
I1209 06:38:35.926427 2189 solver.cpp:255] Train net output #0: loss = 0.0130608 (* 1 = 0.0130608 loss)
I1209 06:38:35.926440 2189 solver.cpp:631] Iteration 6000, lr = 7.02927e-05
I don't see any appreciable reduction in your loss which means your network is not really learning. As you are testing on the training data itself and still getting low accuracy, it implies your network is underfitting. There are a couple of things I can suggest, though. First, try running for more number of iterations and see if the loss decreases. And second, you may experiment with your learning rate as well. Try keeping it higher in the beginning.
Too less information to answer your question. How much data you have? What is the task, actually? What do you predict from audio?
It can be any of dozens of reasons, why everything is going wrong. But there are two things, that look strange. First, you use raw audio data, while I would strongly recommend using spectral features. Second, this part in your network definition looks suspicious:
top: "fc2"
param { lr_mult: 1 decay_mult: 1 }
param { lr_mult: 2 decay_mult: 0 }
inner_product_param {
I usually use named parameters:
top: "conv2"
param {
name: "conv2_w"
lr_mult: 1
decay_mult: 1
}
param {
name: "conv2_b"
lr_mult: 2
decay_mult: 0
}
convolution_param {
But even if you do everything correctly with net definition and data preparation, it does not mean that you chose correct method to solve your task and that your task makes sense at all.
Related
I'm looking to re-implement in Pytorch the following WGAN-GP model,
taken by this paper.
The original implementation was in tensorflow. Apart from minor issues which require me to modify subtle details, since torch seems not supporting padding='same' for strided convolutions, my implementation is the following:
class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()
self.disc = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size = 3, stride = (1, 1),padding='same'),
self._block(in_channels=32, out_channels=32, kernel_size=3, stride=(2,1), padding=(1,1)),
self._block(in_channels=32, out_channels=64, kernel_size = 3, stride = (1, 1),padding='same'),
self._block(in_channels=64, out_channels=64, kernel_size = 3, stride = (2, 1),padding=(1,1)),
self._block(in_channels=64, out_channels=128, kernel_size = 3, stride = (1, 1),padding='same'),
self._block(in_channels=128, out_channels=128, kernel_size = 3, stride = (2, 1),padding=(1,1)),
self._block(in_channels=128, out_channels=256, kernel_size=5, stride=(2,2),padding=(2,2))
)
self.lin = nn.Linear(256*6*4,1)
#unifies Conv2d leakyrelu and batchnorm
def _block(self, in_channels,
out_channels,
kernel_size, stride, padding):
return nn.Sequential(nn.Conv2d(in_channels,
out_channels,
kernel_size,
stride,
padding,
bias=False),
nn.BatchNorm2d(out_channels),
nn.LeakyReLU(0.2)) #bias false as we use batchnorm
def forward(self, x):
x = self.disc(x)
x = x.view(-1,256*6*4)
return self.lin(x)
class Generator(nn.Module):
def __init__(self, z_dim):
super(Generator, self).__init__()
self.z_dim = z_dim
self.lin1 = nn.Linear(z_dim, 6*4*256)
self.gen = nn.Sequential(
self._block(in_channels=256, out_channels=128, kernel_size=(5,4),stride=(2,2),padding=(2,1)),
self._block(in_channels=128, out_channels=128, kernel_size=(4,3), stride=(2,1),padding=(1,1)),
self._block(in_channels=128, out_channels=64, kernel_size=(3,3), stride=(1,1), padding=(1,1)),
self._block(in_channels=64, out_channels=64, kernel_size=(3,3), stride=(1,1), padding=(1,1)),
self._block(in_channels=64, out_channels=64, kernel_size=(3,2), stride=(2,2), padding=(1,4)),
self._block(in_channels=64, out_channels=32, kernel_size=(3,3), stride=(1,1), padding=(1,1)),
self._block(in_channels=32, out_channels=32, kernel_size=3, stride=(2,1),padding=(1,1)),
self._block(in_channels=32, out_channels=1, kernel_size=3, stride=(1,1),padding=(1,1)),
nn.Sigmoid()
)
def _block(self, in_channels, out_channels,kernel_size, stride,padding):
return nn.Sequential(
nn.ConvTranspose2d(in_channels,
out_channels,
kernel_size,
stride,
padding,
bias=False),
nn.BatchNorm2d(out_channels),
nn.ReLU(), #they use relu in the generator
)
def forward(self, x):
x = x.view(-1, 128)
x = self.lin1(x)
x = x.view(-1,256,6,4)
return self.gen(x)
The inputs (real/fake/) have shape (batch_size, 1, 85, 8) and consist of very sparse one-hot matrices.
Now, with the above models, during the first training batches I have very bad errors for both loss G and loss D
Epoch [0/5] Batch 0/84 Loss D: -34.0230, loss G: 132.8942
Epoch [0/5] Batch 1/84 Loss D: -3080.0264, loss G: 601.3990
Epoch [0/5] Batch 2/84 Loss D: -216907.8125, loss G: 872.5948
Epoch [0/5] Batch 3/84 Loss D: -26314.8633, loss G: 4973.5327
Epoch [0/5] Batch 4/84 Loss D: -1000911.5000, loss G: 6153.7974
Epoch [0/5] Batch 5/84 Loss D: -14484664.0000, loss G: -5013.7808
Epoch [0/5] Batch 6/84 Loss D: -5119665.0000, loss G: -7194.0640
Epoch [0/5] Batch 7/84 Loss D: -25285320.0000, loss G: 20130.0801
Epoch [0/5] Batch 8/84 Loss D: -11411679.0000, loss G: 32655.1016
Epoch [0/5] Batch 9/84 Loss D: -18403266.0000, loss G: 37912.0469
Epoch [0/5] Batch 10/84 Loss D: -6191229.0000, loss G: 33614.3828
Epoch [0/5] Batch 11/84 Loss D: -8119311.0000, loss G: 28472.3496
Epoch [0/5] Batch 12/84 Loss D: -134419216.0000, loss G: 18065.1074
Epoch [0/5] Batch 13/84 Loss D: -123661928.0000, loss G: 71028.8984
Epoch [0/5] Batch 14/84 Loss D: -2723217.0000, loss G: 47931.0195
Epoch [0/5] Batch 15/84 Loss D: -806806.1250, loss G: 41759.3555
Even though these are just the first batches of the first epoch, the losses seem too large to me and I suspect something's wrong with my implementation. Or can be normal to obtain such numbers for the WGAN losses at first batches? I'm asking cause I have no huge experience with such architectures.
If the models look OK, should I upload my training loop for further discussion?
I'm trying to optimize the hyperparameters of my LSTM network and using BayesSearch for tuning the parameters and I got this error: 'activation is not a legal parameter' and this is part of the error:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_5664/3189938758.py in
def create_model1(neurons=10, activation='relu', recurrent_activation='relu',
kernel_initializer='uniform',
recurrent_initializer='Orthogonal', weight_constraint=0,
dropout_rate=0.0, recurrent_dropout=0.0,
learning_rate=0.001, rho=0.9, momentum=0.0):
model = Sequential()
model.add(LSTM(neurons, input_dim=(train_X.shape[1], train_X.shape[2]),
activation=activation,
recurrent_activation=recurrent_activation,
kernel_initializer=kernel_initializer,
recurrent_initializer=recurrent_initializer,
kernel_constraint=max_norm(weight_constraint),
recurrent_dropout=recurrent_dropout))
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation='sigmoid'))
optimizer = RMSprop(learning_rate=learning_rate, rho=rho, momentum=momentum,
epsilon=1e-07, centered=False)
model.compile(loss='mse', optimizer=optimizer, metrics=['accuracy'])
return model
seed = 7
np.random.seed(seed)
regressor1 = KerasRegressor(build_fn=create_model1, epochs=5, batch_size=400, verbose=0)
neurons = [5, 10]
batch_size = [ 400, 800]
epochs = [5, 10]
learning_rate = [0.001, 0.01]
rho = [0.01, 0.1]
momentum = [0.01, 0.1]
kernel_initializer = ['Orthogonal', 'uniform', 'lecun_uniform', 'normal', 'zero',
'glorot_normal', 'glorot_uniform',
'he_normal', 'he_uniform']
recurrent_initializer = ['Orthogonal', 'uniform', 'lecun_uniform', 'normal', 'zero',
'glorot_normal', 'glorot_uniform',
'he_normal','he_uniform']
activation= ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid',
'hard_sigmoid', 'linear']
recurrent_activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid',
'hard_sigmoid', 'linear']
weight_constraint = [1, 2]
dropout_rate = [0.0, 0.1]
recurrent_dropout = [0.0, 0.1]
params = dict(neurons=neurons, batch_size=batch_size,
epochs=epochs,learning_rate=learning_rate, rho=rho, momentum=momentum,
kernel_initializer=kernel_initializer,
recurrent_initializer=recurrent_initializer, activation=activation,
recurrent_activation=recurrent_activation, dropout_rate=dropout_rate,
recurrent_dropout=recurrent_dropout,
weight_constraint=weight_constraint)
Bayes = BayesSearchCV(estimator=regressor, search_spaces=params, scoring='r2',
n_jobs=-1, cv=5)
Bayes_result = Bayes.fit(train_X, train_y.ravel())
MODEL CNN
create a list of the target columns
target_cols = [y_toxic,y_severe_toxic,y_obscene,y_threat,y_insult,y_identity_hate]
preds = []
for col in target_cols:
print('\n')
# set the value of y
y = col
# create a stratified split
X_train, X_eval, y_train ,y_eval = train_test_split(X, y,test_size=0.25,shuffle=True,
random_state=5,stratify=y)
# cnn model
model = Sequential()
e = Embedding(189722, 100, weights=[embedding_matrix],
input_length=500, trainable=False)
model.add(e)
model.add(Conv1D(128, 3, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Dropout(0.2))
model.add(Conv1D(64, 3, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Dropout(0.2))
model.add(Conv1D(64, 3, activation='relu'))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
# compile the model
Adam_opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=Adam_opt, loss='binary_crossentropy', metrics=['acc'])
early_stopping = EarlyStopping(monitor='val_loss', patience=5, mode='min')
save_best = ModelCheckpoint('toxic.hdf', save_best_only=True,
monitor='val_loss', mode='min')
history = model.fit(X_train, y_train, validation_data=(X_eval, y_eval),
epochs=100, verbose=1,callbacks=[early_stopping,save_best])
# make a prediction on y (target column)
model.load_weights(filepath = 'toxic.hdf')
predictions = model.predict(X_test)
y_preds = predictions[:,0]
# append the prediction to a python list
preds.append(y_preds)
let me know this why it getting alike.... I tried ot reshape it still the same error
I need to implement CNN for multi-class classification on Tabular Dataset.
My data has X_train.shape = (1534185, 81, 1) and Y_train = (1534185, 11)
Here is a sample from my dataset
DataSetImage
I tried to normalize the data, but values are too big to be added and stored in float64.
The CNN Model I implemented is below
batchSize = X_train.shape[0]
length = X_train.shape[1]
channel = X_train.shape[2]
n_outputs = y_train.shape[1]
#Initialising the CNN
model = Sequential()
#1.Multiple convolution and max pooling
model.add(Convolution1D(filters=64, kernel_size=3, activation="relu", input_shape=(length, channel)))
model.add(MaxPooling1D(strides=4))
model.add(Dropout(0.1))
model.add(BatchNormalization())
model.add(Convolution1D(filters= 32, kernel_size=3, activation='relu'))
model.add(MaxPooling1D(strides=4))
model.add(Dropout(0.1))
model.add(BatchNormalization())
model.add(Convolution1D(filters= 16, kernel_size=3, activation='relu'))
model.add(MaxPooling1D(strides=4))
model.add(Dropout(0.1))
model.add(BatchNormalization())
#2.Flattening
model.add(Dropout(0.2))
model.add(Flatten())
#3.Full Connection
model.add(Dense(30, activation='relu'))
model.add(Dense(n_outputs, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
If I try to change the kernel size I get following Error
ValueError: Negative dimension size caused by subtracting 2 from 1 for 'max_pooling1d_103/MaxPool' (op: 'MaxPool') with input shapes: [?,1,1,16].
When I try to train my Model using code below, I get no improvement in Accuracy with loss = Nan
history = model.fit(
X_train,
y_train,
batch_size=1000,
epochs=2,
validation_data=(X_test, y_test),
)
Loss: nan
Error:
Train on 1534185 samples, validate on 657509 samples
Epoch 1/2
956000/1534185 [=================>............] - ETA: 1:44 - loss: nan - accuracy: 0.0101
Need your Help
Try to check for inf values and replace them with nan and retry
X_train.replace([np.inf, -np.inf], np.nan,inplace=True)
X_train = X_train.fillna(0)
I have this CNN code for the MNIST data that divides the dataset into training set and test set for only 2's and 7's. On running it the code it gives about 98% Accuracy on the test set.
So, to increase the Accuracy I tried using KerasClassifier from keras.wrappers.scikit_learn. Using the Classifier with GridSearchCV I was thinking to find the optimal parameters but on running the code 1st Iteration goes all fine but throws an error from the next Iteration.
Here is the code:
# This is the normal CNN model without GridSearch
from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
import numpy as np
batch_size = 128
num_classes = 2
epochs = 12
# input image dimensions
img_rows, img_cols = 28, 28
# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
#Only look at 3s and 8s
train_picks = np.logical_or(y_train==2,y_train==7)
test_picks = np.logical_or(y_test==2,y_test==7)
x_train = x_train[train_picks]
x_test = x_test[test_picks]
y_train = np.array(y_train[train_picks]==7,dtype=int)
y_test = np.array(y_test[test_picks]==7,dtype=int)
if K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(4, kernel_size=(3, 3),activation='relu',input_shape=input_shape))
model.add(Conv2D(8, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
# Improving the accuracy using GridSearch
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
def build_model(optimizer):
print(optimizer,batch_size,epochs)
model = Sequential()
model.add(Conv2D(4, kernel_size=(3, 3),activation='relu',input_shape=input_shape))
model.add(Conv2D(8, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=optimizer,
metrics=['accuracy'])
return model
model = KerasClassifier(build_fn = build_model)
parameters = {'batch_size': [128, 256],
'epochs': [10, 20],
'optimizer': ['rmsprop']}
grid_search = GridSearchCV(estimator = model,
param_grid = parameters,
scoring = 'accuracy',
cv = 10)
grid_search = grid_search.fit(x_train, y_train)
best_parameters = grid_search.best_params_
best_accuracy = grid_search.best_score_
This is the Output of the code:
rmsprop 128 12
Epoch 1/10
11000/11000 [==============================] - 3s - loss: 0.1654 - acc: 0.9476
Epoch 2/10
11000/11000 [==============================] - 3s - loss: 0.0699 - acc: 0.9786
Epoch 3/10
11000/11000 [==============================] - 2s - loss: 0.0557 - acc: 0.9839
Epoch 4/10
11000/11000 [==============================] - 2s - loss: 0.0510 - acc: 0.9839
Epoch 5/10
11000/11000 [==============================] - 2s - loss: 0.0471 - acc: 0.9853
Epoch 6/10
11000/11000 [==============================] - 2s - loss: 0.0417 - acc: 0.9875
Epoch 7/10
11000/11000 [==============================] - 2s - loss: 0.0399 - acc: 0.9870
Epoch 8/10
11000/11000 [==============================] - 2s - loss: 0.0365 - acc: 0.9885
Epoch 9/10
11000/11000 [==============================] - 2s - loss: 0.0342 - acc: 0.9899
Epoch 10/10
11000/11000 [==============================] - 2s - loss: 0.0321 - acc: 0.9903
768/1223 [=================>............] - ETA: 0sTraceback (most recent call last):
File "<ipython-input-4-975b20661114>", line 30, in <module>
grid_search = grid_search.fit(x_train, y_train)
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 945, in fit
return self._fit(X, y, groups, ParameterGrid(self.param_grid))
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 564, in _fit
for parameters in parameter_iterable
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
while self.dispatch_one_batch(iterator):
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
self._dispatch(tasks)
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
result = ImmediateResult(func)
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
self.results = batch()
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 131, in <listcomp>
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 260, in _fit_and_score
test_score = _score(estimator, X_test, y_test, scorer)
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 288, in _score
score = scorer(estimator, X_test, y_test)
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/metrics/scorer.py", line 98, in __call__
**self._kwargs)
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/home/thakkar_/anaconda3/lib/python3.6/site-packages/sklearn/metrics/classification.py", line 82, in _check_targets
"".format(type_true, type_pred))
ValueError: Can't handle mix of multilabel-indicator and binary
Please help!
The error seem to be in the way you parsing the dictionary parameters..
An example from here:
import numpy
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import SGD
# Function to create model, required for KerasClassifier
def create_model(learn_rate=0.01, momentum=0):
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
optimizer = SGD(lr=learn_rate, momentum=momentum)
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
print("%f (%f) with: %r" % (mean, stdev, param))
GridseachCV basically takes elements from the dictionary that matches with its input parameter, and train it. You are parsing the complete dictionary, but batch_size, and epochs aren't parameter within the function...
# Improving the accuracy using GridSearch
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
def build_model(optimizer = 'adam'):
model = Sequential()
model.add(Conv2D(4, kernel_size=(3, 3),activation='relu',input_shape=input_shape))
model.add(Conv2D(8, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=optimizer,
metrics=['accuracy'])
return model
model = KerasClassifier(build_fn = build_model)
parameters = {'batch_size': [128, 256],
'epochs': [10, 20],
'optimizer': ['rmsprop']}
grid_search = GridSearchCV(estimator = model,
param_grid = parameters,
scoring = 'accuracy',
cv = 10)
grid_search = grid_search.fit(x_train, y_train)
best_parameters = grid_search.best_params_
best_accuracy = grid_search.best_score_
Maybe something like this would work.. have not tested it.