How to visualize error surface in keras?

How to visualize error surface in keras? - keras

We see pretty pictures of error surface with a global minima and convergence of a neural network in many books. How can I visualize something similar in keras i.e containing error surface and how my model is converging to achieve global minimal error? Below is an example image of such illustrations. And this link has animated illustration of different optimizers. I explored tensorboard log callback for this purpose but could not find any such thing. A little guidance will be appreciated.

The pictures and animations are made for didatic purposes, but the error surface is completely unknown (or incredibly complex to be understood or visualized). That's the whole idea behind using gradient descent.
We only know, at a single point, the direction towards which the funcion increases, through getting the current gradient.
You could try to plot the way (line) you're following by getting the weights values at each iteration and the error, but then you'd face another problem: it's a massively multidimensional function. It's not actually a surface. The number of variables is the number of weights you have in the model (often thousands or even millions). This is absolutely impossible to visualize or even conceive as a visual thing.
To plot such a surface, you'd have to manually change all thousands of weights to get the error for each arrangement. Besides the "impossible to visualize" problem, this would be excessively time consuming.

Related

How do I analyze the change in the relationship between two variables?

I'm working on a simple project in which I'm trying to describe the relationship between two positively correlated variables and determine if that relationship is changing over time, and if so, to what degree. I feel like this is something people probably do pretty often, but maybe I'm just not using the correct terminology because google isn't helping me very much.
I've plotted the variables on a scatter plot and know how to determine the correlation coefficient and plot a linear regression. I thought this may be a good first step because the linear regression tells me what I can expect y to be for a given x value. This means I can quantify how "far away" each data point is from the regression line (I think this is called the squared error?). Now I'd like to see what the error looks like for each data point over time. For example, if I have 100 data points and the most recent 20 are much farther away from where the regression line/function says it should be, maybe I could say that the relationship between the variables is showing signs of changing? Does that make any sense at all or am I way off base?
I have a suspicion that there is a much simpler way to do this and/or that I'm going about it in the wrong way. I'd appreciate any guidance you can offer!

I can suggest two strands of literature that study changing relationships over time. Typing these names into google should provide you with a large number of references so I'll stick to more concise descriptions.
(1) Structural break modelling. As the name suggest, this assumes that there has been a sudden change in parameters (e.g. a correlation coefficient). This is applicable if there has been a policy change, change in measurement device, etc. The estimation approach is indeed very close to the procedure you suggest. Namely, you would estimate the squared error (or some other measure of fit) on the full sample and the two sub-samples (before and after break). If the gains in fit are large when dividing the sample, then you would favour the model with the break and use different coefficients before and after the structural change.
(2) Time-varying coefficient models. This approach is more subtle as coefficients will now evolve more slowly over time. These changes can originate from the time evolution of some observed variables or they can be modeled through some unobserved latent process. In the latter case the estimation typically involves the use of state-space models (and thus the Kalman filter or some more advanced filtering techniques).
I hope this helps!

The bounding box's position and size is incorrect, how to improve it's accuracy?

I'm using detectron2 for solving a segmentation task,
I'm trying to classify an object into 4 classes,
so I have used COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml.
I have applied 4 kind of augmentation transforms and after training I get about 0.1
total loss.
But for some reason the accuracy of the bbox is not great on some images on the test set,
the bbox is drawn either larger or smaller or doesn't cover the whole object.
Moreover sometimes the predictor draws few bboxes, it assumes there are few different objects although there is only a single object.
Are there any suggestions how to improve it's accuracy?
Are there any good practice approaches how to resolve this issue?
Any suggestion or reference material will be helpful.

I would suggest the following:
Ensure that your training set has the object you want to detect in all sizes: in this way, the network learns that the size of the object can be different and less prone to overfitting (detector could assume your object should be only big for example).
Add data. Rather than applying all types of augmentations, try adding much more data. The phenomenon of detecting different objects although there is only one object leads me to believe that your network does not generalize well. Personally I would opt for at least 500 annotations per class.
The biggest step towards improvement will be achieved by means of (2).
Once you have a decent baseline, you could also experiment with augmentations.

White spot on generated image CycleGAN

I am trying to implement cyclegan. However, it looks like I always get white spots on my generated images even after 10 or 25 epochs. I am wondering what could be wrong? should I continue training and the problem would just go away? or is there any hint on how to solve this problem?
IMAGE

White spots are results from clipping your models output values that are too large when plotting the image.
In the Documentation of matplotlib.imshow() it says:
(...) an image with RGB values (0-1 float or 0-255 int)
(...) Out-of-range RGB(A) values are clipped.
Without knowing your architecture, I would:
review your activation functions within the model and the final activation of your generator.
check your loss function, probably high output values are favoured by your loss objectvive
try training for more epochs, probably the model learns to avoid clipping by itself.

Choosing the learning_rate using fastai's learn.lr_find()

I am going over this Heroes Recognition ResNet34 notebook published on Kaggle.
The author uses fastai's learn.lr_find() method to find the optimal learning rate.
Plotting the loss function against the learning rate yields the following figure:
It seems that the loss reaches a minimum for 1e-1, yet in the next step the author passes 1e-2 as the max_lr in fit_one_cycle in order to train his model:
learn.fit_one_cycle(6,1e-2)
Why use 1e-2 over 1e-1 in this example? Wouldn't this only make the training slower?

The idea for a learning rate range test as done in lr_find comes from this paper by Leslie Smith: https://arxiv.org/abs/1803.09820 That has a lot of other useful tuning tips; it's worth studying closely.
In lr_find, the learning rate is slowly ramped up (in a log-linear way). You don't want to pick the point at which loss is lowest; you want to pick the point at which it is dropping fastest per step (=net is learning as fast as possible). That does happen somewhere around the middle of the downward slope or 1e-2, so the guy who wrote the notebook has it about right. Anything between 0.5e-2 and 3e-2 has roughly the same slope and would be a reasonable choice; the smaller values would correspond to a bit slower learning (=more epochs needed, also less regularization) but with a bit less risk of reaching a plateau too early.
I'll try to add a bit of intuition about what is happening when loss is the lowest in this test, say learning rate=1e-1. At this point, the gradient descent algorithm is taking large steps in the direction of the gradient, but loss is not decreasing. How can this happen? Well, it would happen if the steps are consistently too large. Think of trying to get into a well (or canyon) in the loss landscape. If your step size is larger than the size of the well, you can consistently step over it every time and end up on the other side.
This picture from a nice blog post by Jeremy Jordan shows it visually:
In the picture, it shows the gradient descent climbing out of a well by taking too large steps (maybe lr=1+0 in your test). I think this rarely happens exactly like that unless lr is truly excessive; more likely, the well is in a relatively flat landscape, and the gradient descent can step over it, not being able to get into the well in the first place. High-dimensional loss landscapes are hard to visualize, and may be very irregular, but in a sense the lr_find test is looking for the scale of the typical features in the landscape and then picking a learning rate that gives you a step which is similar sized but a bit smaller.

You can find the suggested learning rate as follows:
_, lr = learner.lr_find()

Geometric/Shape Recognition ( Odd Shape )

I would like to do some odd geometric/odd shape recognition. But I'm not sure how to do it.
Here's what I have so far:
Convert RGB image to Monochrome.
Otsu Threshold
Hough Transform.
I'm not sure what to do next.

For geometric information, you could do a raster to vector conversion to convert your image into coordinated vectors (lines and points) and finite element analysis to look for known shapes. Not easy but libraries should be available for both.
Edit: Note that there are sometimes easier practical solutions, but they depend on the image and types of errors. For example, removing perspective, identifying a 3d object from a 2d image, significance of colour, etc... You often see registration markers added to the real world object to overcome
this and allow much easier identification. Looking up articles on feature extraction techniques might help.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string