How to copy `grad_fn` in pytorch? - pytorch

>>> print(foo.grad_fn)
<AddBackward0 object at 0x7f7f9f450710>
I want to copy from foo.grad_fn to bar.grad_fn. For reference, no foo.data is required. I want to copy only the gradient.
Is this possible? I tried the following and it failed.
>>> bar.grad_fn = foo.grad_fn
AttributeError: attribute 'grad_fn' of 'torch._C._TensorBase' objects is not writable
thank you.

Actually it is quite easy. You can access the gradient stored in a leaf tensor simply doing foo.grad.data. So, if you want to copy the gradient from one leaf to another, just do bar.grad.data.copy_(foo.grad.data) after calling backward. Note that data is used to avoid keeping track of this operation in the computation graph.
If it is not a leaf, when you have to specify the optional argument retain_graph=True in backward.
Yet, I'm not sure to understand what you are trying to achieve doing this. I don't get why copying the grad from one tensor to another could by useful. If you tell us more about what you are actually trying to achieve maybe one could give you a more helpful answer.

Related

How would I construct an integer optimization model corresponding to a graph

Suppose we're given some sort of graph where the feasible region of our optimization problem is given. For example: here is an image
How would I go on about constructing these constraints in an integer optimization problem? Anyone got any tips? Thanks!
Mate, I agree with the others that you should be a little more specific than that paint-ish picture ;). In particular you are neither specifying any objective/objective direction nor are you giving any context, what about this graph should be integer-variable related, except for the existence of disjunctive feasible sets, which may be modeled by MIP-techniques. It seems like your problem is formalization of what you conceptualized. However, in case you are just being lazy and are just interested in modelling disjunctive regions, you should be looking into disjunctive programming techniques, such as "big-M" (Note: big-M reformulations can be problematic). You should be aiming at some convex-hull reformulation if you can attain one (fairly easily).
Back to your picture, it is quite clear that you have a problem in two real dimensions (let's say in R^2), where the constraints bounding the feasible set are linear (the lines making up the feasible polygons).
So you know that you have two dimensions and need two real continuous variables, say x[1] and x[2], to formulate each of your linear constraints (a[i,1]*x[1]+a[i,2]<=rhs[i] for some index i corresponding to the number of lines in your graph). Additionally your variables seem to be constrained to the first orthant so x[1]>=0 and x[2]>=0 should hold. Now, to add disjunctions you want some constraints that only hold when a certain condition is true. Therefore, you can add two binary decision variables, say y[1],y[2] and an additional constraint y[1]+y[2]=1, to tell that only one set of constraints can be active at the same time. You should be able to implement this with the help of big-M by reformulating the constraints as follows:
If you bound things from above with your line:
a[i,1]*x[1]+a[i,2]-rhs[i]<=M*(1-y[1]) if i corresponds to the one polygon,
a[i,1]*x[1]+a[i,2]-rhs[i]<=M*(1-y[2]) if i corresponds to the other polygon,
and if your line bounds things from below:
-M*(1-y[1])<=-a[i,1]*x[1]-a[i,2]+rhs[i] if i corresponds to the one polygon,
-M*(1-y[1])<=-a[i,1]*x[1]-a[i,2]+rhs[i] if i corresponds to the other polygon.
It is important that M is sufficiently large, but not too large to cause numerical issues.
That being said, I am by no means an expert on these disjunctive programming techniques, so feel free to chime in, add corrections or make things clearer.
Also, a more elaborate question typically yields more elaborate and satisfying answers ;) If you had gone to the effort of making up a true small example problem you likely would have gotten a full formulation of your problem or even an executable piece of code in no time.

TypeError: src data type = 23 is not supported

hope you're all having a great day so far.
I have a python 3.6 script that applies random (well, in fact it's more like exhaustively all existing) sequences of image transformations with openCV to an image, compare it to a seeken result and try the next one.
Image transformations include:
Thresholding
Morphological transformation
Smoothing
Playing with colors (with cvtColor, but also hand-made algorithm)
Playing with gradients
I won't show the code for this since it's basically just a heavy set of loops and arrays, which I think couldn't be linked to my issue.
Obviously enough most of the tried combinations aren't valid ones since, for example, converting BGR to GRAY might not work well if done after a conversion from BGR to GRAY. I know it's not very pythonic of me, since opposed to EAFP thinking, but since exception-catching cost a lot and happened pretty often, and sometime after some heavy treatment, I wanted to add a few conditions which would prevent most of them.
To do so, I sorted my array of functions and, by checking if I'm within a certain range, I can check the validity of the transformation to come and abort if bad.
if steps[it] >= THREE_CHANNELS_LIMIT:
if len(cur_img.shape) == 3:
if steps[it] >= SINGLE_CHANNEL_LIMIT:
break
elif cur_img.dtype not in BGR_DEPTHS:
break
Where steps[it] is the index pointing to the next function to execute, and THREE_CHANNELS_LIMIT or SINGLE_CHANNEL_LIMIT are the specific index at the border of each function range.
The above code prevents single-channel transformations to be done on multi-channel numpy image.
Now with my issue : from logged exceptions, I can see a few functions, the OpenCV morphological functions, are still throwing errors.
TypeError: src data type = 23 is not supported
Which I think is probably an issue with pixel depth. However, I have no idea what type 23 is/means, and I would like to know it in order to guess how often the issue occurs and determine whether or not I should add another condition or let the try-except statement deal with it.
Searched through the web but found many type = 17, type = 18 or type = 0 issues, but can't seem to find this one.
Is there a file somewhere listing all of OpenCV types used for Error message? Or maybe one of you know about this specific one, which would do the trick for my present case?
And sorry for my innacurate english. Current speelchecker just underline everything so I might have left many typos, too.

Missing values for nominal attribute in Weka

I have a data set and I am doing classification using Weka NaiveBayes classifier. I have 14 attributes, some of which are nominals.
In only one of these attributes, I have some missing values. What I have done so far is that I have left them as missing values, and I know that Weka replaces those values automatically (a question is asked here about that ).
I mean, the values for this attribute are empty in my feature file, and when I create the ARFF file, I see "?" between the two commas.
Now, I have two possibilities:
1) Let them be filled by Weka automatically.
2) Replace them by "NULL".
The problem is that in the first case, the classifier works better. Now, I am wondering if it is allowed to let them be replaced by Weka? Or should I use the second approach, even though I get worse results?
I mean, "when" should we let Weka replace the missing values? and when not?
Meanwhile, the feature which has missing values represents the WordNet supersense of the words and when it is empty, it means that the instance is, for example, a preposition, or a WH question.
Thanks in advance,
Well, about missing values, weka doesn't replace them by default, you have to use filter (exactly as in post you linked first in your question). Some classifiers can handle missing values, I think Naive Bayes can, just by don't count them in probability calculation. So basically you have three options. Use ReplaceMissingValues filter to replace missing values with mode values, don't use filter and use dataset with missing values (in this case I recommend you to have a look how Naive Bayes works, to understand how your missing values will be treated and if it is good for you) and final option, replace your missing values with your own label like "other values" or so. Probably the key for correct choice is in your last paragraph, that suggest that your missing values probably means something. If this is so, I will use third approach - your new label. On the other hand, if missing values doesn't means anything and are just result of some fault in data collection I will think about first two approaches. Good luck.

Options for representing string input as an object

I am receiving as input a "map" represented by strings, where certain nodes of the map have significance (s). For example:
---s--
--s---
s---s-
s---s-
-----s
My question is, what reasonable options are there for representing this input as an object.
The only option that really comes to mind is:
(1) Each position translated to node with up,down,left,right pointers. The whole object contains a pointer to top right node.
This seems like just a graph representation specific to this problem.
Thanks for the help.
Additionally, if there are common terms for this type of input, please let me know
Well, it depends a lot on what you need to delegate to those objects. OOP is basically about asking objects to perform things in order to solve a given problem, so it is hard to tell without knowing what you need to accomplish.
The solution you mention can be a valid one, as can also be having a matrix (in this case of 6x5) where you store in each matrix cell an object representing the node (just as an example, I used both approaches once to model the Conway's game of life). If you could give some more information on what you need to do with the object representation of your map then a better design can be discussed.
HTH

Update the quantile for a dataset when a new datapoint is added

Suppose I have a list of numbers and I've computed the q-quantile (using Quantile).
Now a new datapoint comes along and I want to update my q-quantile, without having stored the whole list of previous datapoints.
What would you recommend?
Perhaps it can't be done exactly without, in the worst case, storing all previous datapoints.
In that case, can you think of something that would work well enough?
One idea I had, if you can assume normality, is to use the inverse CDF instead of the q-quantile.
Keep track of the sample variance as you go and then you can compute InverseCDF[NormalDistribution[sampleMean,sampleVariance], q] which should be the value such that a fraction q of the values are smaller, which is what the q-quantile is.
(I see belisarius was thinking along the same lines.
Here's the link he pointed to: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm )
Unless you know that your underlying data comes from some distribution, it is not possible to update arbitrary quantiles without retaining the original data. You can, as others suggested, assume that the data has some sort of distribution and store the quantiles this way, but this is a rather restrictive approach.
Alternately, have you thought of programming this somewhere besides Mathematica? For example, you could create a class for your datapoints that contains (1) the Double value and (2) some timestamp for when the data came in. In a SortedList of these datapoints classes (which compares based on value), you could get the quantile very fast by simply referencing the index of the datapoints. Want to get a historical quantile? Simply filter on the timestamps in your sorted list.

Resources