How to evaluate MoveNet.SinglePose.Lightning using COCOEval API? - tensorflow-hub

I’m trying to evaluate MoveNet.SinglePose.Lightning using the COCOEval API.
However, I do not know how to calculate the score to use for evaluation.
Referring to the Results Format page, one score must be set for each inference result for the evaluation of keypoint detecition.
And this score is the instance-level confidence of the object.
[{
"image_id": int,
"category_id": int,
"keypoints": [x1,y1,v1,...,xk,yk,vk],
"score": float,
}]
But, MoveNet.SinglePose.Lightning outputs the confidence for each keypoint and not the instance-level confidence.
ref. The model card
Outputs
A float32 tensor of shape [1, 1, 17, 3].
The first two channels of the last dimension represents the yx coordinates (normalized to image frame, i.e. range in [0.0, 1.0]) of the 17 keypoints (in the order of: [nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip, right hip, left knee, right knee, left ankle, right ankle]).
The third channel of the last dimension represents the prediction confidence scores of each keypoint, also in the range [0.0, 1.0].
How can I calculate the score or do the evaluation?

Related

Is there any "better way" to train a quadruped to walk using reinforcement Learning

I have been chasing this problem of using RL to train a quadruped to walk. But have got NO noteworthy success. Following are the Details of the GYM ENV I am using.
Sim: pybullet
env.action_space = box(shape=(12,), upper = 1, lower = -1)
converting selected actions and multiplying them by max_actions specified for each joint.
action space are the 3 motor positions(hip_joint_y, hip_joint_x, knee_joint) x 4 legs of the robot
env.observation_space = box(shape=(12,), upper = np.inf, lower = -np.inf)
observation_space include
roll, pitch of the body [r, p]
angular vel [x, y, z]
linear acc [x, y, z]
Binary contact forces for each leg [1 if in contact else 0]. [1, 1, 1, 1]
reward = (
+ distance_reward
- body_rotation_reward
- energy_usage_reward
- body_drift_from x-axis reward
- body_shake_reward)
I have tried the following approaches.
Using PPO from stable-baselines3 for 20 million timesteps [No Distinct improvement]
Using DDPG, TD3, SAC, A2C, and PPO with 5 million timesteps on each algo increasing policy network up to 4 layers of 1024 neurons each [1024, 1024, 1024, 1024] for qf and vf, or actor and critic.
Using the Discrete Delta concept to scale action limits so changing action_space from box to MultiDiscrete with each action limiting from 0 to 6. discrete_delta_vals = [-0.3, -0.1, -0.03, 0, 0.03, 0.1, 0.3]. Each joint value is decided from choosing one value from the discrete_delta_vals list and adding that value to the previous actions.
Keeping hip_joint_y of all legs as zeros and changing action space from box(shape=(12,)) to box(shape=(8,)). Trained this agent for another 6M timesteps, there seems to be a small improvement at first and then the eps_length and mean_reward settles and no significant improvements afterwards.
I have generated Half Ellipsoid Trajectories with IK and That works but that is explicitly Robotics Approach to solve this problem. I am currently looking into DeepMimic to use those trajectories to guide RL to build a stable walking gait. No Significant breakthrough.
Here is the Repo Link
Check the scripts folder and go through the start_training_v(x).py scripts. Thanks in Advance. If you feel like discussing the entire topic to sort this please drop your email in the comment and I'll reach out to you.
Hi try using Nvidia IsaacGym. This uses pytorch end to endon GPU with PPO. I was able to train a custom urdf to walk in about 10 minutes of training

De-Skewing image

I am unable to figure out how does this deskew is working
def deskew(img):
m = cv2.moments(img)
if abs(m['mu02']) < 1e-2:
return img.copy()
skew = m['mu11']/m['mu02']
M = np.float32([[1, skew, -0.5*SZ*skew], [0, 1, 0]])
img = cv2.warpAffine(img,M,(SZ, SZ),flags=affine_flags)
return img
I know that the moment is a quantitative measure of the shape.
In image processing, the moments give information about the total
area or Intensity, the centroid of the shape and the orientation of the
shape.
Area or total Mass:-
The zeroth moment M(0,0) gives the total Mass or Area.
In image processing, the M(0,0) is the sum of all the pixels and if it is a binary image then sum of pixels gives the area.
Center of mass or Centroid:- When the first moment is divided by
the total mass then it gives the centroid.
Centroid is that point where the shape is perfectly balanced on the
tip of the pin.
M(0,1)/M(0,0) ,M(1,0)/M(0,0)
I think the image from the tutorial you got the code from gives the intuitive idea pretty well:
To deskew the image, they used skewness on x axis (mu02) relative to the variance mu11. They used shear matrix with inverse of image skewness, which is why in skew = m['mu11']/m['mu02'] mu02 and mu11 fraction is flipped. To deskew relative to the center of the top of the image, rather than the (0,0) point, they also used translation, which is where you get M[0, 2] = -0.5*SZ*skew

Move object along curve with custom velocity

I have a catmull-rom curve defined with a couple of control points as shown here:
I would like to animate an object moving along the curve, but be able to define the velocity of the object.
When iterating over the curve's points using the getPoint method, the object moves chordaly (in the image, at u=0, we are at p1, at u=0.25, we are at p2 etc). Using the getPointAt method, the object moves with uniform speed along the curve.
However what I would like to so is to have greater control over the animation, so that I can specify that the movement from p1 to p2 should take 0.5, from p2 to p3, 0.3, and from p3 to p4 0.2. Is this possible?
Thanks for the suggestions. The way I finally implemented this was to create a custom mapping between my time variable, an the u variable for three.js getPoint function.
I created a piecewise linear functionn using a javascript library called everpolate. This way I could map t to u such that:
At t = 0, u = 0, resulting in p1
At t = 0.5, u = 1/3, resulting in p2
At t = 0.8, u = 2/3, resulting in p3
At t = 1, u = 1, resulting in p4
T to U map picture
However what I would like to so is to have greater control over the animation, so that I can specify that the movement from p1 to p2 should take 0.5, from p2 to p3, 0.3, and from p3 to p4 0.2. Is this possible?
You can achieve this by using an animation library like tween.js. In this way, you can specify the start and end position of your object and the desired duration. It's also possible to customize the type of transition by using easing functions.
You have multiple options I will describe the theory and then one possible implementation.
Theory
You want to arclength parametrize your curve. Which means that an increment of 1 in the parameter results in a distance of movement along the curve of 1.
This parametrization will allow you to fully control the movement of your object at any speed you want, be it constan't linear, non-linear, piecewise...
Possible application
There are many numerical integration techniques that will allow you to arclength parametrize the curve.
A possible on is to precompute the values and put them on a table. Pick a small epsilon and starting at the first parameter value x_0, evaluate the function at x_0, x_0+ epsilon, x_0 + 2*epsilon...
As you do this take the linear distance between each sample and add it to an accumulator. i.e travelled_distance += length(sample[x], sample[x+1]).
Store the pair in a table.
Now when you are at x and want to move y units you can round x to the nearest x_n and linearly look for the first x_n value whose distance is greater than y and then return that x_n.
This algorithm is not the most efficient, but it is easy to understand and to code, so at least it can get you started.
If you need a more optimized version, look for arc length parametrization algorithms.

Spark : regression model threshold and precision

I have logistic regression mode, where I explicitly set the threshold to 0.5.
model.setThreshold(0.5)
I train the model and then I want to get basic stats -- precision, recall etc.
This is what I do when I evaluate the model:
val metrics = new BinaryClassificationMetrics(predictionAndLabels)
val precision = metrics.precisionByThreshold
precision.foreach { case (t, p) =>
println(s"Threshold is: $t, Precision is: $p")
}
I get results with only 0.0 and 1.0 as values of threshold and 0.5 is completely ignored.
Here is the output of the above loop:
Threshold is: 1.0, Precision is: 0.8571428571428571
Threshold is: 0.0, Precision is: 0.3005181347150259
When I call metrics.thresholds() it also returns only two values, 0.0 and 1.0.
How do I get the precision and recall values with threshold as 0.5?
You need to clear the model threshold before you make predictions. Clearing threshold makes your predictions return a score and not the classified label. If not you will only have two thresholds, i.e. your labels 0.0 and 1.0.
model.clearThreshold()
A tuple from predictionsAndLabels should look like (0.6753421,1.0) and not (1.0,1.0)
Take a look at https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala
You probably still want to set numBins to control the number of points if the input is large.
I think what happens is that all the predictions are 0.0 or 1.0. Then the intermediate threshold values make no difference.
Consider the numBins argument of BinaryClassificationMetrics:
numBins:
if greater than 0, then the curves (ROC curve, PR curve) computed internally will be down-sampled to this many "bins". If 0, no down-sampling will occur. This is useful because the curve contains a point for each distinct score in the input, and this could be as large as the input itself -- millions of points or more, when thousands may be entirely sufficient to summarize the curve. After down-sampling, the curves will instead be made of approximately numBins points instead. Points are made from bins of equal numbers of consecutive points. The size of each bin is floor(scoreAndLabels.count() / numBins), which means the resulting number of bins may not exactly equal numBins. The last bin in each partition may be smaller as a result, meaning there may be an extra sample at partition boundaries.
So if you don't set numBins, then precision will be calculated at all the different prediction values. In your case this seems to be just 0.0 and 1.0.
First, try adding more bins like this (here numBins is 10):
val metrics = new BinaryClassificationMetrics(probabilitiesAndLabels,10);
If you still only have two thresholds of 0 and 1, then check to make sure the way you have defined your predictionAndLabels. You many be having this problem if you have accidentally provided (label, prediction) instead of (prediction, label).

What does `sample_weight` do to the way a `DecisionTreeClassifier` works in sklearn?

I've read from the relevant documentation that :
Class balancing can be done by sampling an equal number of samples from each class, or preferably by normalizing the sum of the sample weights (sample_weight) for each class to the same value.
But, it is still unclear to me how this works. If I set sample_weight with an array of only two possible values, 1's and 2's, does this mean that the samples with 2's will get sampled twice as often as the samples with 1's when doing the bagging? I cannot think of a practical example for this.
Some quick preliminaries:
Let's say we have a classification problem with K classes. In a region of feature space represented by the node of a decision tree, recall that the "impurity" of the region is measured by quantifying the inhomogeneity, using the probability of the class in that region. Normally, we estimate:
Pr(Class=k) = #(examples of class k in region) / #(total examples in region)
The impurity measure takes as input, the array of class probabilities:
[Pr(Class=1), Pr(Class=2), ..., Pr(Class=K)]
and spits out a number, which tells you how "impure" or how inhomogeneous-by-class the region of feature space is. For example, the gini measure for a two class problem is 2*p*(1-p), where p = Pr(Class=1) and 1-p=Pr(Class=2).
Now, basically the short answer to your question is:
sample_weight augments the probability estimates in the probability array ... which augments the impurity measure ... which augments how nodes are split ... which augments how the tree is built ... which augments how feature space is diced up for classification.
I believe this is best illustrated through example.
First consider the following 2-class problem where the inputs are 1 dimensional:
from sklearn.tree import DecisionTreeClassifier as DTC
X = [[0],[1],[2]] # 3 simple training examples
Y = [ 1, 2, 1 ] # class labels
dtc = DTC(max_depth=1)
So, we'll look trees with just a root node and two children. Note that the default impurity measure the gini measure.
Case 1: no sample_weight
dtc.fit(X,Y)
print dtc.tree_.threshold
# [0.5, -2, -2]
print dtc.tree_.impurity
# [0.44444444, 0, 0.5]
The first value in the threshold array tells us that the 1st training example is sent to the left child node, and the 2nd and 3rd training examples are sent to the right child node. The last two values in threshold are placeholders and are to be ignored. The impurity array tells us the computed impurity values in the parent, left, and right nodes respectively.
In the parent node, p = Pr(Class=1) = 2. / 3., so that gini = 2*(2.0/3.0)*(1.0/3.0) = 0.444..... You can confirm the child node impurities as well.
Case 2: with sample_weight
Now, let's try:
dtc.fit(X,Y,sample_weight=[1,2,3])
print dtc.tree_.threshold
# [1.5, -2, -2]
print dtc.tree_.impurity
# [0.44444444, 0.44444444, 0.]
You can see the feature threshold is different. sample_weight also affects the impurity measure in each node. Specifically, in the probability estimates, the first training example is counted the same, the second is counted double, and the third is counted triple, due to the sample weights we've provided.
The impurity in the parent node region is the same. This is just a coincidence. We can compute it directly:
p = Pr(Class=1) = (1+3) / (1+2+3) = 2.0/3.0
The gini measure of 4/9 follows.
Now, you can see from the chosen threshold that the first and second training examples are sent to the left child node, while the third is sent to the right. We see that impurity is calculated to be 4/9 also in the left child node because:
p = Pr(Class=1) = 1 / (1+2) = 1/3.
The impurity of zero in the right child is due to only one training example lying in that region.
You can extend this with non-integer sample-wights similarly. I recommend trying something like sample_weight = [1,2,2.5], and confirming the computed impurities.

Resources