MDP Policy Iteration example calculations - dynamic-programming

I am new to RL and following lectures from UWaterloo. In the lecture 3a on Policy Iteration, professor gave an example of MDP involving a company that needs to make decision between Advertise(A) or Save(S) decisions in states - Poor Unknown(PU), Poor Famous(PF), Rich Famous(RF) and Rich Unknown(RU) as shown in the MDP transition diagram below.
For the second iteration, n=1 the state value of "Rich and Famous" is shown as 54.2. I am not able to follow the calculation through Policy Iteration algorithm.
My calculation goes as follows,
V_2(RF) = V_1(RF) + gamma * Sum_s'[ p(s'|s,a)]*V(s')
For Save action,
V_2(RF) = 10 + 0.9 * [0.5*10 + 0.5 * 10] = 19
What am I missing here?

I think I found the answer. The V is not a value update for an iteration but value under the policy (different from value iteration). Hence, we need to solve the linear equation as,
V = (I - gama*P)^-1 * R ; matrix inverse method
In octave for second iteration for optimal policy actions as "ASSS", values will be,
octave:32> A=eye(4) - 0.9*[0.5 0.5 0 0; 0.5 0 0.5 0;0 0 0.5 0.5;0.5 0 0 0.5]
A =
0.5500 -0.4500 0 0
-0.4500 1.0000 -0.4500 0
0 0 0.5500 -0.4500
-0.4500 0 0 0.5500
octave:35> B=[0;0;10;10]
B =
0
0
10
10
octave:36> A\B
ans =
31.585
38.604
54.202
44.024

Related

Different output on Function on assesment (Hackerrank) (FIXED)

So, for this assessment on Hackerrank i need to calculate the ratios on given input. So i coded but whenever iam running this on my own shell it gives me the right output ,but in the compiler from Hackerrank it rounds the results up to a full integer.
Down below my own code. The test input =
6
-4 3 -9 0 4 1
My Code =
def plusMinus(arr):
array = []
positive_int = 0
negative_int = 0
zero_int = 0
total_int = 0
#removing the first int from input & appending the rest of the input to the empty array
for i in arr[1:]: ***Error was in this line of code. Removed the parameter "[1:]" and it gave me the right output***
array.append(i)
total_int = len(array)
#calcuting with every integer in array if positive or negative and counting them
for i in array:
if i >= 1:
positive_int +=1
elif i < 0:
negative_int +=1
elif i == 0:
zero_int +=1
#calculate the ratio of the negatives and positives integers
pos_ratio = positive_int / total_int
negative_ratio = negative_int / total_int
zero_ratio = zero_int / total_int
#format to six decimal places
formatted_pos_ratio = "{:.6f}".format(pos_ratio)
formatted_negative_ratio = "{:.6f}".format(negative_ratio)
formatted_zero_ratio = "{:.6f}".format(zero_ratio)
#printing out the ratio's
print(str(formatted_pos_ratio))
print(str(formatted_negative_ratio))
print(str(formatted_zero_ratio))
Output in Hackerrank Compiler =
0.600000
0.200000
0.200000
and in my own compiler its:
0.500000
0.333333
0.166667
What am i doing wrong? (Bear in mind, i just started coding so my code is most probably not the best way to solve this.)

What does it mean when ECOS can't solve my SOCP with ~25,000 optimization variables?

tl;dr On a convex optimization problem with about 25,000 variables, ECOS runs to max_iters and terminates with the following error:
SolverError: Solver 'ECOS' failed. Try another solver, or solve with verbose=True for more information.
What does this mean?
I am trying to solve a convex optimization problem in cvxpy, where the setup is as follows:
# <table> is a contingency table with 3 columns where the first two columns are unique item ids, and the third column describes the frequency of co-occurrence
import numpy as np
import cvxpy as cp
theta = cp.Variable([196, 10], nonneg=True)
phi = cp.Variable([10], nonneg=True)
Q = cp.Parameter([2548, 10], nonneg=True)
Q.value = np.ones([196, 10])/10
obj_func = 0
for m, row in enumerate(table):
i, j, freq = row
obj_func += freq * Q[m,:] * (cp.log(theta[i,:]) + cp.log(theta[j,:]) + cp.log(phi)- cp.log(Q[m,:]))
objective = cp.Maximize(obj_func)
constraints = [
cp.sum(phi) == 1,
cp.sum(theta, axis=0) == 1,
]
problem = cp.Problem(objective, constraints)
opt_val = problem.solve()
When run with verbose=True and max_iters=500, the output looks like:
ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS
It pcost dcost gap pres dres k/t mu step sigma IR | BT
0 +0.000e+00 -1.108e+05 +1e+06 1e+00 1e+00 1e+00 1e+00 --- --- 0 0 - | - -
1 -1.571e+04 -1.265e+05 +1e+06 7e-01 1e+00 1e+00 9e-01 0.2387 5e-01 2 2 2 | 0 2
2 -7.070e+04 -1.814e+05 +8e+05 8e-01 1e+00 2e+00 7e-01 0.3791 3e-01 1 2 2 | 1 0
3 -1.869e+05 -2.975e+05 +5e+05 9e-01 1e+00 2e+00 4e-01 0.6988 5e-01 2 3 2 | 4 1
...
497 +4.782e+08 +4.782e+08 +4e-07 2e-03 5e-12 3e-04 3e-13 0.3208 9e-01 1 1 0 | 16 5
498 +4.782e+08 +4.782e+08 +4e-07 2e-03 5e-12 3e-04 3e-13 0.9791 1e+00 2 1 0 | 27 0
499 +4.782e+08 +4.782e+08 +4e-07 2e-03 5e-12 3e-04 3e-13 0.5013 1e+00 1 1 0 | 21 3
500 +4.782e+08 +4.782e+08 +4e-07 2e-03 5e-12 3e-04 3e-13 0.9791 1e+00 1 1 0 | 30 0
Maximum number of iterations reached, recovering best iterate (497) and stopping.
RAN OUT OF ITERATIONS (reached feastol=1.6e-03, reltol=8.3e-16, abstol=4.0e-07).
Runtime: 314.146930 seconds.
As far as I can tell this is a perfectly standard convex optimization problem. However, when I run ECOS on it, I reach max_iters without it converging. Repeating with max_iters = 500 (as compared to the default of 67) did not solve the issue.
My question is, why does this happen? What is ECOS trying to tell me? Is my problem infeasible? Is it just that there are too many variables to handle?
Waging a guess, I suspect this comes down to scaling. The primal and dual cost are very close together, the gap is small. Maybe the solver tolerances are set too tight for this instance?
Things to try:
Try rescaling your formulation, so that all involved constants have similar magnitudes.
Have a look at the solution computed by ECOS. It might very well be ok - in that case you'd just have to adjust the termination criteria of the solver.

special case of collision detection

If I studying a collision detection of two balls on one dimensional.
Suppose the first one in position 0 and his velocity 5 pixel each frame,
and second one in position 3 and his velocity -5 pixel each frame
then in the next frame, the first ball will move to position 5 and second one will jump to position -2.
In this case we don't have an overlapping between them so the collision detection will fail.
How can I handle this case? here a picture for explanation:
you can estimate and check the collision:
Trajectory intersection in python
but in your case the movement is 1D and linear so you can directly compute the time of collision easily ...
pos0 + vel0*t = pos1 + vel1*t
0 + 5*t = 3 - 5*t
10*t = 3
t = 3/10
t = 0.3
so the collision occurs after 0.3 frame ... You can also add the radius of your objects into account to improve the accuracy of the time ... If you want also the collision position it is:
pos0 + vel0*t = 0 + 5*0.3 = 1.5
pos1 + vel1*t = 3 - 5*0.3 = 1.5

search for specific word in text file in Matlab

I want to search for a specific word in a text file and return its position. This code reads the text fine...
fid = fopen('jojo-1 .txt','r');
while 1
tline = fgetl(fid);
if ~ischar(tline)
break
end
end
but when I add this code
U = strfind(tline, 'Term');
it returns [] although the string 'Term' exists in the file.
Can you please help me?
For me, it works fine:
strfind(' ertret Term ewrwerewr', 'Term')
ans =
9
Are you sure that 'Term' is really in your line?
I believe that your ~ischar(tline) makes the trouble because the code "breaks" when the tline is not char..so the strfind cannot find anything.
so the mayor change I made is to actually search for the String at the line which was identified as a line with some characters.
I tried a little bit modification of your code on my TEXT file:
yyyy/mmdd(or -ddd)/hh.h):2011/-201/10.0UT geog Lat/Long/Alt= 50.0/ 210.0/2000.0
NeQuick is used for topside Ne profile
URSI maps are used for the F2 peak density (NmF2)
CCIR maps are used for the F2 peak height (hmF2)
IRI-95 option is used for D-region
ABT-2009 option is used for the bottomside thickness parameter B0
The foF2 STORM model is turned on
Scotto-97 no L option is used for the F1 occurrence probability
TBT-2011 option is used for the electron temperature
RBY10+TTS03 option is used for ion composition
Peak Densities/cm-3: NmF2= 281323.9 NmF1= 0.0 NmE= 2403.3
Peak Heights/km: hmF2= 312.47 hmF1= 0.00 hmE= 110.00
Solar Zenith Angle/degree 109.6
Dip (Magnetic Inclination)/degree 65.76
Modip (Modified Dip)/degree 55.06
Solar Sunspot Number (12-months running mean) Rz12 57.5
Ionospheric-Effective Solar Index IG12 63.3
TEC [1.E16 m-2] is obtained by numerical integration in 1km steps
from 50 to 2000.0 km. t is the percentage of TEC above the F peak.
-
H ELECTRON DENSITY TEMPERATURES ION PERCENTAGES/% 1E16m-2
km Ne/cm-3 Ne/NmF2 Tn/K Ti/K Te/K O+ N+ H+ He+ O2+ NO+ Clust TEC t/%
0.0 -1 -1.000 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 7.7 75
5.0 -1 -1.000 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 7.7 75
10.0 -1 -1.000 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 7.7 75
it is an output from one Ionospheric model but that is not important:)
so I used following Matlab code to find where it string TEMPERATURES
out = fopen('fort.7'); % Open function
counter = 0; % line counter (sloppy but works)
while 1 % infinite loop
tline = fgetl(out); % read a line
counter = counter + 1; % we are one line further
if ischar(tline) % if the line is string
U = strfind(tline, 'TEMPERATURES'); % where the string start (if at all)
if isfinite(U) == 1; % if it is a number actually
break % we found it, lets go home
end
end
end
results:
counter = 26
U = 27

Metric 3d reconstruction

I'm trying to reconstruct 3D points from 2D image correspondences. My camera is calibrated. The test images are of a checkered cube and correspondences are hand picked. Radial distortion is removed. After triangulation the construction seems to be wrong however. The X and Y values seem to be correct, but the Z values are about the same and do not differentiate along the cube. The 3D points look like as if the points were flattened along the Z-axis.
What is going wrong in the Z values? Do the points need to be normalized or changed from image coordinates at any point, say before the fundamental matrix is computed? (If this is too vague I can explain my general process or elaborate on parts)
Update
Given:
x1 = P1 * X and x2 = P2 * X
x1, x2 being the first and second image points and X being the 3d point.
However, I have found that x1 is not close to the actual hand picked value but x2 is in fact close.
How I compute projection matrices:
P1 = [eye(3), zeros(3,1)];
P2 = K * [R, t];
Update II
Calibration results after optimization (with uncertainties)
% Focal Length: fc = [ 699.13458 701.11196 ] ± [ 1.05092 1.08272 ]
% Principal point: cc = [ 393.51797 304.05914 ] ± [ 1.61832 1.27604 ]
% Skew: alpha_c = [ 0.00180 ] ± [ 0.00042 ] => angle of pixel axes = 89.89661 ± 0.02379 degrees
% Distortion: kc = [ 0.05867 -0.28214 0.00131 0.00244 0.35651 ] ± [ 0.01228 0.09805 0.00060 0.00083 0.22340 ]
% Pixel error: err = [ 0.19975 0.23023 ]
%
% Note: The numerical errors are approximately three times the standard
% deviations (for reference).
-
K =
699.1346 1.2584 393.5180
0 701.1120 304.0591
0 0 1.0000
E =
0.3692 -0.8351 -4.0017
0.3881 -1.6743 -6.5774
4.5508 6.3663 0.2764
R =
-0.9852 0.0712 -0.1561
-0.0967 -0.9820 0.1624
0.1417 -0.1751 -0.9743
t =
0.7942
-0.5761
0.1935
P1 =
1 0 0 0
0 1 0 0
0 0 1 0
P2 =
-633.1409 -20.3941 -492.3047 630.6410
-24.6964 -741.7198 -182.3506 -345.0670
0.1417 -0.1751 -0.9743 0.1935
C1 =
0
0
0
1
C2 =
0.6993
-0.5883
0.4060
1.0000
% new points using cpselect
%x1
input_points =
422.7500 260.2500
384.2500 238.7500
339.7500 211.7500
298.7500 186.7500
452.7500 236.2500
412.2500 214.2500
368.7500 191.2500
329.7500 165.2500
482.7500 210.2500
443.2500 189.2500
402.2500 166.2500
362.7500 143.2500
510.7500 186.7500
466.7500 165.7500
425.7500 144.2500
392.2500 125.7500
403.2500 369.7500
367.7500 345.2500
330.2500 319.7500
296.2500 297.7500
406.7500 341.2500
365.7500 316.2500
331.2500 293.2500
295.2500 270.2500
414.2500 306.7500
370.2500 281.2500
333.2500 257.7500
296.7500 232.7500
434.7500 341.2500
441.7500 312.7500
446.2500 282.2500
462.7500 311.2500
466.7500 286.2500
475.2500 252.2500
481.7500 292.7500
490.2500 262.7500
498.2500 232.7500
%x2
base_points =
393.2500 311.7500
358.7500 282.7500
319.7500 249.2500
284.2500 216.2500
431.7500 285.2500
395.7500 256.2500
356.7500 223.7500
320.2500 194.2500
474.7500 254.7500
437.7500 226.2500
398.7500 197.2500
362.7500 168.7500
511.2500 227.7500
471.2500 196.7500
432.7500 169.7500
400.2500 145.7500
388.2500 404.2500
357.2500 373.2500
326.7500 343.2500
297.2500 318.7500
387.7500 381.7500
356.2500 351.7500
323.2500 321.7500
291.7500 292.7500
390.7500 352.7500
357.2500 323.2500
320.2500 291.2500
287.2500 258.7500
427.7500 376.7500
429.7500 351.7500
431.7500 324.2500
462.7500 345.7500
463.7500 325.2500
470.7500 295.2500
491.7500 325.2500
497.7500 298.2500
504.7500 270.2500
Update III
See answer for corrections. Answers computed above were using the wrong variables/values.
** Note all reference are to Multiple View Geometry in Computer Vision by Hartley and Zisserman.
OK, so there were a couple bugs:
When computing the essential matrix (p. 257-259) the author mentions the correct R,t pair from the set of four R,t (Result 9.19) is the one where the 3D points lay in front of both cameras (Fig. 9.12, a) but doesn't mention how one computes this. By chance I was re-reading chapter 6 and discovered that 6.2.3 (p.162) discusses depth of points and Result 6.1 is the equation needed to be applied to get the correct R and t.
In my implementation of the optimal triangulation method (Algorithm 12.1 (p.318)) in step 2 I had T2^-1' * F * T1^-1 where I needed to have (T2^-1)' * F * T1^-1. The former translates the -1.I wanted, and in the latter, to translate the inverted the T2 matrix (foiled again by MATLAB!).
Finally, I wasn't computing P1 correctly, it should have been P1 = K * [eye(3),zeros(3,1)];. I forgot to multiple by the calibration matrix K.
Hope this helps future passerby's !
It may be that your points are in a degenerate configuration. Try to add a couple of points from the scene that don't belong to the cube and see how it goes.
More information required:
What is t? The baseline might be too small for parallax.
What is the disparity between x1 and x2?
Are you confident about the accuracy of the calibration (I'm assuming you used the Stereo part of the Bouguet Toolbox)?
When you say the correspondences are hand-picked, do you mean you selected the corresponding points on the image or did you use an interest point detector on the two images are then set the correspondences?
I'm sure we can resolve this problem :)

Resources