Gerrit prolog rule - setting accumulative voting adequately - linux

I want to create the following rule:
The patch will become in submittable only there is 3 votes or more with +1, but THERE SHOULD NOT BE a vote with +2, only votes with +1 will be considered for this criterion.
The rule that i have is:
% rule : 1+1+1=2 Code-Review
% rationale : introduce accumulative voting to determine if a change
% is submittable or not and make the change submittable
% if the total score is 3 or higher.
sum_list([], 0).
sum_list([H | Rest], Sum) :- sum_list(Rest,Tmp), Sum is H + Tmp.
add_category_min_score(In, Category, Min, P) :-
findall(X, gerrit:commit_label(label(Category,X),R),Z),
sum_list(Z, Sum),
Sum >= Min, !,
gerrit:commit_label(label(Category, V), U),
V >= 1,
!,
P = [label(Category,ok(U)) | In].
add_category_min_score(In, Category,Min,P) :-
P = [label(Category,need(Min)) | In].
submit_rule(S) :-
gerrit:default_submit(X),
X =.. [submit | Ls],
gerrit:remove_label(Ls,label('Code-Review',_),NoCR),
add_category_min_score(NoCR,'Code-Review', 3, Labels),
S =.. [submit | Labels].
this rule does not works at all, the problem is with the +2 vote.
How can i rework this rule in order to works as i want?

So you want to have min three reviewers that can add +1 and +2 is not allowed.
What if you remove developers rights to give +2 from project config and use prolog cookbook example 13 with little modifications?
submit_rule(submit(CR)) :-
sum(3, 'Code-Review', CR),
% gerrit:max_with_block(-1, 1, 'Verified', V).
% Sum the votes in a category. Uses a helper function score/2
% to select out only the score values the given category.
sum(VotesNeeded, Category, label(Category, ok(_))) :-
findall(Score, score(Category, Score), All),
sum_list(All, Sum),
Sum >= VotesNeeded,
!.
sum(VotesNeeded, Category, label(Category, need(VotesNeeded))).
score(Category, Score) :-
gerrit:commit_label(label(Category, Score), User).
% Simple Prolog routine to sum a list of integers.
sum_list(List, Sum) :- sum_list(List, 0, Sum).
sum_list([X|T], Y, S) :- Z is X + Y, sum_list(T, Z, S).
sum_list([], S, S).

Related

How to build a Gray-code generator in Picat?

Encouraged by the knowledge I've gained from the answer to my previous post, I aim to generate Gray-codes of given length. The procedure hamming seems to work correctly, however, the Picat system finds no solution. Where's the mistake here?
import cp.
main => gray(2).
gray(CodeLen) =>
CodeNr is 2**CodeLen,
Codes = new_array(CodeNr, CodeLen),
Codes :: 0..1,
foreach(CodeNr1 in 1..CodeNr)
CodeNr2 = cond(CodeNr1 == CodeNr, 1, CodeNr1 + 1),
hamming(Codes[CodeNr1], Codes[CodeNr2], 0, H),
H #= 1
% the Hamming distance between 2 consecutive codes is 1
end,
solve(Codes),
printf("%w\n", Codes).
hamming([], [], A, H) ?=> H #= A.
hamming([H1|T1], [H2|T2], A, H) ?=>
H1 #!= H2,
A1 #= A + 1,
hamming(T1, T2, A1, H).
hamming([H1|T1], [H2|T2], A, H) ?=>
H1 #= H2,
A1 #= A + 0,
hamming(T1, T2, A1, H).
The reason that the model don't print anything is that you are using list constructs ([H|T]) on the array matrix Code which is not allowed. You have to convert the rows of the matrix (which are arrays) to lists. This can be done in two ways:
Convert the array matrix Code matrix to a list matrix with array_matrix_to_list_matrix() (requires that the util package is loaded):
import util.
% ....
gray(CodeLen) =>
CodeNr is 2**CodeLen,
Codes = new_array(CodeNr, CodeLen).array_matrix_to_list_matrix, % <--
Codes :: 0..1,
% ....
Convert the array parameters in the call to hamming/4 to lists with theto_list() function. E.g.:
% ...
foreach(CodeNr1 in 1..CodeNr)
CodeNr2 = cond(CodeNr1 == CodeNr, 1, CodeNr1 + 1),
% hamming(Codes[CodeNr1], Codes[CodeNr2], 0, H), % Original
hamming(Codes[CodeNr1].to_list, Codes[CodeNr2].to_list, 0, H), % <---
H #= 1
% the Hamming distance between 2 consecutive codes is 1
end,
% ...
Update.
Here's a constraint model that solves the problem of generating different rows that was indicated in the comment. It uses a simpler version of hamming_distance by just counting the number of different bits with sum. Also, for symmetry, I require that the first and last row also have a Hamming distance of 1. (This was in the original code.)
To require different rows, the constraint to_num/3 is used to converts a number to digits in an array (given a base, here 2). These numbers (which must be distinct) are in the CodesNum list.
import cp,util.
main =>
go.
go ?=>
gray(5),
nl,
% fail,
nl.
go => true.
% First solution for N=2..10
go2 ?=>
foreach(N in 2..10)
println(n=N),
if time(gray(N)) then
true
else
println(nope)
end,
nl
end,
nl.
go2 => true.
gray(CodeLen) =>
CodeNr is 2**CodeLen,
println(codeNr=CodeNr),
Codes = new_array(CodeNr, CodeLen).array_matrix_to_list_matrix,
Codes :: 0..1,
CodesNum = new_list(CodeNr), % array -> integer
CodesNum :: 0..CodeNr,
foreach(CodeNr1 in 1..CodeNr)
to_num(Codes[CodeNr1],2,CodesNum[CodeNr1]),
CodeNr2 = cond(CodeNr1 == CodeNr, 1, CodeNr1 + 1),
hamming_distance(Codes[CodeNr1], Codes[CodeNr2], 1),
end,
% around the corner
% hamming_distance(Codes[1], Codes[CodeNr],1),
all_different(CodesNum),
CodesNum[1] #= 0, % symmetry breaking
Vars = CodesNum ++ Codes.vars,
solve($[ff,updown],Vars),
printf("%w\n", Codes),
println(codesNum=CodesNum),nl.
% Hamming distance of As and Bs
hamming_distance(As, Bs,Diff) =>
Diff #= sum([(A #!= B) : {A,B} in zip(As,Bs)]).
% Convert Num to/from a list of digits in List (base Base)
to_num(List, Base, Num) =>
Len = length(List),
Num #= sum([List[I]*Base**(Len-I) : I in 1..Len]).
to_num(List, Num) =>
to_num(List, 10, Num).
It solves N=4 in 0s:
n = 4
codeNr = 16
[[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0],[1,1,1,1],[1,1,0,1],[1,0,0,1],[1,0,1,1],[1,0,1,0],[0,0,1,0],[0,1,1,0],[0,1,1,1],[0,0,1,1],[0,0,0,1],[0,1,0,1],[0,1,0,0]]
codesNum = [0,8,12,14,15,13,9,11,10,2,6,7,3,1,5,4]
CPU time 0.0 seconds.
The model solves N=2..7 (first solution) quite fast, but it struggles with N=8, and I don't have the time to test different search heuristics to make it faster.
Here's some another approach for solving the gray code but without constraint modelling and it's much faster: http://hakank.org/picat/gray_code.pi
Update2 Here's a much faster version of hamming/4. It use a reification (boolean) variable B to check if H1 and H2 are different and can then be used as the value to add to A0.
hamming2([], [], A, A).
hamming2([H1|T1], [H2|T2], A0, H) :-
B :: 0..1,
H1 #!= H2 #<=> B #= 1,
A1 #= A0 + B,
hamming2(T1, T2, A1, H).

Calculating a custom probability distribution in python (numerically)

I have a custom (discrete) probability distribution defined somewhat in the form: f(x)/(sum(f(x')) for x' in a given discrete set X). Also, 0<=x<=1.
So I have been trying to implement it in python 3.8.2, and the problem is that the numerator and denominator both come out to be really small and python's floating point representation just takes them as 0.0.
After calculating these probabilities, I need to sample a random element from an array, whose each index may be selected with the corresponding probability in the distribution. So if my distribution is [p1,p2,p3,p4], and my array is [a1,a2,a3,a4], then probability of selecting a2 is p2 and so on.
So how can I implement this in an elegant and efficient way?
Is there any way I could use the np.random.beta() in this case? Since the difference between the beta distribution and my actual distribution is only that the normalization constant differs and the domain is restricted to a few points.
Note: The Probability Mass function defined above is actually in the form given by the Bayes theorem and f(x)=x^s*(1-x)^f, where s and f are fixed numbers for a given iteration. So the exact problem is that, when s or f become really large, this thing goes to 0.
You could well compute things by working with logs. The point is that while both the numerator and denominator might underflow to 0, their logs won't unless your numbers are really astonishingly small.
You say
f(x) = x^s*(1-x)^t
so
logf (x) = s*log(x) + t*log(1-x)
and you want to compute, say
p = f(x) / Sum{ y in X | f(y)}
so
p = exp( logf(x) - log sum { y in X | f(y)}
= exp( logf(x) - log sum { y in X | exp( logf( y))}
The only difficulty is in computing the second term, but this is a common problem, for example here
On the other hand computing logsumexp is easy enough to to by hand.
We want
S = log( sum{ i | exp(l[i])})
if L is the maximum of the l[i] then
S = log( exp(L)*sum{ i | exp(l[i]-L)})
= L + log( sum{ i | exp( l[i]-L)})
The last sum can be computed as written, because each term is now between 0 and 1 so there is no danger of overflow, and one of the terms (the one for which l[i]==L) is 1, and so if other terms underflow, that is harmless.
This may however lose a little accuracy. A refinement would be to recognize the set A of indices where
l[i]>=L-eps (eps a user set parameter, eg 1)
And then compute
N = Sum{ i in A | exp(l[i]-L)}
B = log1p( Sum{ i not in A | exp(l[i]-L)}/N)
S = L + log( N) + B

Rank elements based off two variables

I want to rank all the entities in a list based of two variables (both percentages). One of the variables is 'the bigger the better' (x) and the other is 'smaller the better' (y). What is the best way to give each entity a score in order to rank them?
I tried doing x*(1-y) but as some of the y values are over 1, the negatives it created caused some errors.
Below is the data:
x y
a 0.953882755 0.926422663
b 0.757267676 0.926967001
c 1 1.01607838
d 0.89805254 1.008814817
e 0.672989727 0.932579014
f 0.643306278 0.924523932
g 0.621091809 0.935122957
h 0.56891321 0.918181342
i 0.563662125 0.924102288
j 0.579410248 0.946421415
k 0.781299906 1.040418561
l 0.490013047 0.920900829
m 0.475050754 0.932586282
n 0.505211144 0.972570665
o 0.566582462 1.009732948
p 0.610994363 1.031047605
q 0.686065983 1.060742126
r 0.47642017 0.983301498
s 0.463552006 0.976645044
t 0.551532341 1.025816246
u 0.478092524 1.012675037
v 0.645790431 1.084143812
w 0.390365014 1.189518019
Two ways : averaged ranking OR sort by distance from min&max
average ranking :
use =RANK.AVG() on X & Y separately. Get the average, then rank again base on the average.
sort by distance from min&max :
do '=(B2-MIN(B:B)) + (MAX(C:C)-C2)' and drag downwards. Then use =RANK.AVG() on the results, being the smaller (the distance from min/max) the better.
Hope it solves.

How to generate a list of available steps on a grid?

I have a 5x5 grid which is described by max_size(5, 5). I need to generate a list of all cells from that description using DCG.
Here's the code I have so far:
:- use_module(library(clpfd)).
map_size(5, 5).
natnum(0).
natnum(X) :-
X #= X0 + 1,
natnum(X0).
list_all_cells(Visited) -->
{ length(Visited, 25) },
[].
list_all_cells(Visited) -->
[X-Y],
{ map_size(X_max, Y_max),
natnum(X), natnum(Y),
X #< X_max, Y #< Y_max,
maplist(dif(X-Y), Visited) },
list_all_cells([X-Y|Visited]).
However, it doesn't generate a list and outputs only 4 pairs.
A possible query to the DCG looks like list_all_cells([]) which is supposed to list all cells on the grid. For example, it's gonna be [0-0, 1-0, 1-1, 0-1] for a 2x2 grid (order doesn't matter).
In fact, I need this predicate to build another one called available_steps/2 that would generate a list of all possible moves for a given position. Having available_steps(CurrentPos, Visited), I will be able to brute-force Hunt the Wumpus game and find all possible routes to gold.
list_all_cells(Cells) :-
bagof(C,cell(C),Cells).
cell(X-Y) :-
between(0,4,X),
between(0,4,Y).
Example run:
?- list_all_cells(Cells); true.
Cells= [0-0, 0-1, 0-2, 0-3, 0-4, 1-0, 1-1, 1-2, ... - ...|...] [write] % The letter w was pressed.
Cells= [0-0, 0-1, 0-2, 0-3, 0-4, 1-0, 1-1, 1-2, 1-3, 1-4, 2-0, 2-1, 2-2, 2-3, 2-4, 3-0, 3-1, 3-2, 3-3, 3-4, 4-0, 4-1, 4-2, 4-3, 4-4] ;
true.

Understanding BK Trees: How do we derive the (d-n, d+n) range from the triangle inequality?

Reading this post about BK Trees, I found the following snippet a bit confusing:
"Assume for a moment we have two parameters, query, the string we are using in our search, and n the maximum distance a string can be from query and still be returned. Say we take an arbitary string, test and compare it to query. Call the resultant distance d. Because we know the triangle inequality holds, all our results must have at most distance d+n and at least distance d-n from test."
I can intuitively see that if something is d away from the word I'm searching with and I have a tolerance of n error, then I will need at least d-n distance from the word I'm at to "reverse" the differences. Similarly I can have at most d+n because after "reversing" the differences, I can introduce n more differences.
I'm confused how the triangle inequality was used to get this. If we let d(test, query) = d and d(query, found) <= n then by the inequality:
d(test, query) + d(test, nextWordToSearch) >= d(query, found)
d + d(test, nextWordToSearch) >= n
How can we find
d - n <= d(test, nextWordToSearch) <= d + n
Using #templatetypedef's answer, I was able to use the Triangle Inequality for finding both the upper and lower bound.
d(query, desiredNode) = n
d(query, test) = d1
d(query, test) + d(test, desiredNode) >= d(query, desiredNode)
d1 + d(test, desiredNode) >= n
d(test, desiredNode) >= |n - d1|
d(test, query) + d(query, desiredNode) >= d(test, desiredNode)
|d1 + n| >= d(test, desiredNode)
Hence:
|d1 + n| >= d(test, desiredNode) >= |d1 - n|
Absolute values used because of property of non-negative measure.
In what follows, let d be the distance from the query word to the test word (the word at the current node) and n be the maximum distance you're willing to search. You're interested in proving that
n - d ≤ d(test, anyResultingWord) ≤ n + d.
The math you used in your question involving the triangle inequality is sufficient to establish the lower bound. I think the reason you're having trouble with the upper bound is that you don't actually want to use the triangle inequality here.
You actually don't need to use - and in fact, probably shouldn't! - use the triangle inequality to get the upper bound.
Remember that d(x, y) is defined as the Levenshtein distance between x and y, which is the minimum number of insertions, deletions, or substitutions necessary to turn x into y. We want to upper bound d(test, anyResultingWord) at n + d. To do that, notice the following. Starting with the test word, you can convert it to any resulting word as follows:
Start by converting it back to the query word, which takes d edits.
Convert the query word to the resulting word, which takes n edits.
Overall, this gives a series of n + d total edits needed to convert the test word to the result word. This might be the best way to do it, but it might not be. What we can say is that d(test, anyResultingWord) must be at most n + d, since we know we can convert the test to a resulting word in at most n + d edits. This is where the upper bound comes from - it's not a consequence of the triangle inequality as much as a consequence of how the distance metric is defined.
First of all you have to understand that d obeys the triangle inequality. Let's prove this by contradiction:
Suppose for any 3 arbitrary strings a,b and c we have d(a,c)>d(a,b)+d(b,c), but in that case we can find d(a,c) with d(a,b)+d(b,c) steps, hence we have a contradiction. This is why d obeys the triangle inequality and d(a,c)<=d(a,b)+d(b,c).
Let's now imagine how search through that tree goes. We have a search function f that takes as input Q -- a query and N -- max distance.
Question: Why does that function need to look at edges that are in the segment [d-n,d+n]?
Let's now introduce a couple of other strings. Let x be a string, such that d(Q,x)<=n, let t be the current node we are examining. Clearly, in the above notation d meant d(Q,t).
So, to reformulate the above question, we can ask:
Why d(Q,t)-n<=d(t,x)<=d(Q,t)+n?
For simplicity, let's denote d(Q,t) as a, d(t,x) as b, and d(Q,x) as c.
It is clear from the triangle inequality that
a+b>=c => b>=c-a
a+c>=b
b+c>=a => b>=a-c
From 1. and 3. we can see that b>=|a-c|. So, to put everything together, we get |a-c|<=b<=a+c.
Now, this is not the end of the proof, we also have something to do with 0<=c<=N.
This can be easily done like this:
a-N<=a-c<=|a-c|<=b<=a+c<=a+N => a-N<=b<=a+N and since b>=0, we have max(a-N,0)<=b<=a+N.

Resources