The intuition behind the GlobalSpanCoefficient - python-3.x

I am looking at the vehicle routing problem which minimizes the cost of "the slowest truck" in a fleet.
So now the objective function should involve two quantities:
the sum of all transitions of all vehicles (total distance), and
the cost of the most expensive route
How are these values combined? I am assuming that the global span coefficient
distance_dimension.SetGlobalSpanCostCoefficient(100)
is involved? Is that the coefficient of a weighted sum
cost = w*A + (100-w)*B
where A is the cost of the slowest truck and B is the total distance of all trucks?

No it's simply: cost = B + A
with B = sum of all edge cost in the routes (usually set by using routing.SetArcCostEvaluatorOfAllVehicles(arc_cost_callback))
and A = w * (max{end} - min{start})
note: B is needed to help solver to find a first good solution (otherwise strategy like CHEAPEST_PATH behave strangely since there is no cost on edge to choose the cheapest...), While A helps to "distribute" jobs by minimizing the Max cumul Var. but it's still not a real dispersion incentive
e.g. supposing a dimension with cumul_start = 0 and 4 routes with cost 0,0,6,6 it is as good as 2,2,2,6 (or 6,6,6,6 but B is higher here).
i.e. max(cumul_end) == 6 in both cases.
I added a section on GlobalSpan here in the doc.
ps: take a look at https://github.com/google/or-tools/issues/685#issuecomment-388503464
pps: in the doc example try to change maximum_distance = 3000 by 1800 or 3500 if I remember well ;)
ppps: Note than you can have several GlobalSpan on several dimensions and objective is just the sum of all this costs multiply by their respective coefficient...

Related

How do I get the Gross Value Added (GVA) of countries/industries from MRIO models?

I want to calculate the Gross Value Added for different countries and industries using multi-regional input-output (MRIO) tables. However, I struggle to find a good explanation of how this is done based on the data available. The definition of the GVA (Gross Value Added) is the output of a country/industry less the intermediate consumption, and it is related to the GDP by:
GVA = GDP + subsidies - taxes
So far, I have used the "extensions" or "satellite accounts" that provide the Value Added (VA) disaggregated across different flows, i.e. example from Exiobase in the picture. The VA is the sum of all 12 to my understanding. However, based on the definition of the GVA, I have subtracted 1-3 since these are taxes (so GVA = sum of line 4-12). To me, this seems like the correct approach, but I have not succeeded in finding an explanation that could confirm/disprove. I also become uncertain due to the naming of the extension, i.e. "value added" sounding like "gross value added". Does anyone know the correct way of doing this?
Finally, in MRIO x is termed "gross output" being the total output to final demand + intermediate consumption:
x = Ax + y (Ax = intermediate, y = demand)
or
x = (I-A)^-1 * y = L*y (L = Leontif inverse/requirement matrix)
Does this mean that I can also derive the GVAs from x by subtracting the intermediate consumption? In my mind, this will just leave me with "y", but there might be a another smart way?
Thanks in advance!
From what I understand, yes you can !
You have to differentiate Z = Ax summed along its rows or along its columns
x - rowSum(Z) is the GVA.
x - colSum(Z) is the total final demand.
Regarding Exiobase, I don't have a real answer.
I found that summing all lines of the VA (keeping lines 1-3), I get "quasi" the same results as subtracting the row sum of Z to x.
Which is stange...

Finding the optimal selections of x number per column and y numbers per row of an NxM array

Given an NxM array of positive integers, how would one go about selecting integers so that the maximum sum of values is achieved where there is a maximum of x selections in each row and y selections in each column. This is an abstraction of a problem I am trying to face in making NCAA swimming lineups. Each swimmer has a time in every event that can be converted to an integer using the USA Swimming Power Points Calculator the higher the better. Once you convert those times, I want to assign no more than 3 swimmers per event, and no more than 3 races per swimmer such that the total sum of power scores is maximized. I think this is similar to the Weapon-targeting assignment problem but that problem allows a weapon type to attack the same target more than once (in my case allowing a single swimmer to race the same event twice) and that does not work for my use case. Does anybody know what this variation on the wta problem is called, and if so do you know of any solutions or resources I could look to?
Here is a mathematical model:
Data
Let a[i,j] be the data matrix
and
x: max number of selected cells in each row
y: max number of selected cells in each column
(Note: this is a bit unusual: we normally reserve the names x and y for variables. These conventions can help with readability).
Variables
δ[i,j] ∈ {0,1} are binary variables indicating if cell (i,j) is selected.
Optimization Model
max sum((i,j), a[i,j]*δ[i,j])
sum(j,δ[i,j]) ≤ x ∀i
sum(i,δ[i,j]) ≤ y ∀j
δ[i,j] ∈ {0,1}
This can be fed into any MIP solver.

Question(s) regarding computational intensity, prediction of time required to produce a result

Introduction
I have written code to give me a set of numbers in '36 by q' format ( 1<= q <= 36), subject to following conditions:
Each row must use numbers from 1 to 36.
No number must repeat itself in a column.
Method
The first row is generated randomly. Each number in the coming row is checked for the above conditions. If a number fails to satisfy one of the given conditions, it doesn't get picked again fot that specific place in that specific row. If it runs out of acceptable values, it starts over again.
Problem
Unlike for low q values (say 15 which takes less than a second to compute), the main objective is q=36. It has been more than 24hrs since it started to run for q=36 on my PC.
Questions
Can I predict the time required by it using the data I have from lower q values? How?
Is there any better algorithm to perform this in less time?
How can I calculate the average number of cycles it requires? (using combinatorics or otherwise).
Can I predict the time required by it using the data I have from lower q values? How?
Usually, you should be able to determine the running time of your algorithm in terms of input. Refer to big O notation.
If I understood your question correctly, you shouldn't spend hours computing a 36x36 matrix satisfying your conditions. Most probably you are stuck in the infinite loop or something. It would be more clear of you could share code snippet.
Is there any better algorithm to perform this in less time?
Well, I tried to do what you described and it works in O(q) (assuming that number of rows is constant).
import random
def rotate(arr):
return arr[-1:] + arr[:-1]
y = set([i for i in range(1, 37)])
n = 36
q = 36
res = []
i = 0
while i < n:
x = []
for j in range(q):
if y:
el = random.choice(list(y))
y.remove(el)
x.append(el)
res.append(x)
for j in range(q-1):
x = rotate(x)
res.append(x)
i += 1
i += 1
Basically, I choose random numbers from the set of {1..36} for the i+q th row, then rotate the row q times and assigned these rotated rows to the next q rows.
This guarantees both conditions you have mentioned.
How can I calculate the average number of cycles it requires?( Using combinatorics or otherwise).
I you cannot calculate the computation time in terms of input (code is too complex), then fitting to curve seems to be right.
Or you could create an ML model with iterations as data and time for each iteration as label and perform linear regression. But that seems to be overkill in your example.
Graph q vs time
Fit a curve,
Extrapolate to q = 36.
You might want to also graph q vs log(time) as that may give an easier fitted curve.

what is the formula of sentiment calculation

what is the actual formula to compute sentiments using sentiment rated lexicon. the lexicon that I am using contains rating between the range -5 to 5. I want to compute sentiment for individual sentences. Either i have to compute average of all sentiment ranked words in sentence or only sum up them.
There are several methods for computing an index from scored sentiment components of sentences. Each is based on comparing positive and negative words, and each has advantages and disadvantages.
For your scale, a measure of the central tendency of the words would be a fair measure, where the denominator is the number of scored words. This is a form of the "relative proportional difference" measure employed below. You would probably not want to divide the total sentiment words' scores by all words, since this makes each sentence's measure strongly affected by non-sentiment terms.
If you do not believe that the 11 point rating you describe is accurate, you could just classify it as positive or negative depending on its sign. Then you could apply the following methods where you have transformed
where each P and N refer to the counts of the Positive and Negative coded sentiment words, and O is the count of all other words (so that the total number of words = P + N + O).
Absolute Proportional Difference. Bounds: [0,1]
Sentiment = (P − N) / (P + N + O)
Disadvantage: A sentence's score is affected by non-sentiment-related content.
Relative Proportional Difference. Bounds: [-1, 1]
Sentiment = (P − N) / (P + N)
Disadvantage: A sentence's score may tend to cluster very strongly near the scale endpoints (because they may contain content primarily or exclusively of either positive or negative).
Logit scale. Bounds: [-infinity, +infinity]
Sentiment = log(P + 0.5) - log(N + 0.5)
This tends to have the smoothest properties and is symmetric around zero. The 0.5 is a smoother to prevent log(0).
For details, please see William Lowe, Kenneth Benoit, Slava Mikhaylov, and Michael Laver. (2011) "Scaling Policy Preferences From Coded Political Texts." Legislative Studies Quarterly 26(1, Feb): 123-155. where we compare their properties for measuring right-left ideology, but everything we discuss also applies to positive-negative sentiment.
you can use R tool for sentiment computation. here is the link you can refer to:
https://sites.google.com/site/miningtwitter/questions/sentiment/analysis

CodeJam 2014: Solution for The Repeater

I participated in code jam, I successfully solved small input of The Repeater Challenge but can't seem to figure out approach for multiple strings.
Can any one give the algorithm used for multiple strings. For 2 strings ( small input ) I am comparing strings character by character and doing operations to make them equal. However this approach would time out for large input.
Can some one explain their algorithm they used. I can see solutions of other users but can't figure out what have they done.
I can tell you my solution which worked fine for both small and large inputs.
First, we have to see if there is a solution, you do that by bringing all strings to their "simplest" form. If any of them does not match, there there is no solution.
e.g.
aaabbbc => abc
abbbbbcc => abc
abbcca => abca
If only the first two were given, then a solution would be possible. As soon as the third is thrown into the mix, then it's impossible. The algorithm to do the "simplification" is to parse the string and eliminate any double character you see. As soon as a string does not equal the simplified form of the batch, bail out.
As for actual solution to the problem, i simply converted the strings to a [letter, repeat] format. So for example
qwerty => 1q,1w,1e,1r,1t,1y
qqqwweeertttyy => 3q,2w,3e,1r,3t,2y
(mind you the outputs are internal structures, not actual strings)
Imagine now you have 100 strings, you have already passed the test that there is a solution and you have all strings into the [letter, repeat] representation. Now go through every letter and find the least 'difference' of repetitions you have to do, to reach the same number. So for example
1a, 1a, 1a => 0 diff
1a, 2a, 2a => 1 diff
1a, 3a, 10a => 9 diff (to bring everything to 3)
the way to do this (i'm pretty sure there is a more efficient way) is to go from the min number to the max number and calculate the sum of all diffs. You are not guaranteed that the number will be one of the numbers in the set. For the last example, you would calculate the diff to bring everything to 1 (0,2,9 =11) then for 2 (1,1,8 =10), the for 3 (2,0,7 =9) and so on up to 10 and choose the min again. Strings are limited to 1000 characters so this is an easy calculation. On my moderate laptop, the results were instant.
Repeat the same for every letter of the strings and sum everything up and that is your solution.
This answer gives an example to explain why finding the median number of repeats produces the lowest cost.
Suppose we have values:
1 20 30 40 100
And we are trying to find the value which has shortest total distance to all these values.
We might guess the best answer is 50, with cost |50-1|+|50-20|+|50-30|+|50-40|+|50-100| = 159.
Split this into two sums, left and right, where left is the cost of all numbers to the left of our target, and right is the cost of all numbers to the right.
left = |50-1|+|50-20|+|50-30|+|50-40| = 50-1+50-20+50-30+50-40 = 109
right = |50-100| = 100-50 = 50
cost = left + right = 159
Now consider changing the value by x. Providing x is small enough such that the same numbers are on the left, then the values will change to:
left(x) = |50+x-1|+|50+x-20|+|50+x-30|+|50+x-40| = 109 + 4x
right(x) = |50+x-100| = 50 - x
cost(x) = left(x)+right(x) = 159+3x
So if we set x=-1 we will decrease our cost by 3, therefore the best answer is not 50.
The amount our cost will change if we move is given by difference between the number to our left (4) and the number to our right (1).
Therefore, as long as these are different we can always decrease our cost by moving towards the median.
Therefore the median gives the lowest cost.
If there are an even number of points, such as 1,100 then all numbers between the two middle points will give identical costs, so any of these values can be chosen.
Since Thanasis already explained the solution, I'm providing here my source code in Ruby. It's really short (only 400B) and following his algorithm exactly.
def solve(strs)
form = strs.first.squeeze
strs.map { |str|
return 'Fegla Won' if form != str.squeeze
str.chars.chunk { |c| c }.map { |arr|
arr.last.size
}
}.transpose.map { |row|
Range.new(*row.minmax).map { |n|
row.map { |r|
(r - n).abs
}.reduce :+
}.min
}.reduce :+
end
gets.to_i.times { |i|
result = solve gets.to_i.times.map { gets.chomp }
puts "Case ##{i+1}: #{result}"
}
It uses a method squeeze on strings, which removes all the duplicate characters. This way, you just compare every squeezed line to the reference (variable form). If there's an inconsistency, you just return that Fegla Won.
Next you use a chunk method on char array, which collects all consecutive characters. This way you can count them easily.

Resources