Question(s) regarding computational intensity, prediction of time required to produce a result

Question(s) regarding computational intensity, prediction of time required to produce a result - python-3.x

Introduction
I have written code to give me a set of numbers in '36 by q' format ( 1<= q <= 36), subject to following conditions:
Each row must use numbers from 1 to 36.
No number must repeat itself in a column.
Method
The first row is generated randomly. Each number in the coming row is checked for the above conditions. If a number fails to satisfy one of the given conditions, it doesn't get picked again fot that specific place in that specific row. If it runs out of acceptable values, it starts over again.
Problem
Unlike for low q values (say 15 which takes less than a second to compute), the main objective is q=36. It has been more than 24hrs since it started to run for q=36 on my PC.
Questions
Can I predict the time required by it using the data I have from lower q values? How?
Is there any better algorithm to perform this in less time?
How can I calculate the average number of cycles it requires? (using combinatorics or otherwise).

Can I predict the time required by it using the data I have from lower q values? How?
Usually, you should be able to determine the running time of your algorithm in terms of input. Refer to big O notation.
If I understood your question correctly, you shouldn't spend hours computing a 36x36 matrix satisfying your conditions. Most probably you are stuck in the infinite loop or something. It would be more clear of you could share code snippet.
Is there any better algorithm to perform this in less time?
Well, I tried to do what you described and it works in O(q) (assuming that number of rows is constant).
import random
def rotate(arr):
return arr[-1:] + arr[:-1]
y = set([i for i in range(1, 37)])
n = 36
q = 36
res = []
i = 0
while i < n:
x = []
for j in range(q):
if y:
el = random.choice(list(y))
y.remove(el)
x.append(el)
res.append(x)
for j in range(q-1):
x = rotate(x)
res.append(x)
i += 1
i += 1
Basically, I choose random numbers from the set of {1..36} for the i+q th row, then rotate the row q times and assigned these rotated rows to the next q rows.
This guarantees both conditions you have mentioned.
How can I calculate the average number of cycles it requires?( Using combinatorics or otherwise).
I you cannot calculate the computation time in terms of input (code is too complex), then fitting to curve seems to be right.
Or you could create an ML model with iterations as data and time for each iteration as label and perform linear regression. But that seems to be overkill in your example.

Graph q vs time
Fit a curve,
Extrapolate to q = 36.
You might want to also graph q vs log(time) as that may give an easier fitted curve.

Related

Finding the optimal selections of x number per column and y numbers per row of an NxM array

Given an NxM array of positive integers, how would one go about selecting integers so that the maximum sum of values is achieved where there is a maximum of x selections in each row and y selections in each column. This is an abstraction of a problem I am trying to face in making NCAA swimming lineups. Each swimmer has a time in every event that can be converted to an integer using the USA Swimming Power Points Calculator the higher the better. Once you convert those times, I want to assign no more than 3 swimmers per event, and no more than 3 races per swimmer such that the total sum of power scores is maximized. I think this is similar to the Weapon-targeting assignment problem but that problem allows a weapon type to attack the same target more than once (in my case allowing a single swimmer to race the same event twice) and that does not work for my use case. Does anybody know what this variation on the wta problem is called, and if so do you know of any solutions or resources I could look to?

Here is a mathematical model:
Data
Let a[i,j] be the data matrix
and
x: max number of selected cells in each row
y: max number of selected cells in each column
(Note: this is a bit unusual: we normally reserve the names x and y for variables. These conventions can help with readability).
Variables
δ[i,j] ∈ {0,1} are binary variables indicating if cell (i,j) is selected.
Optimization Model
max sum((i,j), a[i,j]*δ[i,j])
sum(j,δ[i,j]) ≤ x ∀i
sum(i,δ[i,j]) ≤ y ∀j
δ[i,j] ∈ {0,1}
This can be fed into any MIP solver.

Creating a dynamic array with given probabilities in Excel

I want to create a dynamic array that returns me X values based on given probabilities. For instance:
Imagine this is a gift box and you can open the box N times. What I want is to have N random results. For example, I want to get randomly 5 of these two rarities but based on their chances.
I have this following formula for now:
=index(A2:A3,randarray(5,1,1,rows(A2:A3),1). And this is the output I get:
The problem here is that I have a dynamic array with the 5 results BUT NOT BASED ON THE PROBABILITIES.
How can I add probabilities to the array?

Here is how you could generate a random outcome with defined probabilities for the entries (Google Sheets solution, not sure about Excel):
=ARRAYFORMULA(
VLOOKUP(
RANDARRAY(H1, 1),
{
{0; OFFSET(C2:C,,, COUNTA(C2:C) - 1)},
OFFSET(A2:A,,, COUNTA(C2:C))
},
2
)
)

This whole subject of random selection was treated very thoroughly in Donald Knuth's series of books, The Art of Computer Programming, vol 2, "Semi-Numerical Algorithms". In that book he presents an algorithm for selecting exactly X out of N items in a list using pseudo-random numbers. What you may not have considered is that after you have chosen your first item the probability array has changed to (X-1)/(N-1) if your first outcome was "Normal" or X/(N-1) if your first outcome was "Rare". This means you'll want to keep track of some running totals based on your prior outcomes to ensure your probabilities are dynamically updated with each pick. You can do this with formulas, but I'm not certain how the back-reference will perform inside an array formula. Microsoft's dynamic array documentation indicates that such internal array references are considered "circular" and are prohibited.
In any case, trying to extend this to 3+ outcomes is very problematic. In order to implement that algorithm with 3 choices (X + Y + Z = N picks) you would need to break this up into one random number for an X or not X choice and then a second random number for a Y or not Y choice. This becomes a recursive algorithm, beyond Excel's ability to cope in formulas.

Count Waveform Periods and Calculate Frequency

I have some oscillating time v displacement excel data from an actuator that I need to analyze and my goal is to be able to count the cycles using the amount of times the displacement value crosses 0. Almost like counting the period of a sine wave. The problem I am having is that the frequency of this data changes several times throughout the data set and may not always or ever = 0. I think if I had a way to count the amount of times the displacement value "crossed" zero I could use every 3 of those points to calculate the wave forms cycles and frequency. I have written a lot of simple things in VBA but by no means am I an expert so any help would be appreciated.

I think an easy way to get it done would be to save compare the sign of the previous value to the current value's sign.
For example, value1 = -0.1236 , value2 = 0.5482. So inbetween those two values, you know it must have crossed 0. You can have the program check each value in the data, count up all the times the sign changes and that should be the number you're looking for.
Example code of how to compare the values:
If Current_Value < 0 <> Previous_Value < 0 then Counter = Counter + 1

How to calculate with the Poisson-Distribution in Matlab?

I’ve used Excel in the past but the calculations including the Poisson-Distribution took a while, that’s why I switched to SQL. Soon I’ve recognized that SQL might not be a proper solution to deal with statistical issues. Finally I’ve decided to switch to Matlab but I’m not used to it at all, my problem Is the following:
I’ve imported a .csv-table and have two columns with values, let’s say A and B (110 x 1 double)
These values both are the input values for my Poisson-calculations. Since I wanna calculate for at least the first 20 events, I’ve created a variable z=1:20.
When I now calculated let’s say
New = Poisspdf(z,A),
it says something like non-scalar arguments must match in size.
Z only has 20 records but A and l both have 110 records. So I’ve expanded Z= 1:110 and transposed it:
Znew = Z.
When I now try to execute the actual calculation:
Results = Poisspdf(Znew,A).*Poisspdf(Znew,B)
I always get only a 100x1 Vector but what I want is a matrix that is 20x20 for each record of A and B (based on my actual choice of z=1:20, I only changed to z=1:110 because Matlab told that they need to match in size).
So in this 20x20 Matrix there should always be in each cell the result of a slightly different calculation (Poisspdf(Znew,A).*Poisspdf(Znew,B)).
For example in the first cell (1,1) I want to have the result of
Poisspdf(0,value of A).*Poisspdf(0,value of B),
in cell(1,2): Poisspdf(0,value of A).*Poisspdf(1,value of B),
in cell(2,1): Poisspdf(1,value of A).*Poisspdf(0,value of B),
and so on...assuming that it’s in the Format cell(row, column)
Finally I want to sum up certain parts of each 20x20 matrix and show the result of the summed up parts in new columns.
Is there anybody able to help? Many thanks!
EDIT:
Poisson Matrix in Excel
In Excel there is Poisson-function: POISSON(x, μ, FALSE) = probability density function value f(x) at the value x for the Poisson distribution with mean μ.
In e.g. cell AD313 in the table above there is the following calculation:
=POISSON(0;first value of A;FALSE)*POISSON(0;first value of B;FALSE)
, in cell AD314
=POISSON(1;first value of A;FALSE)*POISSON(0;first value of B;FALSE)
, in cell AE313
=POISSON(0;first value of A;FALSE)*POISSON(1;first value of B;FALSE)
, and so on.

I am not sure if I completely understand your question. I wrote this code that might help you:
clear; clc
% These are the lambdas parameters for the Poisson distribution
lambdaA = 100;
lambdaB = 200;
% Generating Poisson data here
A = poissrnd(lambdaA,110,1);
B = poissrnd(lambdaB,110,1);
% Get the first 20 samples
zA = A(1:20);
zB = B(1:20);
% Perform the calculation
results = repmat(poisspdf(zA,lambdaA),1,20) .* repmat(poisspdf(zB,lambdaB)',20,1);
% Sum
sumFinal = sum(results,2);
Let me know if this is what you were trying to do.

CodeJam 2014: Solution for The Repeater

I participated in code jam, I successfully solved small input of The Repeater Challenge but can't seem to figure out approach for multiple strings.
Can any one give the algorithm used for multiple strings. For 2 strings ( small input ) I am comparing strings character by character and doing operations to make them equal. However this approach would time out for large input.
Can some one explain their algorithm they used. I can see solutions of other users but can't figure out what have they done.

I can tell you my solution which worked fine for both small and large inputs.
First, we have to see if there is a solution, you do that by bringing all strings to their "simplest" form. If any of them does not match, there there is no solution.
e.g.
aaabbbc => abc
abbbbbcc => abc
abbcca => abca
If only the first two were given, then a solution would be possible. As soon as the third is thrown into the mix, then it's impossible. The algorithm to do the "simplification" is to parse the string and eliminate any double character you see. As soon as a string does not equal the simplified form of the batch, bail out.
As for actual solution to the problem, i simply converted the strings to a [letter, repeat] format. So for example
qwerty => 1q,1w,1e,1r,1t,1y
qqqwweeertttyy => 3q,2w,3e,1r,3t,2y
(mind you the outputs are internal structures, not actual strings)
Imagine now you have 100 strings, you have already passed the test that there is a solution and you have all strings into the [letter, repeat] representation. Now go through every letter and find the least 'difference' of repetitions you have to do, to reach the same number. So for example
1a, 1a, 1a => 0 diff
1a, 2a, 2a => 1 diff
1a, 3a, 10a => 9 diff (to bring everything to 3)
the way to do this (i'm pretty sure there is a more efficient way) is to go from the min number to the max number and calculate the sum of all diffs. You are not guaranteed that the number will be one of the numbers in the set. For the last example, you would calculate the diff to bring everything to 1 (0,2,9 =11) then for 2 (1,1,8 =10), the for 3 (2,0,7 =9) and so on up to 10 and choose the min again. Strings are limited to 1000 characters so this is an easy calculation. On my moderate laptop, the results were instant.
Repeat the same for every letter of the strings and sum everything up and that is your solution.

This answer gives an example to explain why finding the median number of repeats produces the lowest cost.
Suppose we have values:
1 20 30 40 100
And we are trying to find the value which has shortest total distance to all these values.
We might guess the best answer is 50, with cost |50-1|+|50-20|+|50-30|+|50-40|+|50-100| = 159.
Split this into two sums, left and right, where left is the cost of all numbers to the left of our target, and right is the cost of all numbers to the right.
left = |50-1|+|50-20|+|50-30|+|50-40| = 50-1+50-20+50-30+50-40 = 109
right = |50-100| = 100-50 = 50
cost = left + right = 159
Now consider changing the value by x. Providing x is small enough such that the same numbers are on the left, then the values will change to:
left(x) = |50+x-1|+|50+x-20|+|50+x-30|+|50+x-40| = 109 + 4x
right(x) = |50+x-100| = 50 - x
cost(x) = left(x)+right(x) = 159+3x
So if we set x=-1 we will decrease our cost by 3, therefore the best answer is not 50.
The amount our cost will change if we move is given by difference between the number to our left (4) and the number to our right (1).
Therefore, as long as these are different we can always decrease our cost by moving towards the median.
Therefore the median gives the lowest cost.
If there are an even number of points, such as 1,100 then all numbers between the two middle points will give identical costs, so any of these values can be chosen.

Since Thanasis already explained the solution, I'm providing here my source code in Ruby. It's really short (only 400B) and following his algorithm exactly.
def solve(strs)
form = strs.first.squeeze
strs.map { |str|
return 'Fegla Won' if form != str.squeeze
str.chars.chunk { |c| c }.map { |arr|
arr.last.size
}
}.transpose.map { |row|
Range.new(*row.minmax).map { |n|
row.map { |r|
(r - n).abs
}.reduce :+
}.min
}.reduce :+
end
gets.to_i.times { |i|
result = solve gets.to_i.times.map { gets.chomp }
puts "Case ##{i+1}: #{result}"
}
It uses a method squeeze on strings, which removes all the duplicate characters. This way, you just compare every squeezed line to the reference (variable form). If there's an inconsistency, you just return that Fegla Won.
Next you use a chunk method on char array, which collects all consecutive characters. This way you can count them easily.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Question(s) regarding computational intensity, prediction of time required to produce a result - python-3.x

Graph q vs time Fit a curve, Extrapolate to q = 36. You might want to also graph q vs log(time) as that may give an easier fitted curve.

Related

Finding the optimal selections of x number per column and y numbers per row of an NxM array

Creating a dynamic array with given probabilities in Excel

Count Waveform Periods and Calculate Frequency

How to calculate with the Poisson-Distribution in Matlab?

CodeJam 2014: Solution for The Repeater

Categories

Resources