Find largest set of lines which are parallel but not colinear - geometry

I have a 5 lines below with each line representing a,b,c in ax+by+c = 0
1 0 0
1 2 3
3 4 5
30 40 0
30 40 50
I want to find the largest set of non colinear parallel lines from these lines. The result in the above case will be:-
set of 2 lines
3 4 5
30 40 0
The brute force approach would be to go through all the possible combinations which would be O(n*(n+1)/2) and update the largest possible set after each iteration.
Is there any way to find the set size faster?

A solution would be to transform the coordinates into (angle, distance from origin). Finding the largest set will then be O(nlogn).
Assuming that (a,b) are never both 0, find the distance to origin d using d = c/|(a,b)|. Then, find the angle θ using θ = atan2(b,a). You then have a list of coordinates that looks like this:
[[θ0,d0],
[θ1,d1],
...
]
Sort this list using θ as the key.
Remove all elements that you consider colinear given a threshold. Simply parse the list to check if some pairs of consecutive elements have approximately the same value. Do not forget to test the last element with the first element to account for 360° = 0°. If you encounter a colinear pair, remove one of the element.
Using a minimum and maximum index starting at 0, increase the maximum index until the difference between the first angle (at min index) and the last angle (at max index) pass the angle tolerance that you can accept as being parallel (do not forget to consider that 359.999° is close to 0°). If the size of the set (max index - min index) is bigger than your current best set, note it as the current best. Then, increase the minimum index by one and continue increasing the maximum index until the angle difference test fail again. Continue to do so until the minimum index reaches the end of the list and do not forget to make the maximum index wrap to 0 to consider the cases close to 0° and 360°.
To make it easier to find the elements in the user provided list, you can add the original index to the transformed list, e.g., [[θ0,d0,0],[θ1,d1,1],...].
Some implementation details to consider to avoid making it accidentally O(n^2): Removing an element from an contiguous array is O(n), so instead of removing colinear elements every time you encounter one, note the index in a separate list and recreate the array in a second pass. If you instead use a linked list, the min/max index should be replaced by iterators to avoid the O(n) random access using an index to access an element.

Related

Time complexity of my backtracking to find the optimal solution of the maximum sum non adjacent

I'm trying to do dynamic programming backtracking of maximum sum of non adjacent elements to construct the optimal solution to get the max sum.
Background:
Say if input list is [1,2,3,4,5]
The memoization should be [1,2,4,6,9]
And my maximum sum is 9, right?
My solution:
I find the first occurence of the max sum in memo (as we may not choose the last item) [this is O(N)]
Then I find the previous item chosen by using this formula:
max_sum -= a_list[index]
As in this example, 9 - 5 = 4, which 4 is on index 2, we can say that the previous item chosen is "3" which is also on the index 2 in the input list.
I find the first occurence of 4 which is on index 2 (I find the first occurrence because of the same concept in step 1 as we may have not chosen that item in some cases where there are multiple same amounts together) [Also O(N) but...]
The issue:
The third step of my solution is done in a while loop, let's say the non adjacent constraint is 1, the max amount we have to backtrack when the length of list is 5 is 3 times, approx N//2 times.
But the 3rd step, uses Python's index function to find the first occurence of the previous_sum [which is O(N)] memo.index(that_previous_sum)
So the total time complexity is about O(N//2 * N)
Which is O(N^2) !!!
Am I correct on the time complexity? Or am I wrong? Is there a more efficient way to backtrack the memoization list?
P.S. Sorry for the formatting if I done it wrong, thanks!
Solved:
I looped from behind checking if the item in front is same or not
If it's same, means it's not first occurrence. If not, it's first occurrence.
Tada! No Python's index function to find from the first index! We find it now from the back
So the total time complexity is about O(N//2 * N)
Now O(N//2 + 1), which is O(N).

Binary search - worst/avg case

I'm finding it difficult to understand why/how the worst and average case for searching for a key in an array/list using binary search is O(log(n)).
log(1,000,000) is only 6. log(1,000,000,000) is only 9 - I get that, but I don't understand the explanation. If one did not test it, how do we know that the avg/worst case is actually log(n)?
I hope you guys understand what I'm trying to say. If not, please let me know and I'll try to explain it differently.
Worst case
Every time the binary search code makes a decision, it eliminates half of the remaining elements from consideration. So you're dividing the number of elements by 2 with each decision.
How many times can you divide by 2 before you are down to only a single element? If n is the starting number of elements and x is the number of times you divide by 2, we can write this as:
n / (2 * 2 * 2 * ... * 2) = 1 [the '2' is repeated x times]
or, equivalently,
n / 2^x = 1
or, equivalently,
n = 2^x
So log base 2 of n gives you x, which is the number of decisions being made.
Finally, you might ask, if I used log base 2, why is it also OK to write it as log base 10, as you have done? The base does not matter because the difference is only a constant factor which is "ignored" by Big O notation.
Average case
I see that you also asked about the average case. Consider:
There is only one element in the array that can be found on the first try.
There are only two elements that can be found on the second try. (Because after the first try, we chose either the right half or the left half.)
There are only four elements that can be found on the third try.
You can see the pattern: 1, 2, 4, 8, ... , n/2. To express the same pattern going in the other direction:
Half the elements take the maximum number of decisions to find.
A quarter of the elements take one fewer decision to find.
etc.
Since half of the elements take the maximum amount of time, it doesn't matter how much less time the other elements take. We could assume that all elements take the maximum amount of time, and even if half of them actually take 0 time, our assumption would not be more than double whatever the true average is. We can ignore "double" since it is a constant factor. So the average case is the same as the worst case, as far as Big O notation is concerned.
For binary search, the array should be arranged in ascending or descending order.
In each step, the algorithm compares the search key value with the key value of the middle element of the array.
If the keys match, then a matching element has been found and its index, or position, is returned.
Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element.
Or, if the search key is greater,then the algorithm repeats its action on the sub-array to the right.
If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not found" indication is returned.
So, a binary search is a dichotomic divide and conquer search algorithm. Thereby it takes logarithmic time for performing the search operation as the elements are reduced by half in each of the iteration.
For sorted lists which we can do a binary search, each "decision" made by the binary search compares your key to the middle element, if greater it takes the right half of the list, if less it will take the left half of the list (if it's a match it will return the element at that position) you effectively reduce your list by half for every decision yielding O(logn).
Binary search however, only works for sorted lists. For un-sorted lists you can do a straight search starting with the first element yielding a complexity of O(n).
O(logn) < O(n)
Although it entirely depends on how many searches you'll be doing, your inputs, etc what your best approach would be.
For Binary search the prerequisite is a sorted array as input.
• As the list is sorted:
• Certainly we don't have to check every word in the dictionary to look up a word.
• A basic strategy is to repeatedly halve our search range until we find the value.
• For example, look for 5 in the list of 9 #s below.v = 1 1 3 5 8 10 18 33 42
• We would first start in the middle: 8
• Since 5<8, we know we can look at just the first half: 1 1 3 5
• Looking at the middle # again, narrow down to 3 5
• Then we stop when we're down to one #: 5
How many comparison is needed: 4 =log(base 2)(9-1)=O(log(base2)n)
int binary_search (vector<int> v, int val) {
int from = 0;
int to = v.size()-1;
int mid;
while (from <= to) {
mid = (from+to)/2;
if (val == v[mid])
return mid;
else if (val > v[mid])
from = mid+1;
else
to = mid-1;
}
return -1;
}

dynamic programming - topcoder

I've been trying out the dp tutorials on Topcoder. One of the problems given for practice was MiniPaint . I think I've got the solution partly- find the minimum no. of mispaints for a given no. of strokes, for each row and then compute for the entire picture (again using dp, similar to the knapsack problem). However, I'm not sure how to compute the min. no for each row.
P.S I later found the match editorial, but the code for finding the min. no. of mispaintings for each row seems wrong. Could someone explain exactly what they've done in the code?
The stripScore() function returns the minimum number of mispaintings for each row given the amount of strokes available to paint it. Although I'm not sure if the rowid argument is correct, the idea is that starting at start at a particular row with needed amount of strokes available to use and the colour of the region directly before it.
The key to this algorithm, is that the best score for the area to the right of the kth region, is uniquely determined by the number of strokes needed, and the color used to paint the (k-1)th region.
Intuition
I have been bashing my head with this problem for 3 days straight, not realising that It requires two consecutive uses of dynamic programming logic. My approaches, in contrast to the ones available from topcoder, are bottom up.
To start with, instead of calculating the minimum number of mispaints I can achieve, I will instead calculate the maximum number of cells I can paint with maxStrokes strokes. The result can easily be calculated by subtracting my findings from the total cells of my matrix. But how can I really do that? The initial observation has to be the fact that each row can yield me some painted cells in exchange for a number of strokes. This does not depend on the rest of the rows. That means that, for each row, I can calculate the maximum number of cells I can paint on that specific row, with a certain number of strokes.
Example
Input=['BBWWB','WBWWW'], maxStrokes=3
Let's now look at the first row BBWWB, and denote C to be the Max number of cells i can paint with Q strokes
Q C
0 0 (I cant paint with 0 strokes)
1 3 (BBWWB)
2 4 (BBWWB)
3 5 (BBWWB)
We could easily represent the above results with an array of length 4 that stores for each index (stroke) the maximum number of cells that can be painted, namely [0,3,4,5]
It's easy to see that the second row in the same manner would have an array [0,4,4,5].
The result can now easily be calculated just by these two arrays alone, as what we're looking for is a combination of two choices, one for each calculated array, that will yield me the highest amount of cells I can paint with 3 strokes. What are my choices though? Each item of my array represents the maximum number of cells i can paint with index strokes. So, for the first array a choice would be to paint 4 cells with 2 strokes.
I could then combine that choice with the second array's 1-st item 4, which means I can paint 4 cells with 1 stroke. My final result would be 4+4=8 cells with 2+1=3 strokes, which happens to be the best I can get. The output would then trivially be 2*5-8=2 minimum mispaints. However, we need to find an optimal way to calculate the different combinations of items from each row and what sums they can yield me.
The Process
The first part of my algorithm populates two very important tables. Let us denote with N, M the dimensions of the matrix I'm given. The first table, dp is a N*M*maxStrokes matrix. dp[i][j][k] represents the maximum number of cells I can paint from the 0-th cell up until the j-th cell of the i-th row with k strokes. As for the maxPainted table, that is a N*maxStrokes matrix. maxPainted[i][k] stores the maximum number of cells I can paint in the i-th row with k strokes and is identical to the arrays calculated in the above example. In order to calculate the latter, I need to calculate dp first. The formula is the following:
dp[i][j][k]= MAX (1,dp[i][r][k]+1 (if A[i][j]==A[i][r]) ,dp[i][r][k-1]+1 (if A[i][j]!=A[i][r])), for every 0<=r<j
Which can be translated as: The maximum number of cells I can paint up to the j-th cell of the i-th row with k strokes is the maximum of:
1, because I can just ignore all the previous cells, and paint this cell alone
dp[i][r][k]+1, because when A[i][j]==A[i][r], I can extend that color with no extra strokes
dp[i][r][k-1]+1, because when A[i][j]==A[i][r], I have to use a new stroke to paint A[i][j]
It is now evident, that the dp table needs to be calculated in order to acquire the best possible scenarios for each row, that is the maximum number of cells I can paint with every possible number of strokes available. But how can I utilize the maxPainted table once I have calculated it in order to get to my result?
The second part of my approach uses a variation of the 0-1 Knapsack problem in order to calculate the biggest number of cells I can paint with maxStrokes strokes available. What really made this challenging, is that, in contrast to the classical Knapsack, I am only allowed to pick 1 item out of every row, and then calculate all the possible combinations that do not surpass the required stroke constraint. In order to achieve that, I will firstly create a new array of length N*M +1 , called possSums. Let us denote with possSums[S] the MINIMUM number of strokes needed to reach sum S. My goal is to calculate each row's contribution to this array. Let us demonstrate with our previous example.
So I had a 2*5 input, therefore the possSums array would consist of 10+1 elements, which we set to Infinity, as we re trying to minimize the keystrokes needed to reach said sums.
So, possSums=[0,∞,∞,∞,∞,∞,∞,∞,∞,∞,∞], with the first item being 0 because I can paint 0 cells with 0 strokes. What we re now looking to do is calculate each row's contribution to possSums. That means that for every row of my maxPainted array, each element needs to make a specific sum available, which will simulate it being chosen. As we have previously demostrated, maxPainted[0]=[0,3,4,5]. This row's contribution would have to allow 0,3,4 and 5 as achievable sums in my possSums array with used strokes 0,1,2,3 respectively. possSums would then be transformed to possSums=[0,∞,∞,1,2,3,∞,∞,∞,∞,∞]. The next row was maxPainted[1]=[0,4,4,5], which now has to once again alter the possSums to allow the combinations made possible with the selection of each item. Notice that each alterations needs to be irrelevant to the others in the same row. For example, if we first allow the sum=4 which can happen by picking the 1st item of maxPainted[1], sum=9 cannot be allowed by furtherly picking the 3d item of that same array, essentially meaning that combinations of items in the same row cannot be considered. In order to ensure that no such cases are considered, for each row I create a clone of my possSums array to which I will be making the necessary modifications instead of my original array. After considering all of the items within maxPainted[1], possSums would look like this possSums=[0,∞,∞,1,1,3,∞,2,3,4,6], giving me a maximum number of cells that can be painted with up to 3 strokes on the 8th index (sum=8). Therefore my output would be 2*5-8=2
var minipaint=(A,maxStrokes)=>{
let n=A.length,m=A[0].length
, maxPainted=[...Array(n)].map(d=>[...Array(maxStrokes+1)].map(d=>0))
, dp=[...Array(n)].map(d=>[...Array(m)].map(d=>[...Array(maxStrokes+1)].map(d=>0)))
for (let k = 1; k <=maxStrokes; k++)
for (let i = 0; i <n; i++)
for (let j = 0; j <m; j++) {
dp[i][j][k]=1 //i can always just paint the damn thing alone
//for every previous cell of this row
//consider painting it and then painting my current cell j
for (let p = 0; p <j; p++)
if(A[i][p]===A[i][j]) //if the cells are the same, i dont need to use an extra stroke
dp[i][j][k]=Math.max(dp[i][p][k]+1,dp[i][j][k])
else//however if they are,im using an extra stroke( going from k-1 to k)
dp[i][j][k]=Math.max(dp[i][p][k-1]+1,dp[i][j][k])
maxPainted[i][k]=Math.max(maxPainted[i][k],dp[i][j][k])//store the maximum cells I can paint with k strokes
}
//this is where the knapsack VARIANT happens:
// Essentially I want to maximize the sum of my selection of strokes
// For each row, I can pick maximum of 1 item. Thing is,I have a constraint of my total
// strokes used, so I will create an array of possSums whose index represent the sum I wanna reach, and values represent the MINIMUM strokes needed to reach that very sum.
// so possSums[k]=min Number of strokes needed to reach sum K
let result=0,possSums=[...Array(n*m+1)].map(d=>Infinity)
//basecase, I can paint 0 cells with 0 strokes
possSums[0]=0
for (let i = 0; i < n; i++) {
let curr=maxPainted[i],
temp=[...possSums]// I create a clone of my possSums,
// where for each row, I intend to alter It instead of the original array
// in order to avoid cases where two items from the same row contribute to
// the same sum, which of course is incorrect.
for (let stroke = 0; stroke <=maxStrokes; stroke++) {
let maxCells=curr[stroke]
//so the way this happens is :
for (let sum = 0; sum <=n*m-maxCells; sum++) {
let oldWeight=possSums[sum]//consider if UP until now, the sum was possible
if(oldWeight==Infinity)// if it wasnt possible, i cant extend it with my maxCells
continue;
// <GAME CHANGER THAT ALLOWS 1 PICK PER ROW
let minWeight=temp[sum+maxCells]//now, consider extending it by sum+maxCells
// ALTERING THE TEMP ARRAY INSTEAD SO MY POTENTIAL RESULTS ARE NOT AFFECTED BY THE
// SUMS THAT WERE ALLOWED DURING THE SAME ROW
temp[sum+maxCells]=Math.min(minWeight,oldWeight+stroke)
if(temp[sum+maxCells]<=maxStrokes)
result=Math.max(result,sum+maxCells)
}
}
possSums=temp
}
return n*m-result // returning the total number of cells minus the maximum I can paint with maxStrokes
}

Extend value to arithmetic mean

Might be a quite stupid question and I'm not sure if it belongs here or to math.
My problem:
I have several elements of type X which have a boolean attribute Y.
To calculate the percentage of elements where Y is true, I count all X where Y is true and divide it by the number of elements.
But I don't want to iterate all the time above all elements to update that percentage-value.
My idea was:
If I had 33% for 3 elements, and am adding a fourth one where Y is true:
(0.33 * 3 + 1) / 4 = 0.4975
Obviously that does not work well because of the 0.33.
Is there any way for getting an accurate solution without iteration or saving the number of items where Y is true?
Keep a count of the total number of elements and of the "true" ones. Global vars, object member variables, whatever. I assume that sometime back when the program is starting, you have zero elements. Every time an element is added, removed, or its boolean attribute changes, increment or decrement those counts as appropriate. You'll never have to iterate over the list (except maybe for testing) but at the cost of every change to the list having to include fiddling with those variables.
Your idea doesn't work because 0.33 does not equal 1/3. It's an approximation. If you take the exact value, you get the right answer:
(1/3 * 3 + 1) / 4 = (1 + 1) / 4 = 1/2
My question is, if you can store the value of 33% without iterating, why not just store the values of 1 and 3 and calculate them? That is, just keep a running total of the number of true values and number of objects. Increment when you get new ones. Calculate on demand. It's not necessary to iterate every time is way.

Probability question: Estimating the number of attempts needed to exhaustively try all possible placements in a word search

Would it be reasonable to systematically try all possible placements in a word search?
Grids commonly have dimensions of 15*15 (15 cells wide, 15 cells tall) and contain about 15 words to be placed, each of which can be placed in 8 possible directions. So in general it seems like you can calculate all possible placements by the following:
width*height*8_directions_to_place_word*number of words
So for such a grid it seems like we only need to try 15*15*8*15 = 27,000 which doesn't seem that bad at all. I am expecting some huge number so either the grid size and number of words is really small or there is something fishy with my math.
Formally speaking, assuming that x is number of rows and y is number of columns you should sum all the probabilities of every possible direction for every possible word.
Inputs are: x, y, l (average length of a word), n (total words)
so you have
horizontally a word can start from 0 to x-l and going right or from l to x going left for each row: 2x(x-l)
same approach is used for vertical words: they can go from 0 to y-l going down or from l to y going up. So it's 2y(y-l)
for diagonal words you shoul consider all possible start positions x*y and subtract l^2 since a rect of the field can't be used. As before you multiply by 4 since you have got 4 possible directions: 4*(x*y - l^2).
Then you multiply the whole result for the number of words included:
total = n*(2*x*(x-l)+2*y*(y-l)+4*(x*y-l^2)

Resources