binary sequence subsum combinations - string

Given a sequence a1a2....a_{m+n} with n +1s and m -1s, if for any 1=< i <=m+n, we have
sum(ai) >=0, i.e.,
a1 >= 0
a1+a2>=0
a1+a2+a3>=0
...
a1+a2+...+a_{m+n}>=0
then the number of sequence that meets the requirement is C(m+n,n) - C(m+n,n-1), where the first item is the total number of sequence, and the second term refers to those sub-sum < 0.
I was wondering whether there is a similar formula for the bi-side sequence number :
a1 >= 0
a1+a2>=0
a1+a2+a3>=0
...
a1+a2+...+a_{m+n}>=0
a_{m+n}>=0
a_{m+n-1}+a_{m+n}>=0
...
a1+a2+...+a_{m+n}>=0
I feel like it can be derived similarly with the single-side subsum problem, but the number C(m+n,n) - 2 * C(m+n,n-1) is definitely incorrect. Any ideas ?

A clue: the first case is a number of paths (with +-1 step) from (0,0) to (n+m, n-m) point, where path never falls below zero line. (Like Catalan numbers for parenthesis pairs, but without balance requirement n=2m)
Desired formula is a number of (+-1) paths which never rise above (n-m) line. It is possible to get recursive formulas. I hope that compact formula exists for it.
If we consider lattice path at nxm grid, where horizontal step for +1 and vertical step for -1, then we need a number of paths restricted by parallelogramm with (n-m) base

Related

What's the logic behind PERCENTILE.INC Excel function?

I would like to know how does Excel think to calculate the values on the function PERCENTILE.INC. I'm making some studies on Percentile and Quartile, I got the below results:
How does Excel think to calculate the values on column F?
Here's the formulas I'm using:
=PERCENTILE.INC(B2:B21; 0,75) ==> F2
=PERCENTILE.INC(B2:B21; 0,50) ==> F3
=PERCENTILE.INC(B2:B21; 0,25) ==> F4
=PERCENTILE.INC(B2:B21; 0,00) ==> F5
Short answer - the position of a given percentile when the data is sorted in ascending order, using percentile.inc, is given by
(N-1)p+1
where p is the required percentile as a fraction from 0 to 1 and N is the number of points.
If this expression gives a whole number, you take the value at this position (e.g. percentile zero gives 1, so its value is exactly 22). If it's not a whole number, you interpolate between the value at the position given by the whole number part (e.g. for p=0.25 it's 5 and the value at this position is 52) and the value at the position one higher (in this case position 6 so the number is 55), then multiply the difference of the two values (3) by the fraction part (0.75) giving you 2.25 and finally add this to the lower of the two values giving you 54.25. A shorter way of saying this is that you go a quarter of the way between the two nearest values. So you have:
If you wished to show the logic as an Excel formula, you could implement the expression shown here on the right (where h, in the second column of the table, is the position calculated from the formula above and x is the value at that position)
like this:
=LET(p,J3,
range,I$2:I$21,
N,COUNT(range),
position,(N-1)*p+1,
lower,FLOOR(position,1),
fraction,MOD(position,1),
upper,CEILING(position,1),
lowerValue,INDEX(range,lower),
upperValue,INDEX(range,upper),
difference,upperValue-lowerValue,
lowerValue+fraction*difference)

Excel: How to find closest number in table, many times

Excel
Need to find nearest float in a table, for each integer 0..99
https://www.excel-easy.com/examples/closest-match.html explains a great technique for finding the CLOSEST number from an array to a constant cell.
I need to perform this for many values (specifically, find nearest to a vertical list of integers 0..99 from within a list of floats).
Array formulas don't allow the compare-to value (integers) to change as we move down the list of integers, it treats it like a constant location.
I tried Tables, referring to the integers (works) but the formula from the above web site requires an Array operation (F2, control shift Enter), which are not permitted in Tables. Correction: You can enter the formula, control-enter the array function for one cell, copy the formulas, then insert table. Don't change the search cell reference!
Update:
I can still use array operations, but I manually have to copy the desired function into each 100 target cells. No biggie.
Fixed typo in formula. See end of question for details about "perfection".
Example code:
AI4=some integer
AJ4=MATCH(MIN(ABS(Table[float_column]-AI4)), ABS(Table[float_column]-AI4), 0)
repeat for subsequent integers in AI5...AI103
Example data:
0.1 <= matches 0
0.5
0.95 <= matches 1
1.51 <= matches 2
2.89
Consider the case where target=5, and 4.5, 5.5 exist in the list. One gives -0.5 and the other +0.5. Searching for ABS(-.5) will give the first one. Either one is decent, unless your data is non-monotonic.
This still needs a better solution.
Thanks in advance!
I had another problem, which pushed to a better solution.
Specifically, since the Y values for the X that I am interested in can be at varying distances in X, I will interpolate X between the X point before and after. Ie search for less than or equal, also greater than or equal, interpolate the desired X, then interpolate the Y values.
I could go a step further and interpolate N - 1 to N + 1, which will give cleaner results for noisy data.

Inversions in a binary string

How many inversions are there in a binary string of length n ?
For example , n = 3
000->0
001->0
010->1
011->0
100->2
101->1
110->2
111->0
So total inversions are 6
The question looks like a homework, that's why let me omit the details. You can:
Solve the problem as a recurrency (see Толя's answer)
Make up and solve the characteristic equation, get the solution as a close formula with some arbitrary constants (c1, c2, ..., cn); as the matter of fact you'll get just one unknown constant.
Put some known solutions (e.g. f(1) = 0, f(3) = 6) into the formula and find out all the unknown coefficients
The final answer you should get is
f(n) = n*(n-1)*2**(n-3)
where ** means raising into power (2**(n-3) is 2 in n-3 power). In case you don't want to deal with recurrency and the like stuff, you can just prove the formula by induction.
It is easy recurrent function.
Assume that we know answer for n-1.
And after ato all previous sequences we add 0 or 1 as first character.
if we adding 0 as first character that mean that count of inversions will not be changed: hence answer will be same as for n-1.
if we adding 1 as first character that mean count of inversions will be same as before and will be added extra inversion equals to count of 0 into all previous sequences.
Count of zeros ans ones in sequences of length n-1 will be:
(n-1)*2^(n-1)
Half of them is zeros it will give following result
(n-1)*2^(n-2)
It means that we have following formula:
f(1) = 0
f(n) = 2*f(n-1) + (n-1)*2^(n-2)

Binary search - worst/avg case

I'm finding it difficult to understand why/how the worst and average case for searching for a key in an array/list using binary search is O(log(n)).
log(1,000,000) is only 6. log(1,000,000,000) is only 9 - I get that, but I don't understand the explanation. If one did not test it, how do we know that the avg/worst case is actually log(n)?
I hope you guys understand what I'm trying to say. If not, please let me know and I'll try to explain it differently.
Worst case
Every time the binary search code makes a decision, it eliminates half of the remaining elements from consideration. So you're dividing the number of elements by 2 with each decision.
How many times can you divide by 2 before you are down to only a single element? If n is the starting number of elements and x is the number of times you divide by 2, we can write this as:
n / (2 * 2 * 2 * ... * 2) = 1 [the '2' is repeated x times]
or, equivalently,
n / 2^x = 1
or, equivalently,
n = 2^x
So log base 2 of n gives you x, which is the number of decisions being made.
Finally, you might ask, if I used log base 2, why is it also OK to write it as log base 10, as you have done? The base does not matter because the difference is only a constant factor which is "ignored" by Big O notation.
Average case
I see that you also asked about the average case. Consider:
There is only one element in the array that can be found on the first try.
There are only two elements that can be found on the second try. (Because after the first try, we chose either the right half or the left half.)
There are only four elements that can be found on the third try.
You can see the pattern: 1, 2, 4, 8, ... , n/2. To express the same pattern going in the other direction:
Half the elements take the maximum number of decisions to find.
A quarter of the elements take one fewer decision to find.
etc.
Since half of the elements take the maximum amount of time, it doesn't matter how much less time the other elements take. We could assume that all elements take the maximum amount of time, and even if half of them actually take 0 time, our assumption would not be more than double whatever the true average is. We can ignore "double" since it is a constant factor. So the average case is the same as the worst case, as far as Big O notation is concerned.
For binary search, the array should be arranged in ascending or descending order.
In each step, the algorithm compares the search key value with the key value of the middle element of the array.
If the keys match, then a matching element has been found and its index, or position, is returned.
Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element.
Or, if the search key is greater,then the algorithm repeats its action on the sub-array to the right.
If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not found" indication is returned.
So, a binary search is a dichotomic divide and conquer search algorithm. Thereby it takes logarithmic time for performing the search operation as the elements are reduced by half in each of the iteration.
For sorted lists which we can do a binary search, each "decision" made by the binary search compares your key to the middle element, if greater it takes the right half of the list, if less it will take the left half of the list (if it's a match it will return the element at that position) you effectively reduce your list by half for every decision yielding O(logn).
Binary search however, only works for sorted lists. For un-sorted lists you can do a straight search starting with the first element yielding a complexity of O(n).
O(logn) < O(n)
Although it entirely depends on how many searches you'll be doing, your inputs, etc what your best approach would be.
For Binary search the prerequisite is a sorted array as input.
• As the list is sorted:
• Certainly we don't have to check every word in the dictionary to look up a word.
• A basic strategy is to repeatedly halve our search range until we find the value.
• For example, look for 5 in the list of 9 #s below.v = 1 1 3 5 8 10 18 33 42
• We would first start in the middle: 8
• Since 5<8, we know we can look at just the first half: 1 1 3 5
• Looking at the middle # again, narrow down to 3 5
• Then we stop when we're down to one #: 5
How many comparison is needed: 4 =log(base 2)(9-1)=O(log(base2)n)
int binary_search (vector<int> v, int val) {
int from = 0;
int to = v.size()-1;
int mid;
while (from <= to) {
mid = (from+to)/2;
if (val == v[mid])
return mid;
else if (val > v[mid])
from = mid+1;
else
to = mid-1;
}
return -1;
}

Probability question: Estimating the number of attempts needed to exhaustively try all possible placements in a word search

Would it be reasonable to systematically try all possible placements in a word search?
Grids commonly have dimensions of 15*15 (15 cells wide, 15 cells tall) and contain about 15 words to be placed, each of which can be placed in 8 possible directions. So in general it seems like you can calculate all possible placements by the following:
width*height*8_directions_to_place_word*number of words
So for such a grid it seems like we only need to try 15*15*8*15 = 27,000 which doesn't seem that bad at all. I am expecting some huge number so either the grid size and number of words is really small or there is something fishy with my math.
Formally speaking, assuming that x is number of rows and y is number of columns you should sum all the probabilities of every possible direction for every possible word.
Inputs are: x, y, l (average length of a word), n (total words)
so you have
horizontally a word can start from 0 to x-l and going right or from l to x going left for each row: 2x(x-l)
same approach is used for vertical words: they can go from 0 to y-l going down or from l to y going up. So it's 2y(y-l)
for diagonal words you shoul consider all possible start positions x*y and subtract l^2 since a rect of the field can't be used. As before you multiply by 4 since you have got 4 possible directions: 4*(x*y - l^2).
Then you multiply the whole result for the number of words included:
total = n*(2*x*(x-l)+2*y*(y-l)+4*(x*y-l^2)

Resources