Inversions in a binary string - string

How many inversions are there in a binary string of length n ?
For example , n = 3
000->0
001->0
010->1
011->0
100->2
101->1
110->2
111->0
So total inversions are 6

The question looks like a homework, that's why let me omit the details. You can:
Solve the problem as a recurrency (see Толя's answer)
Make up and solve the characteristic equation, get the solution as a close formula with some arbitrary constants (c1, c2, ..., cn); as the matter of fact you'll get just one unknown constant.
Put some known solutions (e.g. f(1) = 0, f(3) = 6) into the formula and find out all the unknown coefficients
The final answer you should get is
f(n) = n*(n-1)*2**(n-3)
where ** means raising into power (2**(n-3) is 2 in n-3 power). In case you don't want to deal with recurrency and the like stuff, you can just prove the formula by induction.

It is easy recurrent function.
Assume that we know answer for n-1.
And after ato all previous sequences we add 0 or 1 as first character.
if we adding 0 as first character that mean that count of inversions will not be changed: hence answer will be same as for n-1.
if we adding 1 as first character that mean count of inversions will be same as before and will be added extra inversion equals to count of 0 into all previous sequences.
Count of zeros ans ones in sequences of length n-1 will be:
(n-1)*2^(n-1)
Half of them is zeros it will give following result
(n-1)*2^(n-2)
It means that we have following formula:
f(1) = 0
f(n) = 2*f(n-1) + (n-1)*2^(n-2)

Related

Find if two strings are anagrams

Faced this question in an interview, which basically stated
Find if the given two strings are anagrams of each other in O(N) time without any extra space
I tried the usual solutions:
Using a character frequency count (O(N) time, O(26) space) (as a variation, iterating 26 times to calculate frequency of each character as well)
Sorting both strings and comparing (O(NlogN) time, constant space)
But the interviewer wanted a "better" approach. At the extreme end of the interview, he hinted at "XOR" for the question. Not sure how that works, since "aa" XOR "bb" should also be zero without being anagrams.
Long story short, are the given constraints possible? If so, what would be the algorithm?
Given word_a and word_b in the same length, I would try the following:
Define a variable counter and initialise the value to 0.
For each letter ii in the alphabet do the following:
2.1. for jj in length(word_a):
2.1.1. if word_a[jj] == ii increase counter by 1: counter += 1
2.1.2. if word_b[jj] == ii decrease the counter by 1: counter -= 1
2.2. if after passing all the characters in the words, counter is different than 0, you have a different number of ii characters in each word and in particular they are not anagrams, break out of the loop and return False
Return True
Explanation
In case the words are anagrams, you have the same number of each of the characters, therefore the use of the histogram makes sense, but histograms require space. Using this method, you run over the n characters of the words exactly 26 times in the case of the English alphabet or any other constant c representing the number of letters in the alphabet. Therefor, the runtime of the process is O(c*n) = O(n) since c is constant and you do not use any other space besides the one variable
I haven't proven to myself that this is infallible yet, but it's a possible solution.
Go through both strings and calculate 3 values: the sum, the accumulated xor, and the count. If all 3 are equal then the strings should be anagrams.

what will be the dp and transitions in this problem

Vasya has a string s of length n consisting only of digits 0 and 1. Also he has an array a of length n.
Vasya performs the following operation until the string becomes empty: choose some consecutive substring of equal characters, erase it from the string and glue together the remaining parts (any of them can be empty). For example, if he erases substring 111 from string 111110 he will get the string 110. Vasya gets ax points for erasing substring of length x.
Vasya wants to maximize his total points, so help him with this!
https://codeforces.com/problemset/problem/1107/E
i was trying to get my head around the editorial,but couldn't understand it... can anyone tell an easy way to do it?
input:
7
1101001
3 4 9 100 1 2 3
output:
109
Explanation
the optimal sequence of erasings is: 1101001 → 111001 → 11101 → 1111 → ∅.
Here, we consider removing prefixes instead of substrings. Why?
We try to remove a consecutive prefix of a particular state which is actually a substring in the main string. So, our DP states will be start index, end index, prefix length.
Let's consider an example str = "1010110". Here, initially start=0, end=7, and prefix=1(the first '1' will be the only prefix now). we iterate over all the indices in the current state except the starting index and check if str[i]==str[start]. Here, for example, str[4]==str[0]. Now we divide the string into "010" with prefix=1(010) && "110" with prefix=2(1010110). These two are now two individual subproblems. So, when there remains a string with length 1, we return aprefix.
Here is my code.

Understanding the maths

I am trying to understand the maths in this code that converts binary to decimal. I was wondering if anyone could break it down so that I can see the working of a conversion. Sorry if this is too newb, but I've been searching for an explanation for hours and can't find one that explains it sufficently.
I know the conversion is decimal*2 + int(digit) but I still can't break it down to understand exaclty how it's converting to decimal
binary = input('enter a number: ')
decimal = 0
for digit in binary:
decimal= decimal*2 + int(digit)
print(decimal)
Here's example with small binary number 10 (which is 2 in decimal number)
binary = 10
for digit in binary:
decimal= decimal*2 + int(digit)
For for loop will take 1 from binary number which is at first place.
digit = 1 for 1st iteration.
It will overwrite the value of decimal which is initially 0.
decimal = 0*2 + 1 = 1
For the 2nd iteration digit= 0.
It will again calculate the value of decimal like below:
decimal = 1*2 + 0 = 2
So your decimal number is 2.
You can refer this for binary to decimal conversion
The for loop and syntax are hiding a larger pattern. First, consider the same base-10 numbers we use in everyday life. One way of representing the number 237 is 200 + 30 + 7. Breaking it down further, we get 2*10^2 + 3*10^1 + 7*10^0 (note that ** is the exponent operator in Python, but ^ is used nearly everywhere else in the world).
There's this pattern of exponents and coefficients with respect to the base 10. The exponents are 2, 1, and 0 for our example, and we can represent fractions with negative exponents. The coefficients 2, 3, and 7 are the same as from the number 237 that we started with.
It winds up being the case that you can do this uniquely for any base. I.e., every real number has a unique representation in base 10, base 2, and any other base you want to work in. In base 2, the exact same pattern emerges, but all the 10s are replaced with 2s. E.g., in binary consider 101. This is the same as 1*2^2 + 0*2^1 + 1*2^0, or just 5 in base-10.
What the algorithm you have does is make that a little more efficient. It's pretty wasteful to compute 2^20, 2^19, 2^18, and so on when you're basically doing the same operations in each of those cases. With our same binary example of 101, they've re-written it as (1 *2+0)*2+1. Notice that if you distribute the second 2 into the parenthesis, you get the same representation we started with.
What if we had a larger binary number, say 11001? Well, the same trick still works. (((1 *2+1 )*2+0)*2+0)*2+1.
With that last example, what is your algorithm doing? It's first computing (1 *2+1 ). On the next loop, it takes that number and multiplies it by 2 and adds the next digit to get ((1 *2+1 )*2+0), and so on. After just two more iterations your entire decimal number has been computed.
Effectively, what this is doing is taking each binary digit and multiplying it by 2^n where n is the place of that digit, and then summing them up. The confusion comes due to this being done almost in reverse, let's step through an example:
binary = "11100"
So first it takes the digit '1' and adds it on to 0 * 2 = 0, so we
have digit = '1'.
Next take the second digit '1' and add it to 1* 2 =
2, digit = '1' + '1'*2.
Same again, with digit = '1' + '1'*2 +
'1'*2^2.
Then the 2 zeros add nothing, but double the result twice,
so finally, digit = '0' + '0'*2 + '1'*2^2 + '1'*2^3 + '1'*2^4 = 28
(I've left quotes around digits to show where they are)
As you can see, the end result in this format is a pretty simple binary to decimal conversion.
I hope this helped you understand a bit :)
I will try to explain the logic :
Consider a binary number 11001010. When looping in Python, the first digit 1 comes in first and so on.
To convert it to decimal, we will multiply it with 2^7 and do this till 0 multiplied by 2^0.
And then we will add(sum) them.
Here we are adding whenever a digit is taken and then will multiply by 2 till the end of loop. For example, 1*(2^7) is performed here as decimal=0(decimal) +1, and then multiplied by 2, 7 times. When the next digit(1) comes in the second iteration, it is added as decimal = 1(decimal) *2 + 1(digit). During the third iteration of the loop, decimal = 3(decimal)*2 + 0(digit)
3*2 = (2+1)*2 = (first_digit) 1*2*2 + (seconds_digit) 1*2.
It continues so on for all the digits.

Binary search - worst/avg case

I'm finding it difficult to understand why/how the worst and average case for searching for a key in an array/list using binary search is O(log(n)).
log(1,000,000) is only 6. log(1,000,000,000) is only 9 - I get that, but I don't understand the explanation. If one did not test it, how do we know that the avg/worst case is actually log(n)?
I hope you guys understand what I'm trying to say. If not, please let me know and I'll try to explain it differently.
Worst case
Every time the binary search code makes a decision, it eliminates half of the remaining elements from consideration. So you're dividing the number of elements by 2 with each decision.
How many times can you divide by 2 before you are down to only a single element? If n is the starting number of elements and x is the number of times you divide by 2, we can write this as:
n / (2 * 2 * 2 * ... * 2) = 1 [the '2' is repeated x times]
or, equivalently,
n / 2^x = 1
or, equivalently,
n = 2^x
So log base 2 of n gives you x, which is the number of decisions being made.
Finally, you might ask, if I used log base 2, why is it also OK to write it as log base 10, as you have done? The base does not matter because the difference is only a constant factor which is "ignored" by Big O notation.
Average case
I see that you also asked about the average case. Consider:
There is only one element in the array that can be found on the first try.
There are only two elements that can be found on the second try. (Because after the first try, we chose either the right half or the left half.)
There are only four elements that can be found on the third try.
You can see the pattern: 1, 2, 4, 8, ... , n/2. To express the same pattern going in the other direction:
Half the elements take the maximum number of decisions to find.
A quarter of the elements take one fewer decision to find.
etc.
Since half of the elements take the maximum amount of time, it doesn't matter how much less time the other elements take. We could assume that all elements take the maximum amount of time, and even if half of them actually take 0 time, our assumption would not be more than double whatever the true average is. We can ignore "double" since it is a constant factor. So the average case is the same as the worst case, as far as Big O notation is concerned.
For binary search, the array should be arranged in ascending or descending order.
In each step, the algorithm compares the search key value with the key value of the middle element of the array.
If the keys match, then a matching element has been found and its index, or position, is returned.
Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element.
Or, if the search key is greater,then the algorithm repeats its action on the sub-array to the right.
If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not found" indication is returned.
So, a binary search is a dichotomic divide and conquer search algorithm. Thereby it takes logarithmic time for performing the search operation as the elements are reduced by half in each of the iteration.
For sorted lists which we can do a binary search, each "decision" made by the binary search compares your key to the middle element, if greater it takes the right half of the list, if less it will take the left half of the list (if it's a match it will return the element at that position) you effectively reduce your list by half for every decision yielding O(logn).
Binary search however, only works for sorted lists. For un-sorted lists you can do a straight search starting with the first element yielding a complexity of O(n).
O(logn) < O(n)
Although it entirely depends on how many searches you'll be doing, your inputs, etc what your best approach would be.
For Binary search the prerequisite is a sorted array as input.
• As the list is sorted:
• Certainly we don't have to check every word in the dictionary to look up a word.
• A basic strategy is to repeatedly halve our search range until we find the value.
• For example, look for 5 in the list of 9 #s below.v = 1 1 3 5 8 10 18 33 42
• We would first start in the middle: 8
• Since 5<8, we know we can look at just the first half: 1 1 3 5
• Looking at the middle # again, narrow down to 3 5
• Then we stop when we're down to one #: 5
How many comparison is needed: 4 =log(base 2)(9-1)=O(log(base2)n)
int binary_search (vector<int> v, int val) {
int from = 0;
int to = v.size()-1;
int mid;
while (from <= to) {
mid = (from+to)/2;
if (val == v[mid])
return mid;
else if (val > v[mid])
from = mid+1;
else
to = mid-1;
}
return -1;
}

binary sequence subsum combinations

Given a sequence a1a2....a_{m+n} with n +1s and m -1s, if for any 1=< i <=m+n, we have
sum(ai) >=0, i.e.,
a1 >= 0
a1+a2>=0
a1+a2+a3>=0
...
a1+a2+...+a_{m+n}>=0
then the number of sequence that meets the requirement is C(m+n,n) - C(m+n,n-1), where the first item is the total number of sequence, and the second term refers to those sub-sum < 0.
I was wondering whether there is a similar formula for the bi-side sequence number :
a1 >= 0
a1+a2>=0
a1+a2+a3>=0
...
a1+a2+...+a_{m+n}>=0
a_{m+n}>=0
a_{m+n-1}+a_{m+n}>=0
...
a1+a2+...+a_{m+n}>=0
I feel like it can be derived similarly with the single-side subsum problem, but the number C(m+n,n) - 2 * C(m+n,n-1) is definitely incorrect. Any ideas ?
A clue: the first case is a number of paths (with +-1 step) from (0,0) to (n+m, n-m) point, where path never falls below zero line. (Like Catalan numbers for parenthesis pairs, but without balance requirement n=2m)
Desired formula is a number of (+-1) paths which never rise above (n-m) line. It is possible to get recursive formulas. I hope that compact formula exists for it.
If we consider lattice path at nxm grid, where horizontal step for +1 and vertical step for -1, then we need a number of paths restricted by parallelogramm with (n-m) base

Resources