Minimizing FSM (Mealy) - state-machine

My exam is tomorrow and I need your help to understand what I'm doing wrong, My book had the following question:
Design a minimal FSM (Mealy) that outputs 1 iff u(t)-v(t)=3n for some natural n, where:
u(t) is the number of 1 digits until now, and v(t) is the number of 0 digits received.
The book shows an answer using only 3 Staes, I didn't understand how is this possible so I went to write my own machine and trying to minimize it to see if I get the same thing.
I declared 5 states:
T - number of zeros is equal to number of ones.
0+ - number of zeros is bigger than number of ones by 1.
1+ - number of ones is bigger than number of zeroes by 1.
0++ - number of zeros is bigger than number of ones by 2.
1++ - number of ones is bigger than number of zeroes by 2.
So my table looks like:
Current State | x=0,z | x=1,z
T 0+,0 1+,0
1+ T,0 1++,0
1++ 1+,0 T,1
0+ 0++,0 T,0
0++ T,1 0+,0
But when I minimize it I get:
P0 = (T,1+,1++,0+,0++)
P1 = (0++)(1++)(T,1+,0+)
p2 = (0++)(1++)(T)(1+)(0+)
which means the states can't be minimized.


How to code rule number 4 from Western Electric rule's for quality control charts

I am new to statistics. I have a problem at hand for which I need to code all the 4 rules of Western Electric rules for Quality control. I have been able to code the first one and second with the help of my peer, could anyone help me out in writing down rule number 4 - "NINE consecutive points fall on the same side of the centerline"
I have plotted rule 1 by getting the data below and above the threshold and then ran the matplotlib plot in single cell.
I am not able to get the data for rule number 4.
Even if no one answered it, it gave me enough motivation to workout a solution.
although I know my code is not optimal, please suggest me if I need any changes further.
for i in range(len(data)):
if arr_data[i] > y_mean:
#now we have index values for both data above and below mean,
#we will now get the sequence of the index to know if there's any run of or greater than length 9
from itertools import groupby
from operator import itemgetter
for k, g in groupby(enumerate(temp_up), lambda ix : ix[0] - ix[1]):
t_up=(list(map(itemgetter(1), g)))
if len(t_up)>=9:#check if the length of the sequence is greater than or equal to 9
#get index to mark red for the data
for i in range(8, len(t_up), 1):
d_up.append(t_up[i])#index number of data points voilating number 4 rule (above mean)
for k, g in groupby(enumerate(temp_down), lambda ix : ix[0] - ix[1]):
t_down=(list(map(itemgetter(1), g)))
if len(t_down)>=9:#check if the length of the sequence is greater than or equal to 9
#get index to mark red for the data
for i in range(8, len(t_down), 1):
d_down.append(t_down[i])#index number of data points voilating number 4 rule (above mean)
data_above_r4 = pd.DataFrame(data.iloc[d_up])
Really late followup, but you could do something like "if minimum of previous 9 points and maximum of previous 9 points are both same side (> or <) mean, failure."
This shows the rolling min, rolling max should work similarly
Generate a "rolling min" and "rolling max" column for the last 9 rows, if anywhere they're both + or both -, flag it as a failure.

Understanding the maths

I am trying to understand the maths in this code that converts binary to decimal. I was wondering if anyone could break it down so that I can see the working of a conversion. Sorry if this is too newb, but I've been searching for an explanation for hours and can't find one that explains it sufficently.
I know the conversion is decimal*2 + int(digit) but I still can't break it down to understand exaclty how it's converting to decimal
binary = input('enter a number: ')
decimal = 0
for digit in binary:
decimal= decimal*2 + int(digit)
Here's example with small binary number 10 (which is 2 in decimal number)
binary = 10
for digit in binary:
decimal= decimal*2 + int(digit)
For for loop will take 1 from binary number which is at first place.
digit = 1 for 1st iteration.
It will overwrite the value of decimal which is initially 0.
decimal = 0*2 + 1 = 1
For the 2nd iteration digit= 0.
It will again calculate the value of decimal like below:
decimal = 1*2 + 0 = 2
So your decimal number is 2.
You can refer this for binary to decimal conversion
The for loop and syntax are hiding a larger pattern. First, consider the same base-10 numbers we use in everyday life. One way of representing the number 237 is 200 + 30 + 7. Breaking it down further, we get 2*10^2 + 3*10^1 + 7*10^0 (note that ** is the exponent operator in Python, but ^ is used nearly everywhere else in the world).
There's this pattern of exponents and coefficients with respect to the base 10. The exponents are 2, 1, and 0 for our example, and we can represent fractions with negative exponents. The coefficients 2, 3, and 7 are the same as from the number 237 that we started with.
It winds up being the case that you can do this uniquely for any base. I.e., every real number has a unique representation in base 10, base 2, and any other base you want to work in. In base 2, the exact same pattern emerges, but all the 10s are replaced with 2s. E.g., in binary consider 101. This is the same as 1*2^2 + 0*2^1 + 1*2^0, or just 5 in base-10.
What the algorithm you have does is make that a little more efficient. It's pretty wasteful to compute 2^20, 2^19, 2^18, and so on when you're basically doing the same operations in each of those cases. With our same binary example of 101, they've re-written it as (1 *2+0)*2+1. Notice that if you distribute the second 2 into the parenthesis, you get the same representation we started with.
What if we had a larger binary number, say 11001? Well, the same trick still works. (((1 *2+1 )*2+0)*2+0)*2+1.
With that last example, what is your algorithm doing? It's first computing (1 *2+1 ). On the next loop, it takes that number and multiplies it by 2 and adds the next digit to get ((1 *2+1 )*2+0), and so on. After just two more iterations your entire decimal number has been computed.
Effectively, what this is doing is taking each binary digit and multiplying it by 2^n where n is the place of that digit, and then summing them up. The confusion comes due to this being done almost in reverse, let's step through an example:
binary = "11100"
So first it takes the digit '1' and adds it on to 0 * 2 = 0, so we
have digit = '1'.
Next take the second digit '1' and add it to 1* 2 =
2, digit = '1' + '1'*2.
Same again, with digit = '1' + '1'*2 +
Then the 2 zeros add nothing, but double the result twice,
so finally, digit = '0' + '0'*2 + '1'*2^2 + '1'*2^3 + '1'*2^4 = 28
(I've left quotes around digits to show where they are)
As you can see, the end result in this format is a pretty simple binary to decimal conversion.
I hope this helped you understand a bit :)
I will try to explain the logic :
Consider a binary number 11001010. When looping in Python, the first digit 1 comes in first and so on.
To convert it to decimal, we will multiply it with 2^7 and do this till 0 multiplied by 2^0.
And then we will add(sum) them.
Here we are adding whenever a digit is taken and then will multiply by 2 till the end of loop. For example, 1*(2^7) is performed here as decimal=0(decimal) +1, and then multiplied by 2, 7 times. When the next digit(1) comes in the second iteration, it is added as decimal = 1(decimal) *2 + 1(digit). During the third iteration of the loop, decimal = 3(decimal)*2 + 0(digit)
3*2 = (2+1)*2 = (first_digit) 1*2*2 + (seconds_digit) 1*2.
It continues so on for all the digits.

Inversions in a binary string

How many inversions are there in a binary string of length n ?
For example , n = 3
So total inversions are 6
The question looks like a homework, that's why let me omit the details. You can:
Solve the problem as a recurrency (see Толя's answer)
Make up and solve the characteristic equation, get the solution as a close formula with some arbitrary constants (c1, c2, ..., cn); as the matter of fact you'll get just one unknown constant.
Put some known solutions (e.g. f(1) = 0, f(3) = 6) into the formula and find out all the unknown coefficients
The final answer you should get is
f(n) = n*(n-1)*2**(n-3)
where ** means raising into power (2**(n-3) is 2 in n-3 power). In case you don't want to deal with recurrency and the like stuff, you can just prove the formula by induction.
It is easy recurrent function.
Assume that we know answer for n-1.
And after ato all previous sequences we add 0 or 1 as first character.
if we adding 0 as first character that mean that count of inversions will not be changed: hence answer will be same as for n-1.
if we adding 1 as first character that mean count of inversions will be same as before and will be added extra inversion equals to count of 0 into all previous sequences.
Count of zeros ans ones in sequences of length n-1 will be:
Half of them is zeros it will give following result
It means that we have following formula:
f(1) = 0
f(n) = 2*f(n-1) + (n-1)*2^(n-2)

CodeJam 2014: Solution for The Repeater

I participated in code jam, I successfully solved small input of The Repeater Challenge but can't seem to figure out approach for multiple strings.
Can any one give the algorithm used for multiple strings. For 2 strings ( small input ) I am comparing strings character by character and doing operations to make them equal. However this approach would time out for large input.
Can some one explain their algorithm they used. I can see solutions of other users but can't figure out what have they done.
I can tell you my solution which worked fine for both small and large inputs.
First, we have to see if there is a solution, you do that by bringing all strings to their "simplest" form. If any of them does not match, there there is no solution.
aaabbbc => abc
abbbbbcc => abc
abbcca => abca
If only the first two were given, then a solution would be possible. As soon as the third is thrown into the mix, then it's impossible. The algorithm to do the "simplification" is to parse the string and eliminate any double character you see. As soon as a string does not equal the simplified form of the batch, bail out.
As for actual solution to the problem, i simply converted the strings to a [letter, repeat] format. So for example
qwerty => 1q,1w,1e,1r,1t,1y
qqqwweeertttyy => 3q,2w,3e,1r,3t,2y
(mind you the outputs are internal structures, not actual strings)
Imagine now you have 100 strings, you have already passed the test that there is a solution and you have all strings into the [letter, repeat] representation. Now go through every letter and find the least 'difference' of repetitions you have to do, to reach the same number. So for example
1a, 1a, 1a => 0 diff
1a, 2a, 2a => 1 diff
1a, 3a, 10a => 9 diff (to bring everything to 3)
the way to do this (i'm pretty sure there is a more efficient way) is to go from the min number to the max number and calculate the sum of all diffs. You are not guaranteed that the number will be one of the numbers in the set. For the last example, you would calculate the diff to bring everything to 1 (0,2,9 =11) then for 2 (1,1,8 =10), the for 3 (2,0,7 =9) and so on up to 10 and choose the min again. Strings are limited to 1000 characters so this is an easy calculation. On my moderate laptop, the results were instant.
Repeat the same for every letter of the strings and sum everything up and that is your solution.
This answer gives an example to explain why finding the median number of repeats produces the lowest cost.
Suppose we have values:
1 20 30 40 100
And we are trying to find the value which has shortest total distance to all these values.
We might guess the best answer is 50, with cost |50-1|+|50-20|+|50-30|+|50-40|+|50-100| = 159.
Split this into two sums, left and right, where left is the cost of all numbers to the left of our target, and right is the cost of all numbers to the right.
left = |50-1|+|50-20|+|50-30|+|50-40| = 50-1+50-20+50-30+50-40 = 109
right = |50-100| = 100-50 = 50
cost = left + right = 159
Now consider changing the value by x. Providing x is small enough such that the same numbers are on the left, then the values will change to:
left(x) = |50+x-1|+|50+x-20|+|50+x-30|+|50+x-40| = 109 + 4x
right(x) = |50+x-100| = 50 - x
cost(x) = left(x)+right(x) = 159+3x
So if we set x=-1 we will decrease our cost by 3, therefore the best answer is not 50.
The amount our cost will change if we move is given by difference between the number to our left (4) and the number to our right (1).
Therefore, as long as these are different we can always decrease our cost by moving towards the median.
Therefore the median gives the lowest cost.
If there are an even number of points, such as 1,100 then all numbers between the two middle points will give identical costs, so any of these values can be chosen.
Since Thanasis already explained the solution, I'm providing here my source code in Ruby. It's really short (only 400B) and following his algorithm exactly.
def solve(strs)
form = strs.first.squeeze { |str|
return 'Fegla Won' if form != str.squeeze
str.chars.chunk { |c| c }.map { |arr|
} { |row|*row.minmax).map { |n| { |r|
(r - n).abs
}.reduce :+
}.reduce :+
gets.to_i.times { |i|
result = solve { gets.chomp }
puts "Case ##{i+1}: #{result}"
It uses a method squeeze on strings, which removes all the duplicate characters. This way, you just compare every squeezed line to the reference (variable form). If there's an inconsistency, you just return that Fegla Won.
Next you use a chunk method on char array, which collects all consecutive characters. This way you can count them easily.

binary sequence subsum combinations

Given a sequence a1a2....a_{m+n} with n +1s and m -1s, if for any 1=< i <=m+n, we have
sum(ai) >=0, i.e.,
a1 >= 0
then the number of sequence that meets the requirement is C(m+n,n) - C(m+n,n-1), where the first item is the total number of sequence, and the second term refers to those sub-sum < 0.
I was wondering whether there is a similar formula for the bi-side sequence number :
a1 >= 0
I feel like it can be derived similarly with the single-side subsum problem, but the number C(m+n,n) - 2 * C(m+n,n-1) is definitely incorrect. Any ideas ?
A clue: the first case is a number of paths (with +-1 step) from (0,0) to (n+m, n-m) point, where path never falls below zero line. (Like Catalan numbers for parenthesis pairs, but without balance requirement n=2m)
Desired formula is a number of (+-1) paths which never rise above (n-m) line. It is possible to get recursive formulas. I hope that compact formula exists for it.
If we consider lattice path at nxm grid, where horizontal step for +1 and vertical step for -1, then we need a number of paths restricted by parallelogramm with (n-m) base
