Regular expression over the language C={a,b} - regular-language

Good evening everyone,i'm getting stuck with the following regular expression,
I think there is a much easier approach to the expression than mine,
I had to writing down the regular expression and the dfa that from the alphabet {a,b} accepted all the strings that start and end with b and have an even number of a.
My attempt was going into cases, but the outcome, wasn't to great :
I tried something like this :
b b* (aba)* (aab)* (aa)* (aab)* (aba)* b*b
But i think this is not complete.
Should i follow some general rule to achieve this task? Or i just need to practice regular expressions ?
Thanks,any tip or help would be appreciated.

A DFA seems easier to make here, so we can start there and derive the regular expression from there.
We will need at least one, initial, state. This state cannot be accepting because the empty string does not start and end with b. We'll call this q0.
If we see an a in this state, we are looking at a string that does not start with b, so we cannot accept it no matter what comes next. We can represent this with a new, dead, state. We'll call this q1.
If we see a b in q0, we need a new state to represent the fact that we are well on the way to seeing a string that meets the criteria. Indeed, the string b starts and ends with a b and has an even number of a (zero is even); so this state must be accepting. Call this q2.
If we see an a in q2 then we have an odd number of as and did not last see a b, so we cannot accept the string. However, it's still possible to accept a string from this state by seeing an odd number of as followed by at least one b. Call the state to represent this q3.
If we see a b in q2, we're in the same situation as before (even number of a and last saw a b, so we can accept). Stay in q2.
If in q3 and we see an a, we now have an even number of a again and just need a b. Call this new state q4. If we see a b, we still need an a so we might as well stay in q3.
If in q4 and we see an a, we again need more as and can return to q3. If, on the other hand, we get a b, we can return to q2 since the string is in our language.
The DFA looks like this:
q s q'
-- -- --
q0 a q1 q0: initial state
q0 b q2 q1: dead state, did not begin with b
q1 a q1 q2: accepting state, even #a and start/stop with b
q1 b q2 q3: start with b, odd #a
q2 a q3 q4: start with b, even #a, stop with a
q2 b q2
q3 a q4
q3 b q3
q4 a q3
q4 b q2
To get the regular expression we can find regular expressions leading to each state, iteratively, and then take the union of regular expressions for accepting states. In this case, only q2 is accepting so all we need is a regular expression for that state. We proceed iteratively, substituting at each stage.
round 0
(q0): e
(q1): (q0)a + (q1)(a+b)
(q2): (q0)b + (q2)b + (q4)b
(q3): (q2)a + (q3)b + (q4)a
(q4): (q3)a
round 1
(q0): e
(q1): a + (q1)(a+b) = a(a+b)*
(q2): b + (q2)b + (q4)b = (b+(q4)b)b*
(q3): (q2)a + (q3)b + (q4)a = ((q2)+(q4))ab*
(q4): (q3)a
round 2
(q0): e
(q1): a(a+b)*
(q2): (b+(q3)ab)b*
(q3): ((q2)+(q3)a)ab* = (q2)ab* + (q3)aab* = (q2)ab*(aab*)*
(q4): (q3)a
round3:
(q0): e
(q1): a(a+b)*
(q2): (b+(q3)ab)b*
(q3): (b+(q3)ab)b*ab*(aab*)* = bb*ab*(aab*)*+(q3)abb*ab*(aab*)* = bb*ab*(aab*)*(abb*ab*(aab*)*)*
(q4): (q3)a
round4:
(q0): e
(q1): a(a+b)*
(q2): (b+bb*ab*(aab*)*(abb*ab*(aab*)*)*ab)b*
(q3): bb*ab*(aab*)*(abb*ab*(aab*)*)*
(q4): bb*ab*(aab*)*(abb*ab*(aab*)*)*a
Therefore, a regular expression is this:
r = (b+bb*ab*(aab*)*(abb*ab*(aab*)*)*ab)b*
= bb* + bb*ab*(aab*)*(abb*ab*(aab*)*)*abb*
The bb* part encodes the fact that any string of b is a string in the language.
the other part begins and ends with bb* which encodes the fact that any string must begin and end with b
The outermost as encode the fact that any string in the language with a must have both a first and a last a
The aab* parts allow there to be contiguous pairs of a
The abb*ab* part allow allows there to be non-contiguous pairs of a
As a final note, the rules for replacing expressions as above are the following:
A: r r is an expression
B: As s is an expression
=
A: r
B: rs
A: r + As r, s are expressions
=
A = rs*

Good evening ! check it out
b(aa)*b
this results in generation of strings having starting and end on b
and containing even clumps of a if any i-e
produce a in multiple of 2 i-e even number

Related

How to classify my binary classified data in excel in pair and unpair rows

How to classify my binary classified data in excel in pair and unpair rows
I want my data to be classified one class in a pair row than the other class in an unpair row and so on for all data. Here a sample input and the expected output:
Text
Gender
Sorted
Input
BB
M
BB
M
AA
F
AA
F
CC
F
DD
M
DD
M
CC
F
AB
F
CD
M
BA
F
AB
F
CD
M
DC
M
DC
M
BA
F
where the last two columns are the expected result sorted evenly by M, F. A valid solution could be also starting from F instead of M. The Text column is irrelevant related to the sorting algorithm. It is just required as part of the output to indicate the input sorted.
Just adapting the idea used in the answer (under Microsoft Office 365) provided by me to the question: Is there a way to sort a list so that rows with the same value in one column are evenly distributed?. In cell E2 enter the following formula:
=LET(groupSize, 2, sorted, SORT(HSTACK(A2:B9,XMATCH(B2:B9,{"M","F"})),3), sInput,
FILTER(sorted, {1,1,0}),sGenderNum, INDEX(sorted,,3),
seq0, SEQUENCE(ROWS(sGenderNum),,0), mapResult,
MAP(sGenderNum, seq0, LAMBDA(a,b, IF(b=0, "SAME",
IF(a=INDEX(sGenderNum,b), "SAME", "NEW")))), factor,
SCAN(-1,mapResult, LAMBDA(aa,c,IF(c="SAME", aa+1,0))),
pos,MAP(sGenderNum, factor, LAMBDA(m,n, m + groupSize*n)),
SORTBY(sInput,pos)
)
Simplification 1: There is no need to add a numeric column on the fly representing the gender, but the formula from previous question needs to be changed a little bit. It would be enough to add a SWITCH statement in the last MAP function. If you want to start with F interchange the letters in the SWITCH statement or the associated numbers.
=LET(groupSize, 2, sorted, SORT(A2:B9,2), sGender,INDEX(sorted,,2),
seq0, SEQUENCE(ROWS(sGender),,0),
mapResult,MAP(sGender, seq0, LAMBDA(a,b, IF(b=0, "SAME",
IF(a=INDEX(sGender,b), "SAME", "NEW")))),
factor, SCAN(-1,mapResult, LAMBDA(aa,c,IF(c="SAME", aa+1,0))),
pos,MAP(sGender, factor, LAMBDA(m,n, SWITCH(m, "M",1, "F",2) + groupSize*n)),
SORTBY(sorted,pos)
)
Simplification 2: For this particular case after sorting in ascending order the gender, there is only one change from F to M. So we can remove the first MAP on previous solution, finding where this change happens (changeIdx). So it can be simplified as follow:
=LET(groupSize, 2, sorted, SORT(A2:B9,2), sGender,INDEX(sorted,,2),
changeIdx, XMATCH("M", sGender), seq, SEQUENCE(ROWS(sGender)),
factor, SCAN(-1,seq, LAMBDA(aa,c, IF(c= changeIdx,0, aa+1))),
pos, MAP(sGender, factor, LAMBDA(m,n, SWITCH(m, "M",1, "F",2) + groupSize*n)),
SORTBY(sorted,pos))
The previous approach works only for the binary case (two values for the genders). See Disclaimer note at the end
and here is the output:
Explanation
First Formula
Please check the referred answer to understand the logic. If you want to start first with Female (F) in XMATCH interchange the letters M, F. The mentioned solution requires to have number instead of letters for the column to sort, so I adapted the input adding an additional column on the fly via: HSTACK with the equivalent numbers representing the gender (0,1). The column representing the numbers is the following:
XMATCH(B2:B9,{"M","F"})
Disclaimer: This is a simpler case than the referred question, so maybe there are easier ways to do it. Because this is just a particular case it easier to adapt it than to start from scratch and we can guarantee it works. If I have time I will try to simplify it but so far it is good enough.

Unique numbers with missing digits

I have this problem that I must solve in time that is polynomial in N, K and D given below:
Let N, K be some natural numbers.
Then a, b, c ... are N numbers of exactly K digits each.
a, b, c ... contain only the digits 1 and 2 in some order given by the input.
Although, there have only D digits that are visible, the rest of them being hidden (the hidden digits will be noted with the character "?").
There may be different numbers such that one or more of a, b, c ... are generalizations of the said number:
e.g.
2?122 is a generalization for 21122 and 22122
2?122 is not a generalization for 11111
12??? and ?21?? are both generalizations for 12112
???22 and ???11 cannot be generalizations of the same number
Basically, some number is a generalization of the other if the latter can be one of the "unhidden" versions of the former.
Question:
How many different numbers there are such that at least one of a, b, c or ... is their generalization?***
Quick Reminder:
N = nº of numbers
K = nº of digits in each number
D = nº of visible digits in each number
Conditions & Limitations:
N, K, D are natural numbers
1 ≤ N
1 ≤ D < K
Input / Output snippets for verification of the algorithm:
Input:
N = 3, K = 5, D = 3
112??
?122?
1?2?1
Output:
8
Explanation:
The numbers are 11211, 11212, 11221, 11222, 12211, 12221, 21221, 21222, which are 8 numbers.
11211, 11212, 11221, 11222 are the generalizations of 112??
11221, 11222, 21221, 21222 are the generalizations of ?122?
11211, 11221, 12211, 12221 are the generalizations of 1?2?1
Input:
N = 2, K = 3, D = 1
1??
?2?
Output:
6
Explanation:
The numbers are 111, 112, 121, 122, 221, 222, which are 6 numbers.
From my calculations, I found out that there are 2^(K-D) possible numbers in total that have a as their generalization, 2^(K-D) possible numbers in total that have b as their generalization etc., leaving me with N*2^(K-D) numbers.
My big problem is that I found cases where a number has multiple generalizations and therefore it repeats inside N*2^(K-D), so the real nº of different numbers will be, in this case, something smaller than N*2^(K-D).
I don't know how to find only the different numbers and I need your help.
Thank you very much!
EDIT: Answer to «n. 1.8e9-where's-my-share m.»'s question from the comments:
If you have two generalisations, can you find out how many numbers they both generalise?
For two given general numbers a and b (general meaning that they both contain "?"), it is possible to find nº of numbers generalised by both a and b in polynomial time by using the following logic:
1 - we declare some variable q = 1
2 - we start "scanning" the digits of the two numbers simultaneously from left to right:
2.1 - if we find two unhidden digits and they are different, then no numbers are generalized by both a and b and we return 0
2.2 - if we find two hidden digits, then we multiply q by 2, since of the two general numbers result to both generalize some number, that number can have 1 or 2 in place of "?", therefore for each "?" we double the numbers that can be generalized from both a and b as long as step 2.1 is never true.
3 - if scanned all the digits and step 2.1 was never true, then we return 2^q
Therefore, the nº of numbers both a and b generalize is 0 or 2^q, according to the cases presented above.
Unfortunately, this is impossible to do in polynomial time (unless P=NP, and maybe not even then.) Your problem is equivalent to the problem of counting satisfying assignments to a formula in Disjunctive Normal Form, called the DNF counting problem. DNF counting is Sharp-P-hard, so a polynomial time solution could be used to solve all problems in NP in polynomial time too (and more).
To see the relationship, note that each pattern is equivalent to an AND of several literals. If you take '1' in a position to be a literal, and '2' in that position to be that literal negated, you can convert it to a disjunctive clause.
For example:
1 1 2 ? ?
becomes
(x_1 ∧ x_2 ∧ ¬x_3)
? 1 2 2 ?
becomes
(x_2 ∧ ¬x_3 ∧ ¬x_4)
1 ? 2 ? 1
becomes
(x_1 ∧ ¬x_3 ∧ x_5)
The question of how many numbers satisfy at least one of these patterns is exactly the question of how many assignments satisfy at least one of the equivalent clauses.

How to produce a table of three inputs to reach a given output? (Excel model)

I have a very detailed excel model to calculate the profitability of a project, that we can call P.
The model has been simplified to compute from 3 unrelated variables. I would like to automatically create a table that shows how inputs A, B and C might vary in order to produce a pre-defined level of profitability, P. For instance, if A = 4 & B = 30, then C must = 2 in order for P to equal 20%. Likewise, if A = 5 & B = 25, then C must = 3 in order for P to equal 20%. A and B should be tested at sensible increments, perhaps 8 intervals each.
A laborious (not scalable) equivalent would be to manually define A and B, then goal-seek C to our pre-defined level of P - we'd then repeat for each combination of A and B at the given intervals and record in a two-way table.
I believe a conventional two-way data table would be pratical if the model sitting behind the inputs were greatly simplified, unfortunately this isn't possible.
Thanks to anyone that can lend a hand. Kind regards.
I think the best way to approach this will be with a VBA macro and the prebuilt GoalSeek Function something like this (p is in cell D1) :
Range(”D1”).GoalSeek Goal:=20 _
ChangingCell:=Range(“C1”)

Difference between three cells

I have three cells A,B and C (A=80 B=79.1 C=79.1).
I require cell D to display the 0.9 difference that occurs and if the case is none match
`If A=B=C = 0
If A=B<>C = difference between A and C
If A=C<>B = difference between A and B
`
OK. I agree very similar to your other question, but rather than letter references I have taken the differences and wrapped them in ABS as I can't see any sign convention making much sense:
=IF(AND(A1=B1,A1=C1),"match",IF(A1=B1,ABS(A1-C1),IF(A1=C1,ABS(B1-A1),IF(B1=C1,ABS(A1-B1),"none"))))

String matching on two columns in [R]

I am looking to match multiple string criteria and then subset the row in R, using grepl to find the match. I have found a nice solution from another post where some specific code is used (but you get the idea): subset(GEMA_EO5, grepl(paste(l, collapse="|"),GEMA_EO5$RefSeq_ID))
I am wondering if it is possible to grepl in two columns, instead of just RefSeq_ID in the example above. That is, in grepl via any other method. In other words, I would like to look for the options in l not just in one column, but in two (or however many). Is this possible?
eg.: 3 columns, a b and c. I would like to criteria such that T (rows 3 and 4) is selected, despite the format "T I" in (3,b). it should identify both (4,a) and (3,b), hence the link to the previous question. I want it to look in column a AND column b, not one or the other.
a b c
A A C P L
V V B W E E
W T I P J G
T W P J
Here's some demo data to show how this works:
set.seed(1234)
dat <- data.frame(A = sample(letters[1:3],10,TRUE),
B = sample(letters[1:3],10,TRUE))
Using [ to subset makes this a lot more clear in my opinion - we can use grepl to give a logical vector based on a match, and use | to combine two tests (on multiple columns). If you wanted a subset of all the rows that contained an 'a' in either column:
dat.a <- dat[with(dat, grepl("a", A)|grepl("a", B)),]
A B
1 b a
2 b a
3 a c
5 a a
9 a a

Resources