Regular expressions and automata - regular-language

I'm studying Regular Expressions by reading Aho's book. I don't understand two of the statements in the book:
Question A:
1(0+1)*1 + 1 : denotes the set of all strings beginning and ending with a 1.
My question why is +1 added at end of the regular expression? Shouldn't 1(0+1)*1 be sufficient?
I'm also having trouble with the following:
Question B:
The set of strings containing only 0's and 1's that have atmost one 1 as below
0*+0*10*
Can you explain how the solution 0*+0*10* is arrived at, step by step?

As to question a: 1(0+1)*1 does not match the one-character string 1, which begins and ends with 1. One needs a special case for it, which the example does.
As to question b: I cannot speak for the author. However... Any string that contains at most one 1 is a string that either has no 1s or has exactly one 1. Assuming that the alphabet is {0,1}, the former means any string that contains zero or more 0s, that is, 0*. The latter, with the same assumption, means any string that contains zero or more 0s followed by one 1 followed by zero or mpre 0s, that is, 0*10*. Combining these yields the example.

For Question a: 1(0+1)*1 denotes set of all strings beginning and ending with one but does not contain string 1 which has length one and starts and ends with one.
For Question b:
Set of strings containing atmost one 1 = A + B where
A is set of all strings containing zero 1s and
B is the set of all strings containing exactly one 1
So A is 0* and B is 0*10*
Hence we get the answer as 0* + 0*10*

For the first example, the string that is accepted by the + 1 but not by the rest is 1. The rest of the expressions can handle 11, but not a string where the first and last character are the same.
It's similar reasoning for the second string - 0* handles strings of all zeroes, 0*10* handles strings of 1 one.

Related

Efficient way to check if string A is contained in string B with at most k errors

Given a string A and a string B (A shorter or the same length as B), I would like to check whether B contains a substring A' such that the Hamming distance between A and A' is at most k.
Does anyone know of an efficient algorithm to do this? Obviously I can just run a sliding window, but this is not feasible for the amount of data I'm working with. The Knuth-Morris-Pratt algorithm (https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm) would work when k=0, but I don't know whether it's modifiable to account for k>0.
Thanks!
Edit: I apparently forgot to clarify, I am looking for a consecutive substring, so for example the substring from position 3 to position 7, without skipping characters. So levenshtein distance is not applicable.
This is what you are looking for : https://en.wikipedia.org/wiki/Levenshtein_distance
If you use the Levenshtein distance and k=1, then you can use the fact that if the length of A is 2n+1 or 2n+2, then either the first or the last n characters of A must be in B.
So you can use strstr to find all places in B where the first or last n characters match exactly and then check the Levenshtein distance.
Special case A = 1 characters: matches everywhere with one error. Special case where A = 2 characters ab: call strchr(a), if it fails call strchr(b).

what will be the dp and transitions in this problem

Vasya has a string s of length n consisting only of digits 0 and 1. Also he has an array a of length n.
Vasya performs the following operation until the string becomes empty: choose some consecutive substring of equal characters, erase it from the string and glue together the remaining parts (any of them can be empty). For example, if he erases substring 111 from string 111110 he will get the string 110. Vasya gets ax points for erasing substring of length x.
Vasya wants to maximize his total points, so help him with this!
https://codeforces.com/problemset/problem/1107/E
i was trying to get my head around the editorial,but couldn't understand it... can anyone tell an easy way to do it?
input:
7
1101001
3 4 9 100 1 2 3
output:
109
Explanation
the optimal sequence of erasings is: 1101001 → 111001 → 11101 → 1111 → ∅.
Here, we consider removing prefixes instead of substrings. Why?
We try to remove a consecutive prefix of a particular state which is actually a substring in the main string. So, our DP states will be start index, end index, prefix length.
Let's consider an example str = "1010110". Here, initially start=0, end=7, and prefix=1(the first '1' will be the only prefix now). we iterate over all the indices in the current state except the starting index and check if str[i]==str[start]. Here, for example, str[4]==str[0]. Now we divide the string into "010" with prefix=1(010) && "110" with prefix=2(1010110). These two are now two individual subproblems. So, when there remains a string with length 1, we return aprefix.
Here is my code.

Why is count() method giving strange answer?

s = "ANANAS"
print(s.count("ANA"))
print(s.count("AN"))
print(s.count("A"))
"ANA" occurs two times in "ANANAS" but python prints 1 whereas
"AN" occurs two times and python prints 2. "A" occurs three times and python prints 3 as output. Why is this strange behaviour?
Straight from the documentation:
str.count(sub[, start[, end]])
Return the number of non-overlapping
occurrences of substring sub in the range [start, end]. Optional
arguments start and end are interpreted as in slice notation.
The two occurences of "ANA" in "ANANAS" are overlapping, hence s.count("ANA") only returns 1.
This is because in your sub string ANA will be only counted twice if it's something like "testANAANAAN " I.e two full occurrences of ANA .
As, in your case if it already checked first full substring it will not use that string part again from full string and will look for matching substring in rest of string.

Deterministic automata to find number of subsequence in string of another string

Deterministic automata to find number of subsequences in string ?
How can I construct a DFA to find number of occurence string as a subsequence in another string?
eg. In "ssstttrrriiinnngggg" we have 3 subsequences which form string "string" ?
also both string to be found and to be searched only contain characters from specific character Set .
I have some idea about storing characters in stack poping them accordingly till we match , if dont match push again .
Please tell DFA solution ?
OVERLAPPING MATCHES
If you wish to count the number of overlapping sequences then you simply construct a DFA that matches the string, e.g.
1 -(if see s)-> 2 -(if see t)-> 3 -(if see r)-> 4 -(if see i)-> 5 -(if see n)-> 6 -(if see g)-> 7
and then compute the number of ways of being in each state after seeing each character using dynamic programming. See the answers to this question for more details.
DP[a][b] = number of ways of being in state b after seeing the first a characters
= DP[a-1][b] + DP[a-1][b-1] if character at position a is the one needed to take state b-1 to b
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=1.
Then the total number of overlapping strings is DP[len(string)][7]
NON-OVERLAPPING MATCHES
If you are counting the number of non-overlapping sequences, then if we assume that the characters in the pattern to be matched are distinct, we can use a slight modification:
DP[a][b] = number of strings being in state b after seeing the first a characters
= DP[a-1][b] + 1 if character at position a is the one needed to take state b-1 to b and DP[a-1][b-1]>0
= DP[a-1][b] - 1 if character at position a is the one needed to take state b to b+1 and DP[a-1][b]>0
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=infinity.
Then the total number of non-overlapping strings is DP[len(string)][7]
This approach will not necessarily give the correct answer if the pattern to be matched contains repeated characters (e.g. 'strings').

RE: Odd length string over { 0, 1} that contains exactly two 0's [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I just started learning about formal lang and automata theory, and recently learned about regex, so I don't know any complicated symbols, so please stick with basic symbols.
The question is: Write a regex for the following language over {0, 1} that is a set of all odd length strings that contain exactly two 0s.
I've got the first part finished (the odd part), it should be:
(0+1)[(0+1)(0+1)]* ( + is the same as | (or) I believe, we learnt it as +)
However, when I think about having exactly two 0s it gets really messed up. I can only see that I can use * with 1 only since # of 0s are limited to 2.
But if I do (11)* , I can't get the permutation of 0s inside the 1s. (e.g. can't get 10101 with (11)*).
What I know:
Only 1s can use *
In the regex only two 0s will be used
The way to make odd length is to add an odd length to an even length (even length needs to have empty string within it's set)
Odd length should not use * since 2 odd = even, so only even length can use *.
For possible hints or answer, please use 0, 1, +/|, *, (, ) only. Some other expressions I will not be able to understand.
Regular Language over {0, 1} that is a set of all odd length strings that contain exactly two 0.
What this language means?
Note language string can be consist of two 0 and any number of 1 such that total length of string is odd. There is no other restriction. 1 and 0 an appear in any order and in any pattern.
As we know even + odd = odd. So in string is consist of at least three length and odd number of 1 because number of 0 in string is two.
So regular expression should be something like: A 0 B 0 C , where A, B, C, are substrings consist of only 1 and total number of 1 in A, B, C is odd, hence all can't be ^ (nul) in a expression.
Now because total number of 1 in A, B, C = odd, so it can be something like: either(1) two even and one odd or (2) all three are odd.
Note: a odd length string can't be null.
Regular Expression:
1(11)*01(11)*01(11)* + 1(11)*0(11)*0(11)* + (11)*01(11)*0(11)* + (11)*0(11)*01(11)*
// all odd A odd, B C even B odd, A C even A B even, C odd

Resources