If the string of L consists of 0's only prove that L* is regular - regular-language

A question 4.2.10 from Introduction to Automata Theory by Hopcroft and Ullman. The original language L can also be non-regular.
Let's say we got a function of 0^(2^n+5), n>=0, how would you prove that (0^(2^n+5))* is regular? And also for the more general case, when f(0) can be any function?

Suppose that L contains two strings 0^n and 0^m and that n and m share no common factors: they are relatively prime. Then, by concatenating some number of instances of 0^n with some number of instances of 0^m, any string of length (n - 1)(m - 1) can be formed. Since L* must therefore exclude only a finite number of words, the complement (L*)' must be finite, hence regular; because regular languages are closed under complement, L* must be regular too.
Where did (n - 1)(m - 1) come from? Well, it's a special case (n = 2) of the coin problem for which we have a closed-form solution. You should be able to research this and find some proofs.
What about the case where all strings in L have lengths divisible by some GCD, say g? Well, the proof of regularity is quite similar; consider a modified alphabet where 0 is replaced by the symbol (0^g) and then prove the analogous language over this alphabet is regular as above. In other words, you can show that L* contains only strings divisible by g and all strings divisible by g of length at least (n/g - 1)(m/g - 1) where n and m have GCD g. The language is regular because it excludes only finitely many words whose lengths are divisible by g.

Related

Regular or context-free or other

I have a problem that needs to find out the following language is regular or context-free or other.
{a^(2i+3j) | i>0, j>0}
I have some doubts to say it is a regular or context-free language because it has no pattern.
The strings in the language are the following:
a^5 = aaaaa is in the language, choosing i = j = 1
a^7 = aaaaaaa is in the language, choosing i = 2, j = 1
a^8 = aaaaaaaa is in the language, choosing i = 1 and j = 2
a^k is in the language for all k > 8
assume k is even. then choose j = 2 and choose i = (k - 6) / 2
assume k is odd. then choose j = 1 and choose i = (k - 3) / 2
The only strings not in the language are the following:
L' = {e, a, aa, aaa, aaaa, aaaaaa}
The complement of our language is finite, thus regular; and by closure properties we know our language must be regular, since the complement of a regular language is regular.
We can solve this by constructing the automata. Doing this with a non-deterministic automata is easier, so that will be a more general solution.
NFA
Non-deterministic finite automata are equivalent to DFAs and regular languages.
To construct such automata, you simply concatenate an automata that has to see an even,non zero number of as to an automata that has to see a multiply of 3 times number of as, keeping only the last step of the latter as accepting.
Basically, the non-determinism of moving between those 2 automatas (and staying at the first one) is checking whether the number of as can be expressed as 2i+3j.
Specific solution
A more specific solution, would use the fact that the gcd(2,3)=1, which means that there's some N=6 so that for any n>N this automata will accept.
We also note that we can accept n=5. We can now construct a DFA that after getting 6 as has a "sink" state that accepts.

Proof of reverse binary strings?

If w : {1...L} → {0,1} is a binary string, the complement of w, denoted wC, is a string of length L defined by: wc(i) = 1 - w(i). The reverse of w, denoted wR, is the string of the length L defined by wR(i) = w(L + 1 - i). Use these definitions to give careful proof that, for every binary string x, (xC)R = (xR)C.
I have no idea how to start this question. I don't really want a direct answer I'd like to learn how to do this question by induction for future questions
If the solution I see is the simplest one, then it's a quite comprehensive exercise.
I suggest you start by proving a the following lemmas:
Lemma 1: (w0)C=(wC)1
Lemma 2: (w0)R=0(wR)
Both Lemma 1 and 2 can be proven by induction on the length of w. Doing it strictly by the given rules is tedious, but not very hard.
Argue that the following lemmas hold as well by the same argument
Lemma 1b: (w1)C=wC0
Lemma 2b: (w1)R=1(wR)
With those lemmas in place, you should be able to tackle the original problem of showing (xC)R=(xR)C.
Do an induction over L (i.e. the length of the word). The base case should be trivial. In the inductive step you'll end up with something like
Induction hypothesis
(uC)R=(uR)C.
Left to show:
((u0)C)R=((u0)R)C
and (by analogy)
((u1)C)R=((u1)R)C
Solving this step will involve the lemmas above.

Stuck with Apostolico-Crochemore algorithm

I am trying to understand the Apostolico-Crochemore algorithm.
The only English description I have found is http://www-igm.univ-mlv.fr/~lecroq/string/node12.html#SECTION00120, but I am stuck with the second line of the description where it says
x is a power of a single character
What does that mean?
m in this case is the length of the pattern, c is a character from the alphabet in use. I can't understand how x == c^m.
This is then followed by (x=(a^l)bu for a, b in Sigma, u in Sigma and a neq b that also uses ^ operation which I cannot understand.
Algorithms on strings are sometimes described in the jargon of formal languages, where concatenation (joining) of strings is written as multiplication: x * y, usually written just xy, means "the string x followed by the string y". So x^n (i.e. "raising the string x to the nth power") naturally means "n copies of the string x, joined together".
This is mostly just a notational device, though multiplication (of ordinary real numbers) and string concatenation do share some abstract mathematical properties. E.g. they are both associative: (xy)z = x(yz), whether we're talking about multiplying numbers or joining strings. (OTOH, xy = yx for real numbers but not for strings, in general. But then matrix multiplication is not commutative either.)

Finding the minimum number of swaps to convert one string to another, where the strings may have repeated characters

I was looking through a programming question, when the following question suddenly seemed related.
How do you convert a string to another string using as few swaps as follows. The strings are guaranteed to be interconvertible (they have the same set of characters, this is given), but the characters can be repeated. I saw web results on the same question, without the characters being repeated though.
Any two characters in the string can be swapped.
For instance : "aabbccdd" can be converted to "ddbbccaa" in two swaps, and "abcc" can be converted to "accb" in one swap.
Thanks!
This is an expanded and corrected version of Subhasis's answer.
Formally, the problem is, given a n-letter alphabet V and two m-letter words, x and y, for which there exists a permutation p such that p(x) = y, determine the least number of swaps (permutations that fix all but two elements) whose composition q satisfies q(x) = y. Assuming that n-letter words are maps from the set {1, ..., m} to V and that p and q are permutations on {1, ..., m}, the action p(x) is defined as the composition p followed by x.
The least number of swaps whose composition is p can be expressed in terms of the cycle decomposition of p. When j1, ..., jk are pairwise distinct in {1, ..., m}, the cycle (j1 ... jk) is a permutation that maps ji to ji + 1 for i in {1, ..., k - 1}, maps jk to j1, and maps every other element to itself. The permutation p is the composition of every distinct cycle (j p(j) p(p(j)) ... j'), where j is arbitrary and p(j') = j. The order of composition does not matter, since each element appears in exactly one of the composed cycles. A k-element cycle (j1 ... jk) can be written as the product (j1 jk) (j1 jk - 1) ... (j1 j2) of k - 1 cycles. In general, every permutation can be written as a composition of m swaps minus the number of cycles comprising its cycle decomposition. A straightforward induction proof shows that this is optimal.
Now we get to the heart of Subhasis's answer. Instances of the asker's problem correspond one-to-one with Eulerian (for every vertex, in-degree equals out-degree) digraphs G with vertices V and m arcs labeled 1, ..., m. For j in {1, ..., n}, the arc labeled j goes from y(j) to x(j). The problem in terms of G is to determine how many parts a partition of the arcs of G into directed cycles can have. (Since G is Eulerian, such a partition always exists.) This is because the permutations q such that q(x) = y are in one-to-one correspondence with the partitions, as follows. For each cycle (j1 ... jk) of q, there is a part whose directed cycle is comprised of the arcs labeled j1, ..., jk.
The problem with Subhasis's NP-hardness reduction is that arc-disjoint cycle packing on Eulerian digraphs is a special case of arc-disjoint cycle packing on general digraphs, so an NP-hardness result for the latter has no direct implications for the complexity status of the former. In very recent work (see the citation below), however, it has been shown that, indeed, even the Eulerian special case is NP-hard. Thus, by the correspondence above, the asker's problem is as well.
As Subhasis hints, this problem can be solved in polynomial time when n, the size of the alphabet, is fixed (fixed-parameter tractable). Since there are O(n!) distinguishable cycles when the arcs are unlabeled, we can use dynamic programming on a state space of size O(mn), the number of distinguishable subgraphs. In practice, that might be sufficient for (let's say) a binary alphabet, but if I were to try to try to solve this problem exactly on instances with large alphabets, then I likely would try branch and bound, obtaining bounds by using linear programming with column generation to pack cycles fractionally.
#article{DBLP:journals/corr/GutinJSW14,
author = {Gregory Gutin and
Mark Jones and
Bin Sheng and
Magnus Wahlstr{\"o}m},
title = {Parameterized Directed \$k\$-Chinese Postman Problem and \$k\$
Arc-Disjoint Cycles Problem on Euler Digraphs},
journal = {CoRR},
volume = {abs/1402.2137},
year = {2014},
ee = {http://arxiv.org/abs/1402.2137},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
You can construct the "difference" strings S and S', i.e. a string which contains the characters at the differing positions of the two strings, e.g. for acbacb and abcabc it will be cbcb and bcbc. Let us say this contains n characters.
You can now construct a "permutation graph" G which will have n nodes and an edge from i to j if S[i] == S'[j]. In the case of all unique characters, it is easy to see that the required number of swaps will be (n - number of cycles in G), which can be found out in O(n) time.
However, in the case where there are any number of duplicate characters, this reduces to the problem of finding out the largest number of cycles in a directed graph, which, I think, is NP-hard, (e.g. check out: http://www.math.ucsd.edu/~jverstra/dcig.pdf ).
In that paper a few greedy algorithms are pointed out, one of which is particularly simple:
At each step, find the minimum length cycle in the graph (e.g. Find cycle of shortest length in a directed graph with positive weights )
Delete it
Repeat until all vertexes have not been covered.
However, there may be efficient algorithms utilizing the properties of your case (the only one I can think of is that your graphs will be K-partite, where K is the number of unique characters in S). Good luck!
Edit:
Please refer to David's answer for a fuller and correct explanation of the problem.
Do an A* search (see http://en.wikipedia.org/wiki/A-star_search_algorithm for an explanation) for the shortest path through the graph of equivalent strings from one string to the other. Use the Levenshtein distance / 2 as your cost heuristic.

Using Closure Properties to prove Regularity

Here's a homework problem:
Is L_4 Regular?
Let L_4 = L*, where L={0^i1^i | i>=1}.
I know L is non-regular and I know that Kleene Star is a closed operation, so my assumption is that L_4 is non-regular.
However my professor provided an example of the above in which L = {0^p | p is prime}, which he said was regular by proving that L* was equal to L(000* + e) by saying each was a subset of one another (e in this case means the empty word).
So his method involved forming a regex of 0^p, but how I can do that when I essentially have one already?
Regular languages are closed under Kleene star. That is, if language R is regular, so is R*.
But the reasoning doesn't work in the other direction: there are nonregular languages P for which P* is actually regular.
You mentioned one such P in your question: the set of strings 0^p where p is prime.
It is easy to use the pumping lemmas for regular and context-free languages to show that P is at least context-sensitive.
However, P* is equivalent to the language 0^q, where q is the sum of zero or more primes.
But this is true for q=0 (the empty string) and any q>=2, so P* can be recognized with a 3-state DFA, even though P itself is not regular.
So L being context-free has no bearing on whether your L_4 = L* is regular or not. If you can construct a DFA that recognizes L_4, as I did for P* above, then clearly it's regular.
In the process of trying to find a DFA that works, you'll probably see some pattern
emerge that can be used as the basis for a pumping argument. The Myhill-Nerode theorem is another approach to proving a language non-regular, and is useful if the language lends itself to analysis of prefixes and distinguishing extensions. If the language can be decomposed into a finite set of equivalence classes under a certain relation, then it can be recognized with a DFA containing that many states.
Edit: For anyone wondering whether OP's example L_4 is regular or not...it's not, which can be proved using the pumping lemma for regular languages.
Assume L_4 is regular, with "pumping length" P. Consider the string w=0P1P, which is an element of L_4. We need to decompose it into the form w=xyz,
with |y| >= 1 and |xy| <= P. Any choice of xy fulfilling these conditions will consist of all zeroes. But then any string w' = xynz with n != 1 will have mismatched counts of 0s and 1s, and therefore cannot be an element of L_4. So the pumping lemma does not hold, and L_4 cannot be regular.

Resources