L {w| w contains any number of substrings 00 and 11 with one 1 occurring anywhere in w}
My guess is that because 1 can be anywhere so Σ*001Σ*1Σ*11Σ* should be the regular expression. Any thoughts or corrections?
Break the language definition down into parts:
L := { w |
w contains any number of substrings 00 and 11
w contains one "1"
}
The first part doesn't actually mean anything. "Any number of substrings 00 and 11" can include no substrings. It's not saying that the string must contain at least one of them. This is equivalent to Σ*.
The second part says the string must contain 1 somewhere in it: Σ*1Σ*
Related
Vasya has a string s of length n consisting only of digits 0 and 1. Also he has an array a of length n.
Vasya performs the following operation until the string becomes empty: choose some consecutive substring of equal characters, erase it from the string and glue together the remaining parts (any of them can be empty). For example, if he erases substring 111 from string 111110 he will get the string 110. Vasya gets ax points for erasing substring of length x.
Vasya wants to maximize his total points, so help him with this!
https://codeforces.com/problemset/problem/1107/E
i was trying to get my head around the editorial,but couldn't understand it... can anyone tell an easy way to do it?
input:
7
1101001
3 4 9 100 1 2 3
output:
109
Explanation
the optimal sequence of erasings is: 1101001 → 111001 → 11101 → 1111 → ∅.
Here, we consider removing prefixes instead of substrings. Why?
We try to remove a consecutive prefix of a particular state which is actually a substring in the main string. So, our DP states will be start index, end index, prefix length.
Let's consider an example str = "1010110". Here, initially start=0, end=7, and prefix=1(the first '1' will be the only prefix now). we iterate over all the indices in the current state except the starting index and check if str[i]==str[start]. Here, for example, str[4]==str[0]. Now we divide the string into "010" with prefix=1(010) && "110" with prefix=2(1010110). These two are now two individual subproblems. So, when there remains a string with length 1, we return aprefix.
Here is my code.
I would like to remove certain characters from a string in COBOL.
For example, '****This is*a test** string.' will become 'This isa test string.', '"Second one"' will become 'Second one'.
While INSPECT ... REPLACING cannot change the position of characters within a data item, INSPECT ... CONVERTING may be used to prepare the data item for subsequent operations.
In the following, the procedure strip-string first converts all characters, to be replaced, to a single common character, in this case, LOW-VALUES. This fragments the string so that the common character maybe be used to easily delimit the fragments. The PERFORM loops over the fragmented string. The UNSTRING statement moves one fragment to the output and provides a COUNT of the number of characters moved. The ADD augments the output starting position so that the fragments are positioned in sequence.
Code:
data division.
working-storage section.
1 binary.
2 p pic 9(4).
2 o pic 9(4).
2 o-count pic 9(4).
1 i-string pic x(40).
88 test-1 value '****This is*a test** string.'.
88 test-2 value '"Second one"'.
1 o-string pic x(40).
1 r-chars pic x(2) value '*"'. *> characters to be removed
procedure division.
begin.
set test-1 to true
perform test-prep
set test-2 to true
perform test-prep
stop run
.
test-prep.
display i-string
perform strip-string
display o-string
display space
.
strip-string.
inspect i-string converting r-chars to low-values
move 1 to p o
perform until p > function length (i-string)
unstring i-string
delimited all low-values
into o-string (o:)
count in o-count
with pointer p
add o-count to o
end-perform
.
Output:
****This is*a test** string.
This isa test string.
"Second one"
Second one
Try the following code snippet.
IDENTIFICATION DIVISION.
PROGRAM-ID. HELLO-WORLD.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-STR PIC X(20) VALUE '****This is*a test**'.
01 WS-CNT PIC 99 VALUE 0.
01 WS-I PIC 99 VALUE 0.
01 WS-J PIC 99 VALUE 1.
01 WS-CHAR.
05 WS-LETTER OCCURS 1 TO 20 TIMES DEPENDING ON WS-CNT PIC X.
PROCEDURE DIVISION.
PERFORM VARYING WS-I FROM 1 BY 1 UNTIL WS-I > FUNCTION LENGTH(WS-STR)
IF WS-STR(WS-I:1) = '*' THEN
CONTINUE
ELSE
MOVE WS-STR(WS-I:1) TO WS-LETTER(WS-J)
ADD 1 TO WS-J
ADD 1 TO WS-CNT
END-IF
END-PERFORM
DISPLAY WS-CHAR
STOP RUN.
Output:
This isa test
Note: I used Tutorial Point's COBOL Coding ground to run the above snippet. COBOL Code doesn't need to be indented there.
Deterministic automata to find number of subsequences in string ?
How can I construct a DFA to find number of occurence string as a subsequence in another string?
eg. In "ssstttrrriiinnngggg" we have 3 subsequences which form string "string" ?
also both string to be found and to be searched only contain characters from specific character Set .
I have some idea about storing characters in stack poping them accordingly till we match , if dont match push again .
Please tell DFA solution ?
OVERLAPPING MATCHES
If you wish to count the number of overlapping sequences then you simply construct a DFA that matches the string, e.g.
1 -(if see s)-> 2 -(if see t)-> 3 -(if see r)-> 4 -(if see i)-> 5 -(if see n)-> 6 -(if see g)-> 7
and then compute the number of ways of being in each state after seeing each character using dynamic programming. See the answers to this question for more details.
DP[a][b] = number of ways of being in state b after seeing the first a characters
= DP[a-1][b] + DP[a-1][b-1] if character at position a is the one needed to take state b-1 to b
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=1.
Then the total number of overlapping strings is DP[len(string)][7]
NON-OVERLAPPING MATCHES
If you are counting the number of non-overlapping sequences, then if we assume that the characters in the pattern to be matched are distinct, we can use a slight modification:
DP[a][b] = number of strings being in state b after seeing the first a characters
= DP[a-1][b] + 1 if character at position a is the one needed to take state b-1 to b and DP[a-1][b-1]>0
= DP[a-1][b] - 1 if character at position a is the one needed to take state b to b+1 and DP[a-1][b]>0
= DP[a-1][b] otherwise
Start with DP[0][b]=0 for b>1 and DP[0][1]=infinity.
Then the total number of non-overlapping strings is DP[len(string)][7]
This approach will not necessarily give the correct answer if the pattern to be matched contains repeated characters (e.g. 'strings').
I've been reading the Wikipedia article about the Knuth-Morris-Pratt algorithm and I'm confused about how the values are found in the jump/partial match table.
i | 0 1 2 3 4 5 6
W[i] | A B C D A B D
T[i] | -1 0 0 0 0 1 2
If someone can more clearly explain the shortcut rule because the sentence
"let us say that we discovered a proper suffix which is a proper prefix and ending at W[2] with length 2 (the maximum possible)"
is confusing. If the proper suffix ends at W[2] wouldn't it be size of 3?
Also I'm wondering why T[4] isn't 1 when there is a prefix and suffix of size 1: The A.
Thanks for any help that can be offered.
Notice that the failure function T[i] does not use i as an index, but rather as a length. Therefore, T[2] represents the length of the longest proper border (a string that is both a prefix and suffix) of the string formed from the first two characters of W, rather than the longest proper border formed by the string ending at character 2. This is why the maximum possible value of T[2] is 2 rather than 3 - the substring formed from the first two characters of W can't have length any greater than 2.
Using this interpretation, it's also easier to see why T[4] is 0 rather than 1. The substring of W formed from the first four characters of W is ABCD, which has no proper prefix that is also a proper suffix.
Hope this helps!
"let us say that we discovered a proper suffix which is a proper prefix and ending at W[2] with length 2 (the maximum possible)"
Okay, the length can be maximum 2, it's correct, here is why...
One fact: "proper" prefix can't be the whole string , same goes for "proper" suffix(like proper subset)
Lets, W[0]=A W[1]=A W[2]=A , i.e the pattern is "AAA", so, the (max length)proper prefix can be "AA" (left to right) and, the (max length) proper suffix can be "AA" (right to left)
//yes, the prefix and suffix have overlaps (the middle "A")
So, the value would be 2 rather than 3, it would have been 3 only if the prefix was not proper.
I'm looking for the positions in a string where a specified substring occurs.
E.g, looking for substring "green" in the the string "green eggs and ham" should return me 1, but from "green eggs and green ham" would return me 1 and 14.
How should I do this?
Edit 1: Changed the wording so position starts at 1, not 0.
Edit 2: I can find the first instance as WS-POINTER in the following snippet:
MOVE 1 TO WS-POINTER
UNSTRING WS-STRING(1:WS-STRING-LEN)
DELIMITED BY LT-MY-DELIMITER
INTO WS-STRING-GARBAGE
WITH POINTER WS-POINTER
END-UNSTRING
AFAIK COBOL does not have a statement to find the position of a string within a string, so that needs to be done manually. However, COBOL does have a statement that counts the occurrences of a string within a string:
INSPECT string TALLYING counter FOR ALL search-string
Here is an example program that works in OpenCOBOL (see OpenCobol.org):
IDENTIFICATION DIVISION.
PROGRAM-ID. OCCURRENCES.
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
DATA DIVISION.
FILE SECTION.
WORKING-STORAGE SECTION.
01 TEST-STRING-1 PIC X(30)
VALUE 'green eggs and ham'.
01 TEST-STRING-2 PIC X(30)
VALUE 'green eggs and green ham'.
01 TEST-STRING PIC X(30).
01 SEARCH-STRING PIC X(05)
VALUE 'green'.
01 MATCH-COUNT PIC 9.
01 SEARCH-INDEX PIC 99.
01 MATCH-POSITIONS.
05 MATCH-POS PIC 99 OCCURS 9 TIMES.
PROCEDURE DIVISION.
MAIN.
MOVE TEST-STRING-1 TO TEST-STRING
PERFORM FIND-MATCHES
MOVE TEST-STRING-2 TO TEST-STRING
PERFORM FIND-MATCHES
STOP RUN
.
FIND-MATCHES.
MOVE ZERO TO MATCH-COUNT
INSPECT TEST-STRING TALLYING MATCH-COUNT
FOR ALL SEARCH-STRING.
DISPLAY 'FOUND ' MATCH-COUNT ' OCCURRENCE(S) OF '
SEARCH-STRING ' IN:'
DISPLAY TEST-STRING
DISPLAY 'MATCHES FOUND AT POSITIONS: ' WITH NO ADVANCING
PERFORM VARYING SEARCH-INDEX FROM 1 BY 1
UNTIL SEARCH-INDEX = 30
IF TEST-STRING (SEARCH-INDEX:5) = SEARCH-STRING
DISPLAY SEARCH-INDEX ' ' WITH NO ADVANCING
END-PERFORM
DISPLAY ' '
DISPLAY ' '
.
You could use QCLSCAN on IBM i
77 QCLSCAN-SRCHLEN PIC S9(3) COMP-3.
77 QCLSCAN-STARTPOS PIC S9(3) COMP-3.
77 QCLSCAN-PATLEN PIC S9(3) COMP-3.
77 QCLSCAN-XLATE PIC X(01) VALUE "0".
77 QCLSCAN-TRIM PIC X(01) VALUE "0".
77 QCLSCAN-WILDCARD PIC X(01) VALUE LOW-VALUES.
77 QCLSCAN-FOUNDPOS PIC S9(3) COMP-3.
...
...
MOVE LENGTH OF WRK-ACCT-NBR TO QCLSCAN-SRCHLEN
MOVE 1 TO QCLSCAN-STARTPOS
MOVE 9 TO QCLSCAN-PATLEN
MOVE "0" TO QCLSCAN-XLATE
MOVE "0" TO QCLSCAN-TRIM
MOVE "?" TO QCLSCAN-WILDCARD
CALL "QCLSCAN" USING WRK-ACCT-NBR
QCLSCAN-SRCHLEN
QCLSCAN-STARTPOS
EMPLOYEE-SSN-9X
QCLSCAN-PATLEN
QCLSCAN-XLATE
QCLSCAN-TRIM
QCLSCAN-WILDCARD
QCLSCAN-FOUNDPOS
IF QCLSCAN-FOUNDPOS > ZERO
* Found data in position QCLSCAN-FOUNDPOS
ELSE
* Found no match
END-IF
MOVE 1 TO WS-POINTER
UNSTRING WS-STRING(1:WS-STRING-LEN)
DELIMITED BY LT-MY-DELIMITER
INTO WS-STRING-GARBAGE
WITH POINTER WS-POINTER
END-UNSTRING
You ask about how to use the above for subsequent strings.
It is possible to use UNSTRING in two ways to get the counts you want. Either by having multiple receiving fields and COUNT-IN or by using multiple executions of UNSTRING using the POINTER value from the previous UNSTRING each time.
You need to account for the length of the delimiter. However, you will end up with "non-intuitive" code which will have to be "understood" each time someone picks up the program with it in.
Instead, it is a simple task with "substring" processing with either OCCURS DEPENDING ON or reference-modification (the method in the accepted answer).
You must make sure you don't "go beyond the end of the field" by ending the search when count + length-of-delimiter = max-length-of-string-to-search.