Below is an Alloy model that I created of unfolding the lines in iCalendar documents. I am wondering if there is a better -- simpler -- model?
Here is a brief description of iCalendar and unfolding, followed by my Alloy model.
iCalendar is a computer file format which allows Internet users to send meeting requests and tasks to other Internet users by sharing or sending files in this format. Wikipedia
An iCalendar file consists of a series of lines. Each line cannot be longer than 75 characters (actually 75 octets, but that’s irrelevant for this discussion). If a line exceeds that length, the line must be “folded” onto the next line. A space character at the start of a line indicates that it is a continuation of the previous line.
The process of moving from the folded representation to its single-line representation is called "unfolding". Unfolding is accomplished by removing the CRLF and the space character that immediately follows … When parsing a content line, folded lines must first be unfolded. RFC5545
Here is a sample iCalendar file:
BEGIN:VCALENDAR
VERSION:2.0
CALSCALE:GREGORIAN
BEGIN:VEVENT
DESCRIPTION;ALTREP="cid:part1.0001#example.org":The Fall'98 Wild
Wizards Conference - - Las Vegas\, NV\, USA
END:VEVENT
END:VCALENDAR
The value of the DESCRIPTION property is folded to the next line, as indicated by the space at the start of Wizards Conference ...
The following is an Alloy model of unfolding lines in iCalendar documents.
These are the key abstractions that I identified in this problem: documents, lines, space, data, and unfolding. Specifically, there is a document containing lines. Each line has data. If a line starts with a space, then it is a continuation line and must be merged with the previous line. An unfolding operation must be repetitively applied to the document. The resulting document must have no folded lines.
An ordered set of documents represents the states of the document during the unfolding process:
open util/ordering[Document]
The document contains a set of lines. Each line has a next line (except the last line, of course). Each line has data. After unfolding, a line will have its data plus the data from the next line. Thus, a line has a set of data. One of the lines, obviously, is the first.
sig Document {
lines: Line -> lone Line,
content: Line -> set Data,
firstLine: Line
} {
firstLine in lines.Line
// No Line maps to firstLine, i.e., firstLine is the first line
no line: Line | lines[line] = firstLine
// All lines are reachable from firstLine
(lines.Line + lines[Line] - firstLine) in firstLine.^lines
// The lines are sequential, i.e., lines are acyclic
no ^lines & iden
// No space at start of first line
no firstLine.spaceAtStart
}
A Line may have a space at its start.
sig Line {
spaceAtStart: lone Space
}
Space is represented by a singleton set. Data is a set of values.
one sig Space {}
sig Data {}
Each time the unfold operation is called, it unfolds one line in the document.
Example of an Unfold Operation: Suppose the document contains this sequence of lines:
Line0, Line1, Line2
Suppose Line1 has a space at the start. That means Line0 is continued on Line1; unfolding merges the two lines into one. After unfolding, the document has this:
Line0', Line2
where Line0' contains the data from Line0 plus Line1.
pred unfold (doc, doc': Document, line: doc.lines) {
// precondition: “line” has a next line (i.e., it's not the last line)
some line[Line]
// precondition: the next line starts with a space
some line[Line].spaceAtStart
let line1 = line.Line, line2 = line[Line], line3 = doc.lines[line2] {
// Merge line1 and line2. Set line3 to follow line1.
// Add the content of line2 to line1. Delete line2 -> line3.
// Unless ... no line follows line2. Then the unfold
// results in no line following line1. Remove line1 -> line2.
some doc.lines[line2] => doc'.lines = doc.lines ++ (line1 -> line3) - (line2 -> line3)
else doc'.lines = doc.lines - (line1 -> line2)
doc'.content = doc.content ++ (line1 -> (doc.content[line1] + doc.content[line2]))
doc'.firstLine = doc.firstLine
}
}
Initialize the document with some lines. Give each line unique data. Make some lines that will require unfolding.
pred init (doc: Document) {
some doc.lines
let allLines = doc.lines[Line] + doc.lines.Line {
all line: allLines | let data = doc.content[line] | one data and
all otherLine: allLines - line | doc.content[otherLine] != data
some allLines.spaceAtStart
}
}
At each step of the execution trace, execute the unfold operation if there is at least one line that needs unfolding. Otherwise, do nothing to the document (i.e., do a skip).
fact traces {
init [first]
all doc: Document - last | let doc' = doc.next {
// If spaces exist in the document, do an unfold operation.
// Otherwise, do nothing (i.e., skip).
let allLines = doc.lines[Line] + doc.lines.Line {
some allLines.spaceAtStart => not skip [doc, doc']
some line: doc.lines | unfold [doc, doc', line] or skip [doc, doc']
}
}
}
"skip" means no change to the document, i.e., the next state of the document is the same as the last state.
pred skip (doc, doc': Document) {
doc'.lines = doc.lines
doc'.content = doc.content
doc'.firstLine = doc.firstLine
}
Assert: The final state of the Document has no lines folded.
assert no_lines_are_folded {
let allLines = last.lines[Line] + last.lines.Line |
no allLines.spaceAtStart
}
check no_lines_are_folded for 6
The Alloy Analyzer finds no counterexamples. Yea!
Related
I am using a Flutter TextFormField Component to capture some multiline text. I would like to process that text string by breaking it down into a list of words using the following.
List<String> splitText = text.split(' ');
However, I realised that if the user returns to the next line down the split by space doesn't work anymore and the last word on the line and first word on the next line are considered a single word.
If I can detect the last word on the line and first work in a line and indent/outdent with a space, my issue is solved, but I don't know how to detect end and start of lines in a textFormForm
I came up with one option which is probably not ideal, but maybe someone can comment.
Convert the String to runes, then look for the return rune (10) and add the space rune (32) before and after for good measure. Problem is I still need to figure out how to convert it back.
List<int> inputTextRunes = text.runes as List;
List<int> outputTextInRunes = [];
for (var i = 0; i < inputTextRunes.length; i++) {
outputTextInRunes.add(inputTextRunes[i]);
if (inputTextRunes[i] == 10) {
outputTextInRunes.insert(i - 1, 32);
outputTextInRunes.insert(i + 1, 32);
}
}
String outputText = String.fromCharCodes(outputTextInRunes);
Question: How to indent and outdent on line beginning and line end
within Flutter TextFormField
You should just perform the original split targeting more than just space. In this instance, you could use a regular expression that matches all forms of whitespace, including tab characters or newlines.
List<String> splitText = text.split(RegExp(r'\s'));
So I'm playing vim adventures and I got stuck. I need a Vim command that will delete the keys in red. I thought dd would do it, but that only deletes the current line.
Use das or dis to delete a sentence. Use dap or dip to delete a paragraph. See :help text-objects for details. Unrelated to your question, see this wiki page for plugins that provide other, highly useful text objects.
) jumps to the beginning of next sentence, so d) will delete (from the cursor) till the beginning of the next sentence. Vim detects sentences using ., meaning period + space. This does mean that d) will have some problems if your cursor is on either the period or space delimiting two sentences, and will only delete until the first character of the next sentence (meaning it deletes either a space or the period and space, which is almost never what is desired). das will work as you probably expect, deleting the sentence and the delimiter (period + space).
If you specifically want to move (and delete to) the last character in a sentence it is more complicated according to this vi.SE answer:
The solution was either dk which deletes the line and the line above it or dj which deletes the line and the line below it.
My original question was actually not the right question (there are multiple sentences).
To delete to the end of the sentence, from where your cursor is, use the letters, use "d)". the "d" is the delete command object, followed by a motion object ")" which advances the cursor (and the deletion process) to the end of the sentence.
To delete "around" a sentence, including all the extra whitespace, use "das" (delete around sentence). Or to delete inside the sentence, and not all the whitespace, then use "dis" (delete inside sentence).
Once you understand the VIM language, then you can easily memorize a plethora of operations. Use this table to understand VIM's vocabulary:
COUNT NUMERAL + TEXT OBJECT COMMAND + MOTION (or OPERATORS)
"3das" will perform "delete around sentence 3 times"
So, if practical, you could place a numeral followed by...
a command:
d=delete
y=yank (into memory buffer to "put" later)
c=change (delete then insert new text)
and then a motion:
) = move cursor to end of sentence
( = move cursor to beginning of prior sentence
} = move cursor to the next paragraph
{ = move cursor to the beginning of the prior paragraph
w = move cursor to next word
b = move cursor back a word
$ = move cursor to the end of the logical line
0 = (zero) move cursor to the beginning of the logical line
G = move cursor to the end of the file
gg = move cursor to the beginning of the file
h, j, k, or l (you might have to look those up)
OR instead of a Motion, define a field of area using:
a = around
i = inside
followed by the name of the area around the cursor:
s = sentence
p = paragraph
w = word
t = xml-tag <tag example> lots of text between tags </tag example>
< or > = inside or around a tag, frequently found in xml documents
{ [ ( ) ] } = inside or around any type of bracket ...
... {a large area [some more (a little stuff) things] of a great many things }
I actually find this table from the help file the best overview for block commands:
"dl" delete character (alias: "x")
"diw" delete inner word
"daw" delete a word
"diW" delete inner WORD (see |WORD|)
"daW" delete a WORD (see |WORD|)
"dgn" delete the next search pattern match
"dd" delete one line
"dis" delete inner sentence
"das" delete a sentence
"dib" delete inner '(' ')' block
"dab" delete a '(' ')' block
"dip" delete inner paragraph
"dap" delete a paragraph
"diB" delete inner '{' '}' block
"daB" delete a '{' '}' block
So deleting a sentence is das or deleting a paragraph is dap.
If you want to delete from J up to and including the . start at J and use df.
If you want to delete both lines then 2dd
Another option (not sure if it works in the game) is to delete up to and including the period:
d/\./e
You have to escape the period when using a search pattern like this after the delete command.
If you were limited to a single line, it is much simpler:
df.
You can use the command: d2d, but I do not know whether it works in the game.
Vim grammar is [Nubmer] [Operator/ Command] [Motion or Text Object]
So in this case, you can use: 2dd
How would I find duplicate lines by matching only one part of each line and not the whole line itself?
Take for example the following text:
uid=154163(j154163) gid=10003(pemcln) groups=10003(pemcln) j154163
uid=152084(k152084) gid=10003(pemcln) groups=10003(pemcln) k152084
uid=154163(b153999) gid=10003(pemcln) groups=10003(pemcln) b153999
uid=154226(u154226) gid=10003(pemcln) groups=10003(pemcln) u154226
I would like to show only the 1st and 3rd lines only as the have the same duplicate UID value "154163"
The only ways I know how would match the whole line and not the subset of each one.
This code looks for the ID from each line. If any ID appears more than once, its lines are printed:
$ awk -F'[=(]' '{cnt[$2]++;lines[$2]=lines[$2]"\n"$0} END{for (k in cnt){if (cnt[k]>1)print lines[k]}}' file
uid=154163(j154163) gid=10003(pemcln) groups=10003(pemcln) j154163
uid=154163(b153999) gid=10003(pemcln) groups=10003(pemcln) b153999
How it works:
-F'[=(]'
awk separates input files into records (lines) and separates the records into fields. Here, we tell awk to use either = or ( as the field separator. This is done so that the second field is the ID.
cnt[$2]++; lines[$2]=lines[$2]"\n"$0
For every line that is read in, we keep a count, cnt, of how many times that ID has appeared. Also, we save all the lines associated with that ID in the array lines.
END{for (k in cnt){if (cnt[k]>1)print lines[k]}}
After we reach the end of the file, we go through each observed ID and, if it appeared more than once, its lines are printed.
Someone has already provided an awk script that will do what you need, assuming the files are small enough to fit into memory (they store all the lines until the end then decide what to output). There's nothing wrong with it, indeed it could be considered the canonical awk solution to this problem. I provide this answer really for those cases where awk may struggle with the storage requirements.
Specifically, if you have larger files that cause problems with that approach, the following awk script, myawkscript.awk, will handle it, provided you first sort the file so it can rely on the fact related lines are together. In order to ensure it's sorted and that you can easily get at the relevant key (using = and ( as field separators), you call it with:
sort <inputfile | awk -F'[=(]' -f myawkscript.awk
The script is:
state == 0 {
if (lastkey == $2) {
printf "%s", lastrec;
print;
state = 1;
};
lastkey = $2;
lastrec = $0"\n";
next;
}
state == 1 {
if (lastkey == $2) {
print;
} else {
lastkey = $2;
lastrec = $0"\n";
state = 0;
}
}
It's basically a state machine where state zero is scanning for duplicates and state one is outputting the duplicates.
In state zero, the relevant part of the current line is checked against the previous and, if there's a match, it outputs both and switches to state one. If there's no match, it simply moves on to the next line.
In state one, it checks each line against the original in the set and outputs it as long as it matches. When it finds one that doesn't match, it stores it and reverts to state zero.
I am reading in a text file and parsing the words into a map to count numbers of occurrences of each word on each line. I am required to ignore all non-alphabetic chars (punctuation, digits, white space, etc) except for apostrophes. I can figure out how to delete all of these characters using the following code, but that causes incorrect words, like "one-two" comes out as "onetwo", which should be two words, "one" and "two".
Instead, I am trying to now replace all of these values with spaces instead of simply deleting, but can't figure out how to do this. I figured the replace-if algorithm would be a good algorithm to use, but can't figure out the proper syntax to accomplish this. C++11 is fine. Any suggestions?
Sample output would be the following:
"first second" = "first" and "second"
"one-two" = "one" and "two"
"last.First" = "last" and "first"
"you're" = "you're"
"great! A" = "great" and "A"
// What I initially used to delete non-alpha and white space (apostrophe's not working currently, though)
// Read file one line at a time
while (getline(text, line)){
istringstream iss(line);
// Parse line on white space, storing values into tokens map
while (iss >> word){
word.erase(remove_if(word.begin(), word.end(), my_predicate), word.end());
++tokens[word][linenum];
}
++linenum;
}
bool my_predicate(char c){
return c == '\'' || !isalpha(c); // This line's not working properly for apostrophe's yet
}
bool my_predicate(char c){
return c == '\'' || !isalpha(c);
}
Here you're writing that you want to remove the char if it is and apostrophe or if it is not an alphabetical character.
Since you want to replace these, you should use std::replace_if() :
std::replace_if(std::begin(word), std::end(word), my_predicate, ' ');
And you should correct your predicate too :
return !isalpha(c) && c != '\'';
You could use std::replace_if to pre-process the input line before sending it to the istringstream. This will also simplify your inner loop.
while (getline(text, line)){
replace_if(line.begin(), line.end(), my_predicate, ' ');
istringstream iss(line);
// Parse line on white space, storing values into tokens map
while (iss >> word){
++tokens[word][linenum];
}
++linenum;
}
Suppose I have a file test.c containing the following:
// line 1
// line 2
If I open this file in Vim and navigate to the first line in normal mode, then type o, I get the following:
// line 1
//
// line 2
Now suppose I have a file test.lhs (literate Haskell) containing
> data X = A | B
> data Y = C | D
If I open this file and navigate to the first line in normal mode, then type o, I get
> data X = A | B
> data Y = C | D
Question: How can I make Vim automatically insert > at the start of the line for the .lhs file, similar to how // is automatically inserted for the .c file?
Got it! To .vimrc, add
set formatoptions+=o
This automatically inserts the "comment leader" (character sequence indicating a comment, or, in the case of literate Haskell, the Haskell code) at the start of the line.
For more information on the options accepted by formatoptions, type :help fo-table.