Meaning of square brackets in Alloy grammar spec - alloy

In the Alloy grammar spec on the Alloy Web site, I find myself confused by the use of square brackets.
In a production like the following, things seem clear.
specification ::= [module] open* paragraph*
I guess the square brackets indicate optionality and that the asterisks are Kleene closures, so that the rule just quoted means a specification consists of at most one module statement, zero or more open clauses, and zero or more paragraphs. This makes sense to me (though I am gradually coming to use Wirth's EBNF notation wherever possible, so my notes show this as [module] {open} {paragraph}).
In the following production, though, the brackets are confusing me.
cmdDecl ::= [name ":"] ["run"|"check"] [name|block] scope
It would surprise me a great deal if the keywords run and check were optional in commands, and ditto for the name of the predicate to be run, the name of the assertion to be checked, or the anonymous block to be run or checked. But that's what it looks as if this rule is saying.
So question 1: what do square brackets indicate in the grammar?
Question 2: is the use of square brackets where some readers might expect parentheses a typo? I.e. should this rule instead take the following form?
cmdDecl ::= [name ":"] ("run"|"check") (name|block) scope
Maybe I'm just not familiar enough with the variety of grammatical notations to be found in the wild; perhaps it would be helpful to indicate the tool, or point to a description of the notation.
Question 3: is this notation used by some parser generation tool? Which?

question 1: what do square brackets indicate in the grammar?
You rightly pointed out that the use of square brackets is inconsistent in the grammar you referred to. I think that grammar was copied from the first edition of the "Software Abstractions" book; I'm not sure if the second edition of the book contains the same grammar.
Question 2: is the use of square brackets where some readers might expect parentheses a typo?
Exactly right.
Question 3: is this notation used by some parser generation tool? Which?
It is not. The Alloy Analyzer uses a grammar written in Cup. The .lex and .cup files (Alloy.lex and Alloy.cup) are included in the Alloy distribution jar file (located in "edu/mit/csail/sdg/alloy4compiler/parser/").

Thanks, Michael. The production for cmdDecl was indeed wrong in the book, so I've posted an erratum. Aleks has also updated the grammar on the Alloy website, which had a couple of other errors.

Related

Why can't an identifier start with a number?

I have a file named 1_add.rs, and I tried to add it into the lib.rs. Yet, I got the following error during compilation.
error: expected identifier, found `1_add`
--> src/lib.rs:1:5
|
1 | mod 1_add;
| ^^^^^ expected identifier
It seems the identifier that starts with a digit is invalid. But why would Rust has this restriction? Is there any workaround if I want to indicate the sequence of different rust files (for managing the exercise files).
In your case (you want to name the files like 1_foo.rs) you can write
#[path="1_foo.rs"]
mod mod_1_foo;
Allowing identifies to start with digits can conflict with type annotations. E.g.
let foo = 1_u32;
sets to type to u32. It would be confusing when 1_u256 means another variable.
But why would Rust has this restriction?
Not only rust, but most every language I've written a line of code in has this restriction as well.
Food for thought:
let a = 1_2;
Is 1_2 a variable name or is it a literal for value 12? What if variable 1_2 does not exist now, but you add it later, does this token stop being a number literal?
While rust compiler probably could make it work, it's not worth all the confusion, IMHO.
Allowing identifiers to start with a digit would caus conflicts with many other token types. Here are a few examples:
1e1 is a floating point number.
0x0 is a hexadecimal integer.
8u8 is an integer with explicit type annotation.
Most importantly, though, I believe allowing identifiers starting with digit would hurt readability. Currently everything starting with a digit is some kind of number, which in my opinion helps when reading code.
An incomplete list of programming languages not allowing identifiers to start with a digit: Python, Java, JavaScript, C#, Ruby, C, C++, Pascal. I can't think of a language that does allow this (which most likely does exist).
Rust identifiers are based on Unicode® Standard Annex #31
(see The Rust RFC Book), which standardizes some common rules for identifiers in programming languages. It might make it easier to parse text that could otherwise be ambiguous, like 1e10?
"Why?" cannot be reasoned here but by historical tales, the rules are as such. You cannot play against them.
If you urgently want to start your identifiers with a digit, at least for human readers, prepend an underscore like this: _1_add.
Note: To make sure that sorting works well, use also leading zeroes as many as appropriate (_001_add if you expect more than 99 files).

Repeating Pattern Matching in antlr4

I'm trying to write a lexer rule that would match following strings
a
aa
aaa
bbbb
the requirement here is all characters must be the same
I tried to use this rule:
REPEAT_CHARS: ([a-z])(\1)*
But \1 is not valid in antlr4. is it possible to come up with a pattern for this?
You can’t do that in an ANTLR lexer. At least, not without target specific code inside your grammar. And placing code in your grammar is something you should not do (it makes it hard to read, and the grammar is tied to that language). It is better to do those kind of checks/validations inside a listener or visitor.
Things like back-references and look-arounds are features that krept in regex-engines of programming languages. The regular expression syntax available in ANTLR (and all parser generators I know of) do not support those features, but are true regular languages.
Many features found in virtually all modern regular expression libraries provide an expressive power that far exceeds the regular languages. For example, many implementations allow grouping subexpressions with parentheses and recalling the value they match in the same expression (backreferences). This means that, among other things, a pattern can match strings of repeated words like "papa" or "WikiWiki", called squares in formal language theory.
-- https://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages

What is the typed hole exploration development style?

While doing the CIS194 (Spring of 2013) homework 10, I got stuck with Applicative instance of a Parser type. I seek help from Google, an I came across with this Reddit post. The user ephrion gave an answer, which was also an example of typed hole exploration method, which I didn't quite understand. In the comments section of his answer he also said this:
It's extremely useful and one of the things that makes Haskell development so nice.
So question is, what this method is exactly, and are there some explicit order of steps in this method?
I still consider myself as a beginner when it comes to Haskell, and by googling about the subject I didn't find a very clear explanation how this kind of development style could be used.
Almost anywhere on the right hand side of an assignment in Haskell, you can write an underscore (optionally followed by other characters) instead of a value (constant or function). Instead of compiling, GHC will then tell you which type of value you might want to replace the underscore with, and list which identifiers in scope are of that type.
Matthías Páll Gissurarson is expanding the list of hints from GHC to include compound expressions.

Why don't popular programming languages use some other character to delimit strings? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 13 years ago.
Every programming language I know (Perl, Javascript, PHP, Python, ASP, ActionScript, Commodore Basic) uses single and double quotes to delimit strings.
This creates the ongoing situation of having to go to great lengths to treat quotes correctly, since the quote is extremely common in the contents of strings.
Why do programming languages not use some other character to delimit strings, one that is not used in normal conversation \, | or { } for example) so we can just get on with our lives?
Is this true, or am I overlooking something? Is there an easy way to stop using quotes for strings in a modern programming language?
print <<<END
I know about here document syntax, but for minor string manipulation it's overly complicated and it complicates formatting.
END;
[UPDATE] Many of you made a good point about the importance of using only ASCII characters. I have updated the examples to reflect that (the backslash, the pipe and braces).
Perl lets you use whatever characters you like
"foo $bar" eq
qq(foo $bar) eq
qq[foo $bar] eq
qq!foo $bar! eq
qq#foo $bar# etc
Meanwhile
'foo $bar' eq
q(foo $bar) eq
q[foo $bar] eq
q!foo $bar! eq
q#foo $bar# etc
The syntax extends to other features, including regular expressions, which is handy if you are dealing with URIs.
"http://www.example.com/foo/bar/baz/" =~ /\/foo/[^\/]+\/baz\//;
"http://www.example.com/foo/bar/baz/" =~ m!/foo/[^/]+/baz/!;
Current: "Typewriter" 'quotation' marks
There are many good reasons for using the quotation marks we are currently using:
Quotes are easily found on keyboards - so they are easy to type, and they have to be easy, because strings are needed so often.
Quotes are in ASCII - most programming tools only handle well ASCII. You can use ASCII in almost any environment imaginable. And that's important when you are fixing your program over a telnet connection in some far-far-away server.
Quotes come in many versions - single quotes, double quotes, back quotes. So a language can assign different meanings for differently quoted strings. These different quotes can also solve the 'quotes "inside" quotes' problem.
Quotes are natural - English used quotes for marking up text passages long before programming languages followed. In linguistics quotes are used in quite the same way as in programming languages. Quotes are natural the same way + and - are natural for addition and substraction.
Alternative: “typographically” ‘correct’ quotes
Technically they are superior. One great advantage is that you can easily differenciate between opening and closing quotes. But they are hard to type and they are not in ASCII. (I had to put them into a headline to make them visible in this StackOverflow font at all.)
Hopefully on one day when ASCII is something that only historians care about and keyboards have changed into something totally different (if we are even going to have keyboards at all), there will come a programming language that uses better quotes...
Python does have an alternative string delimiter with the triple-double quote """Some String""".
Single quotes and double quotes are used in the majority of languages since that is the standard delimiter in most written languages.
Languages (should) try to be as simple to understand as possible, and using something different from quotes to deal with strings introduces unnecessary complexity.
Python has an additional string type, using triple double-quotes,
"""like this"""
In addition to this, Perl allows you to use any delimiter you want,
q^ like this ^
I think for the most part, the regular string delimiters are used because they make sense. A string is wrapped in quotes. In addition to this, most developers are used to using their common-sense when it comes to strings that drastically altering the way strings are presented could be a difficult learning curve.
Using quotation marks to define a set of characters as separate from the enclosing text is more natural to us, and thus easier to read. Also, " and ' are on the keyboard, while those other characters you mentioned are not, so it's easier to type. It may be possible to use a character that is widely available on keyboards, but I can't think of one that won't have the same kind of problem.
E: I missed the pipe character, which may actually be a viable alternative. Except that it's currently widely used as the OR operator, and the readability issue still stands.
Ah, so you want old-fashioned FORTRAN, where you'd quote by counting the number of characters in the string and embedding it in a H format, such as: 13HHello, World!. As somebody who did a few things with FORTRAN back in the days when the language name was all caps, quotation marks and escaping them are a Good Thing. (For example, you aren't totally screwed if you are off by one in your manual character count.)
Seriously, there is no ideal solution. It will always be necessary, at some point, to have a string containing whatever quote character you like. For practical purposes, the quote delimiters need to be on the keyboard and easily accessible, since they're heavily used. Perl's q#...# syntax will fail if a string contains an example of each possible character. FORTRAN's Hollerith constants are even worse.
Because those other characters you listed aren't ASCII. I'm not sure that we are ready for, or need a programming language in unicode...
EDIT: As to why not use {}, | or \, well those symbols all already have meanings in most languages. Imagine C or Perl with two different meanings for '{' and '}'!
| means or, and in some languages concatenate strings already. and how would you get \n if \ was the delimiter?
Fundamentally, I really don't see why this is a problem. Is \" really THAT hard? I mean, in C, you often have to use \%, and \ and several other two-character characters so... Meh.
Because no one has created a language using some other character that has gotten popular.
I think that is largely because the demand for changing the character is just not there, most programmers are used to the standard quote and see no compelling reason to change the status quo.
Compare the following.
print "This is a simple string."
print "This \"is not\" a simple string."
print ¤This is a simple string.¤
print ¤This "is not" a simple string.¤
I for one don't really feel like the second is any easier or more readable.
You say "having to go to great lengths to treat quotes correctly"; but it's only in the text representation. All modern languages treat strings as binary blocks, so they really don't care about the content. Remember that the text representation is only a simple way for the programmer to tell the system what to do. Once the string is interned, it doesn't have any trouble managing the quotes.
One good reason would probably be that if this is the only thing you want to improve on an existing language, you're not really creating a new language.
And if you're creating a new language, picking the right character for the string quotes is probably way way WAY down on the todo list of things to actually implement.
You would probably be best off picking a delimiter that exists on all common keyboards and terminal representation sets, so most of the ones you suggest are right out...
And in any case, a quoting mechanism will still be necessary...you gain a reduction in the number of times you use quoting at the cost of making the language harder for non-specialist to read.
So it is not entirely clear that this is a win, and then there is force of habit.
Ada doesn't use single quotes for strings. Those are only for chars, and don't have to be escaped inside strings.
I find it very rare that the double-quote character comes up in a normal text string that I enter into a computer program. When it does, it is almost always because I am passing that string to a command interpreter, and need to embed another string in it.
I would imagine the main reason none of those other characters are used for string delimiters is that they aren't in the original 7-bit ASCII code table. Perhaps that's not a good excuse these days, but in a world where most language designers are afraid to buck the insanely crappy C syntax, you aren't going to get a lot of takers for an unusual string delimiter choice.
Python allows you to mix single and double quotes to put quotation marks in strings.
print "Please welcome Mr Jim 'Beaner' Wilson."
>>> Please welcome Mr Jim 'Beaner' Wilson.
print 'Please welcome Mr Jim "Beaner" Wilson.'
>>> Please welcome Mr Jim "Beaner" Wilson
You can also used the previously mentioned triple quotes. These also extend across multiple lines to allow you to also keep from having to print newlines.
print """Please welcome Mr Jim "Beaner" Wilson."""
>>> Please welcome Mr Jim "Beaner" Wilson
Finally, you can print strings the same way as everyone else.
print "Please welcome Mr Jim \"Beaner\" Wilson."
>>> Please welcome Mr Jim "Beaner" Wilson

Is there any advantage of being a case-sensitive programming language? [duplicate]

This question already has answers here:
Closed 14 years ago.
I personally do not like programming languages being case sensitive.
(I know that the disadvantages of case sensitivity are now-a-days complemented by good IDEs)
Still I would like to know whether there are any advantages for a programming language if it is case sensitive. Is there any reason why designers of many popular languages chose to make them case sensitive?
EDIT: duplicate of Why are many languages case sensitive?
EDIT: (I cannot believe I asked this question a few years ago)
This is a preference. I prefer case sensitivity, I find it easier to read code this way. For instance, the variable name "myVariable" has a different word shape than "MyVariable," "MYVARIABLE," and "myvariable." This makes it more straightforward at a glance to tell the two identifiers apart. Of course, you should not or very rarely create identifiers that differ only in case. This is more about consistency than the obvious "benefit" of increasing the number of possible identifiers. Some people think this is a disadvantage. I can't think of any time in which case sensitivity gave me any problems. But again, this is a preference.
Case-sensitivity is inherently faster to parse (albeit only slightly) since it can compare character sequences directly without having to figure out which characters are equivalent to each other.
It allows the implementer of a class/library to control how casing is used in the code. Case may also be used to convey meaning.
The code looks more the same. In the days of BASIC these were equivalent:
PRINT MYVAR
Print MyVar
print myvar
With type checking, case sensitivity prevents you from having a misspelling and unrecognized variable. I have fixed bugs in code that is a case insensitive, non typed language (FORTRAN77), where the zero (0) and capital letter O looked the same in the editor. The language created a new object and so the output was flawed. With a case sensitive, typed language, this would not have happened.
In the compiler or interpreter, a case-insensitive language is going to have to make everything upper or lowercase to test for matches, or otherwise use a case insensitive matching tool, but that's only a small amount of extra work for the compiler.
Plus case-sensitive code allows certain patterns of declarations such as
MyClassName myClassName = new MyClassName()
and other situations where case sensitivity is nice.

Resources