allow(uncommon_codepoints) is ignored unless specified at crate level - rust

I wrote this the other day:
let µ = ... some expression ...
(As it happens, the µ sign is easily typeable on my keyboard, just AltGr+m. This is why I have a habit to use this letter quite often especially when it is about small values.)
Now I got this:
identifier contains uncommon Unicode codepoints
`#[warn(uncommon_codepoints)]` on by default
No problem, I'll just allow it, I thought, and put this at the front:
#![allow(uncommon_codepoints)]
But no, it's utterly hesitant against greek:
allow(uncommon_codepoints) is ignored unless specified at crate level
`#[warn(unused_attributes)]` on by default
I would think it is at least debatable what "uncommon" exactly is. But I'm not really interested in that discussion, as long as I can turn it off.
So please ... how exactly do I specify something at the crate level? I tried it in main.rs and libs.rs but it wont accept it.
Edit
This really starts to become interesting:
I put the line
#![allow(uncommon_codepoints)]
as line 1 in main.rs and it now stops complaining about the unused_attribute. However, the "uncommon codepoint" warning still appears when compiling the file that contains it (i.e. with cargo build). I am at rustc 1.58.1 (stable, AFAIK)
I also found out that what my keyboard produces is not U+03BC GREEK SMALL LETTER MU but U+00B5 MICRO SIGN. It's still a letter, lowercase. Now, the interesting thing is: the uncommon Unicode warning does not appear for a genuine greek Mu, but for the micro sign it does!
Is there any other place I can turn off annoying and (from my point of view utterly useless) warnings? In general, I highly appreciate rust's detailed and often helpful warnings (though lately I found myself making an unused HashSet just to avoid the warnings about unused imports --- hey I know I will need this later, so please stop nagging), but this unicode thingy is a bit overdone. Its a valid variable name according to rust lexical syntax, and I really do want to use it. Period.

Related

Why can't an identifier start with a number?

I have a file named 1_add.rs, and I tried to add it into the lib.rs. Yet, I got the following error during compilation.
error: expected identifier, found `1_add`
--> src/lib.rs:1:5
|
1 | mod 1_add;
| ^^^^^ expected identifier
It seems the identifier that starts with a digit is invalid. But why would Rust has this restriction? Is there any workaround if I want to indicate the sequence of different rust files (for managing the exercise files).
In your case (you want to name the files like 1_foo.rs) you can write
#[path="1_foo.rs"]
mod mod_1_foo;
Allowing identifies to start with digits can conflict with type annotations. E.g.
let foo = 1_u32;
sets to type to u32. It would be confusing when 1_u256 means another variable.
But why would Rust has this restriction?
Not only rust, but most every language I've written a line of code in has this restriction as well.
Food for thought:
let a = 1_2;
Is 1_2 a variable name or is it a literal for value 12? What if variable 1_2 does not exist now, but you add it later, does this token stop being a number literal?
While rust compiler probably could make it work, it's not worth all the confusion, IMHO.
Allowing identifiers to start with a digit would caus conflicts with many other token types. Here are a few examples:
1e1 is a floating point number.
0x0 is a hexadecimal integer.
8u8 is an integer with explicit type annotation.
Most importantly, though, I believe allowing identifiers starting with digit would hurt readability. Currently everything starting with a digit is some kind of number, which in my opinion helps when reading code.
An incomplete list of programming languages not allowing identifiers to start with a digit: Python, Java, JavaScript, C#, Ruby, C, C++, Pascal. I can't think of a language that does allow this (which most likely does exist).
Rust identifiers are based on Unicode® Standard Annex #31
(see The Rust RFC Book), which standardizes some common rules for identifiers in programming languages. It might make it easier to parse text that could otherwise be ambiguous, like 1e10?
"Why?" cannot be reasoned here but by historical tales, the rules are as such. You cannot play against them.
If you urgently want to start your identifiers with a digit, at least for human readers, prepend an underscore like this: _1_add.
Note: To make sure that sorting works well, use also leading zeroes as many as appropriate (_001_add if you expect more than 99 files).

Is there some sort of sequence in which the compiler operates?

I am rewriting a C++ program to Rust, and there is one thing that gives me an emotional rollercoaster. At first iteration it gives me, say 50 errors, then I solve them one by one, and just when I solve the last one, the compiler gives me 60 fresh errors, then I solve them and get another few tens of errors.
The last (yet) set of errors seems to be generated exclusively by the borrow checker. So why does this happen? Are there some layers or stages of the compiling process, and if yes what are they?
I want to know that, because I like predictability and dislike emotional rollercoasters (also I want to know when this adventure is going to end).
Yes, there is an order:
Syntax errors, from parsing the source code
Non-lifetime type errors, from the type checker
Lifetime errors, from the borrow checker
The first two are common to most typed languages. You need to build some kind of model of the type relationships before checking them, which will fail fast if the syntax is incorrect. In Rust, a subsequent step is to verify that all borrows are valid, once the basic type check has passed.
You can read more in the blog post, Introducing MIR
.

identifying left recursion in antlr4

A grammar (copied from its manual) reports the following left recursions when
I un-commented the following production: casting_type->constant_primary
error(119): The following sets of rules are mutually left-recursive [primary, method_call_root, method_call, cast]
and [casting_type, constant_cast, cast, constant_primary, constant_function_call, function_subroutine_call, primary]
and [subroutine_call, function_subroutine_call, constant_function_call, constant_primary, method_call, method_call_root, casting_type, primary, constant_cast, cast]
The above error report has 3 sets of rules. The third set has 2 left-recursions in it:
casting_type,constant_primary,constant_cast,casting_type
casting_type,constant_primary,constant_function_call,function_subroutine_call,subroutine_call,method_call,method_call_root,primary,cast,casting_type
Since this error was reported after I un-commented one production, I
think it is reasonable to expect to see at least its names in each set (casting_type,constant_primary). Clearly the first set
lacks both these names, so it cannot contain a recursion. And the second set (I cannot give the full
grammar here because it is too long) has recursion-1 and some extra names
which seem not relevant.
My question is: why is Antlr printing the first and the second sets of rules?
Is this a bug in antlr (I tried 4.6 and 4.7, same result), or is this hinting at a problem that I am missing something in these sets?
I saw a similar post elsewhere, where the reported names did not indicate a recursion, but on deeper analysis recursion was found somewhere else.
Probably nobody can really answer your question, not even the authors of ANTLR. To me it looks like you get follow-up errors which make not much sense, because a real error made analysis impossible (or at least can lead to wrong conclusions). Of course there can also be a bug in ANTLR, but I recommend to focus on one of the sets and fix that (if you can see what makes them mutually left recursive). Maybe the other errors disappear then or you have to analyze again.

Haskell Char and unicode surrogates

I was playing around with strings and discovered that Haskell (correctly) disallows characters above Unicode code point 0x10ffff (ie one gets something like a sequence out of range error if one attempts to use something above this limit). Out of curiosity, i played around with the Unicode surrogate halves (0xd800 to 0xdfff) - invalid Unicode codepoints, and discovered that they seem to be permitted. I am curious as to why this is. Is it simply because being a bounded item means only defining a maximum and a minimum?
Disallowing the surrogate code units would indeed make Char a more correct type for Unicode code points. The Report says that Char is "an enumeration whose values represent Unicode characters", so probably this should be considered a GHC bug.
There's no specific notion of "a bounded item", but it would require extra checks in various places (right now chr just needs to make one comparison to check if its argument is valid, for instance) and possibly make some things behave more strangely (if people indirectly expect code points to be contiguous).
I don't know that there's an especially good rationale for it, though, or that the trade-off was even considered originally. In Haskell 1.4, Char was just a 16-bit type, so it would have been natural to extend it to 17*2^16 values without adding extra checks. This issue is occasionally brought up -- I've brought it up before -- but most people don't seem to worry about it very much. It's probably reasonable to file a GHC bug about it, though, to get a proper discussion going.
Note that Data.Text (which uses UTF-16 as its internal representations) does disallow the invalid code units (it has to).

Mandatory use of braces

As part of a code standards document I wrote awhile back, I enforce "you must always use braces for loops and/or conditional code blocks, even (especially) if they're only one line."
Example:
// this is wrong
if (foo)
//bar
else
//baz
while (stuff)
//things
// This is right.
if (foo) {
// bar
} else {
// baz
}
while (things) {
// stuff
}
When you don't brace a single-line, and then someone comments it out, you're in trouble. If you don't brace a single-line, and the indentation doesn't display the same on someone else's machine... you're in trouble.
So, question: are there good reasons why this would be a mistaken or otherwise unreasonable standard? There's been some discussion on it, but no one can offer me a better counterargument than "it feels ugly".
The best counter argument I can offer is that the extra line(s) taken up by the space reduce the amount of code you can see at one time, and the amount of code you can see at one time is a big factor in how easy it is to spot errors. I agree with the reasons you've given for including braces, but in many years of C++ I can only think of one occasion when I made a mistake as a result and it was in a place where I had no good reason for skipping the braces anyway. Unfortunately I couldn't tell you if seeing those extra lines of code ever helped in practice or not.
I'm perhaps more biased because I like the symmetry of matching braces at the same indentation level (and the implied grouping of the contained statements as one block of execution) - which means that adding braces all the time adds a lot of lines to the project.
I enforce this to a point, with minor exceptions for if statements which evaluate to either return or to continue a loop.
So, this is correct by my standard:
if(true) continue;
As is this
if(true) return;
But the rule is that it is either a return or continue, and it is all on the same line. Otherwise, braces for everything.
The reasoning is both for the sake of having a standard way of doing it, and to avoid the commenting problem you mentioned.
I see this rule as overkill. Draconian standards don't make good programmers, they just decrease the odds that a slob is going to make a mess.
The examples you give are valid, but they have better solutions than forcing braces:
When you don't brace a single-line, and then someone comments it out, you're in trouble.
Two practices solve this better, pick one:
1) Comment out the if, while, etc. before the one-liner with the one-liner. I.e. treat
if(foo)
bar();
like any other multi-line statement (e.g. an assignment with multiple lines, or a multiple-line function call):
//if(foo)
// bar();
2) Prefix the // with a ;:
if(foo)
;// bar();
If you don't brace a single-line, and the indentation doesn't display the same on someone else's machine... you're in trouble.
No, you're not; the code works the same but it's harder to read. Fix your indentation. Pick tabs or spaces and stick with them. Do not mix tabs and spaces for indentation. Many text editors will automatically fix this for you.
Write some Python code. That will fix at least some bad indentation habits.
Also, structures like } else { look like a nethack version of a TIE fighter to me.
are there good reasons why this would be a mistaken or otherwise unreasonable standard? There's been some discussion on it, but no one can offer me a better counterargument than "it feels ugly".
Redundant braces (and parentheses) are visual clutter. Visual clutter makes code harder to read. The harder code is to read, the easier it is to hide bugs.
int x = 0;
while(x < 10);
{
printf("Count: %d\n", ++x);
}
Forcing braces doesn't help find the bug in the above code.
P.S. I'm a subscriber to the "every rule should say why" school, or as the Dalai Lama put it, "Know the rules so that you may properly break them."
I have yet to have anyone come up with a good reason not to always use curly braces.
The benefits far exceed any "it feels ugly" reason I've heard.
Coding standard exist to make code easier to read and reduce errors.
This is one standard that truly pays off.
I find it hard to argue with coding standards that reduce errors and make the code more readable. It may feel ugly to some people at first, but I think it's a perfectly valid rule to implement.
I stand on the ground that braces should match according to indentation.
// This is right.
if (foo)
{
// bar
}
else
{
// baz
}
while (things)
{
// stuff
}
As far as your two examples, I'd consider yours slightly less readable since finding the matching closing parentheses can be hard, but more readable in cases where indentation is incorrect, while allowing logic to be inserted easier. It's not a huge difference.
Even if indentation is incorrect, the if statement will execute the next command, regardless of whether it's on the next line or not. The only reason for not putting both commands on the same line is for debugger support.
The one big advantage I see is that it's easier to add more statements to conditionals and loops that are braced, and it doesn't take many additional keystrokes to create the braces at the start.
My personal rule is if it's a very short 'if', then put it all on one line:
if(!something) doSomethingElse();
Generally I use this only when there are a bunch of ifs like this in succession.
if(something == a) doSomething(a);
if(something == b) doSomething(b);
if(something == b) doSomething(c);
That situation doesn't arise very often though, so otherwise, I always use braces.
At present, I work with a team that lives by this standard, and, while I'm resistant to it, I comply for uniformity's sake.
I object for the same reason I object to teams that forbid use of exceptions or templates or macros: If you choose to use a language, use the whole language. If the braces are optional in C and C++ and Java, mandating them by convention shows some fear of the language itself.
I understand the hazards described in other answers here, and I understand the yearning for uniformity, but I'm not sympathetic to language subsetting barring some strict technical reason, such as the only compiler for some environment not accommodating templates, or interaction with C code precluding broad use of exceptions.
Much of my day consists of reviewing changes submitted by junior programmers, and the common problems that arise have nothing to do with brace placement or statements winding up in the wrong place. The hazard is overstated. I'd rather spend my time focusing on more material problems than looking for violations of what the compiler would happily accept.
The only way coding standards can be followed well by a group of programmers is to keep the number of rules to a minimum.
Balance the benefit against the cost (every extra rule confounds and confuses programmers, and after a certain threshold, actually reduces the chance that programmers will follow any of the rules)
So, to make a coding standard:
Make sure you can justify every rule with clear evidence that it is better than the alternatives.
Look at alternatives to the rule - is it really needed? If all your programmers use whitespace (blank lines and indentation) well, an if statement is very easy to read, and there is no way that even a novice programmer can mistake a statement inside an "if" for a statement that is standalone. If you are getting lots of bugs relating to if-scoping, the root cause is probably that you have a poor whitepsace/indentation style that makes code unnecessarily difficult to read.
Prioritise your rules by their measurable effect on code quality. How many bugs can you avoid by enforcing a rule (e.g. "always check for null", "always validate parameters with an assert", "always write a unit test" versus "always add some braces even if they aren't needed"). The former rules will save you thousands of bugs a year. The brace rule might save you one. Maybe.
Keep the most effective rules, and discard the chaff. Chaff is, at a minimum, any rule that will cost you more to implement than any bugs that might occur by ignoring the rule. But probably if you have more than about 30 key rules, your programmers will ignore many of them, and your good intentions will be as dust.
Fire any programmer who comments out random bits of code without reading it :-)
P.S. My stance on bracing is: If the "if" statement or the contents of it are both a single line, then you may omit the braces. That means that if you have an if containing a one-line comment and a single line of code, the contents take two lines, and therefore braces are required. If the if condition spans two lines (even if the contents are a single line), then you need braces. This means you only omit braces in trivial, simple, easily read cases where mistakes are never made in practice. (When a statement is empty, I don't use braces, but I always put a comment clearly stating that it is empty, and intentionally so. But that's bordering on a different topic: being explicit in code so that readers know that you meant a scope to be empty rather than the phone rang and you forgot to finish the code)
Many languanges have a syntax for one liners like this (I'm thinking of perl in particular) to deal with such "ugliness". So something like:
if (foo)
//bar
else
//baz
can be written as a ternary using the ternary operator:
foo ? bar : baz
and
while (something is true)
{
blah
}
can be written as:
blah while(something is true)
However in languages that don't have this "sugar" I would definitely include the braces. Like you said it prevents needless bugs from creeping in and makes the intention of the programmer clearer.
I am not saying it is unreasonable, but in 15+ years of coding with C-like languages, I have not had a single problem with omitting the braces. Commenting out a branch sounds like a real problem in theory - I've just never seen it happening in practice.
Another advantage of always using braces is that it makes search-and-replace and similar automated operations easier.
For example: Suppose I notice that functionB is usually called immediately after functionA, with a similar pattern of arguments, and so I want to refactor that duplicated code into a new combined_function. A regex could easily handle this refactoring if you don't have a powerful enough refactoring tool (^\s+functionA.*?;\n\s+functionB.*?;) but, without braces, a simple regex approach could fail:
if (x)
functionA(x);
else
functionA(y);
functionB();
would become
if (x)
functionA(x);
else
combined_function(y);
More complicated regexes would work in this particular case, but I've found it very handy to be able to use regex-based search-and-replace, one-off Perl scripts, and similar automated code maintenance, so I prefer a coding style that doesn't make that needlessly complicated.
I don't buy into your argument. Personally, I don't know anyone who's ever "accidentally" added a second line under an if. I would understand saying that nested if statements should have braces to avoid a dangling else, but as I see it you're enforcing a style due to a fear that, IMO, is misplaced.
Here are the unwritten (until now I suppose) rules I go by. I believe it provides readability without sacrificing correctness. It's based on a belief that the short form is in quite a few cases more readable than the long form.
Always use braces if any block of the if/else if/else statement has more than one line. Comments count, which means a comment anywhere in the conditional means all sections of the conditional get braces.
Optionally use braces when all blocks of the statement are exactly one line.
Never place the conditional statement on the same line as the condition. The line after the if statement is always conditionally executed.
If the conditional statement itself performs the necessary action, the form will be:
for (init; term; totalCount++)
{
// Intentionally left blank
}
No need to standardize this in a verbose manner, when you can just say the following:
Never leave braces out at the expense of readability. When in doubt, choose to use braces.
I think the important thing about braces is that they very definitely express the intent of the programmer. You should not have to infer intent from indentation.
That said, I like the single-line returns and continues suggested by Gus. The intent is obvious, and it is cleaner and easier to read.
If you have the time to read through all of this, then you have the time to add extra braces.
I prefer adding braces to single-line conditionals for maintainability, but I can see how doing it without braces looks cleaner. It doesn't bother me, but some people could be turned off by the extra visual noise.
I can't offer a better counterargument either. Sorry! ;)
For things like this, I would recommend just coming up with a configuration template for your IDE's autoformatter. Then, whenever your users hit alt-shift-F (or whatever the keystroke is in your IDE of choice), the braces will be automatically added. Then just say to everyone: "go ahead and change your font coloring, PMD settings or whatever. Please don't change the indenting or auto-brace rules, though."
This takes advantage of the tools available to us to avoid arguing about something that really isn't worth the oxygen that's normally spent on it.
Depending on the language, having braces for a single lined conditional statement or loop statement is not mandatory. In fact, I would remove them to have fewer lines of code.
C++:
Version 1:
class InvalidOperation{};
//...
Divide(10, 0);
//...
Divide(int a, in b)
{
if(b == 0 ) throw InvalidOperation();
return a/b;
}
Version 2:
class InvalidOperation{};
//...
Divide(10, 0);
//...
Divide(int a, in b)
{
if(b == 0 )
{
throw InvalidOperation();
}
return a/b;
}
C#:
Version 1:
foreach(string s in myList)
Console.WriteLine(s);
Version2:
foreach(string s in myList)
{
Console.WriteLine(s);
}
Depending on your perspective, version 1 or version 2 will be more readable. The answer is rather subjective.
Wow. NO ONE is aware of the dangling else problem? This is essentially THE reason for always using braces.
In a nutshell, you can have nasty ambiguous logic with else statements, especially when they're nested. Different compilers resolve the ambiguity in their own ways. It can be a huge problem to leave off braces if you don't know what you're doing.
Aesthetics and readability has nothing to do it.

Resources