Are operations like `2*7` considered literals? - literals

I just had a small question.
Are operations considered literals? Would 2*7, for example, be a literal? Is "hello, " + "world!" a literal?
I know the operands are literals, but the expression is not explicitly 14 or "hello, world!".
The question Is 2+3 considered as a literal?
asks basically what I am asking but most answers weren't even helpful, all they do is break the variable declaration down or talk about what compilers do with them, but I'm not looking for that, so I would like a more in depth explanation.
Thank you

It will depend on the language and the compiler, sorry. But just using the concept that a literal is a kind of token, then no, the result is a compile-time constant, not a token.
In C/C++ 2*7 will be optimised by the compiler to make a new constant but it isn't actually legally defined as a literal, though it can be treated as a compile-time constant.
Concatenating "hello" "world" (note no plus) is actually described as a preprocessing step in c++, so does generate a new literal constant string, but then in original C this didn't work.
But note that in C, a macro will treat the parameter phrase 2+7 as separate tokens, and #define STUPIDMUL3(val) 3 * val for 2+7 will give the answer 13, not 18. If you could find a way to force macros to treat the two halves of the string differently, I think it would.
I would expect an interpreter to take longer to process 2*7 than it would 14 because it might interpret and solve it every time.

Related

Why can't an identifier start with a number?

I have a file named 1_add.rs, and I tried to add it into the lib.rs. Yet, I got the following error during compilation.
error: expected identifier, found `1_add`
--> src/lib.rs:1:5
|
1 | mod 1_add;
| ^^^^^ expected identifier
It seems the identifier that starts with a digit is invalid. But why would Rust has this restriction? Is there any workaround if I want to indicate the sequence of different rust files (for managing the exercise files).
In your case (you want to name the files like 1_foo.rs) you can write
#[path="1_foo.rs"]
mod mod_1_foo;
Allowing identifies to start with digits can conflict with type annotations. E.g.
let foo = 1_u32;
sets to type to u32. It would be confusing when 1_u256 means another variable.
But why would Rust has this restriction?
Not only rust, but most every language I've written a line of code in has this restriction as well.
Food for thought:
let a = 1_2;
Is 1_2 a variable name or is it a literal for value 12? What if variable 1_2 does not exist now, but you add it later, does this token stop being a number literal?
While rust compiler probably could make it work, it's not worth all the confusion, IMHO.
Allowing identifiers to start with a digit would caus conflicts with many other token types. Here are a few examples:
1e1 is a floating point number.
0x0 is a hexadecimal integer.
8u8 is an integer with explicit type annotation.
Most importantly, though, I believe allowing identifiers starting with digit would hurt readability. Currently everything starting with a digit is some kind of number, which in my opinion helps when reading code.
An incomplete list of programming languages not allowing identifiers to start with a digit: Python, Java, JavaScript, C#, Ruby, C, C++, Pascal. I can't think of a language that does allow this (which most likely does exist).
Rust identifiers are based on Unicode® Standard Annex #31
(see The Rust RFC Book), which standardizes some common rules for identifiers in programming languages. It might make it easier to parse text that could otherwise be ambiguous, like 1e10?
"Why?" cannot be reasoned here but by historical tales, the rules are as such. You cannot play against them.
If you urgently want to start your identifiers with a digit, at least for human readers, prepend an underscore like this: _1_add.
Note: To make sure that sorting works well, use also leading zeroes as many as appropriate (_001_add if you expect more than 99 files).

Read substrings from a string containing multiplication [duplicate]

This question already has answers here:
'*' and '/' not recognized on input by a read statement
(2 answers)
Closed 4 years ago.
I am a scientist programming in Fortran, and I came up with a strange behaviour. In one of my programs I have a string containing several "words", and I want to read all words as substrings. The first word starts with an integer and a wildcard, like "2*something".
When I perform an internal read on that string, I expect to read all wods, but instead, the READ function repeatedly reads the first substring. I do not understand why, nor how to avoid this behaviour.
Below is a minimalist sample program that reproduces this behaviour. I would expect it to read the three substrings and to print "3*a b c" on the screen. Instead, I get "a a a".
What am I doing wrong? Can you please help me and explain what is going on?
I am compiling my programs under GNU/Linux x64 with Gfortran 7.3 (7.3.0-27ubuntu1~18.04).
PROGRAM testread
IMPLICIT NONE
CHARACTER(LEN=1024):: string
CHARACTER(LEN=16):: v1, v2, v3
string="3*a b c"
READ(string,*) v1, v2, v3
PRINT*, v1, v2, v3
END PROGRAM testread
You are using list-directed input (the * format specifier). In list-directed input, a number (n) followed by an asterisk means "repeat this item n times", so it is processed as if the input was a a a b c. You would need to have as input '3*a' b c to get what you want.
I will use this as another opportunity to point out that list-directed I/O is sometimes the wrong choice as its inherent flexibility may not be what you want. That it has rules for things like repeat counts, null values, and undelimited strings is often a surprise to programmers. I also often see programmers complaining that list-directed input did not give an error when expected, because the compiler had an extension or the programmer didn't understand just how liberal the feature can be.
I suggest you pick up a Fortran language reference and carefully read the section on list-directed I/O. You may find you need to use an explicit format or change your program's expectations.
Following the answer of #SteveLionel, here is the relevant part of the reference on list-directed sequential READ statements (in this case, for Intel Fortran, but you could find it for your specific compiler and it won't be much different).
A character string does not need delimiting apostrophes or quotation marks if the corresponding I/O list item is of type default character, and the following is true:
The character string does not contain a blank, comma (,), or slash ( / ).
The character string is not continued across a record boundary.
The first nonblank character in the string is not an apostrophe or a quotation mark.
The leading character is not a string of digits followed by an asterisk.
A nondelimited character string is terminated by the first blank, comma, slash, or end-of-record encountered. Apostrophes and quotation marks within nondelimited character strings are transferred as is.
In total, there are 4 forms of sequential read statements in Fortran, and you may choose the option that best fits your need:
Formatted Sequential Read:
To use this you change the * to an actual format specifier. If you know the length of the strings at advance, this would be as easy as '(a3,a2,a2)'. Or, you could come with a format specifier that matches your data, but this generally demands you knowing the length or format of stuff.
Formatted Sequential List-Directed:
You are currently using this option (the * format descriptor). As we already showed you, this kind of I/O comes with a lot of magic and surprising behavior. What is hitting you is the n*cte thing, that is interpreted as n repetitions of cte literal.
As said by Steve Lionel, you could put quotation marks around the problematic word, so it will be parsed as one-piece. Or, as proposed by #evets, you could split or break your string using the intrinsics index or scan. Another option could be changing your wildcard from asterisk to anything else.
Formatted Namelist:
Well, that could be an option if your data was (or could be) presented in the namelist format, but I really think it's not your case.
Unformatted:
This may not apply to your case because you are reading from a character variable, and an internal READ statement can only be formatted.
Otherwise, you could split your string by means of a function instead of a I/O operation. There is no intrinsic for this, but you could come with one without much trouble (see this thread for reference). As you may have noted already, manipulating strings in fortran is... awkward, at least. There are some libraries out there (like this) that may be useful if you are doing lots of string stuff in Fortran.

Turn a number into a byte string literal, similar to stringify!()

I'm trying to write a macro that would turn a number into a byte string literal, similar to how the stringify! macro can turn its argument into a &str.
More concretely, how would I write this:
byte_stringify!(10) -> b"10"
I will be using this to create a large number of const structs, so I can't really rely on calling a method on str.
More ambitiously, I'm actually trying to prepend and append some text before turning the argument into a byte string:
make_arg!(10) -> b"x10y"
Update:
Where did the old bytes! macro go? I think I want:
bytes!(stringify!(10))
You can't; at least, not without writing a compiler plugin, which is far beyond the scope of a simple Stack Overflow response.
There's some basic documentation on the subject in the Compiler Plugins chapter of the Rust Book, though do keep in mind that compiler plugins only work on nightly Rust; they do not work in any stable or beta release, thus also locking any crate that uses them to nightly Rust.
Sorry about that.

Using a regular string as a code for compiling

I need to make a function that receives a string such as:
int *ptr[20], *p, p2, p3[3];
and the function need to print:
ptr requires 80 bytes.
p requires 4 bytes.
p2 requires 4 bytes.
p3 requires 12 bytes.
to simplify to task, I would like to use the "fake" code in the string as a "real" code, and then just print the function sizeof(variable) to answer the question. I think it is the most simple way.
But how to do it?
What you describe is the ability to "evaluate" dynamically generated code.
Some languages -- usually they are evaluated (non-compiled) ones -- have such features, but C++ does not.
Even if it did, it wouldn't be a good solution here. You need a parser. For a formal approach, you may research lexers and context-free parsers. For an ad hoc approach...well...do whatever string manipulation you would like.

Why do a lot of programming languages put the type *after* the variable name?

I just came across this question in the Go FAQ, and it reminded me of something that's been bugging me for a while. Unfortunately, I don't really see what the answer is getting at.
It seems like almost every non C-like language puts the type after the variable name, like so:
var : int
Just out of sheer curiosity, why is this? Are there advantages to choosing one or the other?
There is a parsing issue, as Keith Randall says, but it isn't what he describes. The "not knowing whether it is a declaration or an expression" simply doesn't matter - you don't care whether it's an expression or a declaration until you've parsed the whole thing anyway, at which point the ambiguity is resolved.
Using a context-free parser, it doesn't matter in the slightest whether the type comes before or after the variable name. What matters is that you don't need to look up user-defined type names to understand the type specification - you don't need to have understood everything that came before in order to understand the current token.
Pascal syntax is context-free - if not completely, at least WRT this issue. The fact that the variable name comes first is less important than details such as the colon separator and the syntax of type descriptions.
C syntax is context-sensitive. In order for the parser to determine where a type description ends and which token is the variable name, it needs to have already interpreted everything that came before so that it can determine whether a given identifier token is the variable name or just another token contributing to the type description.
Because C syntax is context-sensitive, it very difficult (if not impossible) to parse using traditional parser-generator tools such as yacc/bison, whereas Pascal syntax is easy to parse using the same tools. That said, there are parser generators now that can cope with C and even C++ syntax. Although it's not properly documented or in a 1.? release etc, my personal favorite is Kelbt, which uses backtracking LR and supports semantic "undo" - basically undoing additions to the symbol table when speculative parses turn out to be wrong.
In practice, C and C++ parsers are usually hand-written, mixing recursive descent and precedence parsing. I assume the same applies to Java and C#.
Incidentally, similar issues with context sensitivity in C++ parsing have created a lot of nasties. The "Alternative Function Syntax" for C++0x is working around a similar issue by moving a type specification to the end and placing it after a separator - very much like the Pascal colon for function return types. It doesn't get rid of the context sensitivity, but adopting that Pascal-like convention does make it a bit more manageable.
the 'most other' languages you speak of are those that are more declarative. They aim to allow you to program more along the lines you think in (assuming you aren't boxed into imperative thinking).
type last reads as 'create a variable called NAME of type TYPE'
this is the opposite of course to saying 'create a TYPE called NAME', but when you think about it, what the value is for is more important than the type, the type is merely a programmatic constraint on the data
If the name of the variable starts at column 0, it's easier to find the name of the variable.
Compare
QHash<QString, QPair<int, QString> > hash;
and
hash : QHash<QString, QPair<int, QString> >;
Now imagine how much more readable your typical C++ header could be.
In formal language theory and type theory, it's almost always written as var: type. For instance, in the typed lambda calculus you'll see proofs containing statements such as:
x : A y : B
-------------
\x.y : A->B
I don't think it really matters, but I think there are two justifications: one is that "x : A" is read "x is of type A", the other is that a type is like a set (e.g. int is the set of integers), and the notation is related to "x ε A".
Some of this stuff pre-dates the modern languages you're thinking of.
An increasing trend is to not state the type at all, or to optionally state the type. This could be a dynamically typed langauge where there really is no type on the variable, or it could be a statically typed language which infers the type from the context.
If the type is sometimes given and sometimes inferred, then it's easier to read if the optional bit comes afterwards.
There are also trends related to whether a language regards itself as coming from the C school or the functional school or whatever, but these are a waste of time. The languages which improve on their predecessors and are worth learning are the ones that are willing to accept input from all different schools based on merit, not be picky about a feature's heritage.
"Those who cannot remember the past are condemned to repeat it."
Putting the type before the variable started innocuously enough with Fortran and Algol, but it got really ugly in C, where some type modifiers are applied before the variable, others after. That's why in C you have such beauties as
int (*p)[10];
or
void (*signal(int x, void (*f)(int)))(int)
together with a utility (cdecl) whose purpose is to decrypt such gibberish.
In Pascal, the type comes after the variable, so the first examples becomes
p: pointer to array[10] of int
Contrast with
q: array[10] of pointer to int
which, in C, is
int *q[10]
In C, you need parentheses to distinguish this from int (*p)[10]. Parentheses are not required in Pascal, where only the order matters.
The signal function would be
signal: function(x: int, f: function(int) to void) to (function(int) to void)
Still a mouthful, but at least within the realm of human comprehension.
In fairness, the problem isn't that C put the types before the name, but that it perversely insists on putting bits and pieces before, and others after, the name.
But if you try to put everything before the name, the order is still unintuitive:
int [10] a // an int, ahem, ten of them, called a
int [10]* a // an int, no wait, ten, actually a pointer thereto, called a
So, the answer is: A sensibly designed programming language puts the variables before the types because the result is more readable for humans.
I'm not sure, but I think it's got to do with the "name vs. noun" concept.
Essentially, if you put the type first (such as "int varname"), you're declaring an "integer named 'varname'"; that is, you're giving an instance of a type a name. However, if you put the name first, and then the type (such as "varname : int"), you're saying "this is 'varname'; it's an integer". In the first case, you're giving an instance of something a name; in the second, you're defining a noun and stating that it's an instance of something.
It's a bit like if you were defining a table as a piece of furniture; saying "this is furniture and I call it 'table'" (type first) is different from saying "a table is a kind of furniture" (type last).
It's just how the language was designed. Visual Basic has always been this way.
Most (if not all) curly brace languages put the type first. This is more intuitive to me, as the same position also specifies the return type of a method. So the inputs go into the parenthesis, and the output goes out the back of the method name.
I always thought the way C does it was slightly peculiar: instead of constructing types, the user has to declare them implicitly. It's not just before/after the variable name; in general, you may need to embed the variable name among the type attributes (or, in some usage, to embed an empty space where the name would be if you were actually declaring one).
As a weak form of pattern-matching, it is intelligable to some extent, but it doesn't seem to provide any particular advantages, either. And, trying to write (or read) a function pointer type can easily take you beyond the point of ready intelligability. So overall this aspect of C is a disadvantage, and I'm happy to see that Go has left it behind.
Putting the type first helps in parsing. For instance, in C, if you declared variables like
x int;
When you parse just the x, then you don't know whether x is a declaration or an expression. In contrast, with
int x;
When you parse the int, you know you're in a declaration (types always start a declaration of some sort).
Given progress in parsing languages, this slight help isn't terribly useful nowadays.
Fortran puts the type first:
REAL*4 I,J,K
INTEGER*4 A,B,C
And yes, there's a (very feeble) joke there for those familiar with Fortran.
There is room to argue that this is easier than C, which puts the type information around the name when the type is complex enough (pointers to functions, for example).
What about dynamically (cheers #wcoenen) typed languages? You just use the variable.

Resources