As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
In some languages, single quotes are used to define characters and double quotes are used to define strings. In other languages, both single and double quotes are used to define strings.
Do languages that use single and double quotes to define strings often offer an explicit way to define a single character?
Are there any implications to not being able to specifically define a character? Is it acceptable - or desirable - to automatically optimize single character strings into characters?
If the language has a character data type, then there is usually a way to define a character literal.
In VB.NET for example, a character literal looks like a single character string, but with the C suffix:
Dim space As Char = " "C
(The reason that apostrophes was not used for character literals in VB.NET, as in for example C#, is that they are used as shorthand for the REM command.)
In Javascript for example there is no character data type, so there is no way do specify a character literal. You would represent a character either as a single character string, or as the numerical character code.
Automatically optimising a single character string to a character would not likely be a good solution, unless you also make the automatic conversion back to a string if needed. In practice that would however be the same as automatically convert a single character string to a character when needed.
Related
This question already has answers here:
'*' and '/' not recognized on input by a read statement
(2 answers)
Closed 4 years ago.
I am a scientist programming in Fortran, and I came up with a strange behaviour. In one of my programs I have a string containing several "words", and I want to read all words as substrings. The first word starts with an integer and a wildcard, like "2*something".
When I perform an internal read on that string, I expect to read all wods, but instead, the READ function repeatedly reads the first substring. I do not understand why, nor how to avoid this behaviour.
Below is a minimalist sample program that reproduces this behaviour. I would expect it to read the three substrings and to print "3*a b c" on the screen. Instead, I get "a a a".
What am I doing wrong? Can you please help me and explain what is going on?
I am compiling my programs under GNU/Linux x64 with Gfortran 7.3 (7.3.0-27ubuntu1~18.04).
PROGRAM testread
IMPLICIT NONE
CHARACTER(LEN=1024):: string
CHARACTER(LEN=16):: v1, v2, v3
string="3*a b c"
READ(string,*) v1, v2, v3
PRINT*, v1, v2, v3
END PROGRAM testread
You are using list-directed input (the * format specifier). In list-directed input, a number (n) followed by an asterisk means "repeat this item n times", so it is processed as if the input was a a a b c. You would need to have as input '3*a' b c to get what you want.
I will use this as another opportunity to point out that list-directed I/O is sometimes the wrong choice as its inherent flexibility may not be what you want. That it has rules for things like repeat counts, null values, and undelimited strings is often a surprise to programmers. I also often see programmers complaining that list-directed input did not give an error when expected, because the compiler had an extension or the programmer didn't understand just how liberal the feature can be.
I suggest you pick up a Fortran language reference and carefully read the section on list-directed I/O. You may find you need to use an explicit format or change your program's expectations.
Following the answer of #SteveLionel, here is the relevant part of the reference on list-directed sequential READ statements (in this case, for Intel Fortran, but you could find it for your specific compiler and it won't be much different).
A character string does not need delimiting apostrophes or quotation marks if the corresponding I/O list item is of type default character, and the following is true:
The character string does not contain a blank, comma (,), or slash ( / ).
The character string is not continued across a record boundary.
The first nonblank character in the string is not an apostrophe or a quotation mark.
The leading character is not a string of digits followed by an asterisk.
A nondelimited character string is terminated by the first blank, comma, slash, or end-of-record encountered. Apostrophes and quotation marks within nondelimited character strings are transferred as is.
In total, there are 4 forms of sequential read statements in Fortran, and you may choose the option that best fits your need:
Formatted Sequential Read:
To use this you change the * to an actual format specifier. If you know the length of the strings at advance, this would be as easy as '(a3,a2,a2)'. Or, you could come with a format specifier that matches your data, but this generally demands you knowing the length or format of stuff.
Formatted Sequential List-Directed:
You are currently using this option (the * format descriptor). As we already showed you, this kind of I/O comes with a lot of magic and surprising behavior. What is hitting you is the n*cte thing, that is interpreted as n repetitions of cte literal.
As said by Steve Lionel, you could put quotation marks around the problematic word, so it will be parsed as one-piece. Or, as proposed by #evets, you could split or break your string using the intrinsics index or scan. Another option could be changing your wildcard from asterisk to anything else.
Formatted Namelist:
Well, that could be an option if your data was (or could be) presented in the namelist format, but I really think it's not your case.
Unformatted:
This may not apply to your case because you are reading from a character variable, and an internal READ statement can only be formatted.
Otherwise, you could split your string by means of a function instead of a I/O operation. There is no intrinsic for this, but you could come with one without much trouble (see this thread for reference). As you may have noted already, manipulating strings in fortran is... awkward, at least. There are some libraries out there (like this) that may be useful if you are doing lots of string stuff in Fortran.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There is some stuff that I never see in any programming language and I would know why. I believe this things may be useful. Wll,maybe the explanation will be obvious when you point. But let's go.
Why doesn't 10², be valid in its syntax?
sometimes, we want express by using such notation(just like in a paper) instead of pre-computed value(that sometimes, is a big number,and,makes some difficult when seen at first time, I belive that it is the purpose to _ in the D and Java programming languages) or still call math functions for this. Of course that I'm saying to the compiler replace the value of this variable to the computed value,don't leave it to at run-time.
The - in an indentifier. Why is - not acceptable like _?(just lisp dialect does) to me, int name-size = 14; does not means unreadable. Or this "limitation" is attribute to characters set of computer?
I will be so happy when someone answer my questions. Also,if you have another pointer to ask,just edit my answer and post a note on its edition or post as comment.
Okay, so the two specific questions you've given:
102 - how would you expect to type this? Programming languages tend to stick to ASCII for all but identifiers. Note that you can use double x = 10e2; in Java and C#... but the e form is only valid for floating point literals, not integers.
As noted in comments, exponentiation is supported in some languages - but I suspect it just wasn't deemed sufficiently useful to be worth the extra complexity in most.
An identifier with a - in leads to obvious ambiguity in languages with infix operators:
int x = 10;
int y = 4;
int x-y = 3;
int z = x-y;
Is z equal to 3 (the value of the x-y variable) or is it equal to 6 (the value of subtracting y from x)? Obviously you could come up with rules about what would happen, but by removing - from the list of valid characters in an identifier, this ambiguity is removed. Using _ or just casing (nameSize) is simpler than providing extra rules in the language. Where would you stop, anyway? What about . as part of an identifier, or +?
In general, you should be aware that languages can easily suffer from too many features. The C# team in particular have been quite open about how high the bar is for a new feature to make it into the language. Every new feature must be designed, specified, implemented, tested, documented, and then developers have to learn about it if they're going to understand code using it. This is not cheap, so good language designers are naturally conservative.
Can it be done?
2.⁷
1.617 * 10.ⁿ(13)
Apparently yes. You can modify languages such as ruby (define utf-8 named functions and monkey patch numeric classes) or create User-defined literals in C++ to achieve additional expressiveness.
Should it be done?
How would you type those characters?
Which unicode would you use for, say, euler's constant ? U+2107?
I'd say we stick to code we can type and agree on.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Why does Forth use IF statement THEN ... instead of ENDIF?
I'm implementing a (non-conforming) Forth compiler thing. Basically, Forth's syntax appears very counter-intuitive to me regarding IF statements.
IF ."Statement is true"
ELSE ."Statement is not true"
THEN ."Printed no matter what;
Why is the ending statement a THEN? This makes the language read extremely weird to me. For my compiler, I'm considering changing it to something like ENDIF which reads more natural. But, what was the rationale behind having backwards IF-THEN statements in the first place?
Just think of it as, "IF that's the case, do this, ELSE do that ... and THEN continue with ..."
Or better yet, use quotations (as in Factor, RetroForth, ...) in which case it's completely postfix without special compile-time words; just regular words taking addresses from the stack: [ do this ] [ do that ] if or [ do this ] when or [ do that ] unless. I personally much prefer this.
Aside RE: quotations
Here is how quotations are compiled in RetroForth. In my own Forth (which compiles to my own VM), I simply added a QUOTE instruction that pushes the next address to the stack and jumps over n-bytes. The n-bytes are expected to be terminated by a RETURN instruction and the if, when, unless words consume a predicate along with the address(es) left by preceding quotations; calling as appropriate. Very simple indeed, and quotations generally open the door for all kinds of beautiful abstractions away from thinking about the stack.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I thinking of mapping single quotation mark to double quotation mark i.e ' -> " in my vimrc.
Besides declaring chars in C, where is ' is used in programming?
Should I map it?
I'll reverse map them to access both.
Single quotes are used all over the place in programming.
In the Bourne shell (and derivatives, and csh and derivatives, and Perl and other languages) it is used to inhibit string exansion, so you can do this:
$ echo '$VARIABLE'
$VARIABLE
In C, the single quote is used to denote a character constant, rather than a string. So you can do this:
char c = 'c';
But this is an error:
char c = "c";
And of course if you are programming in a language called "English", the single quote is used to denote important things like possessives ("snihalani's question seemed sort of odd) as well as contractions ("I can't believe anyone would want to do this.").
These are just a few examples. There are, of course, more.
I use single quotes almost exclusively. They're useful when you're using double quotes inside of strings:
print 'Foo said, "Bar"'
It's easier than escaping them:
print "Foo said, \"Bar\""
Also, you won't be able to type normal sentences with possessives either:
# Attaches foo's signal to a slot
self.foo.bar.connect(self.baz)
PHP, for instance, doesn't perform variable substitution when strings are quoted with single quotes:
$var = 1;
echo('I will literally print $var');
In many languages (e.g., C, C++, Ada), ' delimits character literals and " delimits string literals.
In others (e.g., Perl, Bourne shell), either ' or " can be used for string literals, but with different semantics; " is handy when the string contains ' characters, and vice versa, and " causes references to variables to be expanded to the name of the variable, while ' prevents this.
Ada uses ' to delimit the name of an attribute.
And in all languages, you'll need ' in comments and string literals -- for example if you want to write "you'll need ' in comments and string literals".
They're distinct characters. Removing your ability to type one of them is Not A Good Idea.
You might consider mapping ' to " and vice versa, if that makes typing easier for you. But once you get into the habit of using your mappings, it could be awkward to use somebody else's setup, or to type text into something other than vim.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 13 years ago.
Every programming language I know (Perl, Javascript, PHP, Python, ASP, ActionScript, Commodore Basic) uses single and double quotes to delimit strings.
This creates the ongoing situation of having to go to great lengths to treat quotes correctly, since the quote is extremely common in the contents of strings.
Why do programming languages not use some other character to delimit strings, one that is not used in normal conversation \, | or { } for example) so we can just get on with our lives?
Is this true, or am I overlooking something? Is there an easy way to stop using quotes for strings in a modern programming language?
print <<<END
I know about here document syntax, but for minor string manipulation it's overly complicated and it complicates formatting.
END;
[UPDATE] Many of you made a good point about the importance of using only ASCII characters. I have updated the examples to reflect that (the backslash, the pipe and braces).
Perl lets you use whatever characters you like
"foo $bar" eq
qq(foo $bar) eq
qq[foo $bar] eq
qq!foo $bar! eq
qq#foo $bar# etc
Meanwhile
'foo $bar' eq
q(foo $bar) eq
q[foo $bar] eq
q!foo $bar! eq
q#foo $bar# etc
The syntax extends to other features, including regular expressions, which is handy if you are dealing with URIs.
"http://www.example.com/foo/bar/baz/" =~ /\/foo/[^\/]+\/baz\//;
"http://www.example.com/foo/bar/baz/" =~ m!/foo/[^/]+/baz/!;
Current: "Typewriter" 'quotation' marks
There are many good reasons for using the quotation marks we are currently using:
Quotes are easily found on keyboards - so they are easy to type, and they have to be easy, because strings are needed so often.
Quotes are in ASCII - most programming tools only handle well ASCII. You can use ASCII in almost any environment imaginable. And that's important when you are fixing your program over a telnet connection in some far-far-away server.
Quotes come in many versions - single quotes, double quotes, back quotes. So a language can assign different meanings for differently quoted strings. These different quotes can also solve the 'quotes "inside" quotes' problem.
Quotes are natural - English used quotes for marking up text passages long before programming languages followed. In linguistics quotes are used in quite the same way as in programming languages. Quotes are natural the same way + and - are natural for addition and substraction.
Alternative: “typographically” ‘correct’ quotes
Technically they are superior. One great advantage is that you can easily differenciate between opening and closing quotes. But they are hard to type and they are not in ASCII. (I had to put them into a headline to make them visible in this StackOverflow font at all.)
Hopefully on one day when ASCII is something that only historians care about and keyboards have changed into something totally different (if we are even going to have keyboards at all), there will come a programming language that uses better quotes...
Python does have an alternative string delimiter with the triple-double quote """Some String""".
Single quotes and double quotes are used in the majority of languages since that is the standard delimiter in most written languages.
Languages (should) try to be as simple to understand as possible, and using something different from quotes to deal with strings introduces unnecessary complexity.
Python has an additional string type, using triple double-quotes,
"""like this"""
In addition to this, Perl allows you to use any delimiter you want,
q^ like this ^
I think for the most part, the regular string delimiters are used because they make sense. A string is wrapped in quotes. In addition to this, most developers are used to using their common-sense when it comes to strings that drastically altering the way strings are presented could be a difficult learning curve.
Using quotation marks to define a set of characters as separate from the enclosing text is more natural to us, and thus easier to read. Also, " and ' are on the keyboard, while those other characters you mentioned are not, so it's easier to type. It may be possible to use a character that is widely available on keyboards, but I can't think of one that won't have the same kind of problem.
E: I missed the pipe character, which may actually be a viable alternative. Except that it's currently widely used as the OR operator, and the readability issue still stands.
Ah, so you want old-fashioned FORTRAN, where you'd quote by counting the number of characters in the string and embedding it in a H format, such as: 13HHello, World!. As somebody who did a few things with FORTRAN back in the days when the language name was all caps, quotation marks and escaping them are a Good Thing. (For example, you aren't totally screwed if you are off by one in your manual character count.)
Seriously, there is no ideal solution. It will always be necessary, at some point, to have a string containing whatever quote character you like. For practical purposes, the quote delimiters need to be on the keyboard and easily accessible, since they're heavily used. Perl's q#...# syntax will fail if a string contains an example of each possible character. FORTRAN's Hollerith constants are even worse.
Because those other characters you listed aren't ASCII. I'm not sure that we are ready for, or need a programming language in unicode...
EDIT: As to why not use {}, | or \, well those symbols all already have meanings in most languages. Imagine C or Perl with two different meanings for '{' and '}'!
| means or, and in some languages concatenate strings already. and how would you get \n if \ was the delimiter?
Fundamentally, I really don't see why this is a problem. Is \" really THAT hard? I mean, in C, you often have to use \%, and \ and several other two-character characters so... Meh.
Because no one has created a language using some other character that has gotten popular.
I think that is largely because the demand for changing the character is just not there, most programmers are used to the standard quote and see no compelling reason to change the status quo.
Compare the following.
print "This is a simple string."
print "This \"is not\" a simple string."
print ¤This is a simple string.¤
print ¤This "is not" a simple string.¤
I for one don't really feel like the second is any easier or more readable.
You say "having to go to great lengths to treat quotes correctly"; but it's only in the text representation. All modern languages treat strings as binary blocks, so they really don't care about the content. Remember that the text representation is only a simple way for the programmer to tell the system what to do. Once the string is interned, it doesn't have any trouble managing the quotes.
One good reason would probably be that if this is the only thing you want to improve on an existing language, you're not really creating a new language.
And if you're creating a new language, picking the right character for the string quotes is probably way way WAY down on the todo list of things to actually implement.
You would probably be best off picking a delimiter that exists on all common keyboards and terminal representation sets, so most of the ones you suggest are right out...
And in any case, a quoting mechanism will still be necessary...you gain a reduction in the number of times you use quoting at the cost of making the language harder for non-specialist to read.
So it is not entirely clear that this is a win, and then there is force of habit.
Ada doesn't use single quotes for strings. Those are only for chars, and don't have to be escaped inside strings.
I find it very rare that the double-quote character comes up in a normal text string that I enter into a computer program. When it does, it is almost always because I am passing that string to a command interpreter, and need to embed another string in it.
I would imagine the main reason none of those other characters are used for string delimiters is that they aren't in the original 7-bit ASCII code table. Perhaps that's not a good excuse these days, but in a world where most language designers are afraid to buck the insanely crappy C syntax, you aren't going to get a lot of takers for an unusual string delimiter choice.
Python allows you to mix single and double quotes to put quotation marks in strings.
print "Please welcome Mr Jim 'Beaner' Wilson."
>>> Please welcome Mr Jim 'Beaner' Wilson.
print 'Please welcome Mr Jim "Beaner" Wilson.'
>>> Please welcome Mr Jim "Beaner" Wilson
You can also used the previously mentioned triple quotes. These also extend across multiple lines to allow you to also keep from having to print newlines.
print """Please welcome Mr Jim "Beaner" Wilson."""
>>> Please welcome Mr Jim "Beaner" Wilson
Finally, you can print strings the same way as everyone else.
print "Please welcome Mr Jim \"Beaner\" Wilson."
>>> Please welcome Mr Jim "Beaner" Wilson