Why don't popular programming languages use some other character to delimit strings? [closed] - string

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 13 years ago.
Every programming language I know (Perl, Javascript, PHP, Python, ASP, ActionScript, Commodore Basic) uses single and double quotes to delimit strings.
This creates the ongoing situation of having to go to great lengths to treat quotes correctly, since the quote is extremely common in the contents of strings.
Why do programming languages not use some other character to delimit strings, one that is not used in normal conversation \, | or { } for example) so we can just get on with our lives?
Is this true, or am I overlooking something? Is there an easy way to stop using quotes for strings in a modern programming language?
print <<<END
I know about here document syntax, but for minor string manipulation it's overly complicated and it complicates formatting.
END;
[UPDATE] Many of you made a good point about the importance of using only ASCII characters. I have updated the examples to reflect that (the backslash, the pipe and braces).

Perl lets you use whatever characters you like
"foo $bar" eq
qq(foo $bar) eq
qq[foo $bar] eq
qq!foo $bar! eq
qq#foo $bar# etc
Meanwhile
'foo $bar' eq
q(foo $bar) eq
q[foo $bar] eq
q!foo $bar! eq
q#foo $bar# etc
The syntax extends to other features, including regular expressions, which is handy if you are dealing with URIs.
"http://www.example.com/foo/bar/baz/" =~ /\/foo/[^\/]+\/baz\//;
"http://www.example.com/foo/bar/baz/" =~ m!/foo/[^/]+/baz/!;

Current: "Typewriter" 'quotation' marks
There are many good reasons for using the quotation marks we are currently using:
Quotes are easily found on keyboards - so they are easy to type, and they have to be easy, because strings are needed so often.
Quotes are in ASCII - most programming tools only handle well ASCII. You can use ASCII in almost any environment imaginable. And that's important when you are fixing your program over a telnet connection in some far-far-away server.
Quotes come in many versions - single quotes, double quotes, back quotes. So a language can assign different meanings for differently quoted strings. These different quotes can also solve the 'quotes "inside" quotes' problem.
Quotes are natural - English used quotes for marking up text passages long before programming languages followed. In linguistics quotes are used in quite the same way as in programming languages. Quotes are natural the same way + and - are natural for addition and substraction.
Alternative: “typographically” ‘correct’ quotes
Technically they are superior. One great advantage is that you can easily differenciate between opening and closing quotes. But they are hard to type and they are not in ASCII. (I had to put them into a headline to make them visible in this StackOverflow font at all.)
Hopefully on one day when ASCII is something that only historians care about and keyboards have changed into something totally different (if we are even going to have keyboards at all), there will come a programming language that uses better quotes...

Python does have an alternative string delimiter with the triple-double quote """Some String""".
Single quotes and double quotes are used in the majority of languages since that is the standard delimiter in most written languages.

Languages (should) try to be as simple to understand as possible, and using something different from quotes to deal with strings introduces unnecessary complexity.

Python has an additional string type, using triple double-quotes,
"""like this"""
In addition to this, Perl allows you to use any delimiter you want,
q^ like this ^
I think for the most part, the regular string delimiters are used because they make sense. A string is wrapped in quotes. In addition to this, most developers are used to using their common-sense when it comes to strings that drastically altering the way strings are presented could be a difficult learning curve.

Using quotation marks to define a set of characters as separate from the enclosing text is more natural to us, and thus easier to read. Also, " and ' are on the keyboard, while those other characters you mentioned are not, so it's easier to type. It may be possible to use a character that is widely available on keyboards, but I can't think of one that won't have the same kind of problem.
E: I missed the pipe character, which may actually be a viable alternative. Except that it's currently widely used as the OR operator, and the readability issue still stands.

Ah, so you want old-fashioned FORTRAN, where you'd quote by counting the number of characters in the string and embedding it in a H format, such as: 13HHello, World!. As somebody who did a few things with FORTRAN back in the days when the language name was all caps, quotation marks and escaping them are a Good Thing. (For example, you aren't totally screwed if you are off by one in your manual character count.)
Seriously, there is no ideal solution. It will always be necessary, at some point, to have a string containing whatever quote character you like. For practical purposes, the quote delimiters need to be on the keyboard and easily accessible, since they're heavily used. Perl's q#...# syntax will fail if a string contains an example of each possible character. FORTRAN's Hollerith constants are even worse.

Because those other characters you listed aren't ASCII. I'm not sure that we are ready for, or need a programming language in unicode...
EDIT: As to why not use {}, | or \, well those symbols all already have meanings in most languages. Imagine C or Perl with two different meanings for '{' and '}'!
| means or, and in some languages concatenate strings already. and how would you get \n if \ was the delimiter?
Fundamentally, I really don't see why this is a problem. Is \" really THAT hard? I mean, in C, you often have to use \%, and \ and several other two-character characters so... Meh.

Because no one has created a language using some other character that has gotten popular.
I think that is largely because the demand for changing the character is just not there, most programmers are used to the standard quote and see no compelling reason to change the status quo.
Compare the following.
print "This is a simple string."
print "This \"is not\" a simple string."
print ¤This is a simple string.¤
print ¤This "is not" a simple string.¤
I for one don't really feel like the second is any easier or more readable.

You say "having to go to great lengths to treat quotes correctly"; but it's only in the text representation. All modern languages treat strings as binary blocks, so they really don't care about the content. Remember that the text representation is only a simple way for the programmer to tell the system what to do. Once the string is interned, it doesn't have any trouble managing the quotes.

One good reason would probably be that if this is the only thing you want to improve on an existing language, you're not really creating a new language.
And if you're creating a new language, picking the right character for the string quotes is probably way way WAY down on the todo list of things to actually implement.

You would probably be best off picking a delimiter that exists on all common keyboards and terminal representation sets, so most of the ones you suggest are right out...
And in any case, a quoting mechanism will still be necessary...you gain a reduction in the number of times you use quoting at the cost of making the language harder for non-specialist to read.
So it is not entirely clear that this is a win, and then there is force of habit.

Ada doesn't use single quotes for strings. Those are only for chars, and don't have to be escaped inside strings.
I find it very rare that the double-quote character comes up in a normal text string that I enter into a computer program. When it does, it is almost always because I am passing that string to a command interpreter, and need to embed another string in it.
I would imagine the main reason none of those other characters are used for string delimiters is that they aren't in the original 7-bit ASCII code table. Perhaps that's not a good excuse these days, but in a world where most language designers are afraid to buck the insanely crappy C syntax, you aren't going to get a lot of takers for an unusual string delimiter choice.

Python allows you to mix single and double quotes to put quotation marks in strings.
print "Please welcome Mr Jim 'Beaner' Wilson."
>>> Please welcome Mr Jim 'Beaner' Wilson.
print 'Please welcome Mr Jim "Beaner" Wilson.'
>>> Please welcome Mr Jim "Beaner" Wilson
You can also used the previously mentioned triple quotes. These also extend across multiple lines to allow you to also keep from having to print newlines.
print """Please welcome Mr Jim "Beaner" Wilson."""
>>> Please welcome Mr Jim "Beaner" Wilson
Finally, you can print strings the same way as everyone else.
print "Please welcome Mr Jim \"Beaner\" Wilson."
>>> Please welcome Mr Jim "Beaner" Wilson

Related

Shell Input Independent of Spaces Implementation

I recently had a programming class where we implemented a shell in Java. One of the requirements was to ensure that parameters could be read from terminal regardless of spaces and tabs etc between them unless within quotes where everything within quotes is taken as is.
To solve this I wrote a regex and used streams to get the results in an array for further processing.
But now while preparing for a systems programming course I realized there must be a simpler way to do this? How is this implemented in a typical shell like bash?
Is it just reading the stream character by character and skipping when it runs into quotes until it finds a matching one?
For complex grammars (and bash tokens can be complex), better to use parser/generator tools, instead of implementing the logic from scratch. Using RE, possible to get some grammars covered, but it's unlikely to cover complex set of rules.
Depending on the constraints (programming language, etc.), consider two options:
Using flex/bison for token parsing and grammar parsing, OR
Using a scripting engine (Python, Perl, JavaScript), which has both RE and strong string processing capabilities.
For bash (and probably any other modern shell), it is much more complex than that. See this function in bash's source code that is used to parse a matching pair of characters (quotes, curly braces etc.). It is pretty complex as there are many different types of quotes, parentheses and braces (', {, (, ", ', ...) and a lot of edge cases are involved. For example, in this case, skipping characters until you see another quotes wouldn't work because things can be nested:
echo "`echo "hello"`"
I don't know about the requirements of the shell program you implemented in your class, but if it didn't include such nested constructs, then I believe a simple approach like you mentioned could be used.

Is there a perfect function in the standard library or some recommended methodology to expand escape sequences in Rust?

I am taking user input in an application I am writing and would like to expand escape sequences that the user enters.
For example if the user enters \\n it will be interpreted into a str \\\\n. I want to, in a general way, interpret that string (into a newline) and similar ones.
I could of course use String::replace() on the most essential ones and live without the rest, but I would prefer a general solution that also handles hex escapes (\x61 is a).
Escapes are usually handled by the lexer / parser (basically they're part of the language grammar), I don't think there's an stdlib function which would manage them as it would be done as a much lower level.
Furthermore, escapes tend to be highly language-specific, in possibly unexpected ways. Rust has a singularly small list of escape sequences, which is probably desirable compared to the garbage available from C's, but I still do not know that you'd want to allow e.g. arbitrary hex or unicode escape sequences.
I would therefore recommend setting up your own explicitly supported list of escapes, though if you really do not want to there are probably third-party packages which can help you.

Case-insensitive string comparison in Julia

I'm sure this has a simple answer, but how does one compare two string and ignore case in Julia? I've hacked together a rather inelegant solution:
function case_insensitive_match{S<:AbstractString}(a::S,b::S)
lowercase(a) == lowercase(b)
end
There must be a better way!
Efficiency Issues
The method that you have selected will indeed work well in most settings. If you are looking for something more efficient, you're not apt to find it. The reason is that capital vs. lowercase letters are stored with different bit encoding. Thus it isn't as if there is just some capitalization field of a character object that you can ignore when comparing characters in strings. Fortunately, the difference in bits between capital vs. lowercase is very small, and thus the conversions are simple and efficient. See this SO post for background on this:
How do uppercase and lowercase letters differ by only one bit?
Accuracy Issues
In most settings, the method that you have will work accurately. But, if you encounter characters such as capital vs. lowercase Greek letters, it could fail. For that, you would be better of with the normalize function (see docs for details) with the casefold option:
normalize("ad", casefold=true)
See this SO post in the context of Python which addresses the pertinent issues here and thus need not be repeated:
How do I do a case-insensitive string comparison?
Since it's talking about the underlying issues with utf encoding, it is applicable to Julia as well as Python.
See also this Julia Github discussion for additional background and specific examples of places where lowercase() can fail:
https://github.com/JuliaLang/julia/issues/7848

Why do programming languages use commas to separate function parameters?

It seems like all programming languages use commas (,) to separate function parameters.
Why don't they use just spaces instead?
Absolutely not. What about this function call:
function(a, b - c);
How would that look with a space instead of the comma?
function(a b - c);
Does that mean function(a, b - c); or function(a, b, -c);? The use of the comma presumably comes from mathematics, where commas have been used to separate function parameters for centuries.
First of all, your premise is false. There are languages that use space as a separator (lisp, ML, haskell, possibly others).
The reason that most languages don't is probably that a) f(x,y) is the notation most people are used to from mathematics and b) using spaces leads to lots of nested parentheses (also called "the lisp effect").
Lisp-like languages use: (f arg1 arg2 arg3) which is essentially what you're asking for.
ML-like languages use concatenation to apply curried arguments, so you would write f arg1 arg2 arg3.
Tcl uses space as a separator between words passed to commands. Where it has a composite argument, that has to be bracketed or otherwise quoted. Mind you, even there you will find the use of commas as separators – in expression syntax only – but that's because the notation is in common use outside of programming. Mathematics has written n-ary function applications that way for a very long time; computing (notably Fortran) just borrowed.
You don't have to look further than most of our natural languages to see that comma is used for separation items in lists. So, using anything other than comma for enumerating parameters would be unexpected for anyone learning a programming language for the first time.
There's a number of historical reasons already pointed out.
Also, it's because in most languages, where , serves as separator, whitespace sequences are largely ignored, or to be more exact, although they may separate tokens, they do not act as tokens themselves. This is moreless true for all languages deriving their syntax from C. A sequence of whitespaces is much like the empty word and having the empty word delimit anything probably is not the best of ideas.
Also, I think it is clearer and easier to read. Why have whitespaces, which are invisible characters, and essentially serve nothing but the purpose of formatting, as really meaningful delimiters. It only introduces ambiguity. One example is that provided by Carl.
A second would f(a (b + c)). Now is that f(a(b+c)) or f(a, b+c)?
The creators of JavaScript had a very useful idea, similar to yours, which yields just the same problems. The idea was, that ENTER could also serve as ;, if the statement was complete. Observe:
function a() {
return "some really long string or expression or whatsoever";
}
function b() {
return
"some really long string or expression or whatsoever";
}
alert(a());//"some really long string or expression or whatsoever"
alert(b());//"undefined" or "null" or whatever, because 'return;' is a valid statement
As a matter of fact, I sometimes tend to use the latter notation in languages, that do not have this 'feature'. JavaScript forces a way to format my code upon me, because someone had the cool idea, of using ENTER instead of ;.
I think, there is a number of good reasons why some languages are the way they are. Especially in dynamic languages (as PHP), where there's no compile time check, where the compiler could warn you, that the way it resolved an ambiguity as given above, doesn't match the signature of the call you want to make. You'd have a lot of weird runtime errors and a really hard life.
There are languages, which allow this, but there's a number of reasons, why they do so. First and foremost, because a bunch of very clever people sat down and spent quite some time designing a language and then discovered, that its syntax makes the , obsolete most of the time, and thus took the decision to eliminate it.
This may sound a bit wise but I gather for the same reason why most earth-planet languages use it (english, french, and those few others ;-) Also, it is intuitive to most.
Haskell doesn't use commas.
Example
multList :: [Int] -> Int -> [Int]
multList (x : xs) y = (x * y) : (multList xs y)
multList [] _ = []
The reason for using commas in C/C++ is that reading a long argument list without a separator can be difficult without commas
Try reading this
void foo(void * ptr point & * big list<pointers<point> > * t)
commas are useful like spaces are. In Latin nothing was written with spaces, periods, or lower case letters.
Try reading this
IAMTHEVERYMODELOFAWHATDOYOUWANTNOTHATSMYBUCKET
it's primarily to help you read things.
This is not true. Some languages don't use commas. Functions have been Maths concepts before programming constructs, so some languages keep the old notation. Than most of the newer has been inspired by C (Javascript, Java, C#, PHP too, they share some formal rules like comma).
While some languages do use spaces, using a comma avoids ambiguous situations without the need for parentheses. A more interesting question might be why C uses the same character as a separator as is used for the "a then b" operator; the latter question is in some ways more interesting given that the C character set has at three other characters that do not appear in any context (dollar sign, commercial-at, and grave, and I know at least one of those (the dollar sign) dates back to the 40-character punchcard set.
It seems like all programming languages use commas (,) to separate function parameters.
In natural languages that include comma in their script, that character is used to separate things. For instance, if you where to enumerate fruits, you'd write: "lemon, orange, strawberry, grape" That is, using comma.
Hence, using comma to separate parameters in a function is more natural that using other character ( | for instance )
Consider:
someFunction( name, age, location )
vs.
someFunction( name|age|location )
Why don't they use just spaces instead?
Thats possible. Lisp does it.
The main reason is, space, is already used to separate tokens, and it's easier not to assign an extra functionality.
I have programmed in quite a few languages and while the comma does not rule supreme it is certainly in front. The comma is good because it is a visible character so that script can be compressed by removing spaces without breaking things. If you have space then you can have tabs and that can be a pain in the ... There are issues with new-lines and spaces at the end of a line. Give me a comma any day, you can see it and you know what it does. Spaces are for readability (generally) and commas are part of syntax. Mind you there are plenty of exceptions where a space is required or de rigueur. I also like curly brackets.
It is probably tradition. If they used space they could not pass expression as param e.g.
f(a-b c)
would be very different from
f(a -b c)
Some languages, like Boo, allow you to specify the type of parameters or leave it out, like so:
def MyFunction(obj1, obj2, title as String, count as Int):
...do stuff...
Meaning: obj1 and obj2 can be of any type (inherited from object), where as title and count must be of type String and Int respectively. This would be hard to do using spaces as separators.

Is there any advantage of being a case-sensitive programming language? [duplicate]

This question already has answers here:
Closed 14 years ago.
I personally do not like programming languages being case sensitive.
(I know that the disadvantages of case sensitivity are now-a-days complemented by good IDEs)
Still I would like to know whether there are any advantages for a programming language if it is case sensitive. Is there any reason why designers of many popular languages chose to make them case sensitive?
EDIT: duplicate of Why are many languages case sensitive?
EDIT: (I cannot believe I asked this question a few years ago)
This is a preference. I prefer case sensitivity, I find it easier to read code this way. For instance, the variable name "myVariable" has a different word shape than "MyVariable," "MYVARIABLE," and "myvariable." This makes it more straightforward at a glance to tell the two identifiers apart. Of course, you should not or very rarely create identifiers that differ only in case. This is more about consistency than the obvious "benefit" of increasing the number of possible identifiers. Some people think this is a disadvantage. I can't think of any time in which case sensitivity gave me any problems. But again, this is a preference.
Case-sensitivity is inherently faster to parse (albeit only slightly) since it can compare character sequences directly without having to figure out which characters are equivalent to each other.
It allows the implementer of a class/library to control how casing is used in the code. Case may also be used to convey meaning.
The code looks more the same. In the days of BASIC these were equivalent:
PRINT MYVAR
Print MyVar
print myvar
With type checking, case sensitivity prevents you from having a misspelling and unrecognized variable. I have fixed bugs in code that is a case insensitive, non typed language (FORTRAN77), where the zero (0) and capital letter O looked the same in the editor. The language created a new object and so the output was flawed. With a case sensitive, typed language, this would not have happened.
In the compiler or interpreter, a case-insensitive language is going to have to make everything upper or lowercase to test for matches, or otherwise use a case insensitive matching tool, but that's only a small amount of extra work for the compiler.
Plus case-sensitive code allows certain patterns of declarations such as
MyClassName myClassName = new MyClassName()
and other situations where case sensitivity is nice.

Resources