Is there any language where names can include space characters? - programming-languages

Is there any programming language that allows Names to include white spaces ? (By names, I intend variables, methods, field, etc.)

Scala does allow whitespace characters in identifier names (but for that to be possible, you need to surround the identifiers with pair of backticks).
Example (executed at Scala REPL):
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) Client VM, Java 1.6.0_22).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val `lol! this works! :-D` = 4
lol! this works! :-D: Int = 4
scala> val `omg!!!` = 4
omg!!!: Int = 4
scala> `omg!!!` + `lol! this works! :-D`
res0: Int = 8

In SQL you can have spaces and other non-identifier characters in field names and such. You just have to quote them like [field name] or "field name".

Common Lisp can do it with variables, if you surround the variable name with pipes (|):
CL-USER> (setf |hello world| 42)
42
CL-USER> |hello world|
42
Worth noting is that "piped" variable names also are case sensitive (which variable names normally aren't in CL).
CL-USER> |Hello World|
The variable |Hello World| is unbound.
[Condition of type UNBOUND-VARIABLE]
CL-USER> (setf hello-world 99)
99
CL-USER> hello-world
99
CL-USER> HeLlO-WoRlD
99

PHP can: http://blog.riff.org/2008_05_11_spaces_php_variable_names
Perl also:
${'some var'} = 42;
print ${'some var'}, "\n";
${'my method'} = sub {
print "method called\n";
};
&${'my method'};

A more recent innovation and experimental web script (sub)type of JavaScript: https://github.com/featurist/pogoscript/wiki
wind speed = 25
average temperature = 32
becomes
windSpeed = 25
averageTemperature = 32
Behind the screens. Also flexible rules on positioning of return variables so you can do:
y = compute some value from (z) and return it
md5 hash (read all text from file "sample.txt")
Becomes:
var y;
y = computeSomeValueFromAndReturnIt(z);
md5Hash(readAllTextFromFile("sample.txt"));

In Ruby you can have symbols that are named as :"this has a space" but it is enclosed in double-quotes so I'm not sure if you count that.
If other languages allowed whitespace as a valid character in symbol names, then you would have to use some other character to separate them.

The problem with spaces in variable names is that it's subject to interpretation since whitespace normally means "ok, end of the current token, starting another." Exceptions to this rule must have some special indicator such as quotation marks in a string ("This is a test").

Our PARLANSE parallel programming language is one such. In fact, it allows any character in identifiers, although many of them, including spaces, have to be escaped (preceded by ~) to be included in the name. Here's an example:
~'Buffer~ Marker~'
This is used to let PARLANSE easily refer to arbitrary symbols from other languages (in particular, from EBNFs taken from arbitrary reference documents, where we can't control the punctuation used).
We don't use this feature a lot, but when it is needed it means we can stay true to tokens from other documents.

You might be able to find esoteric languages that don't separate expression elements with whitespaces on this website: http://99-bottles-of-beer.net
For example... whitespace :D

Some dialects of SQL allow databases, tables, and fields to have spaces in their names.
For example, in SQL Server, you can refer to a table with a space in its name, either by putting the table name in [square brackets] or (depending on connection options) in "double quotes".

There shouldn't be much problems creating such languages supporting whitespaces in identifiers, as long as there are enough separating tokens which say the parser where the identifiers end (such as operators, braces, commas and the infamous semicolon). It just doesn't improve the readability of the source code much.

Related

Determine when an SQL alias can be an open name

What would be the highest-performing implementation to determine if a string that represents an SQL alias needs to be wrapped in double-quotes?
Presently, in pg-promise I am wrapping every alias in double-quotes, to play it safe. I am looking to make the output SQL neater and shorter, if possible.
And I am divided which approach is the best -
to use a regular expression, somehow
to do a direct algorithm with strings
not to change it at all, if there are reasons for that
Basically, I am looking to improve function as.alias, if possible, not to wrap aliases into double quotes when it is not needed.
What have I tried so far...
I thought at first to do it only for the 99% of all cases - not to add double-quotes when your alias is the most typical one, just a simple word:
function skipQuotes(alias) {
const m = alias.match(/[A-Z]+|[a-z]+/);
return m && m[0] === alias;
}
This only checks it is a single word that uses either upper or lower case, but not the combination.
SOLUTION
Following the answer, I ended up with implementation that should cover 99% of all practical use cases, which is what I was trying to achieve:
const m = alias.match(/[a-z_][a-z0-9_$]*|[A-Z_][A-Z0-9_$]*/);
if (m && m[0] === alias) {
// double quotes will be skipped
} else {
// double quotes will be added
}
i.e. the surrounding double quotes are not added when the alias uses a simple syntax:
it is a same-case single word, without spaces
it can contain underscores, and can start with one
it can contain digits and $, but cannot start with those
Removing double quotes is admirable -- it definitely makes queries easier to read. The rules are pretty simple. A "valid" identifier consists of:
Letters (including diacritical marks), numbers, underscore, and dollar sign.
Starts with a letter (including diacriticals) or underscore.
Is not a reserved word.
(I think I have this summarized correctly. The real rules are in the documentation.)
The first two are readily implemented using regular expressions. The last probably wants a reference table for lookup (and the list varies by Postgres release -- although less than you might imagine).
Otherwise, the identifier needs to be surrounded by escape characters. Postgres uses double quotes (which is ANSI standard).
One reason you may want to do this is because Postgres converts identifiers to lower case for comparison. So, the following works fine:
select xa, Xa, xA, "xa"
from (select 1 as Xa) y
However, this does not work:
select Xa
from (select 1 as "Xa") y
Nor does:
select "Xa"
from (select 1 as Xa) y
In fact, there is no way to get refer to "Xa" without using quotes (at least none that I can readily think of).
Enforcing the discipline of exact matches can be a good thing or a bad thing. I find that one discipline too many: I admit to often ignoring case when writing "casual" code; it is just simpler to type without capitalization (or using double quotes). For more formal code, I try to be consistent.
On the other hand, the rules do allow:
select "Xa", "aX", ax
from (select 1 as "Xa", 2 as "aX", 3 as AX) y
(This returns 1, 2, 3.)
This is a naming convention that I would be happy if it were not allowed.

Single-quote notation for characters in Coq?

In most programming languages, 'c' is a character and "c" is a string of length 1. But Coq (according to its standard ascii and string library) uses "c" as the notation for both, which requires constant use of Open Scope to clarify which one is being referred to. How can you avoid this and designate characters in the usual way, with single quotes? It would be nice if there is a solution that only partially overrides the standard library, changing the notation but recycling the rest.
Require Import Ascii.
Require Import String.
Check "a"%char.
Check "b"%string.
or this
Program Definition c (s:string) : ascii :=
match s with "" => " "%char | String a _ => a end.
Check (c"A").
Check ("A").
I am quite confident that there is no smart way of doing this, but there is a somewhat annoying one: simply declare one notation for each character.
Notation "''c''" := "c" : char_scope.
Notation "''a''" := "a" : char_scope.
Check 'a'.
Check 'c'.
It shouldn't be too hard to write a script for automatically generating those declarations. I don't know if this has any negative side-effects on Coq's parser, though.

Adding space in a specific position in a string of uppercase and lowercase letters

Dear stackoverflow users,
Many people encounter situations in which they need to modify strings. I have seen many
posts related to string modification. But, I have not come across solutions I am looking
for. I believe my post would be useful for some other R users who will face similar
challenges. I would like to seek some help from R users who are familiar with string
modification.
I have been trying to modify a string like the following.
x <- "Marcus HELLNERJohan OLSSONAnders SOEDERGRENDaniel RICHARDSSON"
There are four individuals in this string. Family names are in capital letters.
Three out of four family names stay in chunks with first names (e.g., HELLNERJohan).
I want to separate family names and first names adding space (e.g., HELLNER Johan).
I think I need to state something like "Select sequences of uppercase letters, and
add space between the last and second last uppercase letters, if there are lowercase
letters following."
The following post is probably somewhat relevant, but I have not been successful in writing codes yet.
Splitting String based on letters case
Thank you very much for your generous support.
This works by finding and capturing two consecutive sub-patterns, the first consisting of one upper case letter (the end of a family name), and the next consisting of an upper then a lower-case letter (taken to indicate the start of a first name). Everywhere these two groups are found, they are captured and replaced by themselves with a space inserted between (the "\\1 \\2" in the call below).
x <- "Marcus HELLNERJohan OLSSONAnders SOEDERGRENDaniel RICHARDSSON"
gsub("([[:upper:]])([[:upper:]][[:lower:]])", "\\1 \\2", x)
# "Marcus HELLNER Johan OLSSON Anders SOEDERGREN Daniel RICHARDSSON"
If you want to separate the vector into a vector of names, this splits the string using a regular expression with zero-width lookbehind and lookahead assertions.
strsplit(x, split = "(?<=[[:upper:]])(?=[[:upper:]][[:lower:]])",
perl = TRUE)[[1]]
# [1] "Marcus HELLNER" "Johan OLSSON" "Anders SOEDERGREN"
# [4] "Daniel RICHARDSSON"

What's the point of nesting brackets in Lua?

I'm currently teaching myself Lua for iOS game development, since I've heard lots of very good things about it. I'm really impressed by the level of documentation there is for the language, which makes learning it that much easier.
My problem is that I've found a Lua concept that nobody seems to have a "beginner's" explanation for: nested brackets for quotes. For example, I was taught that long strings with escaped single and double quotes like the following:
string_1 = "This is an \"escaped\" word and \"here\'s\" another."
could also be written without the overall surrounding quotes. Instead one would simply replace them with double brackets, like the following:
string_2 = [[This is an "escaped" word and "here's" another.]]
Those both make complete sense to me. But I can also write the string_2 line with "nested brackets," which include equal signs between both sets of the double brackets, as follows:
string_3 = [===[This is an "escaped" word and "here's" another.]===]
My question is simple. What is the point of the syntax used in string_3? It gives the same result as string_1 and string_2 when given as an an input for print(), so I don't understand why nested brackets even exist. Can somebody please help a noob (me) gain some perspective?
It would be used if your string contains a substring that is equal to the delimiter. For example, the following would be invalid:
string_2 = [[This is an "escaped" word, the characters ]].]]
Therefore, in order for it to work as expected, you would need to use a different string delimiter, like in the following:
string_3 = [===[This is an "escaped" word, the characters ]].]===]
I think it's safe to say that not a lot of string literals contain the substring ]], in which case there may never be a reason to use the above syntax.
It helps to, well, nest them:
print [==[malucart[[bbbb]]]bbbb]==]
Will print:
malucart[[bbbb]]]bbbb
But if that's not useful enough, you can use them to put whole programs in a string:
loadstring([===[print "o m g"]===])()
Will print:
o m g
I personally use them for my static/dynamic library implementation. In the case you don't know if the program has a closing bracket with the same amount of =s, you should determine it with something like this:
local c = 0
while contains(prog, "]" .. string.rep("=", c) .. "]") do
c = c + 1
end
-- do stuff

Which languages allow whitespace in identifiers?

Which languages allow whitespace in identifiers?
Example:
int current index = 5
string body = fetch article(current index)
FORTRAN, and it was a bad design decision.
For example, replacing a , by a . can transform a DO loop into an assignment.
MSSQL, MSAccess, and Oracle, if you quote identifiers correctly (using [] or "" respectively)
Whitespace!
http://compsoc.dur.ac.uk/whitespace/
The problem with whitespace, is that it's often used as separator between tokens. So if you allow whitespace you have to combine several tokens to one.
But it is not impossible. Two identifiers without another token are rare so you can adopt the compiler to accept this.
On the other hand, you can get hard to read code:
int current index = 5
int current /* in between comment */ index = 5
int current
index = 5
So I don't think the advantages beat the disadvantages.

Resources