Are there programming languages that rely on non-latin alphabets? - programming-languages

Every programming language I have ever seen has been based on the Latin alphabet, this is not surprising considering I live in Canada...
But it only really makes sense that there would be programming languages based on other alphabets, or else bright computer scientists across the world would have to learn a new alphabet to go on in the field. I know for a fact that people in countries dominated by other alphabets develop languages based off the Latin alphabet (eg. Ruby from Japan), but just how common is it for programming languages to be based off of other alphabets like Arabic, or Cyrillic, or even writing systems which are not alphabetic but rather logographic in nature such as Japanese Kanji?
Also are any of these languages in active widespread use, or are they mainly used as teaching tools?
This is something that has bugged me since I started programming, and I have never run across someone who could think of a real answer.

Have you seen Perl?

APL is probably the most widely known. It even has a cool keyboard overlay (or was it a special keyboard you had to buy?):
In the non-alphabetic category, we also have programming languages like LabVIEW, which is mostly graphical. (You can label objects, and you can still do string manipulation, so there's some textual content.) LabVIEW has been used in data acquisition and automation for years, but gained a bit of popularity when it became the default platform for Lego Mindstorms.

There's a list on Wikipedia. I don't think any of them is really prevalent though. Many programmers can learn to write programs with english keywords even if they didn't understand the language. Ruby is a good example, you'll still see Japanese identifiers and comments in some Ruby code.

Well, Brainf* uses no latin characters, if you'll pardon the language...and the pun.

Many languages allow Unicode identifiers. It's part of standard Java, and both g++ (though you have to use \uNNNN escapes) and MSVC++ allow them (see also this question) And some allow using #define (or maybe better) to rename control structures.
But in reality, people don't do this for the most part. See past questions such as Language of variable names?, Should all code be written in English?, etc.

Agda.
Sample Snippet:
mutual
data ωChain : Set where
_∷_,_ : ∀ (x : carrier) (xω : ∞ ωChain) (p : x ≼ xω) → ωChain
head : ωChain → carrier
head (x ∷ _ , _) = x
_≼_ : carrier → ∞ ωChain → Set
x ≼ xω = x ≤ head (♭ xω)

Well, there's always APL. That has its own UNICODE characters, and I believe it used to require a special keyboard too.

There'is one langauge used in russian ERP system called after company, which developed it 1C. But it's identifiers and operators has english analogs.
Also, I know that haskell has unicode identifiers support, so you can write programs in any alphabet. But this is not useful (My native language is russian). It's quite enough that you have to type program messages and helpful comments in native alphabet.

Other people are answering with languages that use punctuation marks in addition to Latin letters. I wonder why no one mentioned digits 0 to 9 as well.
In some languages, and in some implementations of some languages, programmers can use a wide range of characters in identifiers, such as Arabic or Chinese characters. This doesn't mean that the language relies on them though.
In most languages, programmers can use a wide range of characters in string literals (in quotation marks) and in comments. Again this doesn't mean that the language relies on them.
In every programming language that I've seen, the language does rely on punctuation marks and digits. So this answers your question but not in the way you expect.
Now let's try to find something meaningful. Is there a programming language where keywords are chosen from non-Latin alphabets? I would guess not, except maybe for joke languages. What would be the point of inventing a programming language that makes it impossible for some programmers to even input a program?
EDIT: My guess is wrong. Besides APL's usage of various invented punctuation marks, it does depend on a few Greek keywords, where each keyword is one letter long, such as the letter rho.

I just found an interesting wiki for "esoteric programming languages".

Related

How does flexibility affect a language's syntax?

I am currently working on writing my own language(shameless plug), which is centered around flexibility. I am trying to make almost any part of the language syntax exchangeable through things like extensions/plugins. While writing the whole thing, it has got me thinking. I am wondering how that sort of flexibility could affect the language.
I know that Lisp is often referred to as one of the most extensible languages due to its extensive macro system. I do understand that concept of macros, but I am yet to find a language that allows someone to change the way it is parsed. To my knowledge, almost every language has an extremely concrete syntax as defined by some long specification.
My question is how could having a flexible syntax affect the intuitiveness and usability of the language? I know the basic "people might be confused when the syntax changes" and "semantic analysis will be hard". Those are things that I am already starting to compensate for. I am looking for a more conceptual answer on the pros and cons of having a flexible syntax.
The topic of language design is still quite foreign to me, so I apologize if I am asking an obvious or otherwise stupid question!
Edit:
I was just wanting to clarify the question I was asking. Where exactly does flexibility in a language's syntax stand, in terms of language theory? I don't really need examples or projects/languages with flexibility, I want to understand how it can affect the language's readability, functionality, and other things like that.
Perl is the most flexible language I know. That a look at Moose, a postmodern object system for Perl 5. It's syntax is very different than Perl's but it is still very Perl-ish.
IMO, the biggest problem with flexibility is precedence in infix notation. But none I know of allow a datatype to have its own infix syntax. For example, take sets. It would be nice to use ⊂ and ⊇ in their syntax. But not only would a compiler have to recognize these symbols, it would have to be told their order of precedence.
Common Lisp allows to change the way it's parsed - see reader macros. Racket allows to modify its parser, see racket languages.
And of course you can have a flexible, dynamically extensible parsing alongside with powerful macros if you use the right parsing techniques (e.g., PEG). Have a look at an example here - mostly a C syntax, but extensible with both syntax and semantic macros.
As for precedence, PEG goes really well together with Pratt.
To answer your updated question - there is surprisingly little research done on programming languages readability anyway. You may want to have a look at what Dr. Blackwell group was up to, but it's still far from conclusive.
So I can only share my hand-wavy anecdotes - flexible syntax languages facilitates eDSL construction, and, in my opinion, eDSLs is the only way to eliminate unnecessary complexity from code, to make code actually maintainable in a long term. I believe that non-flexible languages are one of the biggest mistakes made by this industry, and it must be corrected at all costs, ASAP.
Flexibility allows you to manipulate the syntax of the language. For example, Lisp Macros can enable you to write programs that write programs and manipulate your syntax at compile-time to valid Lisp expressions. For example the Loop Macro:
(loop for x from 1 to 5
do(format t "~A~%" x))
1
2
3
4
5
NIL
And we can see how the code was translated with macroexpand-1:
(pprint(macroexpand-1 '(loop for x from 1 to 5
do (format t "~a~%" x))))
We can then see how a call to that macro is translated:
(LET ((X 1))
(DECLARE (TYPE (AND REAL NUMBER) X))
(TAGBODY
SB-LOOP::NEXT-LOOP
(WHEN (> X '5) (GO SB-LOOP::END-LOOP))
(FORMAT T "~a~%" X)
(SB-LOOP::LOOP-DESETQ X (1+ X))
(GO SB-LOOP::NEXT-LOOP)
SB-LOOP::END-LOOP)))
Language Flexibility just allows you to create your own embedded language within a language and reduce the length of your program in terms of characters used. So in theory, this can make a language very unreadable since we can manipulate the syntax. For example we can create invalid code that's translated to valid code:
(defmacro backwards (expr)
(reverse expr))
BACKWARDS
CL-USER> (backwards ("hello world" nil format))
"hello world"
CL-USER>
Clearly the above code can become complex since:
("hello world" nil format)
is not a valid Lisp expression.
Thanks to SK-logic's answer for pointing me in the direction of Alan Blackwell. I sent him an email asking his stance on the matter, and he responded with an absolutely wonderful explanation. Here it is:
So the person who responded to your StackOverflow question, saying
that flexible syntax could be useful for DSLs, is certainly correct.
It actually used to be fairly common to use the C preprocessor to
create alternative syntax (that would be turned into regular syntax in
an initial compile phase). A lot of the early esolangs were built this
way.
In practice, I think we would have to say that a lot of DSLs are
implemented as libraries within regular programming languages, and
that the library design is far more significant than the syntax. There
may be more purpose for having variety in visual languages, but making
customisable general purpose compilers for arbitrary graphical syntax
is really hard - much worse than changing text syntax features.
There may well be interesting things that your design could enable, so
I wouldn’t discourage experimentation. However, I think there is one reason why
customisable syntax is not so common. This is related to the famous
programmer’s editor EMACS. In EMACS, everything is customisable - all
key bindings, and all editor functions. It’s fun to play with, and
back in the day, many of us made our own personalised version that
only we knew how to operate. But it turned out that it was a real
hassle that everyone’s editor worked completely differently. You could
never lean over and make suggestions on another person’s session, and
teams always had to know who was logged in order to know whether the
editor would work. So it turned out that, over the years, we all just
started to use the default distribution and key bindings, which made
things easier for everyone.
At this point in time, that is just about enough of an explanation that I was looking for. If anyone feels as though they have a better explanation or something to add, feel free to contact me.

Where is it specified whether Unicode identifiers should be allowed in a Haskell implementation?

I wanted to write some educational code in Haskell with Unicode characters (non-Latin) in the identifiers. (So that the identifiers look nice and natural for speakers of a natural language other than English which is not using the Latin characters in its writing.) So, I set out for finding an appropriate Haskell implementation that would allow this.
But where is this feature specified in the language specification? How would I refer to this feature when looking for a conforming implementation? (And which Haskell implemenations are known to actually support Unicode identifiers?)
It turned out that one Haskell implementation did accept my code with Unicode identifiers, whereas another one failed to accept it. I would like it if there were a way to formalize this requirement of my code, in a form of a language feature switch perhaps, so that if I or someone else tries to run my code, it would be immediately clear whether his implementation is missing the required feature and hence he should look for another one. (There could be also a wiki page for this feature--"Unicode identifiers", which would list which of the existing implementations support it, so that one would know where to go if one needs it.)
(BTW, I have put a "syntax" tag on this question, but I actually perceive it to be an issue of the level of lexing, a lower level than the syntax of a language. Is there a tag here for features of the lexing level of a language, rather than for features of the syntax specification of a language?)
The Online Report documents this under Lexemes. It also notes early on that "Haskell uses the Unicode character set. However, source programs are currently biased toward the ASCII character set used in earlier versions of Haskell.".
Actual compilers may or may not support Unicode identifiers. GHC does, but you need to keep in mind that Unicode codepoints must obey the same rules as ASCII characters: types must start with a codepoint which is classed as uppercase or titlecase, variables as lowercase (although de facto this is relaxed to alphabetic and not uppercase/titlecase; this might be worth asking for a clarification from the language committee), operators must be punctuation or symbol. (This means that you can't declare types in Arabic, for example, unless you prefix them with a character in some other script that is uppercase/titlecase.)
As to collecting Unicode support information: while I don't know of a single page that provides it, searching for "unicode" on the Haskell Wiki finds information about Unicode support in a number of Haskell compilers.

Programming Language with Inflection

Is there a programming language that uses inflections (suffixing a word to add a certain meaning) instead of operators to express instructions? Just wondering.
What I am talking about is using inflections to add a meaning to an identifier such as a variable or type name.
For example:
native type integer
var x : integer = 12
var location : integers = 12, 5, 42
say 0th locationte to_string (( -te replaces "." operator. prints 12 ))
I think Perligata (Perl in Latin) is what you're looking for. :) From the article
There is no reason why programming
languages could not also use
inflexions, rather than position, to
denote lexical roles.
Here's an example program (Sieve of Eratosthenes):
#! /usr/local/bin/perl -w
use Lingua::Romana::Perligata;
maximum inquementum tum biguttam egresso scribe.
meo maximo vestibulo perlegamentum da.
da duo tum maximum conscribementa meis listis.
dum listis decapitamentum damentum nexto
fac sic
nextum tum novumversum scribe egresso.
lista sic hoc recidementum nextum cis vannementa da listis.
cis.
This is partially facetious, but... assembly language? Things like conditional jump instructions are often variations on a root ("J" for jump or whatnot) with suffixes added to denote the associated condition ("JNZ" for jump-if-not-zero, et cetera).
The excellent (dare I say fascinating) game-design language Inform 7 is inflected like English. But it's so closely integrated with a host of other design decisions that it's hard to peel away as a separate feature.
Anyone who is interested in language designs that are unusual but successful should check out Inform 7.
Presumably any programming language that uses natural language explicitly or closely as a basis, e.g., Natural-Language Programming. There was some research done at MIT into using English to produce high-level skeletons of programs, which is more in the realm of natural-language processing; the tool they created is called Metafor.
As far as I know, no existing language has support for, say, modifying or extending keywords with inflection. Now you've got me interested, though, so I'm sure I'll come up with something soon!
Of the 40 or so languages I know, the only thing that comes to mind is some rare SQL implementations which include friendly aliases. For example to select a default database after connecting, the standard is USE <some database name> but one I used somewhere which also allowed USING <some database name>.
FORTRAN uses the first letter of the name to determine the type of an implicitly-declared variable.
COBOL has singular and plural versions of its "figurative constants", e.g. SPACE and SPACES.
Python3.7 standard module contextvars has Context Variables, which can be used for inflection..

Where did string escape codes (\n, \t...) originate?

Purely wondering... since they're still around and in use in C# today...
Where did the pattern of using string escape codes come from? What language did it first appear in? What languages, if any, have solved the problem in a different way?
I suspect that these escape codes originated in B, a high-level assembly programming language for the Honeywell 6000 GCOS operating system. This language was developed at Bell Labs based on a British language called BCPL. Because BCPL was rather wordy, the B developers simplified the syntax and added things like braces to replace BEGIN and END. That's where the name B came from, because it was an abbreviated form of BCPL.
Later on some people at Bell Labs created a language that was the successor to B, mainly by adding typing and a standard I/O library. Because it was B's successor, they chose the next letter in the name BCPL.
I do not recall seeing the backslash notation before B, and since C and UNIX inherited it from B, I thing that B is the origin of this notation, or more specifically, that Bell Labs was the origin. It's entirely possible that this notation was used in other Bell Labs software before B, since they were a prolific producer of software, much of which was distributed freely to universities such as the one which I attended in the mid 1970's.
By the way, the idea of an escape sequence existed long before that, dating back to the 19th century Baudot code which was a fixed length 5 bit binary code intended to replace variable length Morse code. Baudot had SI (Shift In) and SO (Shift Out) codes that escaped letters into their capital variation, just like the Shift key on a typewriter.

Truly multi-lingual programming languages?

I realize most languages support multiple languages, but every language I've seen has always been more-or-less US-centric. By that, I mean the keywords, standard library functions, etc. all have english names. So, as a programmer, you still really need to know at least some english to make sense of it.
Are there any truly "multi-lingual" languages out there with support for language keywords and such in multiple languages?
This is generally a horrible idea, as anyone who's worked in a localized IDE can attest to. Programmers rely heavily on having one common vocabulary. When the compiler gives me the error "missing type specifier - int assumed", I can share this exact error message with others, for example here on SO, and it will be familiar to those others so they can tell me what it means. If the compiler instead generated error messages in Danish, I'd be limited to getting help from the relatively few programmers who speak Danish.
Suddenly my vocabulary is no longer the same as someone in the same position in Germany, France or Japan. We can no longer exchange code, bugs, bug fixes or ideas.
A developer in Spain wouldn't be able to use my code because it was literally written in another language. And if I had trouble with my code, others would be helpless to debug it, because it wouldn't even compile under their localization settings (and if it did, it'd still be unreadable to them).
Ultimately, a programming language is a language. It may have borrowed some words from English, but it is not English, and you do not need to understand English to program in it, any more than I need to understand latin in order to speak English (English borrows latin words as well).
You might as well ask for a multi-lingual English. What would be the point? Yes, it would in theory allow people who didn't speak English to... speak English. It just wouldn't be the same English as every other English-speaker speaks, so it wouldn't actually enable communication between them.
The keyword if in a programming language is not the same as if in the English language. They mean different things, even though one was obviously inspired by the other.
The delegate keyword in C# does not mean the same thing as "delegate" in English. Nor does while, return or "constructor". They are not english words, they are keywords or concepts in C++, Java, C#, Python or any other programming language.
Sounds like a bad idea to me. If I'm writing a program, how am I to know that the variable name I'm typing is actually a keyword in Bulgarian or Korean as transliterated? Do I have to deal with thousands of keywords, or do I have problems combining two routines written by my Swedish and Egyptian colleagues?
Just realize that programming keywords are in English, just like music keywords are in Italian.
This seems like a good place to start: Non-English-based programming languages.
There's a few interesting ones on there, like Python translated to Chinese.
You can make use of the C/C++ preprocessor to redefine all the keywords - and some people have done this. I came across it when working as a trainer/mentor for a Norwegian company. Some bright spark had implemented aheader that translated all the C keywords into Norwegian and enforced its use. The Norwegian staff, all of whom spoke excellent English (or I couldn't have earned my crust with them) all hated it and it died a death.
I've also worked fairly extensively in the Netherlands, and most of the programmers there seem to program in English. The only people I've come across who are resistant to the English hegemony in programming languages are (needless to say) the French.
There is one area where a localized language may be useful and helpful and these are DLSs (Domain Specific Languages) that were designed to be used by non-programmers. Those languages can surely benefit from being localized since business users from non-English speaking countries often don't know English as well as programmers do.
Such localized DSLs can prove advantageous to programmers as well if they deal with a lot of non-translatable terms. One rather successful system I've encountered was used to calculate salaries for personnel in the Israeli military. It used a Hebrew-based syntax together with hundreds of terms that can only be properly expressed in Hebrew. In that particular case the standard logic keywords if, then, else, etc. were translated to Hebrew and the entire code editor was right-to-left. A very large body of business logic is maintained in this manner to this day and, IMHO, rightly so.
It seems like it would be notoriously difficult, unless it was a community effort, but for some languages I don't see why you can't make an existing language multi-lingual, but creating custom libraries that localize standard libraries.
For example, in Java, you can create
public class HoweverYouSayExampleObjectInYourLanguage extends ExampleObjectName {
}
and then create wrapper functions / methods with names in your target language, but which basically call existing standard methods
private void HoweverYouSayExampleMethodInYourLanguage(parameters) {
this.ExampleMethod(parameters);
// Some error handling code
}
If you do error handling properly, the stack trace / errors will all reference errors in the standard libraries, unless it was an error speficially with the implementation of your localization library - in which case that should be pointed out via proper error handling in the localization library itself.
The disadvantage would be, as other people have mentioned, sharing source code. If we were all on the same page with an IDE for a given language - I don't see a reason why you couldn't build a really internationalized IDE in which the source code you see on the screen isn't the REAL source code per se, but a local rendering of the real source code via some form of mapping.
I'm going to go ahead and say that everything I just said above is probably at best an okay idea because function names aren't nearly as important as localized documentation for libraries and APIs. something which in my experience is done terribly or not at all for common programming languages / contexts.
You can program Perl in Latin.
Don't try to code in the natural language, that's useless. Learn the "programming" language instead.
For instance, the "switch" word didn't mean anything to me in English, but it was an instruction to decide over several choices.
Later ( when I learn english ) I thought.. Hey, this is funny, English do have a "switch" word too, just like C. ( doh! )
:)
No matter how good or bad your English is, you can't say to java
import java.util.* into my CD-ROM;
Because it is not a valid syntax.
What about languages like APL and J? The keywords in APL are all single symbols; unfortunately, most of these are not on your keyboard, so J came along and replaced most of them with ASCII representations (made up of more than one character in many cases).
Sorted!
Sorted! is bilingual. It can understand both english and german code. To my knowledge, Sorted! is the only programming language that can do this, in the world.
Any useful ones? That's a better question.
To a significant extent, you can program in prolog in any unicode script (because it is a symbolic language). There's a (tiny, weeny) catch - variables are signified by an initial capital Roman letter in all the prolog compilers you are likely to come across and you'll have to redefine the built-ins (but prolog makes this relatively easy*).
I think an example will illustrate what I mean best:
% an algorithm for finding easter dates, given year (as first argument)
復活節( V1, V2, V3) :-
A 是 (V1 mod 19),
B 是 V1// 100,
C 是 V1 mod 100,
D 是 B // 4,
E 是 B mod 4,
F 是 (B + 8) // 25,
G 是 (B - F + 1) // 3,
H 是 (19*A + B - D - G + 15 )mod 30,
I 是 C // 4,
J 是 C mod 4,
K 是 (32 + 2*E + 2*I - H - J) mod 7 ,
L 是 (A + 11*H + 22*K) // 451,
M 是 (H + K - 7*L + 114) // 31,
N 是 (H + K - 7*L + 114)mod 31,
V2 是 M,
V3 是 N + 1.
/*
Example test:
?- 復活節( 2013, V2 , V3).
V2 = 3 ,
V3 = 31
i.e. Easter this year will be on 31st March
*/
this is what I used to redefine the build in 'is' operator (don't shoot me if it's imperfect):
:-op(500,xfy,是).
是(X,Y):-is(X,Y).

Resources