Truly multi-lingual programming languages? - programming-languages

I realize most languages support multiple languages, but every language I've seen has always been more-or-less US-centric. By that, I mean the keywords, standard library functions, etc. all have english names. So, as a programmer, you still really need to know at least some english to make sense of it.
Are there any truly "multi-lingual" languages out there with support for language keywords and such in multiple languages?

This is generally a horrible idea, as anyone who's worked in a localized IDE can attest to. Programmers rely heavily on having one common vocabulary. When the compiler gives me the error "missing type specifier - int assumed", I can share this exact error message with others, for example here on SO, and it will be familiar to those others so they can tell me what it means. If the compiler instead generated error messages in Danish, I'd be limited to getting help from the relatively few programmers who speak Danish.
Suddenly my vocabulary is no longer the same as someone in the same position in Germany, France or Japan. We can no longer exchange code, bugs, bug fixes or ideas.
A developer in Spain wouldn't be able to use my code because it was literally written in another language. And if I had trouble with my code, others would be helpless to debug it, because it wouldn't even compile under their localization settings (and if it did, it'd still be unreadable to them).
Ultimately, a programming language is a language. It may have borrowed some words from English, but it is not English, and you do not need to understand English to program in it, any more than I need to understand latin in order to speak English (English borrows latin words as well).
You might as well ask for a multi-lingual English. What would be the point? Yes, it would in theory allow people who didn't speak English to... speak English. It just wouldn't be the same English as every other English-speaker speaks, so it wouldn't actually enable communication between them.
The keyword if in a programming language is not the same as if in the English language. They mean different things, even though one was obviously inspired by the other.
The delegate keyword in C# does not mean the same thing as "delegate" in English. Nor does while, return or "constructor". They are not english words, they are keywords or concepts in C++, Java, C#, Python or any other programming language.

Sounds like a bad idea to me. If I'm writing a program, how am I to know that the variable name I'm typing is actually a keyword in Bulgarian or Korean as transliterated? Do I have to deal with thousands of keywords, or do I have problems combining two routines written by my Swedish and Egyptian colleagues?
Just realize that programming keywords are in English, just like music keywords are in Italian.

This seems like a good place to start: Non-English-based programming languages.
There's a few interesting ones on there, like Python translated to Chinese.

You can make use of the C/C++ preprocessor to redefine all the keywords - and some people have done this. I came across it when working as a trainer/mentor for a Norwegian company. Some bright spark had implemented aheader that translated all the C keywords into Norwegian and enforced its use. The Norwegian staff, all of whom spoke excellent English (or I couldn't have earned my crust with them) all hated it and it died a death.
I've also worked fairly extensively in the Netherlands, and most of the programmers there seem to program in English. The only people I've come across who are resistant to the English hegemony in programming languages are (needless to say) the French.

There is one area where a localized language may be useful and helpful and these are DLSs (Domain Specific Languages) that were designed to be used by non-programmers. Those languages can surely benefit from being localized since business users from non-English speaking countries often don't know English as well as programmers do.
Such localized DSLs can prove advantageous to programmers as well if they deal with a lot of non-translatable terms. One rather successful system I've encountered was used to calculate salaries for personnel in the Israeli military. It used a Hebrew-based syntax together with hundreds of terms that can only be properly expressed in Hebrew. In that particular case the standard logic keywords if, then, else, etc. were translated to Hebrew and the entire code editor was right-to-left. A very large body of business logic is maintained in this manner to this day and, IMHO, rightly so.

It seems like it would be notoriously difficult, unless it was a community effort, but for some languages I don't see why you can't make an existing language multi-lingual, but creating custom libraries that localize standard libraries.
For example, in Java, you can create
public class HoweverYouSayExampleObjectInYourLanguage extends ExampleObjectName {
}
and then create wrapper functions / methods with names in your target language, but which basically call existing standard methods
private void HoweverYouSayExampleMethodInYourLanguage(parameters) {
this.ExampleMethod(parameters);
// Some error handling code
}
If you do error handling properly, the stack trace / errors will all reference errors in the standard libraries, unless it was an error speficially with the implementation of your localization library - in which case that should be pointed out via proper error handling in the localization library itself.
The disadvantage would be, as other people have mentioned, sharing source code. If we were all on the same page with an IDE for a given language - I don't see a reason why you couldn't build a really internationalized IDE in which the source code you see on the screen isn't the REAL source code per se, but a local rendering of the real source code via some form of mapping.
I'm going to go ahead and say that everything I just said above is probably at best an okay idea because function names aren't nearly as important as localized documentation for libraries and APIs. something which in my experience is done terribly or not at all for common programming languages / contexts.

You can program Perl in Latin.

Don't try to code in the natural language, that's useless. Learn the "programming" language instead.
For instance, the "switch" word didn't mean anything to me in English, but it was an instruction to decide over several choices.
Later ( when I learn english ) I thought.. Hey, this is funny, English do have a "switch" word too, just like C. ( doh! )
:)
No matter how good or bad your English is, you can't say to java
import java.util.* into my CD-ROM;
Because it is not a valid syntax.

What about languages like APL and J? The keywords in APL are all single symbols; unfortunately, most of these are not on your keyboard, so J came along and replaced most of them with ASCII representations (made up of more than one character in many cases).

Sorted!
Sorted! is bilingual. It can understand both english and german code. To my knowledge, Sorted! is the only programming language that can do this, in the world.
Any useful ones? That's a better question.

To a significant extent, you can program in prolog in any unicode script (because it is a symbolic language). There's a (tiny, weeny) catch - variables are signified by an initial capital Roman letter in all the prolog compilers you are likely to come across and you'll have to redefine the built-ins (but prolog makes this relatively easy*).
I think an example will illustrate what I mean best:
% an algorithm for finding easter dates, given year (as first argument)
復活節( V1, V2, V3) :-
A 是 (V1 mod 19),
B 是 V1// 100,
C 是 V1 mod 100,
D 是 B // 4,
E 是 B mod 4,
F 是 (B + 8) // 25,
G 是 (B - F + 1) // 3,
H 是 (19*A + B - D - G + 15 )mod 30,
I 是 C // 4,
J 是 C mod 4,
K 是 (32 + 2*E + 2*I - H - J) mod 7 ,
L 是 (A + 11*H + 22*K) // 451,
M 是 (H + K - 7*L + 114) // 31,
N 是 (H + K - 7*L + 114)mod 31,
V2 是 M,
V3 是 N + 1.
/*
Example test:
?- 復活節( 2013, V2 , V3).
V2 = 3 ,
V3 = 31
i.e. Easter this year will be on 31st March
*/
this is what I used to redefine the build in 'is' operator (don't shoot me if it's imperfect):
:-op(500,xfy,是).
是(X,Y):-is(X,Y).

Related

How does flexibility affect a language's syntax?

I am currently working on writing my own language(shameless plug), which is centered around flexibility. I am trying to make almost any part of the language syntax exchangeable through things like extensions/plugins. While writing the whole thing, it has got me thinking. I am wondering how that sort of flexibility could affect the language.
I know that Lisp is often referred to as one of the most extensible languages due to its extensive macro system. I do understand that concept of macros, but I am yet to find a language that allows someone to change the way it is parsed. To my knowledge, almost every language has an extremely concrete syntax as defined by some long specification.
My question is how could having a flexible syntax affect the intuitiveness and usability of the language? I know the basic "people might be confused when the syntax changes" and "semantic analysis will be hard". Those are things that I am already starting to compensate for. I am looking for a more conceptual answer on the pros and cons of having a flexible syntax.
The topic of language design is still quite foreign to me, so I apologize if I am asking an obvious or otherwise stupid question!
Edit:
I was just wanting to clarify the question I was asking. Where exactly does flexibility in a language's syntax stand, in terms of language theory? I don't really need examples or projects/languages with flexibility, I want to understand how it can affect the language's readability, functionality, and other things like that.
Perl is the most flexible language I know. That a look at Moose, a postmodern object system for Perl 5. It's syntax is very different than Perl's but it is still very Perl-ish.
IMO, the biggest problem with flexibility is precedence in infix notation. But none I know of allow a datatype to have its own infix syntax. For example, take sets. It would be nice to use ⊂ and ⊇ in their syntax. But not only would a compiler have to recognize these symbols, it would have to be told their order of precedence.
Common Lisp allows to change the way it's parsed - see reader macros. Racket allows to modify its parser, see racket languages.
And of course you can have a flexible, dynamically extensible parsing alongside with powerful macros if you use the right parsing techniques (e.g., PEG). Have a look at an example here - mostly a C syntax, but extensible with both syntax and semantic macros.
As for precedence, PEG goes really well together with Pratt.
To answer your updated question - there is surprisingly little research done on programming languages readability anyway. You may want to have a look at what Dr. Blackwell group was up to, but it's still far from conclusive.
So I can only share my hand-wavy anecdotes - flexible syntax languages facilitates eDSL construction, and, in my opinion, eDSLs is the only way to eliminate unnecessary complexity from code, to make code actually maintainable in a long term. I believe that non-flexible languages are one of the biggest mistakes made by this industry, and it must be corrected at all costs, ASAP.
Flexibility allows you to manipulate the syntax of the language. For example, Lisp Macros can enable you to write programs that write programs and manipulate your syntax at compile-time to valid Lisp expressions. For example the Loop Macro:
(loop for x from 1 to 5
do(format t "~A~%" x))
1
2
3
4
5
NIL
And we can see how the code was translated with macroexpand-1:
(pprint(macroexpand-1 '(loop for x from 1 to 5
do (format t "~a~%" x))))
We can then see how a call to that macro is translated:
(LET ((X 1))
(DECLARE (TYPE (AND REAL NUMBER) X))
(TAGBODY
SB-LOOP::NEXT-LOOP
(WHEN (> X '5) (GO SB-LOOP::END-LOOP))
(FORMAT T "~a~%" X)
(SB-LOOP::LOOP-DESETQ X (1+ X))
(GO SB-LOOP::NEXT-LOOP)
SB-LOOP::END-LOOP)))
Language Flexibility just allows you to create your own embedded language within a language and reduce the length of your program in terms of characters used. So in theory, this can make a language very unreadable since we can manipulate the syntax. For example we can create invalid code that's translated to valid code:
(defmacro backwards (expr)
(reverse expr))
BACKWARDS
CL-USER> (backwards ("hello world" nil format))
"hello world"
CL-USER>
Clearly the above code can become complex since:
("hello world" nil format)
is not a valid Lisp expression.
Thanks to SK-logic's answer for pointing me in the direction of Alan Blackwell. I sent him an email asking his stance on the matter, and he responded with an absolutely wonderful explanation. Here it is:
So the person who responded to your StackOverflow question, saying
that flexible syntax could be useful for DSLs, is certainly correct.
It actually used to be fairly common to use the C preprocessor to
create alternative syntax (that would be turned into regular syntax in
an initial compile phase). A lot of the early esolangs were built this
way.
In practice, I think we would have to say that a lot of DSLs are
implemented as libraries within regular programming languages, and
that the library design is far more significant than the syntax. There
may be more purpose for having variety in visual languages, but making
customisable general purpose compilers for arbitrary graphical syntax
is really hard - much worse than changing text syntax features.
There may well be interesting things that your design could enable, so
I wouldn’t discourage experimentation. However, I think there is one reason why
customisable syntax is not so common. This is related to the famous
programmer’s editor EMACS. In EMACS, everything is customisable - all
key bindings, and all editor functions. It’s fun to play with, and
back in the day, many of us made our own personalised version that
only we knew how to operate. But it turned out that it was a real
hassle that everyone’s editor worked completely differently. You could
never lean over and make suggestions on another person’s session, and
teams always had to know who was logged in order to know whether the
editor would work. So it turned out that, over the years, we all just
started to use the default distribution and key bindings, which made
things easier for everyone.
At this point in time, that is just about enough of an explanation that I was looking for. If anyone feels as though they have a better explanation or something to add, feel free to contact me.

A language in which everything compiles

I'm trying to do some research for a new project, and I need to create objects dynamically from random data.
For this to work, I need a language / compiler that doesn't have problems with weird uncompilable code lying around.
Basically, I need the random code to compile (or be interpreted) as much as possible - Meaning that the uncompilable parts will be ignored, and only the compilable parts will create the objects (which could be ran).
Object Oriented-ness is not a must, but is a very strong advantage.
I thought of ASM, but it's very messy, and I'd probably need a more readable code
Thanks!
It sounds like you're doing something very much like genetic programming; even if you aren't, GP has to solve some of the same problems—using randomness to generate valid programs. The approach to this that is typically used is to work with a syntax tree: rather than storing x + y * 3 - 2, you store something like the following:
Then, instead of randomly changing the syntax, one can randomly change nodes in the tree instead. And if x should randomly change to, say, +, you can statically know that this means you need to insert two children (or not, depending on how you define +).
A good choice for a language to work with for this would be any Lisp dialect. In a Lisp, the above program would be written (- (+ x (* y 3)) 2), which is just a linearization of the syntax tree using parentheses to show depth. And in fact, Lisps expose this feature: you can just as easily work with the object '(- (+ x (* y 3)) 2) (note the leading quote). This is a three-element list, whose first element is -, second element is another list, and third element is 2. And, though you might or might not want it for your particular application, there's an eval function, such that (eval '(- (+ x (* y 3)) 2)) will take in the given list, treat it as a Lisp syntax tree/program, and evaluate it. This is what makes Lisps so attractive for doing this sort of work; Lisp syntax is basically a reification of the syntax-tree, and if you operate at the syntax-tree level, you can work on code as though it was a value. Lisp won't help you read /dev/random as a program directly, but with a little interpretation layered on top, you should be able to get what you want.
I should also mention, though I don't know anything about it (not that I know much about ordinary genetic programming) the existence of linear genetic programming. This is sort of like the assembly model that you mentioned—a linear stream of very, very simple instructions. The advantage here would seem to be that if you are working with /dev/random or something like it, the amount of interpretation needed is very small; the disadvantage would be, as you mentioned, the low-level nature of the code.
I'm not sure if this is what you're looking for, but any programming language can be made to function this way. For any programming language P, define the language Palways as follows:
If p is a valid program in P, then p is a valid program in Palways whose meaning is the same as its meaning in P.
If p is not a valid program in P, then p is a valid program in Palways whose meaning is the same as a program that immediately terminates.
For example, I could make the language C++always so that this program:
#include <iostream>
using namespace std;
int main() {
cout << "Hello, world!" << endl;
}
would compile as "Hello, world!", while this program:
Hahaha! This isn't legal C++ code!
Would be a legal program that just does absolutely nothing.
To solve your original problem, just take any OOP language like Java, Smalltalk, etc. and construct the appropriate Javaalways, Smalltalkalways, etc. language from it. Again, I'm not sure if this is at all what you're looking for, but it could be done very easily.
Alternatively, consider finding a grammar for any OOP language and then using that grammar to produce random syntactically valid programs. You could then filter those programs down by using the Palways programming language for that language to eliminate syntactically but not semantically valid programs.
Divide the ASCII byte values into 9 classes (division modulo 9 would help). Then assign then to Brainfuck codewords (see http://en.wikipedia.org/wiki/Brainfuck). Then interpret as Brainfuck.
There you go, any sequence of ASCII characters is a program. Not that it's going to do anything sensible... This approach has a much better chance, compared to templatetypedef's answer, to get a nontrivial program from a random byte sequence.
Text Editors
You could try feeding random character strings to an editor like Emacs or VI. Many (most?) characters will perform an editing action but some will do nothing (other than beep, perhaps). You would have to ensure that the random code mutator never generates the character sequence that exits the editor. However, this experience would be much like programming a Turing machine -- the code is not too readable.
Mathematica
In Mathematica, undefined symbols and other expressions evaluate to themselves, without error. So, that language might be a viable choice if you can arrange for the random code mutator to always generate well-formed expressions. This would be readily achievable since the basic Mathematica syntax is trivial, making it easy to operate on syntactic units rather than at the character level. It would be even easier if the mutator were written in Mathematica itself since expression-munging is Mathematica's forte. You could define a mini-language of valid operations within a Mathematica package that does not import the system-defined symbols. This would allow you to generate well-formed expressions to your heart's content without fear of generating a dangerous expression, like DeleteFile[FileNames["*.*", "/", Infinity]].
I believe Common Lisp should suit your needs. I always have some code in my SLIME/Emacs session that wouldn't compile. You can always tweak things, redefine functions in run-time. It is actually very good for prototyping.
A few years ago it took me quite a while to learn. But nowadays we have quicklisp and everything is so much easier.
Here I describe my development environment:
Install lisp on my linux machine
PS: I want to give an example, where Common Lisp was useful for me:
Up to maybe 2004 I used to write small programs in C (the keep it simple Unix way).
The last 3 years I had to get lots of different hardware running. Motorized stages, scientific cameras, IO cards.
The cameras turned out to be quite annoying. Usually you have to cool them down to -50 degree celsius or so and (in some SDKs) they don't like it when you close them. But this
is exactly how my C development cycle worked: write (30s), compile (1s), run (0.1s), repeat.
Eventually I decided to just use Common Lisp. Often it is straight forward to define the foreign function interfaces to talk to the SDKs and I can do this without ever leaving the running Lisp image. I start the editor in the morning define the open-device function, to talk to the device and after 3 hours I have enough of the functions implemented to set gain, temperature, region of interest and obtain the video.
Then I can often put the SDK manual away and just use the camera.
I used the same interactive programming approach when I have to parse some webpage or some weird XML.

Programming Language with Inflection

Is there a programming language that uses inflections (suffixing a word to add a certain meaning) instead of operators to express instructions? Just wondering.
What I am talking about is using inflections to add a meaning to an identifier such as a variable or type name.
For example:
native type integer
var x : integer = 12
var location : integers = 12, 5, 42
say 0th locationte to_string (( -te replaces "." operator. prints 12 ))
I think Perligata (Perl in Latin) is what you're looking for. :) From the article
There is no reason why programming
languages could not also use
inflexions, rather than position, to
denote lexical roles.
Here's an example program (Sieve of Eratosthenes):
#! /usr/local/bin/perl -w
use Lingua::Romana::Perligata;
maximum inquementum tum biguttam egresso scribe.
meo maximo vestibulo perlegamentum da.
da duo tum maximum conscribementa meis listis.
dum listis decapitamentum damentum nexto
fac sic
nextum tum novumversum scribe egresso.
lista sic hoc recidementum nextum cis vannementa da listis.
cis.
This is partially facetious, but... assembly language? Things like conditional jump instructions are often variations on a root ("J" for jump or whatnot) with suffixes added to denote the associated condition ("JNZ" for jump-if-not-zero, et cetera).
The excellent (dare I say fascinating) game-design language Inform 7 is inflected like English. But it's so closely integrated with a host of other design decisions that it's hard to peel away as a separate feature.
Anyone who is interested in language designs that are unusual but successful should check out Inform 7.
Presumably any programming language that uses natural language explicitly or closely as a basis, e.g., Natural-Language Programming. There was some research done at MIT into using English to produce high-level skeletons of programs, which is more in the realm of natural-language processing; the tool they created is called Metafor.
As far as I know, no existing language has support for, say, modifying or extending keywords with inflection. Now you've got me interested, though, so I'm sure I'll come up with something soon!
Of the 40 or so languages I know, the only thing that comes to mind is some rare SQL implementations which include friendly aliases. For example to select a default database after connecting, the standard is USE <some database name> but one I used somewhere which also allowed USING <some database name>.
FORTRAN uses the first letter of the name to determine the type of an implicitly-declared variable.
COBOL has singular and plural versions of its "figurative constants", e.g. SPACE and SPACES.
Python3.7 standard module contextvars has Context Variables, which can be used for inflection..

Are there programming languages that rely on non-latin alphabets?

Every programming language I have ever seen has been based on the Latin alphabet, this is not surprising considering I live in Canada...
But it only really makes sense that there would be programming languages based on other alphabets, or else bright computer scientists across the world would have to learn a new alphabet to go on in the field. I know for a fact that people in countries dominated by other alphabets develop languages based off the Latin alphabet (eg. Ruby from Japan), but just how common is it for programming languages to be based off of other alphabets like Arabic, or Cyrillic, or even writing systems which are not alphabetic but rather logographic in nature such as Japanese Kanji?
Also are any of these languages in active widespread use, or are they mainly used as teaching tools?
This is something that has bugged me since I started programming, and I have never run across someone who could think of a real answer.
Have you seen Perl?
APL is probably the most widely known. It even has a cool keyboard overlay (or was it a special keyboard you had to buy?):
In the non-alphabetic category, we also have programming languages like LabVIEW, which is mostly graphical. (You can label objects, and you can still do string manipulation, so there's some textual content.) LabVIEW has been used in data acquisition and automation for years, but gained a bit of popularity when it became the default platform for Lego Mindstorms.
There's a list on Wikipedia. I don't think any of them is really prevalent though. Many programmers can learn to write programs with english keywords even if they didn't understand the language. Ruby is a good example, you'll still see Japanese identifiers and comments in some Ruby code.
Well, Brainf* uses no latin characters, if you'll pardon the language...and the pun.
Many languages allow Unicode identifiers. It's part of standard Java, and both g++ (though you have to use \uNNNN escapes) and MSVC++ allow them (see also this question) And some allow using #define (or maybe better) to rename control structures.
But in reality, people don't do this for the most part. See past questions such as Language of variable names?, Should all code be written in English?, etc.
Agda.
Sample Snippet:
mutual
data ωChain : Set where
_∷_,_ : ∀ (x : carrier) (xω : ∞ ωChain) (p : x ≼ xω) → ωChain
head : ωChain → carrier
head (x ∷ _ , _) = x
_≼_ : carrier → ∞ ωChain → Set
x ≼ xω = x ≤ head (♭ xω)
Well, there's always APL. That has its own UNICODE characters, and I believe it used to require a special keyboard too.
There'is one langauge used in russian ERP system called after company, which developed it 1C. But it's identifiers and operators has english analogs.
Also, I know that haskell has unicode identifiers support, so you can write programs in any alphabet. But this is not useful (My native language is russian). It's quite enough that you have to type program messages and helpful comments in native alphabet.
Other people are answering with languages that use punctuation marks in addition to Latin letters. I wonder why no one mentioned digits 0 to 9 as well.
In some languages, and in some implementations of some languages, programmers can use a wide range of characters in identifiers, such as Arabic or Chinese characters. This doesn't mean that the language relies on them though.
In most languages, programmers can use a wide range of characters in string literals (in quotation marks) and in comments. Again this doesn't mean that the language relies on them.
In every programming language that I've seen, the language does rely on punctuation marks and digits. So this answers your question but not in the way you expect.
Now let's try to find something meaningful. Is there a programming language where keywords are chosen from non-Latin alphabets? I would guess not, except maybe for joke languages. What would be the point of inventing a programming language that makes it impossible for some programmers to even input a program?
EDIT: My guess is wrong. Besides APL's usage of various invented punctuation marks, it does depend on a few Greek keywords, where each keyword is one letter long, such as the letter rho.
I just found an interesting wiki for "esoteric programming languages".

Where did string escape codes (\n, \t...) originate?

Purely wondering... since they're still around and in use in C# today...
Where did the pattern of using string escape codes come from? What language did it first appear in? What languages, if any, have solved the problem in a different way?
I suspect that these escape codes originated in B, a high-level assembly programming language for the Honeywell 6000 GCOS operating system. This language was developed at Bell Labs based on a British language called BCPL. Because BCPL was rather wordy, the B developers simplified the syntax and added things like braces to replace BEGIN and END. That's where the name B came from, because it was an abbreviated form of BCPL.
Later on some people at Bell Labs created a language that was the successor to B, mainly by adding typing and a standard I/O library. Because it was B's successor, they chose the next letter in the name BCPL.
I do not recall seeing the backslash notation before B, and since C and UNIX inherited it from B, I thing that B is the origin of this notation, or more specifically, that Bell Labs was the origin. It's entirely possible that this notation was used in other Bell Labs software before B, since they were a prolific producer of software, much of which was distributed freely to universities such as the one which I attended in the mid 1970's.
By the way, the idea of an escape sequence existed long before that, dating back to the 19th century Baudot code which was a fixed length 5 bit binary code intended to replace variable length Morse code. Baudot had SI (Shift In) and SO (Shift Out) codes that escaped letters into their capital variation, just like the Shift key on a typewriter.

Resources