In my application there is language detection. Languages have some identification code like en for English, pt for Portuguese, fr for French, etc.
Is there a language code for the case when a language cannot be identified (unknown language)?
If your system allows for three-letter codes, the ISO 639-2 standard says:
If there is language content, but the specific language cannot be
determined, a special identifier is provided by ISO 639-2:
und (Undetermined)
The language codes you're using look like ISO 639-1. There is no ISO 639-1 code for an indeterminate or unknown language; however, "xx" could be used as as a reasonable placeholder.
You probably know it already, but at least for the benefit of others who may look your question up later:
In C, C++ and Unix/Linux generally, and sometimes elsewhere, there is something that goes by the cryptic name of "the C locale." This is really the absence of a locale, which is to say, the absence of a set of language- and country-related conventions. In the C locale, text is sorted in straight character-code (usually ASCII) order, which means that "ABC" precedes "XYZ" precedes "abc" precedes "xyz."
One needn't specify any code at all to get the C locale. It has no code -- a significant fact which is at least tangentially related to your question.
Related
I wanted to write some educational code in Haskell with Unicode characters (non-Latin) in the identifiers. (So that the identifiers look nice and natural for speakers of a natural language other than English which is not using the Latin characters in its writing.) So, I set out for finding an appropriate Haskell implementation that would allow this.
But where is this feature specified in the language specification? How would I refer to this feature when looking for a conforming implementation? (And which Haskell implemenations are known to actually support Unicode identifiers?)
It turned out that one Haskell implementation did accept my code with Unicode identifiers, whereas another one failed to accept it. I would like it if there were a way to formalize this requirement of my code, in a form of a language feature switch perhaps, so that if I or someone else tries to run my code, it would be immediately clear whether his implementation is missing the required feature and hence he should look for another one. (There could be also a wiki page for this feature--"Unicode identifiers", which would list which of the existing implementations support it, so that one would know where to go if one needs it.)
(BTW, I have put a "syntax" tag on this question, but I actually perceive it to be an issue of the level of lexing, a lower level than the syntax of a language. Is there a tag here for features of the lexing level of a language, rather than for features of the syntax specification of a language?)
The Online Report documents this under Lexemes. It also notes early on that "Haskell uses the Unicode character set. However, source programs are currently biased toward the ASCII character set used in earlier versions of Haskell.".
Actual compilers may or may not support Unicode identifiers. GHC does, but you need to keep in mind that Unicode codepoints must obey the same rules as ASCII characters: types must start with a codepoint which is classed as uppercase or titlecase, variables as lowercase (although de facto this is relaxed to alphabetic and not uppercase/titlecase; this might be worth asking for a clarification from the language committee), operators must be punctuation or symbol. (This means that you can't declare types in Arabic, for example, unless you prefix them with a character in some other script that is uppercase/titlecase.)
As to collecting Unicode support information: while I don't know of a single page that provides it, searching for "unicode" on the Haskell Wiki finds information about Unicode support in a number of Haskell compilers.
Is there a programming language that uses inflections (suffixing a word to add a certain meaning) instead of operators to express instructions? Just wondering.
What I am talking about is using inflections to add a meaning to an identifier such as a variable or type name.
For example:
native type integer
var x : integer = 12
var location : integers = 12, 5, 42
say 0th locationte to_string (( -te replaces "." operator. prints 12 ))
I think Perligata (Perl in Latin) is what you're looking for. :) From the article
There is no reason why programming
languages could not also use
inflexions, rather than position, to
denote lexical roles.
Here's an example program (Sieve of Eratosthenes):
#! /usr/local/bin/perl -w
use Lingua::Romana::Perligata;
maximum inquementum tum biguttam egresso scribe.
meo maximo vestibulo perlegamentum da.
da duo tum maximum conscribementa meis listis.
dum listis decapitamentum damentum nexto
fac sic
nextum tum novumversum scribe egresso.
lista sic hoc recidementum nextum cis vannementa da listis.
cis.
This is partially facetious, but... assembly language? Things like conditional jump instructions are often variations on a root ("J" for jump or whatnot) with suffixes added to denote the associated condition ("JNZ" for jump-if-not-zero, et cetera).
The excellent (dare I say fascinating) game-design language Inform 7 is inflected like English. But it's so closely integrated with a host of other design decisions that it's hard to peel away as a separate feature.
Anyone who is interested in language designs that are unusual but successful should check out Inform 7.
Presumably any programming language that uses natural language explicitly or closely as a basis, e.g., Natural-Language Programming. There was some research done at MIT into using English to produce high-level skeletons of programs, which is more in the realm of natural-language processing; the tool they created is called Metafor.
As far as I know, no existing language has support for, say, modifying or extending keywords with inflection. Now you've got me interested, though, so I'm sure I'll come up with something soon!
Of the 40 or so languages I know, the only thing that comes to mind is some rare SQL implementations which include friendly aliases. For example to select a default database after connecting, the standard is USE <some database name> but one I used somewhere which also allowed USING <some database name>.
FORTRAN uses the first letter of the name to determine the type of an implicitly-declared variable.
COBOL has singular and plural versions of its "figurative constants", e.g. SPACE and SPACES.
Python3.7 standard module contextvars has Context Variables, which can be used for inflection..
Every programming language I have ever seen has been based on the Latin alphabet, this is not surprising considering I live in Canada...
But it only really makes sense that there would be programming languages based on other alphabets, or else bright computer scientists across the world would have to learn a new alphabet to go on in the field. I know for a fact that people in countries dominated by other alphabets develop languages based off the Latin alphabet (eg. Ruby from Japan), but just how common is it for programming languages to be based off of other alphabets like Arabic, or Cyrillic, or even writing systems which are not alphabetic but rather logographic in nature such as Japanese Kanji?
Also are any of these languages in active widespread use, or are they mainly used as teaching tools?
This is something that has bugged me since I started programming, and I have never run across someone who could think of a real answer.
Have you seen Perl?
APL is probably the most widely known. It even has a cool keyboard overlay (or was it a special keyboard you had to buy?):
In the non-alphabetic category, we also have programming languages like LabVIEW, which is mostly graphical. (You can label objects, and you can still do string manipulation, so there's some textual content.) LabVIEW has been used in data acquisition and automation for years, but gained a bit of popularity when it became the default platform for Lego Mindstorms.
There's a list on Wikipedia. I don't think any of them is really prevalent though. Many programmers can learn to write programs with english keywords even if they didn't understand the language. Ruby is a good example, you'll still see Japanese identifiers and comments in some Ruby code.
Well, Brainf* uses no latin characters, if you'll pardon the language...and the pun.
Many languages allow Unicode identifiers. It's part of standard Java, and both g++ (though you have to use \uNNNN escapes) and MSVC++ allow them (see also this question) And some allow using #define (or maybe better) to rename control structures.
But in reality, people don't do this for the most part. See past questions such as Language of variable names?, Should all code be written in English?, etc.
Agda.
Sample Snippet:
mutual
data ωChain : Set where
_∷_,_ : ∀ (x : carrier) (xω : ∞ ωChain) (p : x ≼ xω) → ωChain
head : ωChain → carrier
head (x ∷ _ , _) = x
_≼_ : carrier → ∞ ωChain → Set
x ≼ xω = x ≤ head (♭ xω)
Well, there's always APL. That has its own UNICODE characters, and I believe it used to require a special keyboard too.
There'is one langauge used in russian ERP system called after company, which developed it 1C. But it's identifiers and operators has english analogs.
Also, I know that haskell has unicode identifiers support, so you can write programs in any alphabet. But this is not useful (My native language is russian). It's quite enough that you have to type program messages and helpful comments in native alphabet.
Other people are answering with languages that use punctuation marks in addition to Latin letters. I wonder why no one mentioned digits 0 to 9 as well.
In some languages, and in some implementations of some languages, programmers can use a wide range of characters in identifiers, such as Arabic or Chinese characters. This doesn't mean that the language relies on them though.
In most languages, programmers can use a wide range of characters in string literals (in quotation marks) and in comments. Again this doesn't mean that the language relies on them.
In every programming language that I've seen, the language does rely on punctuation marks and digits. So this answers your question but not in the way you expect.
Now let's try to find something meaningful. Is there a programming language where keywords are chosen from non-Latin alphabets? I would guess not, except maybe for joke languages. What would be the point of inventing a programming language that makes it impossible for some programmers to even input a program?
EDIT: My guess is wrong. Besides APL's usage of various invented punctuation marks, it does depend on a few Greek keywords, where each keyword is one letter long, such as the letter rho.
I just found an interesting wiki for "esoteric programming languages".
Purely wondering... since they're still around and in use in C# today...
Where did the pattern of using string escape codes come from? What language did it first appear in? What languages, if any, have solved the problem in a different way?
I suspect that these escape codes originated in B, a high-level assembly programming language for the Honeywell 6000 GCOS operating system. This language was developed at Bell Labs based on a British language called BCPL. Because BCPL was rather wordy, the B developers simplified the syntax and added things like braces to replace BEGIN and END. That's where the name B came from, because it was an abbreviated form of BCPL.
Later on some people at Bell Labs created a language that was the successor to B, mainly by adding typing and a standard I/O library. Because it was B's successor, they chose the next letter in the name BCPL.
I do not recall seeing the backslash notation before B, and since C and UNIX inherited it from B, I thing that B is the origin of this notation, or more specifically, that Bell Labs was the origin. It's entirely possible that this notation was used in other Bell Labs software before B, since they were a prolific producer of software, much of which was distributed freely to universities such as the one which I attended in the mid 1970's.
By the way, the idea of an escape sequence existed long before that, dating back to the 19th century Baudot code which was a fixed length 5 bit binary code intended to replace variable length Morse code. Baudot had SI (Shift In) and SO (Shift Out) codes that escaped letters into their capital variation, just like the Shift key on a typewriter.
I realize most languages support multiple languages, but every language I've seen has always been more-or-less US-centric. By that, I mean the keywords, standard library functions, etc. all have english names. So, as a programmer, you still really need to know at least some english to make sense of it.
Are there any truly "multi-lingual" languages out there with support for language keywords and such in multiple languages?
This is generally a horrible idea, as anyone who's worked in a localized IDE can attest to. Programmers rely heavily on having one common vocabulary. When the compiler gives me the error "missing type specifier - int assumed", I can share this exact error message with others, for example here on SO, and it will be familiar to those others so they can tell me what it means. If the compiler instead generated error messages in Danish, I'd be limited to getting help from the relatively few programmers who speak Danish.
Suddenly my vocabulary is no longer the same as someone in the same position in Germany, France or Japan. We can no longer exchange code, bugs, bug fixes or ideas.
A developer in Spain wouldn't be able to use my code because it was literally written in another language. And if I had trouble with my code, others would be helpless to debug it, because it wouldn't even compile under their localization settings (and if it did, it'd still be unreadable to them).
Ultimately, a programming language is a language. It may have borrowed some words from English, but it is not English, and you do not need to understand English to program in it, any more than I need to understand latin in order to speak English (English borrows latin words as well).
You might as well ask for a multi-lingual English. What would be the point? Yes, it would in theory allow people who didn't speak English to... speak English. It just wouldn't be the same English as every other English-speaker speaks, so it wouldn't actually enable communication between them.
The keyword if in a programming language is not the same as if in the English language. They mean different things, even though one was obviously inspired by the other.
The delegate keyword in C# does not mean the same thing as "delegate" in English. Nor does while, return or "constructor". They are not english words, they are keywords or concepts in C++, Java, C#, Python or any other programming language.
Sounds like a bad idea to me. If I'm writing a program, how am I to know that the variable name I'm typing is actually a keyword in Bulgarian or Korean as transliterated? Do I have to deal with thousands of keywords, or do I have problems combining two routines written by my Swedish and Egyptian colleagues?
Just realize that programming keywords are in English, just like music keywords are in Italian.
This seems like a good place to start: Non-English-based programming languages.
There's a few interesting ones on there, like Python translated to Chinese.
You can make use of the C/C++ preprocessor to redefine all the keywords - and some people have done this. I came across it when working as a trainer/mentor for a Norwegian company. Some bright spark had implemented aheader that translated all the C keywords into Norwegian and enforced its use. The Norwegian staff, all of whom spoke excellent English (or I couldn't have earned my crust with them) all hated it and it died a death.
I've also worked fairly extensively in the Netherlands, and most of the programmers there seem to program in English. The only people I've come across who are resistant to the English hegemony in programming languages are (needless to say) the French.
There is one area where a localized language may be useful and helpful and these are DLSs (Domain Specific Languages) that were designed to be used by non-programmers. Those languages can surely benefit from being localized since business users from non-English speaking countries often don't know English as well as programmers do.
Such localized DSLs can prove advantageous to programmers as well if they deal with a lot of non-translatable terms. One rather successful system I've encountered was used to calculate salaries for personnel in the Israeli military. It used a Hebrew-based syntax together with hundreds of terms that can only be properly expressed in Hebrew. In that particular case the standard logic keywords if, then, else, etc. were translated to Hebrew and the entire code editor was right-to-left. A very large body of business logic is maintained in this manner to this day and, IMHO, rightly so.
It seems like it would be notoriously difficult, unless it was a community effort, but for some languages I don't see why you can't make an existing language multi-lingual, but creating custom libraries that localize standard libraries.
For example, in Java, you can create
public class HoweverYouSayExampleObjectInYourLanguage extends ExampleObjectName {
}
and then create wrapper functions / methods with names in your target language, but which basically call existing standard methods
private void HoweverYouSayExampleMethodInYourLanguage(parameters) {
this.ExampleMethod(parameters);
// Some error handling code
}
If you do error handling properly, the stack trace / errors will all reference errors in the standard libraries, unless it was an error speficially with the implementation of your localization library - in which case that should be pointed out via proper error handling in the localization library itself.
The disadvantage would be, as other people have mentioned, sharing source code. If we were all on the same page with an IDE for a given language - I don't see a reason why you couldn't build a really internationalized IDE in which the source code you see on the screen isn't the REAL source code per se, but a local rendering of the real source code via some form of mapping.
I'm going to go ahead and say that everything I just said above is probably at best an okay idea because function names aren't nearly as important as localized documentation for libraries and APIs. something which in my experience is done terribly or not at all for common programming languages / contexts.
You can program Perl in Latin.
Don't try to code in the natural language, that's useless. Learn the "programming" language instead.
For instance, the "switch" word didn't mean anything to me in English, but it was an instruction to decide over several choices.
Later ( when I learn english ) I thought.. Hey, this is funny, English do have a "switch" word too, just like C. ( doh! )
:)
No matter how good or bad your English is, you can't say to java
import java.util.* into my CD-ROM;
Because it is not a valid syntax.
What about languages like APL and J? The keywords in APL are all single symbols; unfortunately, most of these are not on your keyboard, so J came along and replaced most of them with ASCII representations (made up of more than one character in many cases).
Sorted!
Sorted! is bilingual. It can understand both english and german code. To my knowledge, Sorted! is the only programming language that can do this, in the world.
Any useful ones? That's a better question.
To a significant extent, you can program in prolog in any unicode script (because it is a symbolic language). There's a (tiny, weeny) catch - variables are signified by an initial capital Roman letter in all the prolog compilers you are likely to come across and you'll have to redefine the built-ins (but prolog makes this relatively easy*).
I think an example will illustrate what I mean best:
% an algorithm for finding easter dates, given year (as first argument)
復活節( V1, V2, V3) :-
A 是 (V1 mod 19),
B 是 V1// 100,
C 是 V1 mod 100,
D 是 B // 4,
E 是 B mod 4,
F 是 (B + 8) // 25,
G 是 (B - F + 1) // 3,
H 是 (19*A + B - D - G + 15 )mod 30,
I 是 C // 4,
J 是 C mod 4,
K 是 (32 + 2*E + 2*I - H - J) mod 7 ,
L 是 (A + 11*H + 22*K) // 451,
M 是 (H + K - 7*L + 114) // 31,
N 是 (H + K - 7*L + 114)mod 31,
V2 是 M,
V3 是 N + 1.
/*
Example test:
?- 復活節( 2013, V2 , V3).
V2 = 3 ,
V3 = 31
i.e. Easter this year will be on 31st March
*/
this is what I used to redefine the build in 'is' operator (don't shoot me if it's imperfect):
:-op(500,xfy,是).
是(X,Y):-is(X,Y).