Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Disclaimer: Yes I know this will take 3 years, at least.
I am looking forward to writing a new interpreted programming language. I have a quite solid idea of what I want in terms of dynamicness, syntax, object model, etc, etc.
Now that I have the idea, I have a few questions before I start:
Should I begin writing the full specification and then implement, or write them both all along?
I'm still doubting between C and C++. C++ would allow for more clean design and faster development while C would (maybe) ensure portability to more platforms (microprocessors?). Performance is a must.
Should I try to interest people for the project before the first working prototype so they can cooperate (the end product will be a liberal license anyway), or keep working alone until I have something that runs?
How modular should it be? I am sure that I won't immediately start working on a bytecode interpreter but something easier to implement but slower thing first, so modularity is a must in order to be able to extend later, but I guess overdoing it will hamper performance and clearity.
The answers to your questions depend largely on why you're doing this- the primary reason. Are you trying to create the next Ruby, or is this a learning exercise?
Specification: If this is a personal project, this is not as important. PHP gets a bad rap for having been developed "on the fly," yet many people use it every day. A more complete spec will probably help get people involved if/when you want help.
If you want cross-platform and performance, C is the way to go.
If you want people to join in, prove something first. Write a killer-cool application with your language and blog/talk about why your language is different/special/better.
Modularity of what, the language itself or the compiler? If you want to extend the language, a good spec will help (see #1.) The compiler should be designed with all the best practices in mind, which should help make it extensible.
I hear the Dragon Book is good for learning to develop compilers.
Your specification will be broken unless you write it hand-in-hand with the implementation.
If you think C++ would give you cleaner design and faster development, you should probably use it.
You will have difficulty getting anyone interested in a project unless there is something that runs and demonstrates what is unique about your language.
If you think your language will ever require a byte-code interpreter (and you do say "Performance is a must") you should investigate the capabilities of existing byte-code interpreters before you finalize your language design.
I think you have set yourself too many goals. You say "performance is a must" but in a comment reply you say your goal is "to learn a lot about language design" and that it is "pretty unlikely" that you'll use it in a real project. New programming languages are created to solve problems; more precisely, they're created to help people express solutions to problems in better ways. Designing a language without using it seriously, intensely, continually is like writing software without any test cases: you're likely to wind up with something unusable.
If you want to try your hand at language design, then find a problem---one that you care about---that existing languages won't let you solve the way you want. Then do whatever you can to get a working implementation and start writing and running programs using it. You don't need a hand-crafted JIT compiler with a runtime written in highly bummed assembly code. If you target the JVM or .NET, you get a very high-performance GC, scalable threading system, libraries, and lots of other good stuff for free, even if it interferes with that awesome idea you had for ______.
On the other hand, if you just want to make something run fast, don't try to design a language at the same time. Just find one that you like, learn about implementation strategies, and see if you can do better.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm currently working on the topic of programming-languages and interpreter-design. I have already created several programming languages but couldn't reach my goal so far:
Create a programming-language which focuses on giving the programmer a good feeling when writing code in it. It should just be fun and/or interesting and in no case annoying to write something in it.
I get this feeling when writing code in Python. I sometimes get the opposite with PHP and in rare cases when having to reinvent some wheel in C++.
So I've tried to figure out some syntactical features to make programming in my new language fun, but I just can't find any.
Which concrete features, maybe mainly in terms of syntax, do/could make programming in a language fun?
Examples:
I find it enjoyable to program in Ruby because of it's use of code blocks.
It would be nice if you could include exactly one example in your answer
Those features do not have to already exist in any language!
I'm doing this because I have experienced extreme rises in (my own) productivity when programming in languages I love (because of particular features).
You mentioned Ruby in your question. AFAIK, Ruby is the only programming language, for which Joy is an actual, stated, explicit design goal. (In fact, it is the only design goal.)
The reason that Yukihiro Matsumoto was able to design Ruby this way, is that he already knew and used tons of programming languages before he started designing Ruby and learned tons more in order to design Ruby. (Interestingly, he didn't know Python, and has said that he probably wouldn't have created Ruby if he did.)
Here's just a tiny fraction of the languages that matz has either used himself, or looked at for inspiration (or in some cases for inspiration what not to do):
CLU
Sather
Lisp
Scheme
Smalltalk
Perl
Python
Haskell
Scala
PHP
C
C++
Java
C#
Objective-C
Erlang
And I believe that this is one way that good programming languages can be designed (what Larry Wall calls postmodernist language design): Throw away everything that didn't work in the past, take everything that worked and combine that tastefully.
Of course, this requires that you actually know all those languages from which you want to "steal" and in particular, it requires that you know lots of very different languages with different paradigms, different concepts and different "feels", otherwise the idea pool from which you steal is rather small and inbred.
Consistency.
Its the feeling that you already know something when you use an API or feature you've never used before. It also makes you more productive as you don't have to learn something new for the sake of it.
I think this is also one of the Ruby 'likes', in that if you follow the naming convention, things start to 'just work' without bindings and glue and suchlike.
For example, using the STL in C++, many of the algorithms are the same for all containers - even strings. That makes it nice to use... except for those parts that do not follow the same API (eg vector of bools) then the difference is more noticable.
Two things to keep in mind are orthogonality and the principle of least surprise.
A programming language should make it easy to write correct programs and difficult (if not impossible) to write incorrect programs. For instance, in Java
long x = 2000000000 + 2000000000;
overflows, while
long x = 2000000000L + 2000000000;
doesn't. Is this obvious? I don't think so. Does anyone ever want something to overflow? I don't think so.
Hilarity.
http://lolcode.com/
Follow common practices (like using + for addition, & for bitwise/logical and)
Group logicaly-similar code in namespaces
Have an extensive string processing library
Incorporate debugging facilities
For a cross-platform language, try to minimize platform differences as much as possible
A language feature that appears simple and easy to learn surprises and delights the programmer with its unexpected power. I nominate Haskell type classes :-)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I've been writing software in Java for many years now, but it was always for internal applications that would be deployed to a server. I'd like to get into writing desktop applications now but I don't know where to start. I've written a few Java/Swing applications but again they were for internal use.
My understanding is that Java and other semi-compiled and interpreted languages are too easy to reverse engineer, making them unsuitable for commercial software. I am aware that there are compilers for Java and some other interpreted language, but I've also heard that they are pricey and/or unreliable.
Assuming I start a microISV and wish to develop and sell applications to a broad audience, what's my best bet? I would prefer something that can be written close to once, and compiled for different operating systems but I am not opposed to .NET and a Windows-only audience if other languages would compromise the experience (installation ease & user experience) in Windows. My only issue there is that I don't have a large starting budget and paying out the wazoo for the required development tools is not really in the cards.
Why would people want to reverse engineer your software? They might pirate it, but you can't prevent pirating no matter what language you use. I doubt you have a top-secret algorithm that you're trying to hide either, in which case reverse-engineering might be an issue.
You should go with whatever you know best, and Java can work just fine.
If you are intent on switching to another language, I recommend taking a look at Qt. Qt is a free and open-source cross-platform toolkit for C++ that allows you to write applications in that will compile and run on Windows, Mac, and Linux with minimal effort. You CAN write commercial software for free with Qt with its LGPL license.
Edit: GCJ compiles Java to native code, but only supports Java 1.4.
Well, if you're trying to be an Independent Service Vendor -- and not a Software Vendor -- then in a sense it doesn't matter if you use a language like Java which can be decompiled. Because you'd be selling yourself as the best person to integrate and customize the software for your clients. The software is the delivery mechanism for the thing that will actually make you the money: you and your skills. Plenty of companies make a profit by giving away their software for free and contracting their services to set it up for their clients. You can mitigate the Java decompiling issue somewhat by using an obfuscator, but it's kind of fighting the wrong battle.
If you intent to make your money selling software and not service, then Java would be a relatively risky route to take.
It all depends on your business plan.
If you are starting a one-man company, then you are selling your personal expertise. So the language you use must be the one (or maybe two) that you are most familiar with and expert in. I'm surprised you felt it necessary to ask this.
Any code can be decompiled to some degree. I think you can obfuscate Java to a degree that will deter the casual user... but I think the other people hit the nail on the head. Of all the reasons not to use Java, the ease of decompiling should be very very low on your list. If that is all that is stopping you, go for it! Google Java obfuscater and you will find something.
I'm skeptical about the risk of reverse engineering a complex piece of software written in Java, but for purposes of your question I'm willing to stipulate it. I assume the same issues rule out any other language that is implemented only on the JVM.
The most salient aspects of Java are
Static type system
Class-based object system
Automatic memory management
No freestanding functions or modules outside the class/interface system
Generics
This combination could be replicated in a language like C#, but I assume the same objections you have about distributing JVM bytecode also apply to MSIL bytecodes.
I'm having a hard time coming up with a language that has all these features. Here are some nearby languages:
C++ has everything except automatic memory management, plus it allows freestanding functions. However the C++ generic mechanism (templates) is not for the faint of heart, and it doesn't (yet) support modular typechecking. Lots more flexibility than Java but also lots more ways to shoot your foot off. Use with caution.
Modula-3 has all of the above but it's essentially a dead language, plus like C++ there's no modular type checking for the generics.
I'm not familiar enough with Eiffel to be able to make good comparisons, but I think it's worth looking into.
Delphi may also be worth looking into. It seems to have everything above except generics. It's primarily a proprietary Windows environment (formerly known as Object Pascal), but there seems to exist an open-source 'Free Pascal' compiler that supports Delphi.
There are many object-oriented languages with automatic memory management and dynamic typing, among which one might highlight ruby, Python, and Smalltalk. None of these really compiles well and reliably to standalone native machine code, although all push toward some form of experimental compilation. And they are all dynamically typed, which is quite different from what you're used to.
If I were in your position I would probably go ahead an use Java and accept some risk of reverse engineering. Decompilers aren't as wonderful as you might think, and they don't produce wildly maintainable code, either. But if you really want to be able to produce native machine code, I would investigate Delphi and Eiffel. (I myself would use Modula-3, but that's because I once invested substantial effort in learning it. It's a very well designed language for its niche, but the user community is about gone and I think it's a dead letter. Pity.)
I was wondering if programming languages and frameworks get updated in small increments or are they just x.0 releases? And if they do how do you keep up on all the changes in every update? I am specifically interested in Objective-C and Cocoa and CocoaTouch.
I'm learning from books and online PDF's etc, but often they are at best a few years old. I just would like to know if there have been any changes etc. that should concern me and even if not, inevitably there will be, so where can I look out for them?
The lifecycle of languages (as distinct from compilers) depends a lot on the language in questions.
Some examples:
C and c++ go many years between standards updates. Each of which is significant, but expected to be complete and self consistent
The Perl and Python cycles have been running faster, with new features accumulating on the languages without a change of major version number as long as the don't break back compatibility.
Individual compilers and interpreters can undergo a steady stream of feature/performance/stability updates while consuming the same input language.
Yes. Programming languages are updated quite often. It depends on the language a bit though. For instance a Major new version of Java with new language features is released every 18 months and in between minor releases that does not add new features occur.
I think if you are learning a language the basic concepts of the language will not change over a very long time so you are ok with material that is a few years old.
Different languages have different release schedules. Generally for the beginer level a book which is a release of two behind isn't all that bad. Just check in with the various websites / maintainer organizations to keep up to date on versions. Current version of Objective-C (who's maintainer is apple).
Many languages are extended every year or two, usually to allow backwards compatibility there are few 'breaking changes' (changes that can break existing code). So to start out learning a language it is not a disaster to use an older reference book as it will still have valid code.
To keep up to date with objective-c subscribe to the appropriate RSS feed or join the mailing list.
Good luck!
I need to get around to writing that programming language I've been meaning to write. How do you kids do it these days? I've been out of the loop for over a decade; are you doing it any differently now than we did back in the pre-internet, pre-windows days? You know, back when "real" coders coded in C, used the command line, and quibbled over which shell was superior?
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily) but how do you build the compiler and standard libraries and so forth? What tools do you kids use these days?
One consideration that's new since the punched card era is the existence of virtual machines already bountifully provided with "standard libraries." Targeting the JVM or the .NET CLR instead of ye olde "language walled garden" saves you a lot of bootstrapping. If you're creating a compiled language, you may also find Java byte code or MSIL an easier compile target than machine code (of course, if you're in this for the fun of creating a tight optimising compiler then you'll see this as a bug rather than a feature).
On the negative side, the idioms of the JVM or CLR may not be what you want for your language. So you may still end up building "standard libraries" just to provide idiomatic interfaces over the platform facility. (An example is that every languages and its dog seems to provide its own method for writing to the console, rather than leaving users to manually call System.out.println or Console.WriteLine.) Nevertheless, it enables an incremental development of the idiomatic libraries, and means that the more obscure libraries for which you never get round to building idiomatic interfaces are still accessible even if in an ugly way.
If you're considering an interpreted language, .NET also has support for efficient interpretation via the Dynamic Language Runtime (DLR). (I don't know if there's an equivalent for the JVM.) This should help free you up to focus on the language design without having to worry so much about the optimisation of the interpreter.
I've written two compilers now in Haskell for small domain-specific languages, and have found it to be an incredibly productive experience. The parsec library makes playing with syntax easy, and interpreters are very simple to write over a Haskell data structure. There is a description of writing a Lisp interpreter in Haskell that I found helpful.
If you are interested in a high-performance backend, I recommend LLVM. It has a concise and elegant byte-code and the best x86/amd64 generating backend you can find. There is an optional garbage collector, and some experimental backends that target the JVM and CLR.
You can write a compiler in any language that produces LLVM bytecode. If you are adventurous enough to learn Haskell but want LLVM, there are a set of Haskell-LLVM bindings.
What has changed considerably but hasn't been mentioned yet is IDE support and interoperability:
Nowadays we pretty much expect Intellisense, step-by-step execution and state inspection "right in the editor window", new types that tell the debugger how to treat them and rather helpful diagnostic messages. The old "compile .x -> .y" executable is not enough to create a language anymore. The environment is nothing to focus on first, but affects willingness to adopt.
Also, libraries have become much more powerful, noone wants to implement all that in yet another language. Try to borrow, make it easy to call existing code, and make it easy to be called by other code.
Targeting a VM - as itowlson suggested - is probably a good way to get started. If that turns out a problem, it can still be replaced by native compilers.
I'm pretty sure you do what's always been done.
Write some code, and show your results to the world.
As compared to the olden times, there are some tools to make your job easier though. Might I suggest ANTLR for parsing your language grammar?
Speaking as someone who just built a very simple assembly like language and interpreter, I'd start out with the .NET framework or similar. Nothing can beat the powerful syntax of C# + the backing of the entire .NET community when attempting to write most things. From here i designed a simple bytecode format and assembly syntax and proceeeded to write my interpreter + assembler.
Like i said, it was a very simple language.
You should not accept wimpy solutions like using the latest tools. You should bootstrap the language by writing a minimal compiler in Visual Basic for Applications or a similar language, then write all the compilation tools in your new language and then self-compile it using only the language itself.
Also, what is the proposed name of the language?
I think recently there have not been languages with ALL CAPITAL LETTER names like COBOL and FORTRAN, so I hope you will call it something like MIKELANG with all capital letters.
Not so much an implementation but a design decision which effects implementation - if you make every statement of your language have a unique parse tree without context, you'll get something that it's easy to hand-code a parser, and that doesn't require large amounts of work to provide syntax highlighting for. Similarly simple things like using a different symbol for module namespaces and object namespaces ( unlike Java which uses . for both package and class namespaces ) means you can parse the code without loading every module that it refers to.
Standard libraries - include the equivalent of everything in C99 standard libraries other than setjmp. Add whatever else you need for your domain. Work out an easy way to do this, either something like SWIG or an in-line FFI such as Ruby's [can't remember module name] and Python's ctypes.
Building as much of the language in the language is an option, but projects which start out doing either give up (rubinius moved to using C++ for parts of its standard library), or is only for research purposes (Mozilla Narcissus)
I am actually a kid, haha. I've never written an actual compiler before or designed a language, but I have finished The Red Dragon Book, so I suppose I have somewhat of an idea (I hope).
It would depend firstly on the grammar. If it's LR or LALR I suppose tools like Bison/Flex would work well. If it's more LL, I'd use Spirit, which is a component of Boost. It allows you to write the language's grammar in C++ in an EBNF-like syntax, so no muddling around with code generators; the C++ compiler compiles the grammar for you. If any of these fail, I'd write an EBNF grammar on paper, and then proceed to do some heavy recursive descent parsing, which seems to work; if C++ can be parsed pretty well using RDP (as GCC does it), then I suppose with enough unit tests and patience you could write entire compilers using RDP.
Once I have a parser running and some sort of intermediate representation, it then depends on how it runs. If it's some bytecode or native code compiler, I'll use LLVM or libJIT to process it. LLVM is more suited for general compilation, but I like the libJIT API and documentation better. Alternatively, if I'm really lazy, I'll generate C code and let GCC do the actual compilation. Another alternative, is to target an existing VM, like Parrot or the JVM or the CLR. Parrot is the VM being designed for Perl. If it's just an interpreter, I'll walk the syntax tree.
A radical alternative is to use Prolog, which has syntax features which remarkably simulate EBNF. I have no experience with it though, and if I am not wrong (which I am almost certainly going to be), Prolog would be quite slow if used to parse heavy duty programming languages with a lot of syntactical constructs and quirks (read: C++ and Perl).
All this I'll do in C++, if only because I am more used to writing in it than C. I'd stay away from Java/Python or anything of that sort for the actual production code (writing compilers in C/C++ help to make it portable), but I could see myself using them as a prototyping language, especially Python, which I am partial towards. Of course, I've never actually done any of this before, so I'm not one to say.
On lambda-the-ultimate there's a link to Create Your Own Programming Language by Marc-André Cournoyer, which appears to describe how to leverage some modern tools for creating little languages.
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily)
Just a hint: Look at some quite different languages first, before designing a new languge (i.e. languages with a very different evaluation strategy). Haskell and Oz come to mind. Though you should also know Prolog and Scheme. A year ago I also was like "hey, let's design a language that behaves exactly as I want", but fortunatly I looked at those other languages first (or you could also say unfortunatly, because now I don't know how I want a language to behave anymore...).
Before you start creating a language you should read this:
Hanspeter Moessenboeck, The Art of Niklaus Wirth
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf
There's a big shortcut to implementing a language that I don't see in the other answers here. If you use one of Lukasiewicz's "unparenthesized" forms (ie. Forward Polish or Reverse Polish) you don't need a parser at all! With reverse polish, the dependencies go right-to-left so you simply execute each token as it's scanned. With forward polish, it's the reverse of that, so you actually execute the program "backwards", simplifying subexpressions until reaching the starting token.
To understand why this works, you should investigate the 3 primary tree-traversal algorithms: pre-order, in-order, post-order. These three traversals are the inverse of the parsing task that a language reader (i. parser) has to perform. Only the in-order notation "requires" a recursive decent to re-construct the expression tree. With the other two, you can get away with just a stack.
This may require more "thinking' and less "implementing".
BTW, if you've already found an answer (this question is a year old), you can post that and accept it.
Real coders still code in C. Just that it's a litte sharper.
Hmmm... language design? or writing a compiler?
If you want to write a compiler, you'd use Flex + Bison. (google)
Not an easy answer, but..
You essentially want to define a set of rules written in text (tokens) and then some parser that checks these rules and assembles them into fragments.
http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/
People can spend years on this, The above article talks about using two tools (Flex and Bison) That can be used to turn text into code you can feed to a compiler.
First I spent a year or so to actually think how the language should look like. At the same time I helped in developing Ioke (www.ioke.org) to learn language internals.
I have chosen Objective-C as implementation platform as it's fast (enough), simple and rich language. It also provides test framework so agile approach is a go. It also has a rich standard library I can build upon.
Since my language is simple on syntactic level (no keywords, only literals, operators and messages) I could go with Ragel (http://www.complang.org/ragel/) for building scanner. It's fast as hell and simple to use.
Now I have a working object model, scanner and simple operator shuffling plus standard library bootstrap code. I can even run a simple programs - as long as they fit in one file that is :)
Of course older techniques are still common (e.g. using Flex and Bison) many newer language implementations combine the lexing and parsing phase, by using a parser based on a parsing expression grammar (PEG). This works for recursive descent parsers created using combinators, or memoizing Packrat parsers. Many compilers are built using the Antlr framework also.
Use bison/flex which is the gnu version of yacc/lex. This book is extremely helpful.
The reason to use bison is it catches any conflicts in the language. I used it and it made my life many years easier (ok so i'm on my 2nd year but the first 6months was a few years ago writing it in C++ and the parsing/conflicts/results were terrible! :(.)
If you want to write a compiler obviously you need to read the Dragon Book ;)
Here is another good book that I have just read. It is practical and easier to understand than the Dragon Book:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0
Mike --
If you're interested in an efficient native-code-generating compiler for Windows so you can get your bearings -- without wading through all the unnecessary widgets, gadgets, and other nonsense that clutter today's machines -- I recommend the Osmosian Order's Plain English development system. It includes a unique interface, a simplified file manager, a friendly text editor, a handy hexadecimal dumper, the compiler/linker (of course), and a wysiwyg page-layout application for documentation. Written entirely in Plain English, it is a quick download (less than a megabyte), small enough to understand in short order (about 25,000 lines of Plain English code, with just 4,000 in the compiler/linker), yet powerful enough to reproduce itself on a bottom-of-the-line Dell in less than three seconds. Really: three seconds. And it's free to all who write and ask for a copy, including the source code and and a rather humorous tongue-in-cheek 100-page manual. See www.osmosian.com for details on how to get a copy, or write to me directly with questions or comments: Gerry.Rzeppa#pobox.com