What is Wasabi? - programming-languages

On Joel Spolsky's blog, I often read something about "Wasabi" and if I check Wikipedia, it mentions that it's an in-house programming language.
What is it? Why do they use it? And why isn't it public?

I would start with The Origin of Wasabi:
As the primary developer and
maintainer of Wasabi, I've wanted to
write a series of articles on Wasabi
for awhile, and last week, I decided
to talk to Joel about it. Today, I
will start off with a short history of
the language, and later, I will talk
about some of its cooler features and
where we want to take it in the
future.

Here's a recent update from 2013:
In the beginning, there was Thistle. Thistle was, at best, a glorified
regex that converted ASP to PHP. It was written by an intern, and it
showed.
Later, Thistle was broadened out into an ASP to PHP compiler. Compiler
was still a loose term; there was still a lot of regex magic that
relied on you following Hungarian apps notation. That said, I'm fairly
confident that this version of Thistle did build an AST for code
generation, which does mean it qualified as a real compiler.
That matters because this version of Thistle was extended out with two
additional features: it could compile VBScript to JavaScript, and it
added some conveniences to VBScript, such as macros (called picture
functions (don't ask)), lambdas, and simplifications to the
declaration system. Don't laugh too much at the former; the motivation
was the same as for RJS or Seaside's JavaScript support. All three
technologies are dead now, and for good reason, but it was modish at
the time.
Later, when .NET came out, and VBScript was end-of-lifed, that left us
with the option of rewriting the whole thing...or making a "real"
compiler that could compile VBScript to .NET. Wasabi was born. Wasabi
was written as a proper compiler that could translate VBScript to C#
and (for legacy reasons; see above) JavaScript. Wasabi, unlike
Thistle, was a real, full-blown compiler, in a CS sense, so it was
feasible to add type inference, lambda expressions, and several other
niceties, without spending too much effort. That said, the goal here
was to allow transition: new components, with restrictions, could be
written in C#. The idea was that, at least hopefully, Wasabi would
gradually be deprecated.
So no. It was never meant to be a new language. It was always intended
to be a stepping stone, a translator between languages, not a real
language in itself. While it gained some additional features, that was
to make working in the damn thing palatable--not to be a real language
in its own right. Emitting C# and .NET IL are actually about equally
easy if you have a real compiler, but Wasabi always emitted C#,
specifically so that we could one day ditch the whole thing.
https://news.ycombinator.com/item?id=5281930
See also https://news.ycombinator.com/item?id=5281322

Its the language Fog Creek made and uses. Its based on VBScript.

Related

Does this language have its niche | future?

I am working on a new language, targeted for web development, embeding into applications, distributed applications, high-reliability software (but this is for distant future).
Also, it's target to reduce development expenses in long term - more time to write safer code and less support later. And finally, it enforces many things that real teams have to enforce - like one crossplatform IDE, one codestyle, one web framework.
In short, the key syntax/language features are:
Open source, non-restrictive licensing. Surely crossplatform.
Tastes like C++ but simpler, Pythonic syntax with strict & static type checking. Easier to learn, no multiple inheritance and other things which nobody know anyway :-)
LLVM bytecode/compilation backend gives near-C speed.
Is has both garbage collection & explicit object destruction.
Real OS threads, native support of multicore computers. Multithreading is part of language, not a library.
Types have the same width on any platform. int(32), long(64) e.t.c
Built in post and preconditions, asserts, tiny unit tests. You write a method - you can write all these things in 1 place, so you have related things in one place. If you worry that your class sourcecode will be bloated with this - it's IDEs work to hide what you don't need now.
Java-like exception handling (i.e. you have to handle all exceptions)
I guess I'll leave web & cluster features for now...
What you think? Are there any existing similar languages which I missed?
To summarize: You language has no real selling points. It just does what a dozen other languages already did, with syntax and semantics just slightly off, depending on where the programmer comes from. This may be a good thing, as it makes the language easier to adapt, but you also have to convince people to trouble to switch. All this stuff has to be built and debugged and documented again, tools have to be programmed, people have to learn it and convince their pointy-haired bosses to use it, etc. "So it's language X with a few features from Y and nicer syntax? But it won't make my application's code 15% shorter and cleaner, it won't free me from boilerplate X, etc - and it won't work with my IDE." The last one is important. Tools matter. If there are no good tools for a language, few people will shy away, rightfully so.
And finally, it enforces many things that real teams have to enforce - like one crossplatform IDE, one codestyle, one web framework.
Sounds like a downside! How does the language "enforce one X"? How do you convince programmers this coding style is the one true style? Why shouldn't somebody go and replace the dog slow, hardly maintained, severly limited IDE you "enforce" with something better? How could one web framework possibly fit all applications? Programmers rarely like to be forced into X, and they are sometimes right.
Also, you language will have to talk to others. So you have ready-made standard solutions for multithreading and web development in mind? Maybe you should start with a FFI instead. Python can use extensions written in C or C++, use dynamic libraries through ctypes, and with Cython it's amazingly simple to wrap any given C library with a Python interface. Do you have any idea how many important libraries are written in C? Unless your language can use these, people can hardly get (real-world) stuff done with it. Just think of GUI. Most mayor GUI toolkits are C or C++. And Java has hundreds of libraries (the other JVM languages profit much from Java interop) for many many purposes.
Finally, on performance: LLVM can give you native code generation, which is a huge plus (performance-wise, but also because the result is standalone), but the LLVM optimizers are limited, too. Don't expect it to beat C. Especially not hand-tuned C compiled via icc on Intel CPUs ;)
"Are there any existing similar
languages which I missed?"
D? Compared to your features:
The compiler has a dual license - GPL and Artistic
See example code here.
LDC targets LLVM. Support for D version 2 is under development.
Built-in garbage collection or explicit memory management.
core.thread
Types
Unit tests / Pre and Post Contracts
try/catch/finally exception handling plus scope guarantees
Responding to a few of your points individually (I've omitted what I consider either unimportant or good):
targeted for web development
Most people use php. Not because it's the best language available, that's for sure.
embeding into applications
Lua.
distributed applications, high-reliability software (but this is for distant future).
Have you carefully studied Erlang, both its design and its reference implementation?
it enforces many things that real teams have to enforce - like one crossplatform IDE, one codestyle, one web framework.
If your language becomes successful, people will make other IDEs, other code styles, other web frameworks.
Multithreading is part of language, not a library.
Really good languages for multithreading forbid side effects inside threads. Yes, in practice that pretty much means Erlang only.
Types have the same width on any platform. int(32), long(64) e.t.c
Sigh... There's only one reasonable width for integers outside of machine-level languages like C: infinite.
Designing your own language will undoubtedly teach you someting. But designing a good language is like designing a good cryptosystem: lots of amateurs try, but it takes an expert to do it well.
I suggest you read some of Norman Ramsey's answers here on programming language design, starting with this thread.
Given your interest in distributed applications, knowing Erlang is a must. As for sequential programming, the minimum is one imperative language and one functional language (ideally both Lisp/Scheme and Haskell, but F# is a good start). I also recommend knowing at least one high-level language that doesn't have objects, just so you understand that not having objects can often make the programmer's life easier (because objects are complex).
As for what could drive other people to learn your language... Good tools/libraries/frameworks can't hurt (FORTRAN, php), and a big company setting the example can't hurt (Java, C#). Good design doesn't seem to be much of a factor (a ha-ha-only-serious joke has it that what makes a language successful is using {braces} to delimit blocks: C, C++, Java, C#, php)...
What you've given us is a list of features, with no coherent philosophy, or explanation as to how they will work together. None of the features are unique. At best, you're offering incremental improvements over what's already there. I'd expect there's already languages kicking around with what you've said, it's just that they're still fairly obscure, because they didn't make it.
Languages have inertia. People have to learn new languages, and sometimes new tools. They need incentive to do so, and 20% improvement in a few features doesn't cut it.
What you need, at a minimum, is a killer app and a form of elevator pitch. (The "elevator pitch" is what you tell the higher-ups about your project when you're in the elevator with them, in current US business parlance.) You need to have your language be obviously worth learning for some purpose, and you need to be able to tell people why it's worth learning before they think "just another language by somebody who wanted to write a language" and go away.
You need to form a language community. That community needs to have some localization at first: people who work in X big company, people who want to do Y, whatever. Decide on what that community is likely to be, and come up with one big reason to switch and some reasons to believe that your language can deliver what it promises.
No.
Every buzzword you have included in your feature list is an enormous amount of work to be spec'd, implemented, documented, and tested.
How many people will be actively developing the language? I guess the web is full of failed programming language projects. (Same is true for non-mainstream OSes)
Have a look at what .Net/Visual Studio or Java/Eclipse have accomplished. That's 1000s of years of specification, development, tests, documentation, feedback, bug fixes, service packs.
During my last job I heard about somebody who wrote his own programming framework, because it was "better". The resulting program code (both in the framework and in the applications) is certainly unmaintainable once the original programmer quits, or is "hit by a bus", as the saying goes.
As the list sounds like Java++ or Mono++, you'd probably be more successful in engaging in an existing project, even if it won't have your name tag on it.
Perhaps you missed one key term. Performance.
In any case, unless this new language has some really out-of-this-world features(ex: 100% increase in performance over other web development languages), I think it will be yet another fish in the pond.
Currently I'm responsible for maintaining a framework developed/owned by my company. It's a nightmare. Unless there is a mainstream community, working on this full time, it's really an elephant. I do not appreciate my company's decision to develop its own framework(because it's supposed to be "faster") day 'n night.
The language tastes good in my opinion, I don't want use java for a simple website but I would like to have types and things like that. ASP .NET is a problem because of licensing and I can't afford those licenses for a single website... Also features looks good
Remember a lot of operator overloading: I think is the biggest thing that PHP is actually missing. It allows classes to behave much more like basic types :)
When you have something to test I'll love to help you with it! Thanks
Well, if you have to reinvent the wheel, you can go for it :)
I am not going to give you any examples of languages or language features, but I will give you one advice instead:
Supporting framework is what is the most important thing. People will tend to love it or hate it, depending on how easy is to write good code that get job done. Therefore, please do usability test before releasing it. I mean ask several people how they will do certain task and create API accordingly. Then test beta API on other coders and listen carefully to their comments.
Regards and good luck :)
There's always space for another programming language. Apart from getting the design right, I think the biggest problem is coming across as just another wannabe language. So you may want to look at your marketing, you need a big sponsor who can integrate your language into their products, or you need to generate a buzz around it, easiest way is astroturfing. Good luck.
http://en.wikipedia.org/wiki/List_of_programming_languages
NB the names G and G++ aren't taken. Oh and watch out for the patent trolls.
Edit
Oops G / G++ are taken... still there are plenty more letters left.
This sounds more like a "systems" language rather than a "web development language". The major languages in this category (other than C++/C) are D and Go.
My advice to you would be to not start from scratch but examine the possibility of creating tools or libraries for those languages, and seeing just how far you can push them.

Right Language for the Job

Using the right language for the job is the key - this is the comment I read in SO and I also belive thats the right thing to do. Because of this we ended up using different languages for different parts of the project - like perl, VBA(Excel Macros), C# etc. We have three to four languages currently in use inside the project. Using the right language for the job has made it immensly more easy to do automate a job, but of late people are complaining that any new person who has to take over the project will have to learn so many different languages to get started. Also it is difficult to find such kind of person. Please note that this is a one to two person working on the project maximum at a given point of time. I would like to know if the method we are following is right or should we converge to single language and try to use it across all the job even though another language might be better suited for it. Your experenece related to this would also help.
Languages used and their purpose:
Perl - Processing large text file(log files)
C# with Silverlight for web based reporting.
LabVIEW for automation
Excel macros for processing data in excel sheets, generating graphs and exporting to powerpoint.
I'd say you are doing it right. Almost all the projects I've ever worked on have used multiple languages, without any problems. I think you may be overestimating the difficulty people have picking up new languages, particularly if they all use the same paradigm. If your project were using Haskell, Smalltalk, C++ and assembler, you might have difficulties, I must admit :-)
Using a variety languages vs maintainability is simply another design decision with cost-vs-benefits trade-offs that, like any design decision, need to considered carefully. While I like using the "absolute best" tool for the task (whatever "absolute best" means), I wouldn't necessarily use it without considering other factors such as:
do we have sufficient skill and experience to implement successfully
will we be able to find the necessary resources to maintain it
do we already use the language/tech in our system
does the increase in complexity to the overall system (e.g. integration issues, the impact on build automation) outweigh the benefits of using the "absolute best" language
is there another language that we already use and have experience in that is usable in lieu of the "absolute best" language
I worked a system with around a dozen engineers that used C++, Java, SQL, TCL, C, shell scripts, and just a touch of Perl. I was proud that we used the "best language" where they made sense, but, in one case, using the "best language" (TCL) was a mistake - not because it was TCL - but rather because we failed to observe the full costs-vs-benefits of the choice:*
we had only 1 engineer deeply familiar with TCL - the original engineer who refused to use anything but TCL for a particular target component - and then that engineer left the project
this target component was the only part of the system to use TCL and it was small relative to the other components in the system
the component could have been also implemented in another language we already used that we had plenty of experience in (C or C++) with some extra effort
the component appeared deceptively simple, but in reality had subtle corner cases that bit us in production (not something we could have known then, but always something to consider as a possibility later)
we had to implement special changes to the nightly build because the TCL compiler (yes, we compiled the TCL code into an executable) wouldn't work unless it could first throw its logo up on the screen - a screen which wasn't available during a cron-initiated automated nightly build. (We resorted to using xvfb to satisfy it.)
when bugs turned up, we had difficult time finding engineers who wanted to work on it
when problems with this component cropped up only after sustained load in the field, we lacked the experience and deep understanding of the TCL execution engine to easily debug it in the field
finally, the maintenance and sustainment team, which is a much smaller team with fewer resources than the main development team, had one more language that they needed training and experience in just to support this relatively small component
Although there were a number of things we could have done to head-off some of the issues we hit down the road (e.g. put more emphasis on getting TCL experience earlier, running better tests to detect the issues earlier, etc), my point is that the overall cost-vs-benefit didn't justify using TCL to code that single component. Again, it was not a mistake to use TCL because it was TCL (TCL is a fine language), but rather it was a mistake because we failed to give full consideration to the cost-vs-benefits.
As a Software Engineer it's your job to learn new languages if you need to. I would say you should go with the right tool for the job.
It's like sweeping the floor with an octopus. Yeah, it gets the job done... kind of... but it's probably not the best tool for the job. You're better off using a mop.
Another option is to create positions geared towards working in specific languages. So you can have a C# developer, a Perl developer, and a VBA developer who will only work with that language. It's a bit more overhead, but it is a workable solution.
Any modern software project of any scope -- even if it's a one-person job -- requires more than one language. For instance, a web project usually requires Javascript, a backend language, and a DB query language (though any of these might be created by the backend language). That said, there's a threshold that is easily reached, and then it would be very hard to find new developers to take over projects. What's the limit? Three languages? Four? Let's say: five is too many, but one would be too few for any reasonably complex project.
Using the right language for the right job is definitely appropriate - I am mainly a web programmer, and I need to know server-side programming (Rails, PHP + others), SQL, Javascript, jQuery, HTML & CSS (not programming languages strictly, but complex things I need to know) - it would be impossible for me to do all of that in a single language or technology.
If you have a team of smart developers, picking up new languages will not be a problem for them. In fact they probably will be eager to do so. Just make sure they are given adequate time (and mentoring) to learn the new language.
Sure, there will be a learning curve to implementing production code if there is a new language to learn, but you will have a stronger team member for it.
If you have developers in your teams who strongly resist learning new languages, unless there is a very good reason (e.g they are justifiably adamant that they are being asked to used a different language when it is not appropriate to do so) - then they are not the sort of developers I'd want in my teams.
And don't bother trying to hire people who know all the languages you use. Hire smart programmers (who probably know at least one language you use) - they should pick up the other languages just fine.
I would be of the opinion, that if I a programmer on my team wanted to introduce a second (or third) language into a project, that there better be VERY VERY good reason to do so; as a project manager I would need to be convinced that the cost of doing so, more than offset the problems. And it would take a lot of convincing.
Splitting up project into multiple languages makes it very expensive to hire the right person(s) to take over that project when it needs maintenance. For small and medium shops it could be a huge obstacle.
Edit: I am not talking about using javascript and c# on the same project, I am talking about using C# for most of the code, F# for a few parts and then VB or C++ for others - there would need to be a compelling reason.
KISS: 'Keep it simple stupid' is a good axiom to follow in most cases.
EDIT: I am not completely opposed to adding languages, but the burden of proof is on the person to who wants to do it. KISS (to me) applies to not only getting the project/product done and shipped, but also must take into account the lifetime maint. and support requirements. Lots of languages come and go, and programmers are attracted to new languages like a moth to a light. Most projects I have worked on, I still oversee and/or support 5 or 10 years later - last thing I want to see is some long forgotten and/or orphaned language responsible for some key part of an application I need to support.
My experience using C++ and Lua was that I wrote more glue than actual operational code and for dubious benefit.
I'd start by saying the issue is whether you are using the right paradigm for the job?
Suppose you know how to do object oriented programming in C#. I don't think the leap to Java is all that great . Although you'd have to familiarize yourself with libraries and syntax, the idea is pretty similar.
If you have procedural parts to your project, such as parsing files and various data transformations, your Perl/Excel Macros seem pretty similar.
But to address your issues, I'd say above all, you'll need clarity in code. Since your staff are using several languages, they won't be familiar with all languages to an equal degree. So make sure of this:
1) Syntactic sugar is explained in comments. Sugars are pretty language specific and may not be obvious to a new reader. For instance, in VBA I seem to remember there are default properties, so that TextBox1 = "Hello" is what you'd write instead of TextBox1.Text = "Hello". This can be confusing. Also, things like LINQ statements can have un-obvious meanings. So make sure people have comments to read.
2) Where two components from different languages have to work together, make excruciatingly specific details about how that happens. For instance, I once had to write a C component to be called from Excel VBA. There's quote a few steps and potential errors in doing this, especially as far as compiler flags. Make sure both sides know how their side of the interaction occurs.
Finally, as far as hiring people goes, I think you have to find people who aren't married to a specific language. To put it vaguely, hire an intelligent person who sees business issues, not code. He'll learn the lingo soon enough.

Where can I find a quick reference for standard Basic?

The reason? Pure nostalgia.
Anyway, there was a standard for Basic that was published in the late 80s or early 90s. It was probably ISO/IEC 10279:1991, but I don't have access to that and cannot be sure.
Whatever this standard was, some of the syntax made its way into Borlands Turbo Basic and Microsofts Visual Basic. I never learned any significant amount of VB, but Turbo Basic is one of those things I played with in my mis-spent youth.
At one time, my main reference was an article published in one of the main computing periodicals - maybe Personal Computer World, maybe Byte.
A scan of that article (if anyone can even identify it) would be great, but all I really want is a few pages quick reference of that standard syntax. Must be free (I'm not that nostalgic), but it must describe the standard syntax - the whole point is to sort out what is standard as opposed to VB or whatever.
EDIT The more I think about this, the more convinced I am that this standard was available around 1987 or 1988. Maybe it was the earlier non-full version of the standard above, or maybe it was pre-acceptance of the standard.
Take a look at this PDF of a Dartmouth College BASIC manual from October 1964 if you're awash in nostalgia. Dartmouth's then-president John Kemeny invented the language (as you may know) and the 1960s and 70s-era Dartmouth Time Sharing System offered the language for its users to run programs on ancient GE mainframes in one of the first shared environments out there. (Find out more about DTSS here.)
The ANSI standardization process was underway in the mid-80's and Kemeny & Kurtz (the original inventors of BASIC) founded "True Basic" to market a standard BASIC to the PC market. Around this same time the 8th edition of BASIC on DTSS was renamed "Dartmouth Standard Basic", to emphasize its standard compliance.
True Basic hewed pretty closely to the ANSI standard, but I haven't been able to find a copy for their reference manual online. This list of TB functions and commands may give you some idea though.

How to create a language these days?

I need to get around to writing that programming language I've been meaning to write. How do you kids do it these days? I've been out of the loop for over a decade; are you doing it any differently now than we did back in the pre-internet, pre-windows days? You know, back when "real" coders coded in C, used the command line, and quibbled over which shell was superior?
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily) but how do you build the compiler and standard libraries and so forth? What tools do you kids use these days?
One consideration that's new since the punched card era is the existence of virtual machines already bountifully provided with "standard libraries." Targeting the JVM or the .NET CLR instead of ye olde "language walled garden" saves you a lot of bootstrapping. If you're creating a compiled language, you may also find Java byte code or MSIL an easier compile target than machine code (of course, if you're in this for the fun of creating a tight optimising compiler then you'll see this as a bug rather than a feature).
On the negative side, the idioms of the JVM or CLR may not be what you want for your language. So you may still end up building "standard libraries" just to provide idiomatic interfaces over the platform facility. (An example is that every languages and its dog seems to provide its own method for writing to the console, rather than leaving users to manually call System.out.println or Console.WriteLine.) Nevertheless, it enables an incremental development of the idiomatic libraries, and means that the more obscure libraries for which you never get round to building idiomatic interfaces are still accessible even if in an ugly way.
If you're considering an interpreted language, .NET also has support for efficient interpretation via the Dynamic Language Runtime (DLR). (I don't know if there's an equivalent for the JVM.) This should help free you up to focus on the language design without having to worry so much about the optimisation of the interpreter.
I've written two compilers now in Haskell for small domain-specific languages, and have found it to be an incredibly productive experience. The parsec library makes playing with syntax easy, and interpreters are very simple to write over a Haskell data structure. There is a description of writing a Lisp interpreter in Haskell that I found helpful.
If you are interested in a high-performance backend, I recommend LLVM. It has a concise and elegant byte-code and the best x86/amd64 generating backend you can find. There is an optional garbage collector, and some experimental backends that target the JVM and CLR.
You can write a compiler in any language that produces LLVM bytecode. If you are adventurous enough to learn Haskell but want LLVM, there are a set of Haskell-LLVM bindings.
What has changed considerably but hasn't been mentioned yet is IDE support and interoperability:
Nowadays we pretty much expect Intellisense, step-by-step execution and state inspection "right in the editor window", new types that tell the debugger how to treat them and rather helpful diagnostic messages. The old "compile .x -> .y" executable is not enough to create a language anymore. The environment is nothing to focus on first, but affects willingness to adopt.
Also, libraries have become much more powerful, noone wants to implement all that in yet another language. Try to borrow, make it easy to call existing code, and make it easy to be called by other code.
Targeting a VM - as itowlson suggested - is probably a good way to get started. If that turns out a problem, it can still be replaced by native compilers.
I'm pretty sure you do what's always been done.
Write some code, and show your results to the world.
As compared to the olden times, there are some tools to make your job easier though. Might I suggest ANTLR for parsing your language grammar?
Speaking as someone who just built a very simple assembly like language and interpreter, I'd start out with the .NET framework or similar. Nothing can beat the powerful syntax of C# + the backing of the entire .NET community when attempting to write most things. From here i designed a simple bytecode format and assembly syntax and proceeeded to write my interpreter + assembler.
Like i said, it was a very simple language.
You should not accept wimpy solutions like using the latest tools. You should bootstrap the language by writing a minimal compiler in Visual Basic for Applications or a similar language, then write all the compilation tools in your new language and then self-compile it using only the language itself.
Also, what is the proposed name of the language?
I think recently there have not been languages with ALL CAPITAL LETTER names like COBOL and FORTRAN, so I hope you will call it something like MIKELANG with all capital letters.
Not so much an implementation but a design decision which effects implementation - if you make every statement of your language have a unique parse tree without context, you'll get something that it's easy to hand-code a parser, and that doesn't require large amounts of work to provide syntax highlighting for. Similarly simple things like using a different symbol for module namespaces and object namespaces ( unlike Java which uses . for both package and class namespaces ) means you can parse the code without loading every module that it refers to.
Standard libraries - include the equivalent of everything in C99 standard libraries other than setjmp. Add whatever else you need for your domain. Work out an easy way to do this, either something like SWIG or an in-line FFI such as Ruby's [can't remember module name] and Python's ctypes.
Building as much of the language in the language is an option, but projects which start out doing either give up (rubinius moved to using C++ for parts of its standard library), or is only for research purposes (Mozilla Narcissus)
I am actually a kid, haha. I've never written an actual compiler before or designed a language, but I have finished The Red Dragon Book, so I suppose I have somewhat of an idea (I hope).
It would depend firstly on the grammar. If it's LR or LALR I suppose tools like Bison/Flex would work well. If it's more LL, I'd use Spirit, which is a component of Boost. It allows you to write the language's grammar in C++ in an EBNF-like syntax, so no muddling around with code generators; the C++ compiler compiles the grammar for you. If any of these fail, I'd write an EBNF grammar on paper, and then proceed to do some heavy recursive descent parsing, which seems to work; if C++ can be parsed pretty well using RDP (as GCC does it), then I suppose with enough unit tests and patience you could write entire compilers using RDP.
Once I have a parser running and some sort of intermediate representation, it then depends on how it runs. If it's some bytecode or native code compiler, I'll use LLVM or libJIT to process it. LLVM is more suited for general compilation, but I like the libJIT API and documentation better. Alternatively, if I'm really lazy, I'll generate C code and let GCC do the actual compilation. Another alternative, is to target an existing VM, like Parrot or the JVM or the CLR. Parrot is the VM being designed for Perl. If it's just an interpreter, I'll walk the syntax tree.
A radical alternative is to use Prolog, which has syntax features which remarkably simulate EBNF. I have no experience with it though, and if I am not wrong (which I am almost certainly going to be), Prolog would be quite slow if used to parse heavy duty programming languages with a lot of syntactical constructs and quirks (read: C++ and Perl).
All this I'll do in C++, if only because I am more used to writing in it than C. I'd stay away from Java/Python or anything of that sort for the actual production code (writing compilers in C/C++ help to make it portable), but I could see myself using them as a prototyping language, especially Python, which I am partial towards. Of course, I've never actually done any of this before, so I'm not one to say.
On lambda-the-ultimate there's a link to Create Your Own Programming Language by Marc-André Cournoyer, which appears to describe how to leverage some modern tools for creating little languages.
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily)
Just a hint: Look at some quite different languages first, before designing a new languge (i.e. languages with a very different evaluation strategy). Haskell and Oz come to mind. Though you should also know Prolog and Scheme. A year ago I also was like "hey, let's design a language that behaves exactly as I want", but fortunatly I looked at those other languages first (or you could also say unfortunatly, because now I don't know how I want a language to behave anymore...).
Before you start creating a language you should read this:
Hanspeter Moessenboeck, The Art of Niklaus Wirth
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf
There's a big shortcut to implementing a language that I don't see in the other answers here. If you use one of Lukasiewicz's "unparenthesized" forms (ie. Forward Polish or Reverse Polish) you don't need a parser at all! With reverse polish, the dependencies go right-to-left so you simply execute each token as it's scanned. With forward polish, it's the reverse of that, so you actually execute the program "backwards", simplifying subexpressions until reaching the starting token.
To understand why this works, you should investigate the 3 primary tree-traversal algorithms: pre-order, in-order, post-order. These three traversals are the inverse of the parsing task that a language reader (i. parser) has to perform. Only the in-order notation "requires" a recursive decent to re-construct the expression tree. With the other two, you can get away with just a stack.
This may require more "thinking' and less "implementing".
BTW, if you've already found an answer (this question is a year old), you can post that and accept it.
Real coders still code in C. Just that it's a litte sharper.
Hmmm... language design? or writing a compiler?
If you want to write a compiler, you'd use Flex + Bison. (google)
Not an easy answer, but..
You essentially want to define a set of rules written in text (tokens) and then some parser that checks these rules and assembles them into fragments.
http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/
People can spend years on this, The above article talks about using two tools (Flex and Bison) That can be used to turn text into code you can feed to a compiler.
First I spent a year or so to actually think how the language should look like. At the same time I helped in developing Ioke (www.ioke.org) to learn language internals.
I have chosen Objective-C as implementation platform as it's fast (enough), simple and rich language. It also provides test framework so agile approach is a go. It also has a rich standard library I can build upon.
Since my language is simple on syntactic level (no keywords, only literals, operators and messages) I could go with Ragel (http://www.complang.org/ragel/) for building scanner. It's fast as hell and simple to use.
Now I have a working object model, scanner and simple operator shuffling plus standard library bootstrap code. I can even run a simple programs - as long as they fit in one file that is :)
Of course older techniques are still common (e.g. using Flex and Bison) many newer language implementations combine the lexing and parsing phase, by using a parser based on a parsing expression grammar (PEG). This works for recursive descent parsers created using combinators, or memoizing Packrat parsers. Many compilers are built using the Antlr framework also.
Use bison/flex which is the gnu version of yacc/lex. This book is extremely helpful.
The reason to use bison is it catches any conflicts in the language. I used it and it made my life many years easier (ok so i'm on my 2nd year but the first 6months was a few years ago writing it in C++ and the parsing/conflicts/results were terrible! :(.)
If you want to write a compiler obviously you need to read the Dragon Book ;)
Here is another good book that I have just read. It is practical and easier to understand than the Dragon Book:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0
Mike --
If you're interested in an efficient native-code-generating compiler for Windows so you can get your bearings -- without wading through all the unnecessary widgets, gadgets, and other nonsense that clutter today's machines -- I recommend the Osmosian Order's Plain English development system. It includes a unique interface, a simplified file manager, a friendly text editor, a handy hexadecimal dumper, the compiler/linker (of course), and a wysiwyg page-layout application for documentation. Written entirely in Plain English, it is a quick download (less than a megabyte), small enough to understand in short order (about 25,000 lines of Plain English code, with just 4,000 in the compiler/linker), yet powerful enough to reproduce itself on a bottom-of-the-line Dell in less than three seconds. Really: three seconds. And it's free to all who write and ask for a copy, including the source code and and a rather humorous tongue-in-cheek 100-page manual. See www.osmosian.com for details on how to get a copy, or write to me directly with questions or comments: Gerry.Rzeppa#pobox.com

Why do you or do you not implement using polyglot solutions?

Polyglot, or multiple language, solutions allow you to apply languages to problems which they are best suited for. Yet, at least in my experience, software shops tend to want to apply a "super" language to all aspects of the problem they are trying to solve. Sticking with that language come "hell or high water" even if another language is available which solves the problem simply and naturally. Why do you or do you not implement using polyglot solutions?
I almost always advocate more than 1 language in a solution space (actually, more than 2 since SQL is part of so many projects). Even if the client likes a language with explicit typing and a large pool of talent, I advocate the use of scripting languages for administrative, testing, data scrubbing, etc.
The advantages of many-language boil down to "right tool for the job."
There are legitimate disadvantages, though:
Harder to have collective code ownership (not everyone is versed in all languages)
Integration problems (diminished in managed platforms)
Increased runtime overhead from infrastructure libraries (this is often significant)
Increased tooling costs (IDEs, analysis tools, etc.)
Cognitive "bumps" when switching from one to another. This is a double-edged sword: for those well-versed, different paradigms are complementary and when a problem arises in one there is often a "but in X I would solve this with Z!" and problems are solved rapidly. However, for those who don't quite grok the paradigms, there can be a real slow-down when trying to comprehend "What is this?"
I also think it should be said that if you're going to go with many languages, in my opinion you should go for languages with significantly different approaches. I don't think you gain much in terms of problem-solving by having, say, both C# and VB on a project. I think in addition to your mainstream language, you want to have a scripting language (high productivity for smaller and one-off tasks) and a language with a seriously different cognitive style (Haskell, Prolog, Lisp, etc.).
I've been lucky to work in small projects with the possibility to suggest a suitable language for my task. For example C as a low-level language, extending Lua for the high-level/prototyping has served very well, getting up to speed quickly on a new embedded platform. I'd always prefer two languages for any bigger project, one domain-specific fit to that particular project. It adds a lot of expressiveness for quickly trying out new features.
However probably this serves you best for agile development methods, whereas for a more traditional project the first hurdle to overcome would be choosing which language to use, when scripting languages tend to immediately seem "newcomers" with less marketing push or "seriousness" in their image.
The biggest issue with polyglot solutions is that the more languages involved, the harder it is to find programmers with the proper skill set. Particularly if any of the languages are even slightly esoteric, or hail from entirely different schools of design (e.g. - functional vs procedural vs object oriented). Yes, any good programmer should be able to learn what they need, but management often wants someone who can "hit the ground running", no matter how unrealistic that is.
Other reasons include code reuse, increased complexity interfacing between the different languages, and the inevitable turf wars over which language a particular bit of code should belong in.
All of that said, realize that many systems are polyglot by design -- anything using databases will have SQL in addition to some other language. And there's often scripting involved as well, either for actual code or for the build system.
Pretty much all of my professional programming experience has been in the above category. Generally there's a core language (C or C++), SQL of varying degrees, shell scripting, and possibly some perl or python code on the periphery.
My employer's attitude has always been to use what works.
This has meant that when we found some useful Perl modules (like the one that implements "Benford's Law", Statistics::Benford), I had to learn how to use ActiveState's PDK.
When we decided to add interval maths to our project, I had to learn Ada and how to use both GNAT and ObjectAda.
When a high-speed string library was requested, I had to relearn assembler and get used to MASM32 and WinAsm.
When we wanted to have a COM DLL of libiconv (based on Delphi Inspiration's code), I got reacquainted with Delphi.
When we wanted to use Dr. Bill Poser's libuninum, I had to relearn C, and how to use Visual C++ 6's IDE.
We still prototype things in VB6 and VBScript, because they're good at it.
Maybe sometime down the line I'll end up doing stuff in Forth, or Eiffel, or D, or, heaven help me, Haskell (I don't have anything against the language per se, it's just a very different paradigm.)
One issue that I've run into is that Visual Studio doesn't allow multiple languages to be mixed in a single project, forcing you to abstract things out into separate DLLs for each language, which isn't necessarily ideal.
I suspect the main reason, however, is the perception that switching back and forth between many different languages leads to programmer inefficiency. There is some truth to this, I switch constantly between JavaScript, C#, VBScript, and VB.NET and there is a bit of lost time as I switch from one language to another, as I mix my syntax a bit.
Still, there is definitely room for more "polyglot" solutions particularly that extend beyond using JavaScript and whatever back-end programming language.
Well, all the web is polyglot now with Java/PHP/Ruby in the back and JavaScript in the front...
Other examples that come to mind -- a flexible complex system written in a low level language (C or C++) with an embedded high level language (Python, Lua, Scheme) to provide customization and scripting interface. Microsoft Office and VBA, Blender and Python.
A project which can be done in a scripting language such as Python with performance critical or OS-dependent pieces done in C.
Both JVM and CLR are getting lots of new interesting scripting languages compatible. Java + Groovy, C# + IRonPython etc.

Resources