Related
For statically typed languages member access is easy, you just calculate the offset of the member at compilation time. However how does ruby or python do it? There are maybe more structures with same member names and you are not even sure what kind of object a variable holds. Do they use some kind of dictionary to look up the member at runtime?
If you were implementing a dynamically typed language from scratch, that's probably where you would start - with something like a hashtable based dictionary, and it's a perfectly ok solution.
Some dynamic language runtimes optimized for size rather than performance (such as Jerryscript, a highly size-optimized Javascript interpreter) use this sort of approach exclusively, and it works fine.
However, most modern JIT-based dynamic language runtimes such as the V8 Javascript engine (used in Chrome and Node.js) or the JSC Javascript engine used in Safari only use this sort of dictionary as a fallback if they can't do anything better.
Here's an answer to another question where I described how V8 maps work and how they make property access very efficient. It contains a link to a much more detailed description, and a video by Lars Bak who is one of the lead engineers for V8, which is good if you're interested in how this stuff works.
With Ruby (as with Javascript), there are a number of different implementations (JRuby, MRI, Rubinius, etc), so to answer the question "How does Ruby do it" is difficult - each implementation will do it in a different way (although there will be many similarities, imposed by the language design).
Since you seemed to be asking about the concepts in dynamic languages in general, hopefully you'll find the link above gives you some useful information about possible implementations.
I want to know that what are the strategies to create a source to source translator i.e creating translation from one high level language to another. The two ways that come into my mind are
1- Changing syntax tree of one language to other language syntax tree
2- Changing it to intermediate language and then converting that to other high level language
My question is that is it possible to do the conversion using both strategies and which is more feasible to do, can anyone give some refernces to any theory or implementation done by some converter like any of above methods. And is there any standard xml based intermediate language, i know that xmlvm uses xml as intermediate language but it does not provide any proper specification of the intermediate language.
Any compiler is, roughly, a source-to-source translator. Target language can be an assembly language (or directly a binary machine code language), or C, or whatever high level language you fancy. So, the general compilers theory is applicable.
And just as a word of advice - one intermediate language is normally not nearly enough. Use more. Use dozens of intermediate languages, each different from a previous one in just one tiny aspect. This way any language-to-language translation is nothing but trivial.
Another word of advice (anticipating downvotes here) - stay away from XML, especially as a representation for ASTs.
I would look at LLVM, which can do source to source. Although the output isn't pretty, it might provide some good ideas.
The converters are usually based on constructing the semantic tree of one program and then re-shaping it to the target PL. As an example, take a look at C# to Java convertor.
The second approach is also possible, but the organization of your code may change completely after conversion. So, it is better to keep the intermediate common structure (IL, ST, etc), as high level as possible.
Try Clang! It is powerful for source-to-source translation. As of now it fully supports C, C++, Objective C and Objective C++.
You may also want to look at ROSE compiler infrastructure.
say i found a good open source software/library written in python. i want to wrap some of the functions or methods that i have created into easy to understand language of my own.
do porter_stemm(DOC) (the DSL) would be equivalent to the function or series of methods written in python.
i want to create a DSL that is easy to learn, but need this DSL translated into the original open source software software.
im not sure if i am clear here but my intention is:
create an easy to learn code language that users can use to solve a problem in a certain niche.
this simple language needs to be translated or compiled or interpretated via some middleware into the original open source software's language (python).
Write your DSL's syntax and a parser for it, e.g., in Python, with pyparsing (simpler than the traditional lexx-yacc like approach) -- the parser can produce a tree of semantically meaningful nodes, and then you can (simpler) walk that tree and interpret it, or (a bit less simple) walk that tree and generate equivalent Python code. That's the approach I'd suggest for most host languages. Some host languages (Lisp and Scheme being the main ones) have powerful macros, so building a DSL out of macros is more common in those languages.
Embedding your DSL in the host language basically means you're not really doing a DSL but a more traditional framework, so that's really a different approach (possibly more powerful, but may be not quite as easy to learn for non-programmers;-).
Typically a DSL would still have the same basic syntax as its host language, and simply extend the language's capabilities. So everything for you is still written in Python, it's just Python with some domain-specific functions, variables, etc.
My experience with DSLs is mostly with the Lisp family of languages. There, a typical DSL might be implemented with Macros, which generate code-patterns that are specific to the DSL. (Plus, functions relating to the DSL). The point is, although the macros and functions may extend the capabilities of the language, they are still implemented in that language.
I suppose you could go a step farther and write an interpreter in Python - but there you still get Python runnability for free, as your interpreter translates your custom language back into Python.
I need to get around to writing that programming language I've been meaning to write. How do you kids do it these days? I've been out of the loop for over a decade; are you doing it any differently now than we did back in the pre-internet, pre-windows days? You know, back when "real" coders coded in C, used the command line, and quibbled over which shell was superior?
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily) but how do you build the compiler and standard libraries and so forth? What tools do you kids use these days?
One consideration that's new since the punched card era is the existence of virtual machines already bountifully provided with "standard libraries." Targeting the JVM or the .NET CLR instead of ye olde "language walled garden" saves you a lot of bootstrapping. If you're creating a compiled language, you may also find Java byte code or MSIL an easier compile target than machine code (of course, if you're in this for the fun of creating a tight optimising compiler then you'll see this as a bug rather than a feature).
On the negative side, the idioms of the JVM or CLR may not be what you want for your language. So you may still end up building "standard libraries" just to provide idiomatic interfaces over the platform facility. (An example is that every languages and its dog seems to provide its own method for writing to the console, rather than leaving users to manually call System.out.println or Console.WriteLine.) Nevertheless, it enables an incremental development of the idiomatic libraries, and means that the more obscure libraries for which you never get round to building idiomatic interfaces are still accessible even if in an ugly way.
If you're considering an interpreted language, .NET also has support for efficient interpretation via the Dynamic Language Runtime (DLR). (I don't know if there's an equivalent for the JVM.) This should help free you up to focus on the language design without having to worry so much about the optimisation of the interpreter.
I've written two compilers now in Haskell for small domain-specific languages, and have found it to be an incredibly productive experience. The parsec library makes playing with syntax easy, and interpreters are very simple to write over a Haskell data structure. There is a description of writing a Lisp interpreter in Haskell that I found helpful.
If you are interested in a high-performance backend, I recommend LLVM. It has a concise and elegant byte-code and the best x86/amd64 generating backend you can find. There is an optional garbage collector, and some experimental backends that target the JVM and CLR.
You can write a compiler in any language that produces LLVM bytecode. If you are adventurous enough to learn Haskell but want LLVM, there are a set of Haskell-LLVM bindings.
What has changed considerably but hasn't been mentioned yet is IDE support and interoperability:
Nowadays we pretty much expect Intellisense, step-by-step execution and state inspection "right in the editor window", new types that tell the debugger how to treat them and rather helpful diagnostic messages. The old "compile .x -> .y" executable is not enough to create a language anymore. The environment is nothing to focus on first, but affects willingness to adopt.
Also, libraries have become much more powerful, noone wants to implement all that in yet another language. Try to borrow, make it easy to call existing code, and make it easy to be called by other code.
Targeting a VM - as itowlson suggested - is probably a good way to get started. If that turns out a problem, it can still be replaced by native compilers.
I'm pretty sure you do what's always been done.
Write some code, and show your results to the world.
As compared to the olden times, there are some tools to make your job easier though. Might I suggest ANTLR for parsing your language grammar?
Speaking as someone who just built a very simple assembly like language and interpreter, I'd start out with the .NET framework or similar. Nothing can beat the powerful syntax of C# + the backing of the entire .NET community when attempting to write most things. From here i designed a simple bytecode format and assembly syntax and proceeeded to write my interpreter + assembler.
Like i said, it was a very simple language.
You should not accept wimpy solutions like using the latest tools. You should bootstrap the language by writing a minimal compiler in Visual Basic for Applications or a similar language, then write all the compilation tools in your new language and then self-compile it using only the language itself.
Also, what is the proposed name of the language?
I think recently there have not been languages with ALL CAPITAL LETTER names like COBOL and FORTRAN, so I hope you will call it something like MIKELANG with all capital letters.
Not so much an implementation but a design decision which effects implementation - if you make every statement of your language have a unique parse tree without context, you'll get something that it's easy to hand-code a parser, and that doesn't require large amounts of work to provide syntax highlighting for. Similarly simple things like using a different symbol for module namespaces and object namespaces ( unlike Java which uses . for both package and class namespaces ) means you can parse the code without loading every module that it refers to.
Standard libraries - include the equivalent of everything in C99 standard libraries other than setjmp. Add whatever else you need for your domain. Work out an easy way to do this, either something like SWIG or an in-line FFI such as Ruby's [can't remember module name] and Python's ctypes.
Building as much of the language in the language is an option, but projects which start out doing either give up (rubinius moved to using C++ for parts of its standard library), or is only for research purposes (Mozilla Narcissus)
I am actually a kid, haha. I've never written an actual compiler before or designed a language, but I have finished The Red Dragon Book, so I suppose I have somewhat of an idea (I hope).
It would depend firstly on the grammar. If it's LR or LALR I suppose tools like Bison/Flex would work well. If it's more LL, I'd use Spirit, which is a component of Boost. It allows you to write the language's grammar in C++ in an EBNF-like syntax, so no muddling around with code generators; the C++ compiler compiles the grammar for you. If any of these fail, I'd write an EBNF grammar on paper, and then proceed to do some heavy recursive descent parsing, which seems to work; if C++ can be parsed pretty well using RDP (as GCC does it), then I suppose with enough unit tests and patience you could write entire compilers using RDP.
Once I have a parser running and some sort of intermediate representation, it then depends on how it runs. If it's some bytecode or native code compiler, I'll use LLVM or libJIT to process it. LLVM is more suited for general compilation, but I like the libJIT API and documentation better. Alternatively, if I'm really lazy, I'll generate C code and let GCC do the actual compilation. Another alternative, is to target an existing VM, like Parrot or the JVM or the CLR. Parrot is the VM being designed for Perl. If it's just an interpreter, I'll walk the syntax tree.
A radical alternative is to use Prolog, which has syntax features which remarkably simulate EBNF. I have no experience with it though, and if I am not wrong (which I am almost certainly going to be), Prolog would be quite slow if used to parse heavy duty programming languages with a lot of syntactical constructs and quirks (read: C++ and Perl).
All this I'll do in C++, if only because I am more used to writing in it than C. I'd stay away from Java/Python or anything of that sort for the actual production code (writing compilers in C/C++ help to make it portable), but I could see myself using them as a prototyping language, especially Python, which I am partial towards. Of course, I've never actually done any of this before, so I'm not one to say.
On lambda-the-ultimate there's a link to Create Your Own Programming Language by Marc-André Cournoyer, which appears to describe how to leverage some modern tools for creating little languages.
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily)
Just a hint: Look at some quite different languages first, before designing a new languge (i.e. languages with a very different evaluation strategy). Haskell and Oz come to mind. Though you should also know Prolog and Scheme. A year ago I also was like "hey, let's design a language that behaves exactly as I want", but fortunatly I looked at those other languages first (or you could also say unfortunatly, because now I don't know how I want a language to behave anymore...).
Before you start creating a language you should read this:
Hanspeter Moessenboeck, The Art of Niklaus Wirth
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf
There's a big shortcut to implementing a language that I don't see in the other answers here. If you use one of Lukasiewicz's "unparenthesized" forms (ie. Forward Polish or Reverse Polish) you don't need a parser at all! With reverse polish, the dependencies go right-to-left so you simply execute each token as it's scanned. With forward polish, it's the reverse of that, so you actually execute the program "backwards", simplifying subexpressions until reaching the starting token.
To understand why this works, you should investigate the 3 primary tree-traversal algorithms: pre-order, in-order, post-order. These three traversals are the inverse of the parsing task that a language reader (i. parser) has to perform. Only the in-order notation "requires" a recursive decent to re-construct the expression tree. With the other two, you can get away with just a stack.
This may require more "thinking' and less "implementing".
BTW, if you've already found an answer (this question is a year old), you can post that and accept it.
Real coders still code in C. Just that it's a litte sharper.
Hmmm... language design? or writing a compiler?
If you want to write a compiler, you'd use Flex + Bison. (google)
Not an easy answer, but..
You essentially want to define a set of rules written in text (tokens) and then some parser that checks these rules and assembles them into fragments.
http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/
People can spend years on this, The above article talks about using two tools (Flex and Bison) That can be used to turn text into code you can feed to a compiler.
First I spent a year or so to actually think how the language should look like. At the same time I helped in developing Ioke (www.ioke.org) to learn language internals.
I have chosen Objective-C as implementation platform as it's fast (enough), simple and rich language. It also provides test framework so agile approach is a go. It also has a rich standard library I can build upon.
Since my language is simple on syntactic level (no keywords, only literals, operators and messages) I could go with Ragel (http://www.complang.org/ragel/) for building scanner. It's fast as hell and simple to use.
Now I have a working object model, scanner and simple operator shuffling plus standard library bootstrap code. I can even run a simple programs - as long as they fit in one file that is :)
Of course older techniques are still common (e.g. using Flex and Bison) many newer language implementations combine the lexing and parsing phase, by using a parser based on a parsing expression grammar (PEG). This works for recursive descent parsers created using combinators, or memoizing Packrat parsers. Many compilers are built using the Antlr framework also.
Use bison/flex which is the gnu version of yacc/lex. This book is extremely helpful.
The reason to use bison is it catches any conflicts in the language. I used it and it made my life many years easier (ok so i'm on my 2nd year but the first 6months was a few years ago writing it in C++ and the parsing/conflicts/results were terrible! :(.)
If you want to write a compiler obviously you need to read the Dragon Book ;)
Here is another good book that I have just read. It is practical and easier to understand than the Dragon Book:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0
Mike --
If you're interested in an efficient native-code-generating compiler for Windows so you can get your bearings -- without wading through all the unnecessary widgets, gadgets, and other nonsense that clutter today's machines -- I recommend the Osmosian Order's Plain English development system. It includes a unique interface, a simplified file manager, a friendly text editor, a handy hexadecimal dumper, the compiler/linker (of course), and a wysiwyg page-layout application for documentation. Written entirely in Plain English, it is a quick download (less than a megabyte), small enough to understand in short order (about 25,000 lines of Plain English code, with just 4,000 in the compiler/linker), yet powerful enough to reproduce itself on a bottom-of-the-line Dell in less than three seconds. Really: three seconds. And it's free to all who write and ask for a copy, including the source code and and a rather humorous tongue-in-cheek 100-page manual. See www.osmosian.com for details on how to get a copy, or write to me directly with questions or comments: Gerry.Rzeppa#pobox.com
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What makes Scala such a wonderful language, other than the type system? Almost everything I read about the language brings out 'strong typing' as a big reason to use Scala, but there has to be more than that. What are some of the other compelling and/or cool language features that make Scala a really useful tool?
Here are some of the things that made me favour Scala (over, say, usual Java):
a) Type inference. The Java way of doing it:
Map<Something, List<SomethingElse>> list = new HashMap<Something, List<SomethingElse>>()
.. is rather verbose compared to Scala. The compiler should be able to figure it out if you give one of these lists.
b) First-order functions. Again, this functionality can be emulated with classes, but it's ugly.
c) Collections that have map and fold. These two tie in with (b), and also these two are something I wish for every time I have to write Java.
d) Pattern matching and case classes.
e) Variances, which mean that if S extends T, then List[S] extends List[T] as well.
Throw in some static types goodness as well, and I was sold on the language quite fast.
It's a mash up of the best bits from a bunch of languages, what's not to love:
Ruby's terse syntax
Java's performance
Erlang's Actor Support
Closures/Blocks
Convenient shorthand for maps & arrays
Scala is often paraded for having closures and implicits. Not surprising really, as lack of closures and explicit typing are perhaps the two biggest sources of Java boilerplate!
But once you examine it a little deeper, it goes far beyond Java-without-the-annoying bits, Perhaps the greatest strength of Scala is not one specific named feature, but how successful it is in unifying all of the features mentioned in other answers.
Post Functional
The union of object orientation and functional programming for example: Because functions are objects, Scala was able to make Maps implement the Function interface, so when you use a map to look up a value, it's no different syntactically from using a function to calculate a value. In unifying these paradigms so well, Scala truly is a post-functional language.
Or operator overloading, which is achieved by not actually having operators, they're just methods used in infix notation. So 1 + 2 is just calling the + method on an integer. If the method was named plus instead then you'd use it as 1 plus 2 which is no different from 1.plus(2). This is made possible because of another combination of features; everything in Scala is an object, there are no primitives, so integers can have methods.
Other Feature Fusion
Type classes were also mentioned, achieved by a combination of higher-kinded types, singleton objects, and implicits.
Other features that work well together are case classes and pattern matching, allowing you to easily build and deconstruct algebraic data types, without having to manually write all the tedious equality, hashcode, constructor and getter/setter logic that Java demands.
Specifying immutability by default, offering lazy values, and providing first class functions all combine to give you a language that's very suited to building efficient functional data structures.
The list goes on, but I've been using Scala for over 3 years now, and I'm still amazed almost daily at how well everything just works together.
Efficient and Versatile
Scala is also a small language, with a spec that (surprisingly!) only needs to be around 1/3 the size of Java's. This is partly because Java has a lot of special cases in the spec that Scala simplifies away, partly because of removing features such as primitives and operators, and partly because a lot of functionality has been moved from the language and into the libraries.
As a benefit of this, all the techniques available to the Scala library authors are also available to any Scala user, which makes it a great language for defining your own control-flow constructs and for building DSLs. This has been used to great effect in projects like Akka - a 3rd-party Actor framework.
Deep
Finally, it scales the full range of programming styles.
The runtime interpreter (known as the REPL) allows you to very quickly explore ideas in an interactive session, and Scala files can also be run as scripts without needing explicit compilation. When coupled with type inference, this gives Scala the feel of a dynamic language such as Ruby, Perl, or a bash script.
At the other end of the spectrum, traits, classes, objects and self-types allow you to build a full-scale enterprise system based on distinct components and using dependency injection without the need of 3rd-party tools. Scala also integrates with Java libraries at a level almost on-par with native Java, and by running on the JVM can take advantage of all the speed benefits offered on that platform, as well as being perfectly usable in containers such as tomcat, or with OSGi.
I'm new to Scala, but my impression is:
Really good JVM integration will be the driving factor. JRuby can call java and java can call JRuby code, but it's explicitly calling into another language, not the clean integration of Scala-Java. So you can use Java libraries, and even mix and match in the same project.
I started looking at scala when I had a realization that the thing which will drive the next great language is easy concurrency. The JVM has good concurrency from a performance standpoint. I'm sure someone will say that Erlang is better, but Scala is actually usable by normal programmers.
Where Java falls down is that it's just so painfully verbose. It takes way too many characters to create and pass a Functor. Scala allows passing functions as arguments.
It isn't possible in Java to create a union type, or to apply an interface to an existing class. These are both easy in Scala.
Static typing usually has a big penalty of verboseness. Scala eliminates this downside while still giving the upside of static typing, which is compile time type checking, and it makes code assist in editors easier.
The ability to extend the language. This has been the thing that has kept Lisp going for decades, and that allowed Ruby on Rails.
The type system really is Scala's most distinguishing feature. It also has a lot of syntactic conveniences over, say, Java.
But for me, the most compelling features of Scala are:
First-class modules.
Higher-kinded types (type constructor polymorphism).
Implicits.
In effect, these features let you approximate (and in some ways surpass) Haskell's type classes. Combined, they let you write exceptionally modular code.
Just shortly:
You get the power and platform-independency of the Java libraries, but without the boilerplate and verbosity.
You get the simplicity and productivity of Ruby, but with static typing and compiled bytecode.
You get the functional goodnesses and concurrency support of Haskell, but without complete paradigm shift and with the benefits of object-orientation.
What I find especially attractive in all of its magnificient features, among others:
Most of the object-oriented design patterns which require loads of boilerplate code in Java are supported natively, e.g. Singleton (via objects), Adapter, Decorator (via traits and implicits), Visitor (via pattern matching), Strategy (via closures) etc.
You can define your domain models and DSLs very concisely, then you can extend them with the necessary features (notification, association handling; parsing, serialization), without the need of code generation or frameworks.
And finally, there is full interoperability with the well-supported Java platform. You can mix Java and Scala in both directions. There is not much penalty nor compatibility problems when switching to Scala after having experienced the annoyances of Java which make code hard to maintain.
Functional Programming brought to JVM
Supposedly it's very easy to make Scala code run concurrently on multiple processors.
Expressiveness of control flow. For example, it's very common to have a collection of data which you need to process in some way. This might be a list of trades in which the processing involves grouping by some properties (the currencies of the investment instruments) and then doing a summation (to get totals-per-currency perhaps).
In Java this involves separating out a piece of code to do the grouping (a few lines of for-loop) and then another piece of code to do the summation (another for loop). In Scala, this type of thing is typically achievable in one line of code using functional programming and then folding, which reads very expressively l-to-r.
Of course, this is just an argument for a functional language over Java.
The great features of Scala has already been mentioned. One thing that shines through past all features though, is how tastefully everything is integrated.
Scala manages to be one of the most powerful language around without having a feeling of having bolted on features in haste. Neither are the language an academic exercise in proving a point. Innovation and really advanced concepts are brought in to the language with uncanny practicality and elegance.
In short: Martin Odersky is a pure design genius. That is what's so great about Scala!
I want to add the multi-paradigm (OO and FP) nature gives Scala an edge over other languages
Every day you code Java you will become more and more miserable, every day you code Scala you will become happier.
Here's a few fairly in depth explanations for the appeal of functional languages.
How/why do functional languages (specifically Erlang) scale well?
If we abandon feature discussion and will talk about style, i would say it's pipe-line style of coding. You start from some object or collection, types dot and property or dot and transformation and do it until you form desired result. This way it's easy to write a chain of transoformations that will be easy to read them also. Traits to some extend will also allow you to apply the same approach to constructing types.