Custom language reformatter for notepad++ - programming-languages

I'm a developer for a database platform that uses a specific programming language very similar to vbscript to carry out some of the functionality of the user interface. The current developing editor used is notepad++ which already has a custom language highlighter. The need for a code tidier has emerged and I have already been looking into customizing an already existing formatter/beautifier but have failed to find any proper documentation as how to carry it out properly. As far as i understand, i will probably need to write an interpreter and use nppexec plugin, I have written a lexer in bison before so i guess i am not a total beginner.
Can anyone perhaps point out a guide or give a few pointers to make things a bit easier to carry out?

Related

What is the easiest way to query some input values in Haskell

What is the easiest way to get some values (yes/no, numbers) into a Haskell program. The values should be bound to some variables and other questions should be asked based on previous inputs.
I am trying to solve a little problem for which I think Haskell is best suited. Especially for extending the functionality afterwards. An addition I am also trying to learn this language (I am new to Haskell but have some experience with Prolog, so have some idea about functional programming).
I was checking all he stuff relatd to GUI development, but this is actually an overkill to what I need. The input should be in response to some questions which are dependent on the state of the execution.
I hope this is clear enough.
EDIT:
I would like to have some "po-ups" like these. Not all at once, but just as a pop-up when needed.
It feels a little bit like your assumption is that Haskell is like Javascript here.
That is, it's very simple in Javascript to get a "popup" to display in a browser such as Chrome by using prompt("Are you hungry or thirsty?"), but that's only because the prompt function is built on top of the window object which the browser provides to allow developers to hook into the windowing stack of the operating system that the browser is built in.
Haskell, by default, provides far less functionality "for free". That is, if you want to display a pop up, you'll have to use some library that allows you to display some pop-up.
This is a much bigger question than it possibly seems. It's very similar to the same question in any other batch-style programming language. How would you do this in Java, or in Ruby? Well, you need to find a library that supports it.
One such library for many languages and that is cross platform across operating systems is wxWidgets. It's built in C++, but there are bindings/libraries for Haskell and many other languages. The Haskell library is called wxhaskell: https://wiki.haskell.org/WxHaskell
Good luck, and don't expect it to be an easy path necessarily.
If you have interest in learning Haskell basics, feel free to take a look at this tutorial I helped author: http://happylearnhaskelltutorial.com

Is there a program which can help understand another program?

I need to document the software I'm currently working on. The software consists of several programming languages and scripts which got me thinking. If a new developers comes along and needs to fix something, they might know Java but maybe not bash scripting. It would be nice if there was a program which would help to understand what
for f in "$#" ; do
means. I was thinking of something that creates a static HTML page with the code plus syntax highlighting and if you hover over something (like the "for"), it would display a pop-up with an explanation:
for starts a loop which iterates over all values that follow in. In the loop, you can access each value via the variable $f. The loop body is between do and done
Does something like that already exist?
[EDIT] This is just an example. You'll get another help for f, in, "$#", ; and do, i.e. each and every element of the line should be explained. Unknown elements (like command names) should link to Google. So you can understand what it does even if you're missing some detail.
[EDIT2] I'm aware that you can't write a program which understands what another program does. What I'm looking for is a simple tool which will do "extended syntax highlighting" in the sense that it will color an expression and give a short explanation what it means (plus maybe a link to some in-depth reference).
This is meant for someone who knows how to program but maybe hasn't seen some obscure construct before. Say
echo "Error" 1>&2
Every bash programmer knows what this means but a Java developer might be puzzled by the 1>&2 despite the fact that they can guess that echo == System.out.println. A simple "Redirects stdout to stderr" will clear things up and give that instant "AHA!" which allows them to stay in their current train of thought.
A tool like this could be built using ANTLR, i.e. parse the code into an abstract syntax tree using an ANTLR grammar for that language, and write an HTML generator which produced the annotated code.
It sounds like a useful tool to have for language learning, or exploring source code of projects you're not maintaining -- but is it appropriate for documentation?
Why is it important to help the programmers of other languages understand the code at this level of implementation detail? Anyone maintaining the implementation at this level will obviously have to know the language and will probably have an IDE to do most of this.
That said, I'd definitely consider a tool like this as a learning aid.
IMO it would be simpler and more effective to just collect links to good language-specific references and tutorials on a Wiki page.
For all mainstream languages, such sources exist and are maintained regularly. If you try to create your own reference, you need to maintain it too. Fair enough, bash syntax is not going to change very often, but other languages do develop faster, so it is going to be a burden.
If you think about it, it's not that useful to have a tool that explains the syntax. Developers could just google for keywords instead of browsing a website in a similar fashion to http://www.codeweblog.com/source/ .
I believe that good comments will be by far more useful, plus there are tools to extract the documentation by using the comments (for example, HappyDoc does that for Python).
It is a very tricky thing. First of all by definition it can be proven that program that will "understand" any program down't exist. However, you can still use existing documentation. Maybe using tools like Doxygen can help you. You would need to document your code through comments and the documentation will be generated from them.
A language cannot be explained only through its syntax. The runtime environment plays a great part, together with the underlying philosophy of the language and libraies.
Moreover, syntax is not that complex for most common languages (given that code has been written with maintainability in mind).
Going on with bash example, you cannot deeply understand bash if you know nothing about processes & job control, environment variables, a big list of unix commands (tr, sort, cut, paste, sed, awk, find, ...) and many other features that don't appear in syntax.
If the tool produced
for starts a loop which iterates over
all values that follow in. In the
loop, you can access each value via
the variable $f. The loop body is
between do and done
it would be pretty worthless. This is exactly the kind of comment that trainee (human) programmers are told nver to write.

How to create a language these days?

I need to get around to writing that programming language I've been meaning to write. How do you kids do it these days? I've been out of the loop for over a decade; are you doing it any differently now than we did back in the pre-internet, pre-windows days? You know, back when "real" coders coded in C, used the command line, and quibbled over which shell was superior?
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily) but how do you build the compiler and standard libraries and so forth? What tools do you kids use these days?
One consideration that's new since the punched card era is the existence of virtual machines already bountifully provided with "standard libraries." Targeting the JVM or the .NET CLR instead of ye olde "language walled garden" saves you a lot of bootstrapping. If you're creating a compiled language, you may also find Java byte code or MSIL an easier compile target than machine code (of course, if you're in this for the fun of creating a tight optimising compiler then you'll see this as a bug rather than a feature).
On the negative side, the idioms of the JVM or CLR may not be what you want for your language. So you may still end up building "standard libraries" just to provide idiomatic interfaces over the platform facility. (An example is that every languages and its dog seems to provide its own method for writing to the console, rather than leaving users to manually call System.out.println or Console.WriteLine.) Nevertheless, it enables an incremental development of the idiomatic libraries, and means that the more obscure libraries for which you never get round to building idiomatic interfaces are still accessible even if in an ugly way.
If you're considering an interpreted language, .NET also has support for efficient interpretation via the Dynamic Language Runtime (DLR). (I don't know if there's an equivalent for the JVM.) This should help free you up to focus on the language design without having to worry so much about the optimisation of the interpreter.
I've written two compilers now in Haskell for small domain-specific languages, and have found it to be an incredibly productive experience. The parsec library makes playing with syntax easy, and interpreters are very simple to write over a Haskell data structure. There is a description of writing a Lisp interpreter in Haskell that I found helpful.
If you are interested in a high-performance backend, I recommend LLVM. It has a concise and elegant byte-code and the best x86/amd64 generating backend you can find. There is an optional garbage collector, and some experimental backends that target the JVM and CLR.
You can write a compiler in any language that produces LLVM bytecode. If you are adventurous enough to learn Haskell but want LLVM, there are a set of Haskell-LLVM bindings.
What has changed considerably but hasn't been mentioned yet is IDE support and interoperability:
Nowadays we pretty much expect Intellisense, step-by-step execution and state inspection "right in the editor window", new types that tell the debugger how to treat them and rather helpful diagnostic messages. The old "compile .x -> .y" executable is not enough to create a language anymore. The environment is nothing to focus on first, but affects willingness to adopt.
Also, libraries have become much more powerful, noone wants to implement all that in yet another language. Try to borrow, make it easy to call existing code, and make it easy to be called by other code.
Targeting a VM - as itowlson suggested - is probably a good way to get started. If that turns out a problem, it can still be replaced by native compilers.
I'm pretty sure you do what's always been done.
Write some code, and show your results to the world.
As compared to the olden times, there are some tools to make your job easier though. Might I suggest ANTLR for parsing your language grammar?
Speaking as someone who just built a very simple assembly like language and interpreter, I'd start out with the .NET framework or similar. Nothing can beat the powerful syntax of C# + the backing of the entire .NET community when attempting to write most things. From here i designed a simple bytecode format and assembly syntax and proceeeded to write my interpreter + assembler.
Like i said, it was a very simple language.
You should not accept wimpy solutions like using the latest tools. You should bootstrap the language by writing a minimal compiler in Visual Basic for Applications or a similar language, then write all the compilation tools in your new language and then self-compile it using only the language itself.
Also, what is the proposed name of the language?
I think recently there have not been languages with ALL CAPITAL LETTER names like COBOL and FORTRAN, so I hope you will call it something like MIKELANG with all capital letters.
Not so much an implementation but a design decision which effects implementation - if you make every statement of your language have a unique parse tree without context, you'll get something that it's easy to hand-code a parser, and that doesn't require large amounts of work to provide syntax highlighting for. Similarly simple things like using a different symbol for module namespaces and object namespaces ( unlike Java which uses . for both package and class namespaces ) means you can parse the code without loading every module that it refers to.
Standard libraries - include the equivalent of everything in C99 standard libraries other than setjmp. Add whatever else you need for your domain. Work out an easy way to do this, either something like SWIG or an in-line FFI such as Ruby's [can't remember module name] and Python's ctypes.
Building as much of the language in the language is an option, but projects which start out doing either give up (rubinius moved to using C++ for parts of its standard library), or is only for research purposes (Mozilla Narcissus)
I am actually a kid, haha. I've never written an actual compiler before or designed a language, but I have finished The Red Dragon Book, so I suppose I have somewhat of an idea (I hope).
It would depend firstly on the grammar. If it's LR or LALR I suppose tools like Bison/Flex would work well. If it's more LL, I'd use Spirit, which is a component of Boost. It allows you to write the language's grammar in C++ in an EBNF-like syntax, so no muddling around with code generators; the C++ compiler compiles the grammar for you. If any of these fail, I'd write an EBNF grammar on paper, and then proceed to do some heavy recursive descent parsing, which seems to work; if C++ can be parsed pretty well using RDP (as GCC does it), then I suppose with enough unit tests and patience you could write entire compilers using RDP.
Once I have a parser running and some sort of intermediate representation, it then depends on how it runs. If it's some bytecode or native code compiler, I'll use LLVM or libJIT to process it. LLVM is more suited for general compilation, but I like the libJIT API and documentation better. Alternatively, if I'm really lazy, I'll generate C code and let GCC do the actual compilation. Another alternative, is to target an existing VM, like Parrot or the JVM or the CLR. Parrot is the VM being designed for Perl. If it's just an interpreter, I'll walk the syntax tree.
A radical alternative is to use Prolog, which has syntax features which remarkably simulate EBNF. I have no experience with it though, and if I am not wrong (which I am almost certainly going to be), Prolog would be quite slow if used to parse heavy duty programming languages with a lot of syntactical constructs and quirks (read: C++ and Perl).
All this I'll do in C++, if only because I am more used to writing in it than C. I'd stay away from Java/Python or anything of that sort for the actual production code (writing compilers in C/C++ help to make it portable), but I could see myself using them as a prototyping language, especially Python, which I am partial towards. Of course, I've never actually done any of this before, so I'm not one to say.
On lambda-the-ultimate there's a link to Create Your Own Programming Language by Marc-André Cournoyer, which appears to describe how to leverage some modern tools for creating little languages.
Just to clarify, I mean, not how do you DESIGN a language (that I can figure out fairly easily)
Just a hint: Look at some quite different languages first, before designing a new languge (i.e. languages with a very different evaluation strategy). Haskell and Oz come to mind. Though you should also know Prolog and Scheme. A year ago I also was like "hey, let's design a language that behaves exactly as I want", but fortunatly I looked at those other languages first (or you could also say unfortunatly, because now I don't know how I want a language to behave anymore...).
Before you start creating a language you should read this:
Hanspeter Moessenboeck, The Art of Niklaus Wirth
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf
There's a big shortcut to implementing a language that I don't see in the other answers here. If you use one of Lukasiewicz's "unparenthesized" forms (ie. Forward Polish or Reverse Polish) you don't need a parser at all! With reverse polish, the dependencies go right-to-left so you simply execute each token as it's scanned. With forward polish, it's the reverse of that, so you actually execute the program "backwards", simplifying subexpressions until reaching the starting token.
To understand why this works, you should investigate the 3 primary tree-traversal algorithms: pre-order, in-order, post-order. These three traversals are the inverse of the parsing task that a language reader (i. parser) has to perform. Only the in-order notation "requires" a recursive decent to re-construct the expression tree. With the other two, you can get away with just a stack.
This may require more "thinking' and less "implementing".
BTW, if you've already found an answer (this question is a year old), you can post that and accept it.
Real coders still code in C. Just that it's a litte sharper.
Hmmm... language design? or writing a compiler?
If you want to write a compiler, you'd use Flex + Bison. (google)
Not an easy answer, but..
You essentially want to define a set of rules written in text (tokens) and then some parser that checks these rules and assembles them into fragments.
http://www.mactech.com/articles/mactech/Vol.16/16.07/UsingFlexandBison/
People can spend years on this, The above article talks about using two tools (Flex and Bison) That can be used to turn text into code you can feed to a compiler.
First I spent a year or so to actually think how the language should look like. At the same time I helped in developing Ioke (www.ioke.org) to learn language internals.
I have chosen Objective-C as implementation platform as it's fast (enough), simple and rich language. It also provides test framework so agile approach is a go. It also has a rich standard library I can build upon.
Since my language is simple on syntactic level (no keywords, only literals, operators and messages) I could go with Ragel (http://www.complang.org/ragel/) for building scanner. It's fast as hell and simple to use.
Now I have a working object model, scanner and simple operator shuffling plus standard library bootstrap code. I can even run a simple programs - as long as they fit in one file that is :)
Of course older techniques are still common (e.g. using Flex and Bison) many newer language implementations combine the lexing and parsing phase, by using a parser based on a parsing expression grammar (PEG). This works for recursive descent parsers created using combinators, or memoizing Packrat parsers. Many compilers are built using the Antlr framework also.
Use bison/flex which is the gnu version of yacc/lex. This book is extremely helpful.
The reason to use bison is it catches any conflicts in the language. I used it and it made my life many years easier (ok so i'm on my 2nd year but the first 6months was a few years ago writing it in C++ and the parsing/conflicts/results were terrible! :(.)
If you want to write a compiler obviously you need to read the Dragon Book ;)
Here is another good book that I have just read. It is practical and easier to understand than the Dragon Book:
http://www.amazon.co.uk/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=language+implementation+patterns&x=0&y=0
Mike --
If you're interested in an efficient native-code-generating compiler for Windows so you can get your bearings -- without wading through all the unnecessary widgets, gadgets, and other nonsense that clutter today's machines -- I recommend the Osmosian Order's Plain English development system. It includes a unique interface, a simplified file manager, a friendly text editor, a handy hexadecimal dumper, the compiler/linker (of course), and a wysiwyg page-layout application for documentation. Written entirely in Plain English, it is a quick download (less than a megabyte), small enough to understand in short order (about 25,000 lines of Plain English code, with just 4,000 in the compiler/linker), yet powerful enough to reproduce itself on a bottom-of-the-line Dell in less than three seconds. Really: three seconds. And it's free to all who write and ask for a copy, including the source code and and a rather humorous tongue-in-cheek 100-page manual. See www.osmosian.com for details on how to get a copy, or write to me directly with questions or comments: Gerry.Rzeppa#pobox.com

How do you create a computer or scripting language for an application?

Duplicate of:
Learning to write a compiler
Documentation on creating a programming language
Learning Resources on Parsers, Interpreters, and Compilers
Suggestions for writing a programming language?
Compiler-Programming: What are the most fundamental ingredients?
Are there some online resources about compiler principle?
and others I'm too lazy to find right now.
I'm not asking how to make an incredibly complex language. I just wanted to understand the basics. I would use c# as the underlying language. I know it's vague. I was hoping for something very basic to direct me.
I think I'm mostly interested in creating scripting languages. For example, I see people that write programs but then they have a scripting language for their application. I do not want to rewrite a windows scripting language. Say I had a text file reader and for some reason wanted a scripting language to automate something. I'm not sure how to ask.
Thank you.
EDIT - Thank you for the answers. I was looking at it more for the learning not the doing at the moment. I would probably use LUA, but I am trying to learn more about the concept in general.
You could take a look at LUA - I've used it to great success each time I asked myself the question "How would I automate insert task here in insert one of my apps here?"
Edit: Here are some examples (taken from the links page, admittedly, unwieldy Lua Wiki) on how you could embed Lua in your app:
Embedding Lua in C: Using Lua from inside C
Embedding a scripting language inside your C/C++ code
Embeddable scripting with Lua
You can use an existing language like Python or Javascript. For example, for Javascript, there is http://www.mozilla.org/rhino/ for Java apps. So typically you don't need to actually invent a new language, you would just provide a custom API for a language that already exists.
first you need a lexical parser like lex, then a syntax parser like bison.
then you can work with the syntax parser to create an interpreter to 'execute' the syntax results.
that's how the most scripting languages do.
p.s: another way is to practice by writing shells - shell scripts (bash, csh, or sh) are highly simplified scripting languages.
Some terminology is in order. You may be talking about a domain-specific language.
The two basic ways to transform a text file into an "executable": a compiler or an interpreter. An interpreter fits the scripting concept better, as it is easier to build and executes lines one at a time. Note that beyond a very simple language both writing a decent parser or a decent interpreter are non-trivial. The classic work on interpreters is SICP, but this is quite a hard book for beginners.
Scott Hanselman mentioned in his latest hanselminutes podcast that integrating IronPython to allow scripting of an existing application was very easy to do.
If you're interested in the end target of having your application be scriptable, then you should definitely consider using an existing language rather than attempting to write your own.
If you are more interested in the educational experience of writing your own scripting language, then you should go for it!
There's no need to create a new scripting language there are several eg. Rhino which is a widely used embeddable javascript (http://www.mozilla.org/rhino/) or Jscript from MS, that you can use directly in your product.
I've gone the way that you are asking - I once created my own scheme interpreter. This worked really well, but we re-invented a lot of technology and didn't really get a lot of additional benefit. We would have been far better off just using one of the scheme's that were available. I would not make that decision again even though it was fun and successful.

JetBrains Meta Programming System

Does anyone have any experience with the JetBrains Meta Programming System? Is MPS better than, say, developing a DSL in Ruby?
I don't have any personal experience with MPS, but it was mentioned on the recent episode of Herding Code with Markus Völter. Here's my understanding. MPS is a projection editor which means, instead of parsing and editing text, you are directly editing the underlining language data structure. As Markus mentions, MPS allows you to define your own language but you can also introduce new language concepts into existing languages. For example, you can add a new keyword to Java in a matter of minutes. MPS blurs the lines between internal and external DSLs and, with this, you get static typing and tool support which you wouldn't get when developing a DSL with a dynamic language like Ruby.
I work for JetBrains. I led the MPS project for several years, and now I am working on another project which is also completely written in MPS. According to my experience, MPS is worth using :-)
The answer to your question depends on many things. If you have Ruby based system, or want to create a language quickly, Ruby based internal DSL might be the best choice. If you want to generate Java, and have time to learn MPS, MPS might be the best case. You might also consider systems like XText, etc, which are middle ground between Ruby based DSLs and MPS.
MPS is an interesting beast and has a very huge potential. The idea is simply fantastic:
Inside an IDE (MPS) the user defines more or less visually his DSL(s)
the IDE allows to generate not just the language itself (the runtime or what it does), but also the "tool" aka a more or less full blown IDE, that he or other users can use to edit that new language.
That being said, unfortunately at least for the actual available MPS versions, Jetbrains failed to deliver the above(at least for me) because:
- it is very very hard and complicated to use - like it would not have been made by the authors of the easy to use IntelliJ.
- there are just too many concepts and "ways" the user needs to learn before it's able to do something useful, and still one gets the feeling of tapping into the dark.
- the IDE won't generate an IDE for you but something inside MPS too, a "Cell Based Editor" only (as of this version).
I tried MPS several times (cause the concept is so wonderful and promising), but unfortunately as of this moment I wasn't able to do something useful with it.
I might be to stupid for MPS, but in the time I was just figuring out basic about MPS, I was able to deliver fully blown usable Groovy based DSL.
I'm still following MPS's evolution, and hope that one day will deliver what did initially promised, since it's such a fantastic idea.
Macros in common lisp object system CLOS can alter the syntax quite dramatically, MPS is pretty similar to ANTLR, but it comes with a graphical editor. However MPS does not appreciate code fragmenting beyond compile and runtime and hence both MPS and ANTLR convolve around static metaprogramming problems. You still can't create constructs that will accept an arbitrary number of sub-construct arguments like Monadics, eg; a list comprehension builder that takes an arbitrary number of filters and list generators. To make that possible you need to programmatically alter the raw AST. More experienced Lispers can probably point out other transformations that can't be done.
I agree that documentation has been an issue for beginners when learning MPS. This was certainly true when the previous post was written (2010). Having experienced this first-hand, and finally having succeeded in understanding the system, I wrote The MPS Language Workbench (volumes I and II) to help smooth the learning curve. Feedback I get from readers is that the books are sufficient to help you get started (Volume I) and learn more advanced aspects of the MPS platform (Volume II).
Regarding the answer to the original question. Yes, I believe MPS has key advantages compared to developing a DSL in Ruby or Groovy. The reason is that as a language designer you
Have much better control over all aspects of the language,
The languages you build with MPS can include graphical notations and user interface elements, which make them a hybrid between a user interface and a text DSL script/program,
MPS helps migrate programs as you evolve your language (e.g., refactoring or other changes to the language can propagate to the end-users of the languages, using automatic DSL script/program migrations).
You can see a good example of DSL built with MPS in the MetaR project.

Resources