Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am trying to develop a simple C style scripting language for educational purposes.
Thing I have done so far:
defined syntax of the language
written code for tokenizing the language.
The features that I want to include at the moment:
Arthematics
Conditions
while loop (only)
At the moment I don't want to add other features to the language, as it will make development procedure quite complex.
However, I don't know what are the next steps that are involved in developing a language. I have gone through many questions on SO but they weren't very specific in detail. Kindly guide me with this process.
I think these answers are helpful starting out:
How to go about making your own programming language
How to develop Programming language like Coffee script?
If you have defined a EBNF grammar, then you can use a tool like BISON to create a parser. Using that parser to generate an abstract syntax tree you can then proceed to create an interpreter for your language.
Few years back I have been developing my own language too (it was interpreted language) and in phase when language was ready for "others" to try, I found out that there were few things I should have done earlier, or better:
Solve tons of simple programming problems in that language
Solve just a few, but I would call it "hard core" programming problems with it (for example Project Euler)
Write complex language specification, few examples, wiki or FAQ, well anything that will spare you answering the same questions all the time
Hope that helps.
Yes, having done this several times I know it's hard to know where to start. And you really don't need Lex or Yacc or Bison.
Make sure you have the definitions for your lexical elements (tokens) and grammar (in EBNF) nailed down.
Write a lexer for your tokens. It should be able to read a sample program emitting tokens and ending gracefully.
Write a symbol table. This is where you will put symbols as you recognise them. You can put reserved words and literals in here too, or not. It's a design choice.
Write a recursive descent parser, with a function for recognising each production in your grammar. You may need to modify your grammar to let you do this.
Write a tree/node manager for your AST (Abstract Syntax Tree). The parser adds nodes to the tree with links into the symbol table as it recognises productions.
Assuming you get this far, the final two steps are:
Walk the AST performing type and reference resolution, some kinds of optimisation, etc.
Walk the AST to emit code.
The last two steps turn out to be where most of the hard work is.
You will needs some specific references and what you choose depends on your level and what you like to read and what language you like to write. The Dragon Book is an obvious choice, but there are many others. I suggest you look here: Learning to write a compiler.
Related
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Disclaimer: Yes I know this will take 3 years, at least.
I am looking forward to writing a new interpreted programming language. I have a quite solid idea of what I want in terms of dynamicness, syntax, object model, etc, etc.
Now that I have the idea, I have a few questions before I start:
Should I begin writing the full specification and then implement, or write them both all along?
I'm still doubting between C and C++. C++ would allow for more clean design and faster development while C would (maybe) ensure portability to more platforms (microprocessors?). Performance is a must.
Should I try to interest people for the project before the first working prototype so they can cooperate (the end product will be a liberal license anyway), or keep working alone until I have something that runs?
How modular should it be? I am sure that I won't immediately start working on a bytecode interpreter but something easier to implement but slower thing first, so modularity is a must in order to be able to extend later, but I guess overdoing it will hamper performance and clearity.
The answers to your questions depend largely on why you're doing this- the primary reason. Are you trying to create the next Ruby, or is this a learning exercise?
Specification: If this is a personal project, this is not as important. PHP gets a bad rap for having been developed "on the fly," yet many people use it every day. A more complete spec will probably help get people involved if/when you want help.
If you want cross-platform and performance, C is the way to go.
If you want people to join in, prove something first. Write a killer-cool application with your language and blog/talk about why your language is different/special/better.
Modularity of what, the language itself or the compiler? If you want to extend the language, a good spec will help (see #1.) The compiler should be designed with all the best practices in mind, which should help make it extensible.
I hear the Dragon Book is good for learning to develop compilers.
Your specification will be broken unless you write it hand-in-hand with the implementation.
If you think C++ would give you cleaner design and faster development, you should probably use it.
You will have difficulty getting anyone interested in a project unless there is something that runs and demonstrates what is unique about your language.
If you think your language will ever require a byte-code interpreter (and you do say "Performance is a must") you should investigate the capabilities of existing byte-code interpreters before you finalize your language design.
I think you have set yourself too many goals. You say "performance is a must" but in a comment reply you say your goal is "to learn a lot about language design" and that it is "pretty unlikely" that you'll use it in a real project. New programming languages are created to solve problems; more precisely, they're created to help people express solutions to problems in better ways. Designing a language without using it seriously, intensely, continually is like writing software without any test cases: you're likely to wind up with something unusable.
If you want to try your hand at language design, then find a problem---one that you care about---that existing languages won't let you solve the way you want. Then do whatever you can to get a working implementation and start writing and running programs using it. You don't need a hand-crafted JIT compiler with a runtime written in highly bummed assembly code. If you target the JVM or .NET, you get a very high-performance GC, scalable threading system, libraries, and lots of other good stuff for free, even if it interferes with that awesome idea you had for ______.
On the other hand, if you just want to make something run fast, don't try to design a language at the same time. Just find one that you like, learn about implementation strategies, and see if you can do better.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm currently working on the topic of programming-languages and interpreter-design. I have already created several programming languages but couldn't reach my goal so far:
Create a programming-language which focuses on giving the programmer a good feeling when writing code in it. It should just be fun and/or interesting and in no case annoying to write something in it.
I get this feeling when writing code in Python. I sometimes get the opposite with PHP and in rare cases when having to reinvent some wheel in C++.
So I've tried to figure out some syntactical features to make programming in my new language fun, but I just can't find any.
Which concrete features, maybe mainly in terms of syntax, do/could make programming in a language fun?
Examples:
I find it enjoyable to program in Ruby because of it's use of code blocks.
It would be nice if you could include exactly one example in your answer
Those features do not have to already exist in any language!
I'm doing this because I have experienced extreme rises in (my own) productivity when programming in languages I love (because of particular features).
You mentioned Ruby in your question. AFAIK, Ruby is the only programming language, for which Joy is an actual, stated, explicit design goal. (In fact, it is the only design goal.)
The reason that Yukihiro Matsumoto was able to design Ruby this way, is that he already knew and used tons of programming languages before he started designing Ruby and learned tons more in order to design Ruby. (Interestingly, he didn't know Python, and has said that he probably wouldn't have created Ruby if he did.)
Here's just a tiny fraction of the languages that matz has either used himself, or looked at for inspiration (or in some cases for inspiration what not to do):
CLU
Sather
Lisp
Scheme
Smalltalk
Perl
Python
Haskell
Scala
PHP
C
C++
Java
C#
Objective-C
Erlang
And I believe that this is one way that good programming languages can be designed (what Larry Wall calls postmodernist language design): Throw away everything that didn't work in the past, take everything that worked and combine that tastefully.
Of course, this requires that you actually know all those languages from which you want to "steal" and in particular, it requires that you know lots of very different languages with different paradigms, different concepts and different "feels", otherwise the idea pool from which you steal is rather small and inbred.
Consistency.
Its the feeling that you already know something when you use an API or feature you've never used before. It also makes you more productive as you don't have to learn something new for the sake of it.
I think this is also one of the Ruby 'likes', in that if you follow the naming convention, things start to 'just work' without bindings and glue and suchlike.
For example, using the STL in C++, many of the algorithms are the same for all containers - even strings. That makes it nice to use... except for those parts that do not follow the same API (eg vector of bools) then the difference is more noticable.
Two things to keep in mind are orthogonality and the principle of least surprise.
A programming language should make it easy to write correct programs and difficult (if not impossible) to write incorrect programs. For instance, in Java
long x = 2000000000 + 2000000000;
overflows, while
long x = 2000000000L + 2000000000;
doesn't. Is this obvious? I don't think so. Does anyone ever want something to overflow? I don't think so.
Hilarity.
http://lolcode.com/
Follow common practices (like using + for addition, & for bitwise/logical and)
Group logicaly-similar code in namespaces
Have an extensive string processing library
Incorporate debugging facilities
For a cross-platform language, try to minimize platform differences as much as possible
A language feature that appears simple and easy to learn surprises and delights the programmer with its unexpected power. I nominate Haskell type classes :-)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
So is a decompiler really a thing that gives gives the source of a compiled/interpreted piece of code? Because to me that sounds impossible. How would you get the names of the functions, variables, classes, etc if it is compiled. Or am I misinterpreting the definition? How does it work? And what is the general principal behind making one?
You're right about your definition of a decompiler: it takes a compiled application and produces source code to match. However, it does not in most cases know the name and structure of variables/functions/classes--it just guesses. It analyzes the flow of the program and tries to find a way to represent that flow through a certain programming language, typically C. However, because the programming language of choice (C, in this example) is often at a higher level than the state of the underlying program (a binary executable), some parts of the program might be impossible to represent accurately; in this case, the decompiler would fail and you would need to use a disassembler. This is why many people like to obfuscate their code: it makes it much harder for decompilers to open it.
Building a decompiler is not a simple task. Basically, you have to take the application that you are decompiling (be it an executable or some other form of compiled application) and parse it into some kind of tree you can work with in memory. You would then analyze the flow of the program and try to find patters that might suggest that an if statement/variable/function/etc was used in a certain location in the code. It's all really just a guessing game: you'd have to know the patterns that the compiler makes in compiled code, then search for those patterns and replace them with equivalent human-readable source code.
This is all much simpler for higher-level programs like Java or .NET, where you don't have to deal with assembly instructions, and things like variables are mostly taken care of for you. There, you don't have to guess as much as just directly translate. You might not have exact variable/method names, but you can at least deduce the program structure fairly easily.
Disclaimer: I have never written a decompiler and thus don't know every detail of what I'm talking about. If you are really interested in writing a decompiler, you should get a book on the topic.
A decompiler basically takes the machine code and reverts it back to the language it was formatted in. If I'm not mistaken, I think the decompiler needs to know what language it was compiled in, otherwise it won't work.
The basic purpose of the decompiler is to get back to your source code; for example, one time my Java file got corrupted and the only thing I could so to bring it back was by using a decompiler (since the class file wasn't corrupted).
It works by deducing a "reasonable" (based on some heuristics) representation of what's in the object code. The degree of resemblance between what it produces and what was originally there tends to depend heavily upon how much information is contained in binary it starts from. If you start with basically a "pure" binary, it's generally stuck with just making up "reasonable" names for the variables, such as using things like i, j and k for loop indexes, and longer names for most others.
On the other hand, a language that supports introspection needs to embed a great deal more information about variable names, types, etc., into the executable. In a case like this, decompiling can produce something much closer to the original, such as typically retaining the original names for functions, variables, etc. In such a case, the decompiler can often produce something quite similar to the original -- possibly losing little more than formatting and comments.
That depends on what language you are decompiling. If you are decompiling something like C or C++, then the only information provided to you is function names and arguments (In DLLs). If you are dealing with java, then the compiler usually inserts line numbers, variable names, field and method names, and so on. If there are no variable names, then you would get names like localInt1, localInt2, localException1. Or whatever the compiler is. And it can tell the spacing between lines, because of the line numbers.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've had a very odd learning experience in programming. I was sort of taught C++, but I didn't get a lot out of it. Here's what I did get out of it: headers and variable declaration. And I tried to teach myself PHP, in which I learned a lot of. The problem is, a lot of my knowledge is widespread, random, and designed for specific situations.
So, my questions is: What basics are there to programming in most languages?
The term "basics" implies a short list, but to be an effective programmer you have to learn a LOT of concepts. Once you do learn them, though, you'll be able to apply many of the same concepts across languages.
I've compiled a (long!) list of concepts that are important in several, if not most, programming languages.
Language syntax
Keywords
Naming conventions
Operators
Assignment
Arithmetic
String
Other
Literals
Conditionals
If/else
Switch/case
What is considered true or false (0? Empty String? Null?)
Looping constructs
for
foreach/iteration
while
do-while
Exception handling
importing/including code from other files
Type system
Strong/weak
Static/dynamic
Memory management
Scoping
What scopes are available
How overlapping scopes are handled
Language constructs/program organization
Variables
Methods
Functions
Classes
Closures
Packages/Modules/Namespaces
Data types and data structures
Primitives
Objects
Arrays/Lists
Maps/Hash/Associative Array
Sets
Enum
Strings
String concatenation
String comparison and equality
Substring
Replacement
Mutability
Syntax for creating literal strings
Functions, Methods, Closures
Method/function overloading
Method/function overriding
Parameter passing (pass-by-value/pass-by-reference
Returning values (single return/multiple return)
Language type (not mutually exclusive)
Scripting
Procedural
Functional
Object-oriented
Object-oriented principles
Inheritance
Classical vs Prototypical
Single, Multiple, or something else
Classes
Static variables/global variables
access modifiers (private, public, protected)
API (or how to do basic stuff)
Basic I/O
Print to Standard Out
Read from Standard in
File I/O
Read a file
Write a file
Check file attributes
Use of regular expressions
Referencing environment variables
Executing system commands
Threading model
Create threads
Thread-safety
Synchronization primitives
Templating
Another important thing not mentioned here yet is just Object Oriented Programming. The ideas revolving around classes, inheritence, interfaces, etc.
A very important basic programming skill is the ability to think at many different levels of abstraction and to know when and which level of abstraction is the most appropriate for a particular programming task.
Pointers. Because so few people actually understand them.
Recursion and iteration, plus what the difference is, and when you use them.
Get an algorithms book and work through the exercises -- you won't be disappointed.
Testing! (unit testing, integration testing, fixtures, mock objects, ...)
And not a programming skill, but surely a development skill: using revision control, and learning to commit sets of changes that handle one (or a few related) requirement, or bugfix, and will always result in a source tree that compiles without errors. This will teach you to organize your work :-)
And last but not least: English... :-) Again, this is not a programming skill, and I know some may disagree, but I feel that any programming language that uses English keywords, should also be programmed in English. So: use English variable names, and so on. I'd even say that the code comments should be in English, but I am sure even more people would disagree about that... So: learn how others describe their code, and adhere to that.
If I were you, I'd go back and learn the C programming language from the class K&R book.
Find out what sort of thing you want to program for first - e.g. web, PC applications, Java based applications, mobile devices, reports, system interfaces, business to business interfaces, etc. then go from there.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm about to start an LFS-based linux distro just for a hobby project. I plan on doing some very non-standard tasks, and most of it will involve change almost all scripts in the distro. (mainly init scripts, but also I'll be writing a simple set of package manager scripts.) Since I'm gonna be going this far off the norm, and since I have never been a fan of dynamic-typed languages (perl, python, bash and the rest are good, but just not my forte), I was wondering if anyone knew of an interpreted language that actually has declared variables.
Typically the statically typed languages are compiled languages. I guess the reason is, that statical analysis of types is rather expensive and you have to have an in depth look at all the code you're processing. After you've done that it feels like a waste to not write all that information into a file, so that you don't have to do it again next time. So you quickly end up with a compiled language.
On the other hand, to turn a compiled language in a "not-compiled" one is rather easy. You just don't store the results of the compilation anywhere but execute them directly. One compiler I know that provides such a wrapper is GHC, the standard Haskell compiler. You can add #!/usr/bin/runhaskell to your source files and then directly execute them. And since you're planning to be far off the norm, Haskell seems like a perfect fit ;). But expect some rather large startup time for your scripts, because all the "compile time" analysis and optimization isn't free.
Haskell isn't made for shell scripting and it's a functional language, so if you've never seen it before, it might take some time to get used to. But it has very little syntactical overhead and the strength of functional languages is abstraction, so I don't see why you couldn't create a library that makes shell scripting fun. There is even some experimental Haskell shell, but it does seem to be more a proof-of-concept than a real solution.
Generally I would say the overhead of all the type analysis is significant, but I would suggest you pick your favorite statically typed compiled language and look for a wrapper like runhaskell to execute scripts written in it.
quick google. F3, javaFX script, Linden Scripting Language (scripting for second life), Unlike the comment on the first answer F# can be used as a scripting language http://blogs.msdn.com/chrsmith/archive/2008/09/12/scripting-in-f.aspx
Felix, Tuga, CFGScript, Talc, Angelscript, and guessing there is more than that quick search.
Douglas
Groovy. By default it's dynamic, duck-typed. But also supports static typing.
F# provides a combination of "type safety, succinctness, performance, expresivity and scripting".
Look in to the "typeset" command in your favorite shell. bash and ksh93 both can enforce integers and strings, use references (variable variables), etc. With ksh93, you can also do floating-point math and use objects with attributes. Static typing doesn't really buy you anything useful in init scripts and similar. You're primarily going to be reading files and running system commands - which is what the shell's really good at. Take some time with the O'Reilly "learning the Korn Shell" book before deciding that all those other Unixes are stupidly designed... ;)