Our users can enter questions that get answered by students. Our users need a extensible, flexible way to define the correct answers to these questions (which are stored as a simple string).
I would like to expose a library of domain specific functions that users can call on to describe the correct answer. Eg:
exact_match("puppy") // means the correct answer is the string 'puppy'
or
contains("yesterday") // means any answer with the word 'yesterday' is correct
The naive implementation would involve eval'ing user supplied strings in a sandboxed runtime (like a javascript vm or ruby vm). But I'd like to go further and only allow specific functions to be called. Any other scripting would be discarded. Such that:
puts("foo"); contains("yesterday")
would be illegal. Since we don't expose or allow puts().
How can I constrain the execution environment to only run a whitelist of functions? Or is there a different approach to build this kind of external-facing DSL instead of trying to constrain an existing language to a subset of functions?
I would check out MPS by JetBrains if I were you, its an open source DSL creation tool. I have never used it myself, but from everything I have seen on it, it's very intuitive; and all of their other products are incredibly powerful.
Just because you're creating a DSL, that doesn't necessarily mean that you have to give the user the ability to enter the code in text.
The key to this is providing a list of method names and your special keyword for them, the "FunCode" tag in the code example below:
Create a mapping from keyword to code, and letting them define everything they need, and then use it. And I would actually build my own XML parser so that it's not hackable, at least not on a list of zero-day-exploits hackable.
<strDefs>
<strDef><strNam>sickStr</strNam>
<strText>sick</strText><strNum>01</strNum><strDef>
<strDef><strNam>pupStr</strNam>
<strText>puppy</strText><strNum>02</strNum><strDef>
</strDefs>
<funDefs>
<funDef><funCode>pfContainsStr</funCode><funLabel>contains</funLabel>
<funNum>01</funNum></funDef>
<funDef><funCode>pfXact</funCode><funLabel>exact_match</funLabel>
<funNum>02</funNum></funDef>
</funDefs>
<queries>
<query><fun>01</fun><str>02</str>
</query>
</queries>
The above XML more represents the idea and the structure of what to do, but rather in a user interface, so the user is constrained. The user interface code that allows the data-entry of the above data should be running on your server, and they only interact with it. Any code that runs on their browser is hackable, because they can just save the page, edit the HTML (and/or JavaScript), and run that, which is their code now, not yours anymore.
You can't really open the door (pandora's box) and allow just anyone to write just any code and have it evaluated / interpreted by the language parser, because some hacker is going to exploit it. You must lock down the strings, probably by having them enter them into your database in an earlier step, and each string gets its own token that YOU generate (a SQL Server primary key is very simple, usable, and secure), but give them a display representation so it's readable to them.
Then give them a list of methods / functions they can use, along with a token (a primary key can also serve here, perhaps with a kind of table prefix) and also a display representation (label).
If you have them put all of their labels into yet another table, you can have SQL make sure that all of their labels are unique to each other in the whole "language", and then you can allow them to try to define their expressions in the language they want to use. This has the advantage that foreign languages can be used, but you don't have to do anything terribly special.
An important piece would be the verify button, that would translate their expression into unique tokens and back again, checking that the round-trip was successful. If it wasn't successful, there's some kind of ambiguity, and you might be able to allow them an option to use the list of tokens as the source in that case.
If you heavily rely on set-based logic for the underlying foundation of the language and your tables, you should be able to produce a coherent DSL that works. Many DSL creation problems are ones of integrity, where there are underlying assumptions that are contradictory, unintentionally mutually exclusive, or nonsensical. Truth is an unshakeable foundation. Anything else has a lie somewhere -- that you're trying to build on.
Sudoku is illustrative here. When you screw up a Sudoku, you often don't know that you have done so, and you keep building on that false foundation, until you get to the completion of the puzzle, and one whole string of assumptions disagrees with a different string of assumptions. They can't both be true. But you can't tell where you went wrong because you're too far away from the mistake and can not work backwards (easily). All steps taken look correct. A DSL, a database schema, and code, are all this way. Baby steps, that are double- and even triple-checked, and hopefully "correct by inspection", are the best way to "grow" a DSL, slowly, piece-by-piece. The best way to not have flaws is to not add them in the first place.
You don't want bugs in your DSL. Keep it spartan. KISS - Keep it simple, Sparticus! And I have personally found that keeping it set-based, if not overtly, under the covers, accomplishes this very well.
Finally, to be able to think this way, I've studied languages for a long time, and have cultivated a curiosity about how languages have come to be. Books are a good quality source of information, as they have a higher quality level than the internet, which is nevertheless also an indispensable source. Some of my favorite languages: Forth, Factor, SETL, F#, C#, Visual FoxPro (especially for its embedded SQL), T-SQL, Common LISP, Clojure, and probably my favorite, Dylan, an INFIX Lisp without parentheses that Apple experimented with and abandoned, with a syntax that seems to me reminiscent of Pascal, which I sort of liked. The language list is actually much longer than that (and I haven't written code for many of them -- just studied them or their genesis), but that's enough for now.
One of my favorite books, and immensely interesting for the "people" side of it, is "Masterminds of Programming: Conversations with the Creators of Major Programming Languages" (Theory in Practice (O'Reilly)) 1st Edition, Kindle Edition
by Federico Biancuzzi (Author), Chromatic (Author)
By the way, don't let them compromise the integrity of your DSL -- require that it is expressible set-based, and things should go well (IMHO). I hope it works out well for you. Add a comment to my answer telling me how it worked out, if you think of it. And don't forget to choose my answer if you think it's the best! We work hard for the money! ;-)
I've got a business app in C#, with unit tests. Can I increase the reliability and cut down on my testing time and expense by using NModel or Spec Explorer? Alternately, if I were to rewrite it in F# (or even Haskell), what kinds (if any) of reliability increase might I see?
Code Contracts? ASML?
I realize this is subjective, and possibly argumentative, so please back up your answers with data, if possible. :) Or maybe an worked example, such as Eric Evans Cargo Shipping System?
If we consider
Unit tests to be specific and strong theorems, checked
quasi-statically on particular “interesting instances” and Types to be general but weak theorems (usually checked statically), and contracts to be general and strong theorems, checked dynamically for particular instances that occur during regular program operation.
(from B. Pierce's Types Considered Harmful),
where do these other tools fit?
We could pose the analogous question for Java, using Java PathFinder, Scala, etc.
Reliability is a function of several variables, including the general architecture of the software, the capability of the programmers, the quality of the requirements and the maturity of your configuration management and general QA processes. All these will affect the reliability of a rewrite.
Having said that, language certainly has a significant impact. All other things being equal:
Defects are roughly proportional to SLOC count. Languages that are terser see fewer coding errors. Haskell seems to require about 10% of the SLOC required by C++, Erlang about 14%, Java around 50%. I guess C# probably fits alongside Java on this scale.
Type systems are not borne equal. Languages with type inference (e.g. Haskell and to a lesser extent O'Caml) will have fewer defects. Haskell in particular will allow you to encode invariants in the type system so that a program will only compile if they can be proven true. Doing so requires extra work, so consider the trade-off on a case-by-case basis.
Managing state is a source of many defects. Functional languages, and especially pure functional languages, avoid this problem.
QuickCheck and its relatives allow you to write unit and system tests that verify general properties rather than individual test cases. This can greatly reduce the work required to test the code, especially if you are aiming for high test coverage metrics. A set of QuickCheck properties resembles a formal specification, and this concept fits nicely with Test Driven Development (write your tests first, and when the code passes them you are done).
Put all of these things together and you should have a powerful toolkit for driving quality through the development lifecycle. Unfortunately I'm not aware of any robust studies that actually prove this. All the factors I listed at the start would confound any real study, and you would need a lot of data before an unambiguous pattern showed itself.
Some comments on the quote, in the context of C# which is my "first" language:
Unit tests to be specific and strong
theorems,
Yes, but they might not give you first order logic checks, like "for all x there exists a y where f(y)", more like "there exists a y, here it is (!), f(y)", aka setup, act, assert. ;)*
checked quasi-statically on
particular “interesting instances” and
Types to be general but weak theorems
(usually checked statically),
Types are not necessarily that weak**.
and
contracts to be general and strong
theorems, checked dynamically for
particular instances that occur during
regular program operation. (from B.
Pierce's Types Considered Harmful),
Unit Testing
Pex + Moles I think is getting closer to the first-order logic type of checking, as it generates the edge-cases and uses the C9 solver to work with integer constraint solving. I would really like to see more Moles tutorials (moles is for replacing implementations), specifically together with some sort of inversion of control container that can leverage what stub- and real- implementations of abstract classes and interfaces already exist.
Weak Types
In C# they are fairly weak, sure: generic typing/types allows you to add protocol semantics for one operation -- i.e. constraining types to be on interfaces, which are in some sense protocols which implementing classes agree to. However, the static typing of the protocol is just for one operation.
Example: Reactive Extensions API
Let's take Reactive Extensions as a discussion topic.
The contract required by the consumer, implemented by the observable.
interface IObserver<in T> : IDisposable {
void OnNext(T);
void OnCompleted();
void OnError(System.Exception);
}
There are more to the protocol than this interface shows: methods called on an IObserver< in T > instance must follow this protocol:
Ordering:
OnNext{0,n} (OnCompleted | OnError){0, 1}
Furthermore, on another axis; time-dimension:
Time:
for all t|-> t:(method -> time). t(OnNext) < t(OnCompleted)
for all t|-> t:(method -> time). t(OnNext) < t(OnError)
i.e. no invocation to OnNext may be done after one to OnCompleted xor OnError.
Furthermore, the axis of parallelism:
Parallelism:
no invocation to OnNext may be done in parallel
i.e. there's a scheduling constraint that needs to be followed from implementers of IObservable. No IObservable may push from multiple threads at the same time, without first synchronizing the invocation around a context.
How do you test this contract holds in an easy way? With c#, I don't know.
Consumer of API
From the consuming side of the application, there might be interactions between different contexts, such as Dispatcher, Background/other threads, and preferably we'd like to give guarantees that we don't end up in a deadlock.
Further, there is the requirement to handle deterministic disposing of the observables. It might not be clear all the time when an extension method's returned IObservable instance takes care of the method's arguments' IObservable instances and dispose those, so there's a requirement to know about the inner workings of the black box (alternatively you can let the references go in a "reasonable way" and the GC will take them at some point)
<<< Without Reactive Extensions, it's not necessarily easier:
There is the task pool on top of TPL is implemented. In the task pool we have a work-stealing queue of delegates to invoke on the worker threads.
Using the APM/begin/end or the async pattern (which queues to the task pool) could leave us open to callback-ordering bugs if we mutating state. Also, the protocol of begin-invocations and their callbacks might be too convoluted and hence impossible to follow. I read a post-mortem the other day about a silverlight project having problems seeing the business logic-forest for all the callback-trees. Then there's the possibility of implementing the poor-man's async monad, the IEnumerable with an async 'manager' iterating through it and calling MoveNext() every time a yielded IAsyncResult completes.
...and don't get me started on the nuuuumerous hidden protocols in IAsyncResult.
Another problem, without using Reactive extensions is the turtles problem - once you decide that you want an IO-blocking operation to be async, there need to be turtles all the way down to the p/invoke call that places the associated Win32-thread on an IO-completion port! If you have three layers and then some logic as well inside of your topmost layer, you need to make all three layers implement the APM pattern; and fulfil the numerous contract obligations of IAsyncResult (or leave it partially broken) -- and there's no default public AsyncResult implementation in the base class library.
>>>
Working with exceptions from the interface
Even with the above memory-management + parallelism + contract + protocol items covered, there are still exceptions to be handled (not just received and forgotten about), in a good, reliable application. I want to make an example;
Context
Let's say that we find ourselves catching an exception from the contract/interface (not necessarily from reactive extensions' IObservable implementations here which have monadic exception handling rather than stack-frame based).
Hopefully the programmer was diligent and documented the possible exceptions, but there might be exception possibilities all the way down. If everything is correctly defined with code contracts at least we can be sure we are capable of catching a few of the exceptions, but many different causes may be lumped together inside of one exception type, and once an exception is thrown, how do we ensure that the work of the least possible size is rectified?
Aim
Say that we are pushing some data-record from a message-bus-consumer in our application, and receiving them on the background thread which decides what to do with them.
Example
A real-life example here could be Spotify, which I'm using every day.
My $100 router/access point throws in the towel at random times. I guess it has a cache-bug or some sort of stack overflow bug, as it happens every time I push more than 2 MB/s LAN/WAN data through it.
I have to NICs up; the wifi and the ethernet card. Ethernet's connection goes down. The sockets of Spotify's event-handler loop return an invalid code (I think it's C or C++) or throw exceptions. Spotify has to handle it, but it doesn't know what my network topology looks like (and there is no code to try all routes/update the routing table and hence the interface to be used); I still have a route to the internet, but just not on the same interface. Spotify crashes.
A thesis
Exceptions are simply not semantic enough. I believe one can look at exceptions from the perspective of the Error monad in Haskell. We either continue or break: unwinding the stack, executing the catches, executing the finally's an praying we don't end up with race conditions on either other exception handlers or the GC, or async exceptions for outstanding IO-completion ports.
But when one of my interfaces' connection/route goes down, Spotify crashes freezes.
Now we have SEH/Structured Exception Handling, but I think we will have SEH2 in the future, where each source of exception gives, with the actual exception, a discriminated union (i.e. it should be statically typed to the linked library/assembly), of possible compensating actions -- in this example, I could imagine Windows' network API telling the application to execute a compensating action to open the same socket on another interface, or to handle it on its own (like now), or to retry the socket, with some kernel-managed retry policy. Each of these options are parts of a discriminated union type, so the implementer must use one of them.
I think that, when we have SEH2, it won't be called exceptions anymore.
^^
Anyway, I have digressed too much already.
Instead of reading my thoughts, listen to some of Erik Meijer's -- this is a very good round-table discussion between him and Joe Duffy. They discuss handling side-effects of calls. Or have a look at this search listing.
I'm finding myself in a position, today, as a consultant, of maintaining a system where stronger static semantics could be good, and I'm looking at tools which can give me the speed of programming + the correctness verification on a level which is accurate and precise. I haven't found it yet.
I simply think we are another 20 years if not more away from developer oriented reliable computing. There are just too many languages, frameworks, marketing BS and concepts in the air right now, for the ordinary develop to stay on top of things.
Why is this under the heading of "weak types"?
Because I find that the type system will be part of the solution; types need not be weak! Terse code and strong type systems (think Haskell) help programmers build reliable software.
Jon Skeet posted this blog post, in which he states that he is going to be asking why the dynamic part of languages are so good. So i thought i'd preemptively ask on his behalf: What makes them so good?
The two fundamentally different approaches to types in programming languages are static types and dynamic types. They enable very different programming paradigms and they each have their own benefits and drawbacks.
I'd highly recommend Chris Smith's excellent article What to Know Before Debating Type Systems for more background on the subject.
From that article:
A static type system is a mechanism by which a compiler examines source code and assigns labels (called "types") to pieces of the syntax, and then uses them to infer something about the program's behavior. A dynamic type system is a mechanism by which a compiler generates code to keep track of the sort of data (coincidentally, also called its "type") used by the program. The use of the same word "type" in each of these two systems is, of course, not really entirely coincidental; yet it is best understood as having a sort of weak historical significance. Great confusion results from trying to find a world view in which "type" really means the same thing in both systems. It doesn't. The better way to approach the issue is to recognize that:
Much of the time, programmers are trying to solve the same problem with
static and dynamic types.
Nevertheless, static types are not limited to problems solved by dynamic
types.
Nor are dynamic types limited to problems that can be solved with
static types.
At their core, these two techniques are not the same thing at all.
The main thing is that you avoid a lot of redundancy that comes from making the programmer "declare" this, that, and the other. A similar advantage could be obtained through type inferencing (boo does that, for example) but not quite as cheaply and flexibly. As I wrote in the past...:
complete type checking or inference
requires analysis of the whole
program, which may be quite
impractical -- and stops what Van Roy
and Haridi, in their masterpiece
"Concepts, Techniques and Models of
Computer Programming", call "totally
open programming". Quoting a post of
mine from 2004: """ I love the
explanations of Van Roy and Haridi, p.
104-106 of their book, though I may or
may not agree with their conclusions
(which are basically that the
intrinsic difference is tiny -- they
point to Oz and Alice as interoperable
languages without and with static
typing, respectively), all the points
they make are good. Most importantly,
I believe, the way dynamic typing
allows real modularity (harder with
static typing, since type discipline
must be enforced across module
boundaries), and "exploratory
computing in a computation model that
integrates several programming
paradigms".
"Dynamic typing is recommended", they
conclude, "when programs must be as
flexible as possible". I recommend
reading the Agile Manifesto to
understand why maximal flexibility is
crucial in most real-world
application programming -- and
therefore why, in said real world
rather than in the more academic
circles Dr. Van Roy and Dr. Hadidi
move in, dynamic typing is generally
preferable, and not such a tiny issue
as they make the difference to be.
Still, they at least show more
awareness of the issues, in devoting 3
excellent pages of discussion about
it, pros and cons, than almost any
other book I've seen -- most books
have clearly delineated and preformed
precedence one way or the other, so
the discussion is rarely as balanced
as that;).
I'd start with recommending reading Steve Yegge's post on Is Weak Typing Strong Enough, then his post on Dynamic Languages Strike Back. That ought to at least get you started!
Let's do a few advantage/disadvantage comparisons:
Dynamic Languages:
Type decisions can be changed with minimal code impact.
Code can be written/compiled in isolation. I don't need an implementation or even formal description of the type to write code.
Have to rely on unit tests to find any type errors.
Language is more terse. Less typing.
Types can be modified at runtime.
Edit and continue is much easier to implement.
Static Languages:
Compiler tells of all type errors.
Editors can offer prompts like Intellisense much more richly.
More strict syntax which can be frustrating.
More typing is (usually) required.
Compiler can do better optimization if it knows the types ahead of time.
To complicate things a little more, consider that languages such as C# are going partially dynamic (in feel anyway) with the var construct or languages like Haskell that are statically typed but feel dynamic because of type inference.
Dynamic programming languages basically do things at runtime that other languages do at Compile time. This includes extension of the program, by adding new code, by extending objects and definitions, or by modifying the type system, all during program execution rather than compilation.
http://en.wikipedia.org/wiki/Dynamic_programming_language
Here are some common examples
http://en.wikipedia.org/wiki/Category:Dynamic_programming_languages
And to answer your original question:
They're slow, You need to use a basic text editor to write them - no Intellisense or Code prompts, they tend to be a big pain in the ass to write and maintain. BUT the most famous one (javascript) runs on practically every browser in the world - that's a good thing I guess. Lets call it 'broad compatibility'. I think you could probably get a dynamic language interpretor for most operating systems, but you certainly couldn't get a compiler for non dynamic languages for most operating systems.
So I'm currently working on a new programming language. Inspired by ideas from concurrent programming and Haskell, one of the primary goals of the language is management of side effects. More or less, each module will be required to specify which side effects it allows. So, if I were making a game, the graphics module would have no ability to do IO. The input module would have no ability to draw to the screen. The AI module would be required to be totally pure. Scripts and plugins for the game would have access to a very restricted subset of IO for reading configuration files. Et cetera.
However, what constitutes a side effect isn't clear cut. I'm looking for any thoughts or suggestions on the subject that I might want to consider in my language. Here are my current thoughts.
Some side effects are blatant. Whether its printing to the user's console or launching your missiles, anything action that reads or write to a user-owned file or interacts with external hardware is a side effect.
Others are more subtle and these are the ones I'm really interested in. These would be things like getting a random number, getting the system time, sleeping a thread, implementing software transactional memory, or even something very fundamental such as allocating memory.
Unlike other languages built to control side effects (looking at you Haskell), I want to design my language to be pragmatic and practical. The restrictions on side effects should serve two purposes:
To aid in the separations of concerns. (No one module can do everything).
To sandbox each module in the application. (Any module could be used as a plugin)
With that in mind, how should I handle "pseudo"-side effects, like random numbers and sleeping, as I mention above? What else might I have missed? In what ways might I manage memory usage and time as resources?
The problem of how to describe and control effects is currently occupying some of the best scientific minds in programming languages, including people like Greg Morrisett of Harvard University. To my knowledge, the most ambitious pioneering work in this area was done by David Gifford and Pierre Jouvelot in the FX programming language started in 1987. The language definition is online, but you may get more insight into the ideas by reading their 1991 POPL paper.
This is a really interesting question, and it represents one of the stages I've gone through and, frankly, moved beyond.
I remember seminars in which Carl Hewitt, in talking about his Actors formalism, discussed this. He defined it in terms of a method giving a response that was solely a function of its arguments, or that could give different answers at different times.
I say I moved beyond this because it makes the language itself (or the computational model) the main subject, as opposed to the problem(s) it is supposed to solve. It is based on the idea that the language should have a formal underlying model so that its properties are easy to verify. That is fine, but still remains a distant goal, because there is still no language (to my knowledge) in which the correctness of something as simple as bubble sort is easy to prove, let alone more complex systems.
The above is a fine goal, but the direction I went was to look at information systems in terms of information theory. Specifically, assuming a system starts with a corpus of requirements (on paper or in somebody's head), those requirements can be transmitted to a program-writing machine (whether automatic or human) to generate source code for a working implementation. THEN, as changes occur to the requirements, the changes are processed through as delta changes to the implementation source code.
Then the question is: What properties of the source code (and the language it is encoded in) facilitate this process? Clearly it depends on the type of problem being solved, what kinds of information go in and out (and when), how long the information has to be retained, and what kind of processing needs to be done on it. From this one can determine the formal level of the language needed for that problem.
I realized the process of cranking through delta changes of requirements to source code is made easier as the format of the code comes more to resemble the requirements, and there is a nice quantitative way to measure this resemblence, not in terms of superficial resemblence, but in terms of editing actions. The well-known technology that best expresses this is domain specific languages (DSL). So I came to realize that what I look for most in a general-purpose language is the ability to create special-purpose languages.
Depending on the application, such special-purpose languages may or may not need specific formal features like functional notation, side-effect control, paralellism, etc. In fact, there are many ways to make a special-purpose language, from parsing, interpreting, compiling, down to just macros in an existing language, down to simply defining classes, variables, and methods in an existing language. As soon as you declare a variable or subroutine you're created new vocabulary and thus, a new language in which to solve your problem. In fact, in this broad sense, I don't think you can solve any programming problem without being, at some level, a language designer.
So best of luck, and I hope it opens up new vistas for you.
A side effect is having any effect on anything in the world other than returning a value, i.e. mutating something that could be visible in some way outside the function.
A pure function neither depends on or affects any mutable state outside the scope of that invocation of the function, which means that the function's output depends only on constants and its inputs. This implies that if you call a function twice with the same arguments, you are guaranteed to get the same result both times, regardless of how the function is written.
If you have a function that modifies a variable that it has been passed, that modification is a side effect because it's visible output from the function other than the return value. A void function that is not a no-op must have side effects, because it has no other way of affecting the world.
The function could have a private variable only visible to that function that it reads and modifies, and calling it would still have the side effect of changing the way the function behaves in the future. Being pure means having exactly one channel for output of any kind: the return value.
It is possible to generate random numbers purely, but you have to pass around the random seed manually. Most random functions keep a private seed value that is updated each time its called so that you get a different random each time. Here's a Haskell snippet using System.Random:
randomColor :: StdGen -> (Color, Int, StdGen)
randomColor gen1 = (color, intensity, gen2)
where (color, gen2) = random gen1
(intensity, gen3) = randomR (1, 100) gen2
The random functions each return the randomized value and a new generator with a new seed (based on the previous one). To get a new value each time, the chain of new generators (gen1,gen2,gen3) have to be passed along. Implicit generators just use an internal variable to store the gen1.. values in the background.
Doing this manually is a pain, and in Haskell you can use a state monad to make it a lot easier. You'll want to implement something less pure or use a facility like monads, arrows or uniqueness values to abstract it away.
Getting the system time is impure because the time could be different each time you ask.
Sleeping is fuzzier because sleep doesn't affect the result of the function, and you could always delay execution with a busy loop, and that wouldn't affect purity. The thing is that sleeping is done for the sake of something else, which IS a side effect.
Memory allocation in pure languages has to happen implicitly, because explicitly allocating and freeing memory are side effects if you can do any kind of pointer comparisons. Otherwise, creating two new objects with the same parameters would still produce different values because they would have different identities (e.g. not be equal by Java's == operator).
I know I've rambled on a bit, but hopefully that explains what side effects are.
Give a serious look to Clojure, and their use of software transactional memory, agents, and atoms to keep side effects under control.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
It seems that everybody is jumping on the dynamic, non-compiled bandwagon lately. I've mostly only worked in compiled, static typed languages (C, Java, .Net). The experience I have with dynamic languages is stuff like ASP (Vb Script), JavaScript, and PHP. Using these technologies has left a bad taste in my mouth when thinking about dynamic languages. Things that usually would have been caught by the compiler such as misspelled variable names and assigning an value of the wrong type to a variable don't occur until runtime. And even then, you may not notice an error, as it just creates a new variable, and assigns some default value. I've also never seen intellisense work well in a dynamic language, since, well, variables don't have any explicit type.
What I want to know is, what people find so appealing about dynamic languages? What are the main advantages in terms of things that dynamic languages allow you to do that can't be done, or are difficult to do in compiled languages. It seems to me that we decided a long time ago, that things like uncompiled asp pages throwing runtime exceptions was a bad idea. Why is there is a resurgence of this type of code? And why does it seem to me at least, that Ruby on Rails doesn't really look like anything you couldn't have done with ASP 10 years ago?
I think the reason is that people are used to statically typed languages that have very limited and inexpressive type systems. These are languages like Java, C++, Pascal, etc. Instead of going in the direction of more expressive type systems and better type inference, (as in Haskell, for example, and even SQL to some extent), some people like to just keep all the "type" information in their head (and in their tests) and do away with static typechecking altogether.
What this buys you in the end is unclear. There are many misconceived notions about typechecking, the ones I most commonly come across are these two.
Fallacy: Dynamic languages are less verbose. The misconception is that type information equals type annotation. This is totally untrue. We all know that type annotation is annoying. The machine should be able to figure that stuff out. And in fact, it does in modern compilers. Here is a statically typed QuickSort in two lines of Haskell (from haskell.org):
qsort [] = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
And here is a dynamically typed QuickSort in LISP (from swisspig.net):
(defun quicksort (lis) (if (null lis) nil
(let* ((x (car lis)) (r (cdr lis)) (fn (lambda (a) (< a x))))
(append (quicksort (remove-if-not fn r)) (list x)
(quicksort (remove-if fn r))))))
The Haskell example falsifies the hypothesis statically typed, therefore verbose. The LISP example falsifies the hypothesis verbose, therefore statically typed. There is no implication in either direction between typing and verbosity. You can safely put that out of your mind.
Fallacy: Statically typed languages have to be compiled, not interpreted. Again, not true. Many statically typed languages have interpreters. There's the Scala interpreter, The GHCi and Hugs interpreters for Haskell, and of course SQL has been both statically typed and interpreted for longer than I've been alive.
You know, maybe the dynamic crowd just wants freedom to not have to think as carefully about what they're doing. The software might not be correct or robust, but maybe it doesn't have to be.
Personally, I think that those who would give up type safety to purchase a little temporary liberty, deserve neither liberty nor type safety.
Don't forget that you need to write 10x code coverage in unit tests to replace what your compiler does :D
I've been there, done that with dynamic languages, and I see absolutely no advantage.
When reading other people's responses, it seems that there are more or less three arguments for dynamic languages:
1) The code is less verbose.
I don't find this valid. Some dynamic languages are less verbose than some static ones. But F# is statically typed, but the static typing there does not add much, if any, code. It is implicitly typed, though, but that is a different thing.
2) "My favorite dynamic language X has my favorite functional feature Y, so therefore dynamic is better". Don't mix up functional and dynamic (I can't understand why this has to be said).
3) In dynamic languages you can see your results immediately. News: You can do that with C# in Visual Studio (since 2005) too. Just set a breakpoint, run the program in the debugger and modify the program while debbuging. I do this all the time and it works perfectly.
Myself, I'm a strong advocate for static typing, for one primary reason: maintainability. I have a system with a couple 10k lines of JavaScript in it, and any refactoring I want to do will take like half a day since the (non-existent) compiler will not tell me what that variable renaming messed up. And that's code I wrote myself, IMO well structured, too. I wouldn't want the task of being put in charge of an equivalent dynamic system that someone else wrote.
I guess I will be massively downvoted for this, but I'll take the chance.
VBScript sucks, unless you're comparing it to another flavor of VB.
PHP is ok, so long as you keep in mind that it's an overgrown templating language.
Modern Javascript is great. Really. Tons of fun. Just stay away from any scripts tagged "DHTML".
I've never used a language that didn't allow runtime errors. IMHO, that's largely a red-herring: compilers don't catch all typos, nor do they validate intent. Explicit typing is great when you need explicit types, but most of the time, you don't. Search for the questions here on generics or the one about whether or not using unsigned types was a good choice for index variables - much of the time, this stuff just gets in the way, and gives folks knobs to twiddle when they have time on their hands.
But, i haven't really answered your question. Why are dynamic languages appealing? Because after a while, writing code gets dull and you just want to implement the algorithm. You've already sat and worked it all out in pen, diagrammed potential problem scenarios and proved them solvable, and the only thing left to do is code up the twenty lines of implementation... and two hundred lines of boilerplate to make it compile. Then you realize that the type system you work with doesn't reflect what you're actually doing, but someone else's ultra-abstract idea of what you might be doing, and you've long ago abandoned programming for a life of knicknack tweaking so obsessive-compulsive that it would shame even fictional detective Adrian Monk.
That's when you go get plastered start looking seriously at dynamic languages.
I am a full-time .Net programmer fully entrenched in the throes of statically-typed C#. However, I love modern JavaScript.
Generally speaking, I think dynamic languages allow you to express your intent more succinctly than statically typed languages as you spend less time and space defining what the building blocks are of what you are trying to express when in many cases they are self evident.
I think there are multiple classes of dynamic languages, too. I have no desire to go back to writing classic ASP pages in VBScript. To be useful, I think a dynamic language needs to support some sort of collection, list or associative construct at its core so that objects (or what pass for objects) can be expressed and allow you to build more complex constructs. (Maybe we should all just code in LISP ... it's a joke ...)
I think in .Net circles, dynamic languages get a bad rap because they are associated with VBScript and/or JavaScript. VBScript is just a recalled as a nightmare for many of the reasons Kibbee stated -- anybody remember enforcing type in VBScript using CLng to make sure you got enough bits for a 32-bit integer. Also, I think JavaScript is still viewed as the browser language for drop-down menus that is written a different way for all browsers. In that case, the issue is not language, but the various browser object models. What's interesting is that the more C# matures, the more dynamic it starts to look. I love Lambda expressions, anonymous objects and type inference. It feels more like JavaScript everyday.
Here is a statically typed QuickSort in two lines of Haskell (from haskell.org):
qsort [] = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
And here is a dynamically typed QuickSort in LISP (from swisspig.net):
(defun quicksort (lis) (if (null lis) nil
(let* ((x (car lis)) (r (cdr lis)) (fn (lambda (a) (< a x))))
(append (quicksort (remove-if-not fn r)) (list x)
(quicksort (remove-if fn r))))))
I think you're biasing things with your choice of language here. Lisp is notoriously paren-heavy. A closer equivelent to Haskell would be Python.
if len(L) <= 1: return L
return qsort([lt for lt in L[1:] if lt < L[0]]) + [L[0]] + qsort([ge for ge in L[1:] if ge >= L[0]])
Python code from here
For me, the advantage of dynamic languages is how much more readable the code becomes due to less code and functional techniques like Ruby's block and Python's list comprehension.
But then I kind of miss the compile time checking (typo does happen) and IDE auto complete. Overall, the lesser amount of code and readability pays off for me.
Another advantage is the usually interpreted/non compiled nature of the language. Change some code and see the result immediately. It's really a time saver during development.
Last but not least, I like the fact that you can fire up a console and try out something you're not sure of, like a class or method that you've never used before and see how it behaves. There are many uses for the console and I'll just leave that for you to figure out.
Your arguments against dynamic languages are perfectly valid. However, consider the following:
Dynamic languages don't need to be compiled: just run them. You can even reload the files at run time without restarting the application in most cases.
Dynamic languages are generally less verbose and more readable: have you ever looked at a given algorithm or program implemented in a static language, then compared it to the Ruby or Python equivalent? In general, you're looking at a reduction in lines of code by a factor of 3. A lot of scaffolding code is unnecessary in dynamic languages, and that means the end result is more readable and more focused on the actual problem at hand.
Don't worry about typing issues: the general approach when programming in dynamic languages is not to worry about typing: most of the time, the right kind of argument will be passed to your methods. And once in a while, someone may use a different kind of argument that just happens to work as well. When things go wrong, your program may be stopped, but this rarely happens if you've done a few tests.
I too found it a bit scary to step away from the safe world of static typing at first, but for me the advantages by far outweigh the disadvantages, and I've never looked back.
I believe that the "new found love" for dynamically-typed languages have less to do with whether statically-typed languages are better or worst - in the absolute sense - than the rise in popularity of certain dynamic languages. Ruby on Rails was obviously a big phenomenon that cause the resurgence of dynamic languages. The thing that made rails so popular and created so many converts from the static camp was mainly: very terse and DRY code and configuration. This is especially true when compared to Java web frameworks which required mountains of XML configuration. Many Java programmers - smart ones too - converted over, and some even evangelized ruby and other dynamic languages. For me, three distinct features allow dynamic languages like Ruby or Python to be more terse:
Minimalist syntax - the big one is that type annotations are not required, but also the the language designer designed the language from the start to be terse
inline function syntax(or the lambda) - the ability to write inline functions and pass them around as variables makes many kinds of code more brief. In particular this is true for list/array operations. The roots of this ideas was obviously - LISP.
Metaprogramming - metaprogramming is a big part of what makes rails tick. It gave rise to a new way of refactoring code that allowed the client code of your library to be much more succinct. This also originate from LISP.
All three of these features are not exclusive to dynamic languages, but they certainly are not present in the popular static languages of today: Java and C#. You might argue C# has #2 in delegates, but I would argue that it's not widely used at all - such as with list operations.
As for more advanced static languages... Haskell is a wonderful language, it has #1 and #2, and although it doesn't have #3, it's type system is so flexible that you will probably not find the lack of meta to be limiting. I believe you can do metaprogramming in OCaml at compile time with a language extension. Scala is a very recent addition and is very promising. F# for the .NET camp. But, users of these languages are in the minority, and so they didn't really contribute to this change in the programming languages landscape. In fact, I very much believe the popularity of Ruby affected the popularity of languages like Haskell, OCaml, Scala, and F# in a positive way, in addition to the other dynamic languages.
Personally, I think it's just that most of the "dynamic" languages you have used just happen to be poor examples of languages in general.
I am way more productive in Python than in C or Java, and not just because you have to do the edit-compile-link-run dance. I'm getting more productive in Objective-C, but that's probably more due to the framework.
Needless to say, I am more productive in any of these languages than PHP. Hell, I'd rather code in Scheme or Prolog than PHP. (But lately I've actually been doing more Prolog than anything else, so take that with a grain of salt!)
My appreciation for dynamic languages is very much tied to how functional they are. Python's list comprehensions, Ruby's closures, and JavaScript's prototyped objects are all very appealing facets of those languages. All also feature first-class functions--something I can't see living without ever again.
I wouldn't categorize PHP and VB (script) in the same way. To me, those are mostly imperative languages with all of the dynamic-typing drawbacks that you suggest.
Sure, you don't get the same level of compile-time checks (since there ain't a compile time), but I would expect static syntax-checking tools to evolve over time to at least partially address that issue.
One of the advantages pointed out for dynamic languages is to just be able to change the code and continue running. No need to recompile. In VS.Net 2008, when debugging, you can actually change the code, and continue running, without a recompile. With advances in compilers and IDEs, is it possible that this and other advantages of using dynamic languages will go away.
Ah, I didn't see this topic when I posted similar question
Aside from good features the rest of the folks mentioned here about dynamic languages, I think everybody forget one, the most basic thing: metaprogramming.
Programming the program.
Its pretty hard to do in compiled languages, generally, take for example .Net. To make it work you have to make all kind of mambo jumbo and it usualy ends with code that runs around 100 times slower.
Most dynamic languages have a way to do metaprogramming and that is something that keeps me there - ability to create any kind of code in memory and perfectly integrate it into my applicaiton.
For instance to create calculator in Lua, all I have to do is:
print( loadstring( "return " .. io.read() )() )
Now, try to do that in .Net.
My main reason for liking dynamic (typed, since that seems to be the focus of the thread) languages is that the ones I've used (in a work environment) are far superior to the non-dynamic languages I've used. C, C++, Java, etc... they're all horrible languages for getting actual work done in. I'd love to see an implicitly typed language that's as natural to program in as many of the dynamically typed ones.
That being said, there's certain constructs that are just amazing in dynamically typed languages. For example, in Tcl
lindex $mylist end-2
The fact that you pass in "end-2" to indicate the index you want is incredibly concise and obvious to the reader. I have yet to see a statically typed language that accomplishes such.
I think this kind of argument is a bit stupid: "Things that usually would have been caught by the compiler such as misspelled variable names and assigning an value of the wrong type to a variable don't occur until runtime" yes thats right as a PHP developer I don't see things like mistyped variables until runtime, BUT runtime is step 2 for me, in C++ (Which is the only compiled language I have any experience) it is step 3, after linking, and compiling.
Not to mention that it takes all of a few seconds after I hit save to when my code is ready to run, unlike in compiled languages where it can take literally hours. I'm sorry if this sounds a bit angry, but I'm kind of tired of people treating me as a second rate programmer because I don't have to compile my code.
The argument is more complex than this (read Yegge's article "Is Weak Typing Strong Enough" for an interesting overview).
Dynamic languages don't necessarily lack error checking either - C#'s type inference is possibly one example. In the same way, C and C++ have terrible compile checks and they are statically typed.
The main advantages of dynamic languages are a) capability (which doesn't necessarily have to be used all the time) and b) Boyd's Law of Iteration.
The latter reason is massive.
Although I'm not a big fan of Ruby yet, I find dynamic languages to be really wonderful and powerful tools.
The idea that there is no type checking and variable declaration is not too big an issue really. Admittedly, you can't catch these errors until run time, but for experienced developers this is not really an issue, and when you do make mistakes, they're usually easily fixed.
It also forces novices to read what they're writing more carefully. I know learning PHP taught me to be more attentive to what I was actually typing, which has improved my programming even in compiled languages.
Good IDEs will give enough intellisense for you to know whether a variable has been "declared" and they also try to do some type inference for you so that you can tell what a variable is.
The power of what can be done with dynamic languages is really what makes them so much fun to work with in my opinion. Sure, you could do the same things in a compiled language, but it would take more code. Languages like Python and PHP let you develop in less time and get a functional codebase faster most of the time.
And for the record, I'm a full-time .NET developer, and I love compiled languages. I only use dynamic languages in my free time to learn more about them and better myself as a developer..
I think that we need the different types of languages depending on what we are trying to achieve, or solve with them. If we want an application that creates, retrieves, updates and deletes records from the database over the internet, we are better off doing it with one line of ROR code (using the scaffold) than writing it from scratch in a statically typed language. Using dynamic languages frees up the minds from wondering about
which variable has which type
how to grow a string dynamically as needs be
how to write code so that if i change type of one variable, i dont have to rewrite all the function that interact with it
to problems that are closer to business needs like
data is saving/updating etc in the database, how do i use it to drive traffic to my site
Anyway, one advantage of loosely typed languages is that we dont really care what type it is, if it behaves like what it is supposed to. That is the reason we have duck-typing in dynamically typed languages. it is a great feature and i can use the same variable names to store different types of data as the need arises. also, statically typed languages force you to think like a machine (how does the compiler interact with your code, etc etc) whereas dynamically typed languages, especially ruby/ror, force the machine to think like a human.
These are some of the arguments i use to justify my job and experience in dynamic languages!
I think both styles have their strengths. This either/or thinking is kind of crippling to our community in my opinion. I've worked in architectures that were statically-typed from top to bottom and it was fine. My favorite architecture is for dynamically-typed at the UI level and statically-typed at the functional level. This also encourages a language barrier that enforces the separation of UI and function.
To be a cynic, it may be simply that dynamic languages allow the developer to be lazier and to get things done knowing less about the fundamentals of computing. Whether this is a good or bad thing is up to the reader :)
FWIW, Compiling on most applications shouldn't take hours. I have worked with applications that are between 200-500k lines that take minutes to compile. Certainly not hours.
I prefer compiled languages myself. I feel as though the debugging tools (in my experience, which might not be true for everything) are better and the IDE tools are better.
I like being able to attach my Visual Studio to a running process. Can other IDEs do that? Maybe, but I don't know about them. I have been doing some PHP development work lately and to be honest it isn't all that bad. However, I much prefer C# and the VS IDE. I feel like I work faster and debug problems faster.
So maybe it is more a toolset thing for me than the dynamic/static language issue?
One last comment... if you are developing with a local server saving is faster than compiling, but often times I don't have access to everything on my local machine. Databases and fileshares live elsewhere. It is easier to FTP to the web server and then run my PHP code only to find the error and have to fix and re-ftp.
Productivity in a certain context. But that is just one environment I know, compared to some others I know or have seen used.
Smalltalk on Squeak/Pharo with Seaside is a much more effective and efficient web platform than ASP.Net(/MVC), RoR or Wicket, for complex applications. Until you need to interface with something that has libraries in one of those but not smalltalk.
Misspelled variable names are red in the IDE, IntelliSense works but is not as specific. Run-time errors on webpages are not an issue but a feature, one click to bring up the debugger, one click to my IDE, fix the bug in the debugger, save, continue. For simple bugs, the round-trip time for this cycle is less than 20 seconds.
Dynamic Languages Strike Back
http://www.youtube.com/watch?v=tz-Bb-D6teE
A talk discussing Dynamic Languages, what some of the positives are, and how many of the negatives aren't really true.
Because I consider stupid having to declare the type of the box.
The type stays with the entity, not with the container. Static typing had a sense when the type of the box had a direct consequence on how the bits in memory were interpreted.
If you take a look at the design patterns in the GoF, you will realize that a good part of them are there just to fight with the static nature of the language, and they have no reason whatsoever to exist in a dynamic language.
Also, I'm tired of having to write stuff like MyFancyObjectInterface f = new MyFancyObject(). DRY principle anyone ?
Put yourself in the place of a brand new programmer selecting a language to start out with, who doesn't care about dynamic versus staic versus lambdas versus this versus that etc.; which language would YOU choose?
C#
using System;
class MyProgram
{
public static void Main(string[] args)
{
foreach (string s in args)
{
Console.WriteLine(s);
}
}
}
Lua:
function printStuff(args)
for key,value in pairs(args) do
print value .. " "
end
end
strings = {
"hello",
"world",
"from lua"
}
printStuff(strings)
This all comes down to partially what's appropriate for the particular goals and what's a common personal preference. (E.G. Is this going to be a huge code base maintained by more people than can conduct a reasonable meeting together? You want type checking.)
The personal part is about trading off some checks and other steps for development and testing speed (while likely giving up some cpu performance). There's some people for which this is liberating and a performance boost, and there's some for which this is quite the opposite, and yes it does sort of depend on the particular flavor of your language too. I mean no one here is saying Java rocks for speedy, terse development, or that PHP is a solid language where you'll rarely make a hard to spot typo.
I have love for both static and dynamic languages. Every project that I've been involved in since about 2002 has been a C/C++ application with an embedded Python interpret. This gives me the best of both worlds:
The components and frameworks that make up the application are, for a given release of an application, immutable. They must also be very stable, and hence, well tested. A Statically typed language is the right choice for building these parts.
The wiring up of components, loading of component DLLs, artwork, most of the GUI, etc... can vary greatly (say, to customise the application for a client) with no need to change any framework or components code. A dynamic language is perfect for this.
I find that the mix of a statically typed language to build the system and a dynamically type language to configure it gives me flexibility, stability and productivity.
To answer the question of "What's with the love of dynamic languages?" For me it's the ability to completely re-wire a system at runtime in any way imaginable. I see the scripting language as "running the show", therefore the executing application may do anything you desire.
I don't have much experience with dynamic languages in general, but the one dynamic language I do know, JavaScript(aka ECMAScript), I absolutely love.
Well, wait, what's the discussion here? Dynamic compilation? Or dynamic typing? JavaScript covers both bases so I guess I'll talk about both:
Dynamic compilation:
To begin, dynamic languages are compiled, the compilation is simply put off until later. And Java and .NET really are compiled twice. Once to their respective intermediate languages, and again, dynamically, to machine code.
But when compilation is put off you can see results faster. That's one advantage. I do enjoy simply saving the file and seeing my program in action fairly quick.
Another advantage is that you can write and compile code at runtime. Whether this is possible in statically compiled code, I don't know. I imagine it must be, since whatever compiles JavaScript is ultimately machine code and statically compiled. But in a dynamic language this is a trivial thing to do. Code can write and run itself. (And I'm pretty sure .NET can do this, but the CIL that .NET compiles to is dynamically compiled on the fly anyways, and it's not so trivial in C#)
Dynamic typing:
I think dynamic typing is more expressive than static typing. Note that I'm using the term expressive informally to say that dynamic typing can say more with less. Here's some JavaScript code:
var Person = {};
Do you know what Person is now? It's a generic dictionary. I can do this:
Person["First_Name"] = "John";
Person["Last_Name"] = "Smith";
But it's also an object. I could refer to any of those "keys" like this:
Person.First_Name
And add any methods I deem necessary:
Person.changeFirstName = function(newName) {
this.First_Name = newName;
};
Sure, there might be problems if newName isn't a string. It won't be caught right away, if ever, but you can check yourself. It's a matter of trading expressive power and flexibility for safety. I don't mind adding code to check types, etc, myself, and I've yet to run into a type bug that gave me much grief (and I know that isn't saying much. It could be a matter of time :) ). I very much enjoy, however, that ability to adapt on the fly.
Nice blog post on the same topic: Python Makes Me Nervous
Method signatures are virtually
useless in Python. In Java, static
typing makes the method signature into
a recipe: it's all the shit you need
to make this method work. Not so in
Python. Here, a method signature will
only tell you one thing: how many
arguments you need to make it work.
Sometimes, it won't even do that, if
you start fucking around with
**kwargs.
Because it's fun fun fun. It's fun to not worry about memory allocation, for one. It's fun not waiting for compilation. etc etc etc
Weakly typed languages allow flexibility in how you manage your data.
I used VHDL last spring for several classes, and I like their method of representing bits/bytes, and how the compiler catches errors if you try to assign a 6-bit bus to a 9-bit bus. I tried to recreate it in C++, and I'm having a fair struggle to neatly get the typing to work smoothly with existing types. Steve Yegge does a very nice job of describing the issues involved with strong type systems, I think.
Regarding verbosity: I find Java and C# to be quite verbose in the large(let's not cherry-pick small algorithms to "prove" a point). And, yes, I've written in both. C++ struggles in the same area as well; VHDL succumbs here.
Parsimony appears to be a virtue of the dynamic languages in general(I present Perl and F# as examples).