I am looking to create a DSL and I'm looking for a language where you can define your own bracket-style operators for things like the floor and ceiling functions. I'd rather not go the route of defining my own Antlr parser for a custom syntax.
As a far as I know the only languages I know of that allow you to define custom operators are all binary infix operators.
tl;dr: Which languages allow for defined paired symbol (like opening bracket/closed bracket) operators?
Also, I don't see how this question can be "too broad" if no one has named a single language that has this and the criteria are very specific and definitely in the programming domain.
Since Fortress is dead, the only languages I know of where something like this would be imaginable are those of FORTH heritage.
In all others that I know of, braces, brackets and parentheses are already heavily used, and can't be overloaded further.
I suggest giving up the quest for such stuff and get comfortable writing
floor x
ceiling y
or however function application is expressed in the language of your choice.
However, in the article you cite, it is said: Unicode contains codepoints for these symbols at U+2308–U+230B: ⌈x⌉, ⌊x⌋.
Thus you can at least define this a s operator in a Haskell like language and use like:
infix 5 ⌈⌉
(foo ⌈⌉)
The best I could come up with is like the following:
--- how to make brackets
module Test where
import Prelude.Math
infix 7 `«`
infix 6 `»`
_ « x = Math.ceil x
(») = const
x = () «2.345» ()
main _ = println x
The output was: 3.0
(This is not actually Haskell, but Frege, a Haskell-like JVM language.)
Note that I used «» instead of ⌈⌉, because I somehow have no font in my IDE that would correctly show the bracket symbols. (This is another reason not to do such.)
The way it works is that with the infix directives, we get this parsed as
(») ((«) () 2.345) ()
(One could insert any expression in place of the ())
Maybe if you ask in the Haskell group, someone finds a better solution.
Related
Basically, I need to define infix operator for function composition, flipped g . f manner.
(#) :: (a -> b) -> (b -> c) -> a -> c
(#) = flip (.)
infixl 9 #
Here, temporarily, I just have chosen the symbol: #, but I'm not certain this choice is not problematic in terms of collision to other definitions.
The weird thing is somehow I could not find a Pre-defined infix operator list in Haskell.
I want the composition operator as concise as possible, hopefully, a single character like ., and if possible, I want to make it &. Is it OK?
Is there any guidance or best pracice tutorial? Please advise.
Related Q&A:
What characters are permitted for Haskell operators?
You don't see a list of Haskell built-in operators for the same reason you don't see a list of all built-in functions. They're everywhere. Some are in Prelude, some are in Control.Monad, etc., etc. Operators aren't special in Haskell; they're ordinary functions with neat syntax. And Haskellers are generally pretty operator-happy in general. Spend any time inside your favorite lens library and you'll find plenty of amusing-looking operators.
In terms of (#), it might be best to avoid. I don't know of any built-in operators called that, but # can be a bit special when it comes to parsing. Specifically, a compiler extension enables # at the end of ordinary identifiers, and GHC defines a lot of built-in (primitive) types following this practice. For instance, Int is the usual (boxed) integer type, whereas Int# is a primitive integer. Most people don't need to interface with this directly, but it is a use that # has in Haskell that would be confused slightly by the addition of your operator.
Your (#) operator is called (>>>) in Haskell, and it works on all Category instances, including functions. Its companion is (<<<), the generalization of (.) to all Category instances. If three characters is too long for you, I've seen it called (|>) in some other languages, but Haskell already uses that operator for something else. If you're not using Data.Sequence, you could use that operator. But personally, I'd just go with (>>>). Any Haskeller will recognize it pretty quickly.
One big reason prefix operators are nice is that they can avoid the need for parentheses so that + - 10 1 2 unambiguously means (10 - 1) + 2. The infix expression becomes ambiguous if parens are dropped, which you can do away with by having certain precedence rules but that's messy, blah, blah, blah.
I'd like to make Haskell use prefix operations but the only way I've seen to do that sort of trades away the gains made by getting rid of parentheses.
Prelude> (-) 10 1
uses two parens.
It gets even worse when you try to compose functions because
Prelude> (+) (-) 10 1 2
yields an error, I believe because it's trying to feed the minus operation into the plus operations rather than first evaluating the minus and then feeding--so now you need even more parens!
Is there a way to make Haskell intelligently evaluate prefix notation? I think if I made functions like
Prelude> let p x y = x+y
Prelude> let m x y = x-y
I would recover the initial gains on fewer parens but function composition would still be a problem. If there's a clever way to join this with $ notation to make it behave at least close to how I want, I'm not seeing it. If there's a totally different strategy available I'd appreciate hearing it.
I tried reproducing what the accepted answer did here:
Haskell: get rid of parentheses in liftM2
but in both a Prelude console and a Haskell script, the import command didn't work. And besides, this is more advanced Haskell than I'm able to understand, so I was hoping there might be some other simpler solution anyway before I do the heavy lifting to investigate whatever this is doing.
It gets even worse when you try to compose functions because
Prelude> (+) (-) 10 1 2
yields an error, I believe because it's
trying to feed the minus operation into the plus operations rather
than first evaluating the minus and then feeding--so now you need even
more parens!
Here you raise exactly the key issue that's a blocker for getting what you want in Haskell.
The prefix notation you're talking about is unambiguous for basic arithmetic operations (more generally, for any set of functions of statically known arity). But you have to know that + and - each accept 2 arguments for + - 10 1 2 to be unambiguously resolved as +(-(10, 1), 2) (where I've explicit argument lists to denote every call).
But ignoring the specific meaning of + and -, the first function taking the second function as an argument is a perfectly reasonable interpretation! For Haskell rather than arithmetic we need to support higher order functions like map. You would want not not x has to turn into not(not(x)), but map not x has to turn into map(not, x).
And what if I had f g x? How is that supposed to work? Do I need to know what f and g are bound to so that I know whether it's a case like not not x or a case like map not x, just to know how to parse the call structure? Even assuming I have all the code available to inspect, how am I supposed to figure out what things are bound to if I can't know what the call structure of any expression is?
You'd end up needing to invent disambiguation syntax like map (not) x, wrapping not in parentheses to disable it's ability to act like an arity-1 function (much like Haskell's actual syntax lets you wrap operators in parentheses to disable their ability to act like an infix operator). Or use the fact that all Haskell functions are arity-1, but then you have to write (map not) x and your arithmetic example has to look like (+ ((- 10) 1)) 2. Back to the parentheses!
The truth is that the prefix notation you're proposing isn't unambiguous. Haskell's normal function syntax (without operators) is; the rule is you always interpret a sequence of terms like foo bar baz qux etc as ((((foo) bar) baz) qux) etc (where each of foo, bar, etc can be an identifier or a sub-term in parentheses). You use parentheses not to disambiguate that rule, but to group terms to impose a different call structure than that hard rule would give you.
Infix operators do complicate that rule, and they are ambiguous without knowing something about the operators involved (their precedence and associativity, which unlike arity is associated with the name not the actual value referred to). Those complications were added to help make the code easier to understand; particularly for the arithmetic conventions most programmers are already familiar with (that + is lower precedence than *, etc).
If you don't like the additional burden of having to memorise the precedence and associativity of operators (not an unreasonable position), you are free to use a notation that is unambiguous without needing precedence rules, but it has to be Haskell's prefix notation, not Polish prefix notation. And whatever syntactic convention you're using, in any language, you'll always have to use something like parentheses to indicate grouping where the call structure you need is different from what the standard convention would indicate. So:
(+) ((-) 10 1) 2
Or:
plus (minus 10 1) 2
if you define non-operator function names.
Does any programming language use =/= for not-equal?
Are there any lexical difficulties for scanners to recognize such an operator? Or was it the case historically?
[Note: this is NOT a homework question. I'm just curious.]
Erlang uses it to denote exactly not equal to.
Also generally there shouldn't be any difficulties for scanners to recognize such a token (proof by example: Erlang ;-)
In Erlang =/=, as noted by Bytecode Ninja means "exactly not equal to". The notation of Erlang is strongly influenced by Prolog so it should come as no surprise that Prolog uses that operator too. There are several languages which make defining operators trivial. Haskell would be one such. =/= isn't defined in the Haskell standard, but defining it would be trivial:
(=/=) x y = ....
This could then be used in function call-like syntax:
(=/=) 5 6
Or as an inline operator:
5 =/= 6
The semantics would depend on the implementation, of course.
I think that Common Lisp weenies users could write some kind of reader macro that used that sequence too, but I'm not positive.
Not one of the mainstream ones. One could easily create such a language, however.
(As others have mentioned, Erlang and a few other languages do have it already)
Nope. Unless you have a really weird language, there's nothing special about this operator in terms of lexical analysis.
By the way, Java has:
> (greater than)
>> (signed right shift)
>>= (signed right shift compound assignment)
>>> (unsigned right shift)
>>>= (unsigned right shift compound assignment)
> (closing generic type parameter, nestable)
>>, >>>, >>>>, ...
and they all work just fine.
Related question
What trick does Java use to avoid spaces in >> ?
Yes, Erlang uses this symbol as one of its representations for "not equal".
Erlang is a language with strong support for concurrency, originally designed within Ericsson and used for writing software for telephone exchanges, but now gaining significant popularity outside.
You may want to check out Fortress Introductory Slides. Fortess uses =/= for checking inequality. I suppose you look for readability in languages. If that's the case then I can tell that Fortess code can be rendered into very neat looking TeX.
Project Fortress Old Site (moved to java.net)
None that I know of
Not much harder than any other operator like +=, ??, etc.
However, it's very cumbersome to type such an operator. Even != will be simpler.
A google code search for =/= doesn't turn up anything obvious, so I would say nothing mainstream.
There wouldn't be any issues with any operator you want, the computer would simply look for =/= instead of != or <> or whatever your language uses.
There are some really weird languages out there like BrainFuck language (link)
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++. <<+++++++++++++++.>.+++.——.——–.>+.>.
That is code for "Hello World".
Could someone provide a link to a good coding standard for Haskell? I've found this and this, but they are far from comprehensive. Not to mention that the HaskellWiki one includes such "gems" as "use classes with care" and "defining symbolic infix identifiers should be left to library writers only."
Really hard question. I hope your answers turn up something good. Meanwhile, here is a catalog of mistakes or other annoying things that I have found in beginners' code. There is some overlap with the Cal Tech style page that Kornel Kisielewicz points to. Some of my advice is every bit as vague and useless as the HaskellWiki "gems", but I hope at least it is better advice :-)
Format your code so it fits in 80 columns. (Advanced users may prefer 87 or 88; beyond that is pushing it.)
Don't forget that let bindings and where clauses create a mutually recursive nest of definitions, not a sequence of definitions.
Take advantage of where clauses, especially their ability to see function parameters that are already in scope (nice vague advice). If you are really grokking Haskell, your code should have a lot more where-bindings than let-bindings. Too many let-bindings is a sign of an unreconstructed ML programmer or Lisp programmer.
Avoid redundant parentheses. Some places where redundant parentheses are particularly offensive are
Around the condition in an if expression (brands you as an unreconstructed C programmer)
Around a function application which is itself the argument of an infix operator (Function application binds tighter than any infix operator. This fact should be burned into every Haskeller's brain, in much the same way that us dinosaurs had APL's right-to-left scan rule burned in.)
Put spaces around infix operators. Put a space following each comma in a tuple literal.
Prefer a space between a function and its argument, even if the argument is parenthesized.
Use the $ operator judiciously to cut down on parentheses. Be aware of the close relationship between $ and infix .:
f $ g $ h x == (f . g . h) x == f . g . h $ x
Don't overlook the built-in Maybe and Either types.
Never write if <expression> then True else False; the correct phrase is simply <expression>.
Don't use head or tail when you could use pattern matching.
Don't overlook function composition with the infix dot operator.
Use line breaks carefully. Line breaks can increase readability, but there is a tradeoff: Your editor may display only 40–50 lines at once. If you need to read and understand a large function all at once, you mustn't overuse line breaks.
Almost always prefer the -- comments which run to end of line over the {- ... -} comments. The braced comments may be appropriate for large headers—that's it.
Give each top-level function an explicit type signature.
When possible, align -- lines, = signs, and even parentheses and commas that occur in adjacent lines.
Influenced as I am by GHC central, I have a very mild preference to use camelCase for exported identifiers and short_name with underscores for local where-bound or let-bound variables.
Some good rules of thumbs imho:
Consult with HLint to make sure you don't have redundant braces and that your code isn't pointlessly point-full.
Avoid recreating existing library functions. Hoogle can help you find them.
Often times existing library functions are more general than what one was going to make. For example if you want Maybe (Maybe a) -> Maybe a, then join does that among other things.
Argument naming and documentation is important sometimes.
For a function like replicate :: Int -> a -> [a], it's pretty obvious what each of the arguments does, from their types alone.
For a function that takes several arguments of the same type, like isPrefixOf :: (Eq a) => [a] -> [a] -> Bool, naming/documentation of arguments is more important.
If one function exists only to serve another function, and isn't otherwise useful, and/or it's hard to think of a good name for it, then it probably should exist in it's caller's where clause instead of in the module's scope.
DRY
Use Template-Haskell when appropriate.
Bundles of functions like zip3, zipWith3, zip4, zipWith4, etc are very meh. Use Applicative style with ZipLists instead. You probably never really need functions like those.
Derive instances automatically. The derive package can help you derive instances for type-classes such as Functor (there is only one correct way to make a type an instance of Functor).
Code that is more general has several benefits:
It's more useful and reusable.
It is less prone to bugs because there are more constraints.
For example if you want to program concat :: [[a]] -> [a], and notice how it can be more general as join :: Monad m => m (m a) -> m a. There is less room for error when programming join because when programming concat you can reverse the lists by mistake and in join there are very few things you can do.
When using the same stack of monad transformers in many places in your code, make a type synonym for it. This will make the types shorter, more concise, and easier to modify in bulk.
Beware of "lazy IO". For example readFile doesn't really read the file's contents at the moment the file is read.
Avoid indenting so much that I can't find the code.
If your type is logically an instance of a type-class, make it an instance.
The instance can replace other interface functions you may have considered with familiar ones.
Note: If there is more than one logical instance, create newtype-wrappers for the instances.
Make the different instances consistent. It would have been very confusing/bad if the list Applicative behaved like ZipList.
I like to try to organize functions
as point-free style compositions as
much as possible by doing things
like:
func = boo . boppity . bippity . snd
where boo = ...
boppity = ...
bippity = ...
I like using ($) only to avoid nested parens or long parenthesized expressions
... I thought I had a few more in me, oh well
I'd suggest taking a look at this style checker.
I found good markdown file covering almost every aspect of haskell code style. It can be used as cheat sheet. You can find it here: link
I've read the Wikipedia article on concatenative languages, and I am now more confused than I was when I started. :-)
What is a concatenative language in stupid people terms?
In normal programming languages, you have variables which can be defined freely and you call methods using these variables as arguments. These are simple to understand but somewhat limited. Often, it is hard to reuse an existing method because you simply can't map the existing variables into the parameters the method needs or the method A calls another method B and A would be perfect for you if you could only replace the call to B with a call to C.
Concatenative language use a fixed data structure to save values (usually a stack or a list). There are no variables. This means that many methods and functions have the same "API": They work on something which someone else left on the stack. Plus code itself is thought to be "data", i.e. it is common to write code which can modify itself or which accepts other code as a "parameter" (i.e. as an element on the stack).
These attributes make this languages perfect for chaining existing code to create something new. Reuse is built in. You can write a function which accepts a list and a piece of code and calls the code for each item in the list. This will now work on any kind of data as long it's behaves like a list: results from a database, a row of pixels from an image, characters in a string, etc.
The biggest problem is that you have no hint what's going on. There are only a couple of data types (list, string, number), so everything gets mapped to that. When you get a piece of data, you usually don't care what it is or where it comes from. But that makes it hard to follow data through the code to see what is happening to it.
I believe it takes a certain set of mind to use the languages successfully. They are not for everyone.
[EDIT] Forth has some penetration but not that much. You can find PostScript in any modern laser printer. So they are niche languages.
From a functional level, they are at par with LISP, C-like languages and SQL: All of them are Turing Complete, so you can compute anything. It's just a matter of how much code you have to write. Some things are more simple in LISP, some are more simple in C, some are more simple in query languages. The question which is "better" is futile unless you have a context.
First I'm going to make a rebuttal to Norman Ramsey's assertion that there is no theory.
Theory of Concatenative Languages
A concatenative language is a functional programming language, where the default operation (what happens when two terms are side by side) is function composition instead of function application. It is as simple as that.
So for example in the SKI Combinator Calculus (one of the simplest functional languages) two terms side by side are equivalent to applying the first term to the second term. For example: S K K is equivalent to S(K)(K).
In a concatenative language S K K would be equivalent to S . K . K in Haskell.
So what's the big deal
A pure concatenative language has the interesting property that the order of evaluation of terms does not matter. In a concatenative language (S K) K is the same as S (K K). This does not apply to the SKI Calculus or any other functional programming language based on function application.
One reason this observation is interesting because it reveals opportunities for parallelization in the evaluation of code expressed in terms of function composition instead of application.
Now for the real world
The semantics of stack-based languages which support higher-order functions can be explained using a concatenative calculus. You simply map each term (command/expression/sub-program) to be a function that takes a function as input and returns a function as output. The entire program is effectively a single stack transformation function.
The reality is that things are always distorted in the real world (e.g. FORTH has a global dictionary, PostScript does weird things where the evaluation order matters). Most practical programming languages don't adhere perfectly to a theoretical model.
Final Words
I don't think a typical programmer or 8 year old should ever worry about what a concatenative language is. I also don't find it particularly useful to pigeon-hole programming languages as being type X or type Y.
After reading http://concatenative.org/wiki/view/Concatenative%20language and drawing on what little I remember of fiddling around with Forth as a teenager, I believe that the key thing about concatenative programming has to do with:
viewing data in terms of values on a specific data stack
and functions manipulating stuff in terms of popping/pushing values on the same the data stack
Check out these quotes from the above webpage:
There are two terms that get thrown
around, stack language and
concatenative language. Both define
similar but not equal classes of
languages. For the most part though,
they are identical.
Most languages in widespread use today
are applicative languages: the central
construct in the language is some form
of function call, where a function is
applied to a set of parameters, where
each parameter is itself the result of
a function call, the name of a
variable, or a constant. In stack
languages, a function call is made by
simply writing the name of the
function; the parameters are implicit,
and they have to already be on the
stack when the call is made. The
result of the function call (if any)
is then left on the stack after the
function returns, for the next
function to consume, and so on.
Because functions are invoked simply
by mentioning their name without any
additional syntax, Forth and Factor
refer to functions as "words", because
in the syntax they really are just
words.
This is in contrast to applicative languages that apply their functions directly to specific variables.
Example: adding two numbers.
Applicative language:
int foo(int a, int b)
{
return a + b;
}
var c = 4;
var d = 3;
var g = foo(c,d);
Concatenative language (I made it up, supposed to be similar to Forth... ;) )
push 4
push 3
+
pop
While I don't think concatenative language = stack language, as the authors point out above, it seems similar.
I reckon the main idea is 1. We can create new programs simply by joining other programs together.
Also, 2. Any random chunk of the program is a valid function (or sub-program).
Good old pure RPN Forth has those properties, excluding any random non-RPN syntax.
In the program 1 2 + 3 *, the sub-program + 3 * takes 2 args, and gives 1 result. The sub-program 2 takes 0 args and returns 1 result. Any chunk is a function, and that is nice!
You can create new functions by lumping two or more others together, optionally with a little glue. It will work best if the types match!
These ideas are really good, we value simplicity.
It is not limited to RPN Forth-style serial language, nor imperative or functional programming. The two ideas also work for a graphical language, where program units might be for example functions, procedures, relations, or processes.
In a network of communicating processes, every sub-network can act like a process.
In a graph of mathematical relations, every sub-graph is a valid relation.
These structures are 'concatenative', we can break them apart in any way (draw circles), and join them together in many ways (draw lines).
Well, that's how I see it. I'm sure I've missed many other good ideas from the concatenative camp. While I'm keen on graphical programming, I'm new to this focus on concatenation.
My pragmatic (and subjective) definition for concatenative programming (now, you can avoid read the rest of it):
-> function composition in extreme ways (with Reverse Polish notation (RPN) syntax):
( Forth code )
: fib
dup 2 <= if
drop 1
else
dup 1 - recurse
swap 2 - recurse +
then ;
-> everything is a function, or at least, can be a function:
( Forth code )
: 1 1 ; \ define a function 1 to push the literal number 1 on stack
-> arguments are passed implicitly over functions (ok, it seems to be a definition for tacit-programming), but, this in Forth:
a b c
may be in Lisp:
(c a b)
(c (b a))
(c (b (a)))
so, it's easy to generate ambiguous code...
you can write definitions that push the xt (execution token) on stack and define a small alias for 'execute':
( Forth code )
: <- execute ; \ apply function
so, you'll get:
a b c <- \ Lisp: (c a b)
a b <- c <- \ Lisp: (c (b a))
a <- b <- c <- \ Lisp: (c (b (a)))
To your simple question, here's a subjective and argumentative answer.
I looked at the article and several related web pages. The web pages say themselves that there isn't a real theory, so it's no wonder that people are having a hard time coming up with a precise and understandable definition. I would say that at present, it is not useful to classify languages as "concatenative" or "not concatenative".
To me it looks like a term that gives Manfred von Thun a place to hang his hat but may not be useful for other programmers.
While PostScript and Forth are worth studying, I don't see anything terribly new or interesting in Manfred von Thun's Joy programming language. Indeed, if you read Chris Okasaki's paper on Techniques for Embedding Postfix Languages in Haskell you can try out all this stuff in a setting that, relative to Joy, is totally mainstream.
So my answer is there's no simple explanation because there's no mature theory underlying the idea of a concatenative language. (As Einstein and Feynman said, if you can't explain your idea to a college freshman, you don't really understand it.) I'll go further and say although studying some of these languages, like Forth and PostScript, is an excellent use of time, trying to figure out exactly what people mean when they say "concatenative" is probably a waste of your time.
You can't explain a language, just get one (Factor, preferably) and try some tutorials on it. Tutorials are better than Stack Overflow answers.