How should I make my parser concurrent? - multithreading

I'm working on implementing a music programming language parser in Clojure. The idea is that you run the parser program with a text file as a command-line argument; the text file contains code in this music language I'm developing; the parser interprets the code and figures out what "instrument instances" have been declared, and for each instrument instance, it parses the code and returns a sequence of musical "events" (notes, chords, rests, etc.) that the instrument does. So before that last step, we have multiple strings of "music code," one string per instrument instance.
I'm somewhat new to Clojure and still learning the nuances of how to use reference types and threads/concurrency. My parser is going to be doing some complex parsing, so I figured it would benefit from using concurrency to boost performance. Here are my questions:
The simplest way to do this, it seems, would be to save the concurrency for after the instruments are "split up" by the initial parse (a single-thread operation), then parse each instrument's code on a different thread at the same time (rather than wait for each instrument to finish parsing before moving onto the next). Am I on the right track, or is there a more efficient and/or logical way to structure my "concurrency plan"?
What options do I have for how to implement this concurrent parsing, and which one might work the best, either from a performance or a code maintenance standpoint? It seems like it could be as simple as: (map #(future (process-music-code %)) instrument-instances), but I'm not sure if there is a better way to do it like with an agent, or manual threads via Java interop, or what. I'm new to concurrent programming, so any input on different ways to do this would be great.
From what I've read, it seems that Clojure's reference types play an important role in concurrent programming, and I can see why, but is it always necessary to use them when working with multiple threads? Should I worry about making some of my data mutable? If so, what in particular should be mutable in the code for the parser I'm writing? and what reference type(s) would be best suited for what I'm doing? The nature of the way my program will work (user runs the program with a text file as an argument -- program processes it and turns it into audio) makes it seem like I don't need anything to be mutable, since the input data never changes, so my gut tells me I won't need to use any reference types, but then again, I might not fully understand the relationship between reference types and concurrency in Clojure.

I would suggest that you might be distracting yourself from more important things (like working out the details of your music language) by premature optimization. It would be better to write the simplest, easiest-to-code parser which you can first, to get up and running. If you find it too slow, then you can look at how to optimize for better performance.
The parser should be fairly self-contained, and will probably not take a whole lot of code anyways, so even if you later throw it out and rewrite it, it will not be a big loss. And the experience of writing the first parser will help if and when you write the second one.
Other points:
You are absolutely right about reference types -- you probably won't need any. Your program is a compiler -- it takes input, transforms it, writes output, then exits. That is the ideal situation for pure functional programming, with nothing mutable and all flow of data going purely through function arguments and return values.
Using a parser generator is usually the quickest way to get a working parser, but I haven't found a really good parser generator for Clojure. Parsley has a really nice API, but it generates LR(0) parsers, which are almost useless for anything which does not have clear, unambiguous markers for the beginning/end of each "section". (Like the way S-expressions open and close with parens.) There are a couple parser combinator libraries out there, like squarepeg, but I don't like their APIs and prefer to write my own hand-coded, recursive-descent parsers using my own implementation of something like parser combinators. (They're not fast, but the code reads really well.)

I can only support Alex Ds point that writing parsers is an excellent exercise. You should definitely do it in C one time. From my own experience, it's a lot of debugging training at least.
Aside from that, given that you are in the beautiful world of Clojure notice the following:
Your parser will transform ordinary strings to data structures, like
{:command :declare,
:args {:name "bazooka-violin",
...},
...}
In Clojure you can read such data structures easily from EDN files. Possibly it would be a more valuable approach to play around with finding suitable structures directly before you constrain the syntax of your language too much for it to be flexible for later changes in the way your language works.
Don't ever think about writing for performance. Unless your user describes the collected works of Bach in a file, it's unlikely that it will take more than a second to parse.
If you write your interpreter in a functional, modular and concise way, it should be easy to decompose it into steps that can be parallelized using various techniques from pmap to core.reducers. The same of course goes for all other code and your parser as well (if multi-threading is a necessity there).
Even Clojure is not compiled in parallel. However it supports recompilation (on the JVM) which in contrast is a way more valuable feature to think about.

As an aside, I've been reading The Joy of Clojure, and I just learned that there is a nifty clojure.core function called pmap (parallel map) that provides a nice, easy way to perform an operation in parallel on a sequence of data. It's syntax is just like map, but the difference is that it performs the function on each item of the sequence in parallel and returns a lazy sequence of the results! This can generally give a performance boost, but it depends on the inherent performance cost of coordinating the sequence result, so whether or not pmap gives a performance boost will depend on the situation.
At this stage in my MPL parser, my plan is to map a function over a sequence of instruments/music data, transforming each instrument's music data from a parse tree into audio. I have no idea how costly this transformation will be, but if it turns out that it takes a while to generate the audio for each instrument individually, I suppose I could try changing my map to pmap and see if that improves performance.

Related

What do I need to learn to build an interpreter?

For my AQA A2-level Computing project, I've decided to create a basic interpreted programming language, outputting to Console. I don't know how to build an interpreter. I have a copy of the purple dragon book, which is all about compiler design, as user166390 said on an answer to this question that the initial steps to building a compiler are the same to build an interpreter. My question is: is this true?
Can I use the techniques described in the dragon book to write an interpreter? And if so, which steps do I need to use and learn how to use?
Do I need to write a lexical analyser, a syntax analyser, a semantic analyser and an intermediate code generator, for example?
Could I get away with writing a basic parser that reads each line of the source code, parses it, and executes the instruction straight away, or is that a notoriously bad idea?
Yes, you can use the techniques described in the dragon book to write an interpreter.
You need a lexical analyzer and a parser regardless.
As others have pointed out, you do need to write the code to do actual execution -- but for a simple interpreter, this can be essentially the same as the syntax-directed translation described in the dragon book.
Everything else is optional.
If you want to skip straight from the parser to execution, you can. That will leave you with a very simple language, which can be both good and bad -- look at Tcl for an example of such a language.
If you want to interpret each line as you parse it, you can do that, too; this is what most command-line interpreters (Unix shell scripts, Microsoft's cmd.com and PowerShell) do, as well as interactive "REPL's" (Read-Eval-Print-Loops) for languages like Python and Ruby.
"Semantic analyzer" seems vague to me, but sounds like it should include most kinds of load-time consistency checks. This is also optional, but there are advantages in an interpreter that won't take any old garbage and try to execute it as a program...
"Intermediate code" is also kind of vague, but it is arguably optional. If you aren't executing directly from the program string (as in Tcl), you need some kind of internal representation to store your code once you've read it in. One popular option is to execute from an internal tree structure, based more or less closely on your parse tree, which is arguably distinct from producing "intermediate code". On the other hand, if your "intermediate code" could be written out more or less directly from your internal tree structure, you might as well count the internal structure as your "intermediate code".
There are important issues that you haven't addressed; one that stands out is: how do you want to handle names? Presumably you will want the programmer to be able to define and use his own names (e.g., for variables, functions, and so forth), so you will need to implement some kind of mechanism for that.
Exactly how names are handled is a big design decision, with major implications for the usability and implementability of your language. The simplest option for implementation is to use a single, global hash map to implement a single, global namespace -- but note that this choice has well-known usability problems...
Could I get away with writing a basic parser that reads source code and executes the steps straight away?
You could but you'd be doing it the hard way.
Do I need to write a lexical analyser, a syntax analyser, a semantic analyser and an intermediate code generator, for example?
You can skip intermediate code generation except if you want to write a VM-based interpreter. Perl for example, used to execute its parse graph directly; this is in contrast with Java or Python, which produces intermediate byte code.
The interpreter part of a VM-based language is generally simpler than the interpreter that have to understand a parse graph (so each component in the system is simpler), however the complexity of the whole interpreter stack is generally simpler when you don't need to define an intermediate bytecode language. So pick your poison.

APL readability

I have to code in APL. Since the code is going to be maintained for a long time, I am wondering if there are some papers/books which contain heuristics/tips/samples to help in designing clean and readable APL programs.
It is a different experience than coding in other programming language. Making a function, for example. Small will not help: such a function can contain one line of code, which is completely incomprehensible.
First, welcome to the wonderful world of APL.
Writing readable and maintainable APL code is not much different than writing readable and maintainable code in any language. Any good book on writing clean code is as applicable to APL as any other language, perhaps even more so. I recommend Clean Code by Robert C. Martin.
Consider the guideline in this book that all code in a function should be at the same level of abstraction. This applies to APL 100 times over. For example, if you have a function named DoThisBigTask it should have very few APL primitive symbols in it, and certainly no long complex one-liners. It should just be series of calls to other, lower level functions. If these higher-level functions are all well-named and well-defined, the general drift should be easily determined by someone who does not even know APL. The lowest level functions will be nothing but primitives and will be inscrutable to the non-APLer. Depending on how they are written they may even initially appear inscrutable to a seasoned APLer. However, these low level functions should be short, have no side effects, and can easily be re-written rather than modified if the maintaining programmer is unable to understand the original coding technique.
In general, keep your functions short, well-named, well-defined, and to the point. And keep the lines of code even shorter. It is much more important to have well-defined and well-documented functions than it is to have well-written or well document lines of code.
Since you asked for books and other references, I can suggest:
APL2 in Depth by Norman D. Thomson and Raymond P. Polivka. I worked with Ray Polivka for years and he was one of the best APL teachers I
have ever known.
The classic A. P. L.: An Interactive Approach by
Leonard Gilman and Allen J. Rose is good for the core language, but
is rather outdated and doesn't contain much that is truly relevant on
readability.
APL 2 at a Glance by James A. Brown and Sandra Pakin serves in some ways as an update to Gilman and Rose. It covers nested operations and other updates to APL, but has not much specifically directed at readability. Still, if you follow the examples here you will be writing readable code.
APL is Easy by STSC and Jerry R. Turner is an intro directed specifically at the APL*Plus line. Again, not much specifically on readability, but the models are generally well-designed readable code.
Mastering Dyalog APL: A Complete Introduction to Dyalog APL by Bernard Legrand is quite good if you are specifically workign in Dyalog APL, not so much if you are working in one of the other versions such as APL*Plus (from APL2000)
It is my view that the reputation of APL as a "write-only language" is much overstated. One does need to get used to the primitives and the symbols used to represent them. But then one needs to get used to the syntax and the various library functions in many other language environments. I have seen convoluted code in C, C++, and Java as hard to follow as any APL. Of course, it isn't good C, C++, or Java, even if it is clever.
Some advice:
Writing 'one-liners' is a way to test one's mastery of the language,
but is very poor practice for production code.
Comment to make the algorithm and especially the data structure being used clear. As with any code, comments should add something
that cannot be easily read from the code itself, or call attention to
complex or obscure code.
If possible avoid obscure code so there is no need to explain it. It is usually possible.
Make each function do one and only one job, with a clear interface.
Avoid global variables for the most part, and document any that are needed.
Document the interface, purpose, and efect of any function at the
top. Make utilities black boxes without side-effects if possible. If
side-effects are essential, document those as part of the interface.
Develop a standard header comment structure.
Dynamic code built on-the-fly can add flexabiliy to a solution, but
is often much harder to debug if problems occur. Make such code
bullet-proof to the extent you can, and build in optional logging to
help when it turns out to have problems anyway.
You can use an OOP-like style if you wish. But there is no need to do so. If you do, it should IMO be used fairly pervasively through an application, except perhaps for low-level utilities. But OOP-style code can be at least as convoluted as non-OOP code, and APL doesn't have built-in inheritance or other OOP-supporting syntax.
(I'll use here "A" instead of comment, "'" instead of symbol sign.)
Well, I was developing APL for a year, I have only used Aplusdev.org.
You don't even need more. The trick is to try to think OOP-like. You should have -- if I remember well -- structured fields used as class data, sth like {'attribute1 'attribute2, {value,value2}}, so you can easily pick them out like obj.attribute1 in c++.
(here 'attribute Pick object, use only in class functions :) )
Moreover, use namespaced functions:
namespace_classname.method(this, arg1)
namespace_classname._private_method(this, arg1, arg2)
and lots of simple tool functions instead of nifty, long lines. The performance drop is not substantial, you can optimize later for say arrays once you see something could be faster.
And before anything: think matlab and mathematica without for loops! :) It helps a lot.
My suggestions for robust, maintainable code:
use extensive set of utility functions instead of trickery with those unreadable symbols to make your code always to the point.
try-catch blocks there is a built in exception handling, which can be utilized here,
try_begin();
A tried code, maybe in extra brackets not to forget try_end() at the end.
try_end();
catch(sth, function_here);
can be nicely implemented. (You'll see, catching errors is very important)
crude type checking : implement a standard and use for not-so-many times called functions... (you can put a function with flexible parameters right after a function definition)
Syntax:
function(point2i, ch):
{
typecheck({{'int, [1 2]}, 'char}); A do some assertions in typecheck...
// your function goes here
}
lambda functions can be very effective, you can do some reflections to achieve lambdas.
always declare returns with saying "return"!
Unit tests based on try-catch testing each and every function you write.
I also used a lot of 'apply' and 'map' from mathematica, implementing my own version, they are very-very effective here.
I wrote matlab thinking since you can here have a list of structured fields (=class data) in a variable. You will write lots of those if you wanna keep things for-loop-less (and you wanna, trust me). For that you need to have a standard naming convention say indicate with plurals:
namespace_class.method(objects, arg1, arg2)
To the end: also, I wrote inputBox and messageBox like the ones in Javascript or VisualBasic, they will make very easy hacking together simple tools or checking states. The only catch of messageBox, that it can't put the function-flow on hold,
so you need
AA documentation of f1
f1():
{
A do sth
msgbox.call("Hi there",{'Ok, {'f2}});
}
f2():
{
A continue doing stuff
}
You can write auto-docs in bash with a gawk/sed combination to put it into a webpage.
Also creating HTML formatted code helps in printing. ;)
I hope this was good outline for a proper build-up. Before writing own tools, try to dig up the available tools from the legacy codebase... functions are often even 4 times implemented with different names due to the mess that time.

yacc/lex or hand-coding?

I am working on new programming language, but I was always puzzled by the fact that everyone is using yaxx/lex to parse the code, but I am not.
My compiler (which is already working) is handcoded in C++/STL, and I cannot say it's complex or took too much time. It has both some kind of lexer and parser, but they are not autogenerated.
Earlier, I wrote a C compiler(not full spec) the same way - it was able to compile the program in 1 pass, with all these back references resolving & preprocessing - this is definitely impossible with yacc/lex.
I just cannot convince myself to scrap all this, and start diving into yaxx/lex - which might need quite an effort to implement and might possibly introduce some grammar limitations.
Is there something I miss when not using yacc/lex? Do I do an evil thing?
The main advantages of using any kind of lexer/parser generator is that it gives you a lot more flexibility if your language evolves. In a hand-coded lexer/parser (especially if you've mixed in a lot of functionality in a single pass!), changes to the language get nasty fairly quickly, whereas with a parser generator you make the change, re-run the generator, and move on with your life. There are certainly no inherent technical limitations to always just writing everything by hand, but I think the evolvability and maintainability of automating away the boring bits is worth it!
Yacc is inflexible in some ways:
good error handling is hard (basically, its algorithm is only defined to parse a correct string correctly, otherwise, all bets are off; this is one of the reasons that GCC moved to a hand-written parser)
context-dependency is hard to express, whereas with a hand-written recursive descent parser you can simply add a parameter to the functions
Furthermore, I have noticed that lex/yacc object code is often bigger than a hand-written recursive descent parser (source code tends to be the other way round).
I have not used ANTLR so I cannot say if that is better at these points.
The other huge advantage of using generators is that they are guaranteed to process exactly and only the language you specified in the grammar. You can't say that of any hand-written code. The LR/LALR variants are also guaranteed to be O(N), which again you can't assert about any hand coding, at least not without a lot of effort in constructing the proof.
I've written both and lived with both and I would never hand-code again. I only did that one because I didn't have yacc on the platform at the time.
Maybe you are missing out on ANTLR, which is good for languages that can be defined with a recursive-descent parsing strategy.
There are potentially some advantages to using Yacc/Lex, but it is not mandatory to use them. There are some downsides to using Yacc/Lex too, but the advantages usually outweigh the disadvantages. In particular, it is often easier to maintain a Yacc-driven grammar than a hand-coded one, and you benefit from the automation that Yacc provides.
However, writing your own parser from scratch is not an evil thing to do. It may make it harder to maintain in future, but it may make it easier, too.
It certainly depends on the complexity of your language grammar. An easy grammar means that there is an easy implementation and you can just do it yourself.
Take a look at maybe the worst possible example at all: C++ :) (Does anybody knows another language, besides natural languages, which are more difficult to parse correctly?) Even with tools like Antlr, is it quite difficult to get it right, though it is manageable. Whereby on the other side, even while being much harder, it seems that some of the best C++ parsers, e.g. GCC and LLVM, are also mostly handwritten.
If you don't need too much flexibility and your language is not too trivial, you will certainly safe some work/time by using Antlr.

Why are functional languages considered a boon for multi threaded environments?

I hear a lot about functional languages, and how they scale well because there is no state around a function; and therefore that function can be massively parallelized.
However, this makes little sense to me because almost all real-world practical programs need/have state to take care of. I also find it interesting that most major scaling libraries, i.e. MapReduce, are typically written in imperative languages like C or C++.
I'd like to hear from the functional camp where this hype I'm hearing is coming from..
It's important to add one word: "there's no shared state".
Any meaningful program (in any language) changes the state of the world. But (some) functional languages make it impossible to access the same resource from multiple threads simultaneously. The absence of shared state makes multithreading safe.
Functional languages such as Haskell, Scheme and others have what are called "pure functions". A pure function is a function with no side effects. It doesn't modify any other state in the program. This is by definition threadsafe.
Of course you can write pure functions in imperative languages. You also find multi-paradigm languages like Python, Ruby and even C# where you can do imperative programming, functional programming or both.
But the point of Haskell (etc) is that you can't write a non-pure function. Well that's not strictly true but it's mostly true.
Similarly, many imperative languages have immutable objects for much the same reason. An immutable object is one whose state doesn't change once created. Again by definition an immutable object is threadsafe.
You're talking about two different things and don't realize it.
Yes, most real-world programs have state somewhere, but if you want to do multithreading, that state should not be everywhere, and in fact, the fewer places it's in, the better. In functional programs, the default is not to have state, and you can introduce state exactly where you need it and nowhere else. Those parts that are dealing with state will not be as easily multithreaded, but since all the rest of your program is free of side-effects and thus it doesn't matter what order those parts are executed in, it removes a huge barrier to parallelization.
However, this makes little sense to me because almost all real-world
practical programs need/have state to take care of.
You'd be surprised! Yes, all programs need some state (I/O in particular) but often you don't need much more. Just because most programs have heaps of state doesn't mean they need it.
Programming in a functional language encourages you to use less state, and thus your programs become easier to parallelise.
Many functional languages are "impure" which means they allow some state. Haskell doesn't, but Haskell has monads which basically let you get something from nothing: you get state using stateless constructs. Monads are a bit fiddly to work with which is why Haskell gives you a strong incentive to restrict state to as small a part of your program as possible.
I also find it interesting that most major scaling libraries, i.e.
MapReduce, are typically written in imperative languages like C or C++.
Programming concurrent applications is "hard" in C/C++. That's why it's best to do all the dangerous stuff in a library which is heavily tested and inspected. But you still get the flexibility and performance of C/C++.
Higher order functions. Consider a simple reduction operation, summing the elements of an array. In an imperative language, programmers typically write themselves a loop and perform reductions one element at a time.
But that code isn't easy to make multi-threaded. When you write a loop you're assuming an order of operations and you have to spell out how to get from one element to the next. You'd really like to just say "sum the array" and have the compiler, or runtime, or whatever, make the decision about how to work through the array, dividing up the task as necessary between multiple cores, and combining those results together. So instead of writing a loop, with some addition code embedded inside it, an alternative is to pass something representing "addition" into a function that can do the divvying. As soon as you do that, you're writing functionally. You're passing a function (addition) into another function (the reducer). If you write this way then it not only makes more readable code, but when you change architecture, or want to write for heterogeneous architecture, you don't have to change the summer, just the reducer. In practice you might have many different algorithms that all share one reducer so this is a big payoff.
This is just a simple example. You may want to build on this. Functions to apply other functions on 2D arrays, functions to apply functions to tree structures, functions to combine functions to apply functions (eg. if you have a hierarchical structure with trees above and arrays below) and so on.

How should I manage side effects in a new language design?

So I'm currently working on a new programming language. Inspired by ideas from concurrent programming and Haskell, one of the primary goals of the language is management of side effects. More or less, each module will be required to specify which side effects it allows. So, if I were making a game, the graphics module would have no ability to do IO. The input module would have no ability to draw to the screen. The AI module would be required to be totally pure. Scripts and plugins for the game would have access to a very restricted subset of IO for reading configuration files. Et cetera.
However, what constitutes a side effect isn't clear cut. I'm looking for any thoughts or suggestions on the subject that I might want to consider in my language. Here are my current thoughts.
Some side effects are blatant. Whether its printing to the user's console or launching your missiles, anything action that reads or write to a user-owned file or interacts with external hardware is a side effect.
Others are more subtle and these are the ones I'm really interested in. These would be things like getting a random number, getting the system time, sleeping a thread, implementing software transactional memory, or even something very fundamental such as allocating memory.
Unlike other languages built to control side effects (looking at you Haskell), I want to design my language to be pragmatic and practical. The restrictions on side effects should serve two purposes:
To aid in the separations of concerns. (No one module can do everything).
To sandbox each module in the application. (Any module could be used as a plugin)
With that in mind, how should I handle "pseudo"-side effects, like random numbers and sleeping, as I mention above? What else might I have missed? In what ways might I manage memory usage and time as resources?
The problem of how to describe and control effects is currently occupying some of the best scientific minds in programming languages, including people like Greg Morrisett of Harvard University. To my knowledge, the most ambitious pioneering work in this area was done by David Gifford and Pierre Jouvelot in the FX programming language started in 1987. The language definition is online, but you may get more insight into the ideas by reading their 1991 POPL paper.
This is a really interesting question, and it represents one of the stages I've gone through and, frankly, moved beyond.
I remember seminars in which Carl Hewitt, in talking about his Actors formalism, discussed this. He defined it in terms of a method giving a response that was solely a function of its arguments, or that could give different answers at different times.
I say I moved beyond this because it makes the language itself (or the computational model) the main subject, as opposed to the problem(s) it is supposed to solve. It is based on the idea that the language should have a formal underlying model so that its properties are easy to verify. That is fine, but still remains a distant goal, because there is still no language (to my knowledge) in which the correctness of something as simple as bubble sort is easy to prove, let alone more complex systems.
The above is a fine goal, but the direction I went was to look at information systems in terms of information theory. Specifically, assuming a system starts with a corpus of requirements (on paper or in somebody's head), those requirements can be transmitted to a program-writing machine (whether automatic or human) to generate source code for a working implementation. THEN, as changes occur to the requirements, the changes are processed through as delta changes to the implementation source code.
Then the question is: What properties of the source code (and the language it is encoded in) facilitate this process? Clearly it depends on the type of problem being solved, what kinds of information go in and out (and when), how long the information has to be retained, and what kind of processing needs to be done on it. From this one can determine the formal level of the language needed for that problem.
I realized the process of cranking through delta changes of requirements to source code is made easier as the format of the code comes more to resemble the requirements, and there is a nice quantitative way to measure this resemblence, not in terms of superficial resemblence, but in terms of editing actions. The well-known technology that best expresses this is domain specific languages (DSL). So I came to realize that what I look for most in a general-purpose language is the ability to create special-purpose languages.
Depending on the application, such special-purpose languages may or may not need specific formal features like functional notation, side-effect control, paralellism, etc. In fact, there are many ways to make a special-purpose language, from parsing, interpreting, compiling, down to just macros in an existing language, down to simply defining classes, variables, and methods in an existing language. As soon as you declare a variable or subroutine you're created new vocabulary and thus, a new language in which to solve your problem. In fact, in this broad sense, I don't think you can solve any programming problem without being, at some level, a language designer.
So best of luck, and I hope it opens up new vistas for you.
A side effect is having any effect on anything in the world other than returning a value, i.e. mutating something that could be visible in some way outside the function.
A pure function neither depends on or affects any mutable state outside the scope of that invocation of the function, which means that the function's output depends only on constants and its inputs. This implies that if you call a function twice with the same arguments, you are guaranteed to get the same result both times, regardless of how the function is written.
If you have a function that modifies a variable that it has been passed, that modification is a side effect because it's visible output from the function other than the return value. A void function that is not a no-op must have side effects, because it has no other way of affecting the world.
The function could have a private variable only visible to that function that it reads and modifies, and calling it would still have the side effect of changing the way the function behaves in the future. Being pure means having exactly one channel for output of any kind: the return value.
It is possible to generate random numbers purely, but you have to pass around the random seed manually. Most random functions keep a private seed value that is updated each time its called so that you get a different random each time. Here's a Haskell snippet using System.Random:
randomColor :: StdGen -> (Color, Int, StdGen)
randomColor gen1 = (color, intensity, gen2)
where (color, gen2) = random gen1
(intensity, gen3) = randomR (1, 100) gen2
The random functions each return the randomized value and a new generator with a new seed (based on the previous one). To get a new value each time, the chain of new generators (gen1,gen2,gen3) have to be passed along. Implicit generators just use an internal variable to store the gen1.. values in the background.
Doing this manually is a pain, and in Haskell you can use a state monad to make it a lot easier. You'll want to implement something less pure or use a facility like monads, arrows or uniqueness values to abstract it away.
Getting the system time is impure because the time could be different each time you ask.
Sleeping is fuzzier because sleep doesn't affect the result of the function, and you could always delay execution with a busy loop, and that wouldn't affect purity. The thing is that sleeping is done for the sake of something else, which IS a side effect.
Memory allocation in pure languages has to happen implicitly, because explicitly allocating and freeing memory are side effects if you can do any kind of pointer comparisons. Otherwise, creating two new objects with the same parameters would still produce different values because they would have different identities (e.g. not be equal by Java's == operator).
I know I've rambled on a bit, but hopefully that explains what side effects are.
Give a serious look to Clojure, and their use of software transactional memory, agents, and atoms to keep side effects under control.

Resources