In a post called What is this thing you call "thread safe"?, Eric Lippert says:
Thread safety of immutable data structures is all about ensuring that use of the data across all operations is logically consistent, at the expense of the fact that you're looking at an immutable snapshot that might be out-of-date.
I thought the whole point of immutable data structures is that they do not change and therefore cannot be out of date, and that therefore they are intrinsically thread-safe
What does Lippert mean here?
What does Lippert mean here?
I agree that the way I wrote that particular bit was not as clear as it could be.
Back in 2009 we were designing the data structures for Roslyn -- the "C# and VB compiler as a service" -- and therefore were considering how to do analysis in the IDE, a world where the code is almost never correct -- if the code were correct, why are you editing it? -- and where it can be changing several times a second as you type.
I thought the whole point of immutable data structures is that they do not change and therefore cannot be out of date, and that therefore they are intrinsically thread-safe.
It is the fact that they do not change that makes them possibly out of date. Consider a common scenario in the IDE:
using System;
class P
{
static void Main()
{
Console.E
}
}
We have immutable data structures which represent the world at the moment before you typed "E", and we have an immutable data structure which represents the edit you've just made -- striking the letter E -- and now a whole bunch of stuff happens.
The lexer, knowing that the previous lex state is immutable and matches the world before the "E" re-lexes the world just around the E, rather than re-lexing the entire token stream. Similarly, the parser works out what the new (ill-formed!) parse tree is for this edit. That creates a new immutable parse tree that is an edit of the old immutable parse tree, and then the real fun starts. The semantic analyzer tries to figure out what Console means and then what you could possibly mean by E so that it can do an IntelliSense dropdown centered on members of System.Console that begin with E, if there are any. (And we're also starting an error-reporting workflow since there are now many semantic and syntactic errors in the program.)
Now what happens if while we are working all that out on a background thread, you hit "backspace" and then "W"?
All that analysis, which is still in flight will be correct, but it will be correct for Console.E and not Console.W. The analysis is out-of-date. It belongs to a world that is no longer relevant, and we have to start over again with analyzing the backspace and the W.
In short, it is perfectly safe to do analysis of immutable data structures on another thread, but stuff perhaps continues to happen on the UI thread that invalidates that work; this is one of the prices you pay for farming work on immutable data out to worker threads.
Remember, these invalidations can happen extremely quickly; we budgeted 30ms for a re-lex, re-parse and IntelliSense update because a fast typist can get in well over ten keystrokes per second; having an lexer and parser that re-used the immutable state of past lexes and parses was a key part of this strategy, but you then have to plan for an invalidation that throws away your current analysis to happen just as quickly.
Incidentally, the mechanisms we needed to invent to efficiently track those invalidations were themselves quite interesting, and led to some insights into cancellation-based workflows -- but that's a subject for another day.
He means that you might be looking at a different snapshot than someone else. Consider how cons lists work: after adding another element to the head of a list, there are effectively two lists (snapshots). Both of them are immutable but not the same.
It is said that atoms are not garbage collected. Once you’ve created an atom, it remains in the atom table, which might cause memory leakage at the end of the day!
I'm fairly new to Erlang, and my question is: How the atoms can be garbage collected? And if not possible, how to minimize that effect?
Atoms aren't issue unless you are creating them dynamically. If you did that, then you are on your way to crash an Erlang system.
How to create Atoms dynamically?
For example calling list_to_atom function inside a loop.
If you are interested in Erlang garbage collection, then read this paper by Joe Armstrong: One Pass Real-Time Generational Mark-Sweep Garbage Collection (1995).
Always keep in mind: Don't create Atoms dynamically!
Well sometimes you might need to create an Atom dynamically BUT don't over use it!
While I'm not sure atoms are garbage-collected, you can easily do without worrying whether you will blow up the system's memory. As #Chiron said, as long as all your atoms are known at compile time you should be ok.
What if I really need to use list_to_atom/1 somehow? Well, you may be able to twist your issue using this kind of function:
atom("apple") -> apple;
atom("orange") -> orange;
atom("banana") -> banana.
One other workaround is list_to_existing_atom/1
But the VM can still eat more and more RAM: other connected Erlang nodes may register atoms globally, that is allocate atoms at run time.
From learn you some Erlang:
Atoms are really nice and a great way to send messages or represent
constants. However there are pitfalls to using atoms for too many
things: an atom is referred to in an "atom table" which consumes
memory (4 bytes/atom in a 32-bit system, 8 bytes/atom in a 64-bit
system). The atom table is not garbage collected, and so atoms will
accumulate until the system tips over, either from memory usage or
because 1048577 atoms were declared.
This means atoms should not be generated dynamically for whatever
reason; if your system has to be reliable and user input lets someone
crash it at will by telling it to create atoms, you're in serious
trouble. Atoms should be seen as tools for the developer because
honestly, it's what they are.
Ive written a very simple Compiler that translates my source language to bytecode, this code gets processed by the VM (as a simple stack machine, so 3 + 3 will get translated into
push 3
push 3
add
right now I struggle at the garbage collection (I want to use reference counting).
I know the basic concept of it, if a reference gets assigned, the reference counter of that object is incremented, and if it leaves scope, it gets decremented, but the thing thats not clear to me is how the GC can free objects that get passed to functions...
here some more concrete examples of what i mean
string a = "im a string" //ok, assignment, refcount + 1 at declare time and - 1 when it leaves scope
print(new Object()) //how is a parameter solved? is the reference incremented before calling the function?
string b = "a" + "b" + "c" //dont know how to solve this, because 2 strings get pushed, then concanated, then the last gets pushed and concanated again, but should the push operation increase the ref count too or what, and where to decrease them then?
I would be glad if anyone could give me links to tutorials for implementing reference counting or help me with this very specific problem if someone had this problem before (my problem is that i dont understand when to inc, dec the references or where the count is stored)
I think a couple of things can happen with literals. You can treat them like literal numbers, and they are constants and there forever, or you can have an implicit variable that has retrain count of 1 before print, and releases it after.
In response to your edit:
You can use the implicit variable solution, or you can use the "autorelease" concept from Objective-C. You have a an object that is placed in the autorelease pool that will be released in a small amount of time, in which the receiver of the object can retain it.
First, what types of objects does your language allow to be put on the heap? Strings? Do you have mutable or immutable strings?
Check out this post about Strings in Java. So in a Java like language strings get copied every time you concatenate them because they are immutable. Also "this is a string" is actually a call to the constructor of the string class.
If the argument to print() is a call to a constructor (new Object()), there is no reference to the object in the scope calling the function, thus the object lives in the scope of the function and the counters should be incremented and decremented accordingly to entering and leaving the scope of the print() function. If the constructor is called in the calling scope and assigned to a variable, it lives in the calling scope.
While reading about the stuff, Wikipedia is a good start, but Andrew Appel's compiler book would be handy to have (there should be a 2nd edition out there and there is a C and ML version of the book available too). Lambda-the-Ultimate is the place where many of the programming language researchers discuss things, so definitely a place worth looking at.
I started writing my own scripting language over the most recent weekend for both the learning experience and for my resume when I graduate high school. So far things have gone great, I can parse variables with basic types (null, boolean, number, and string) and mathematical expressions with operator precedence, and have a rudimentary mark and sweep garbage collector in place (after completing the mark/sweep collector I will implement a generational garbage collector, I know naive mark/sweep isn't very fast). I am unsure how to store the referenced objects for the garbage collector, though. As of now I have a class GCObject that stores a pointer to the it's memory and whether it is marked or not. Should I store a linked list to it's referenced objects in the class? I have looked at garbage collectors from other languages but I see no linked lists of references per GCObject, so it is confusing me.
TLDR: How do I store objects that are referenced by other objects in a mark and sweep garbage collector? Do I just store linked lists of objects in all my GCObjects?
Thanks guys.
You generally don't store the references to an object in anything but the locations at which those references naturally occur. During the mark operation, you don't need to know which references point to an object; rather, you need to know which references an object (or root) contains, so you can recursively mark those objects.
You also need, for the sweep phase, a way to iterate through all objects so you can finalise any unreferenced objects and return their storage to the allocation pool. How you would do this exactly depends on your general purpose allocator - you probably want to write a custom one.
(I'm assuming you don't want to do compaction - that's a whole lot more complicated).
I'm researching and experimenting more with Groovy and I'm trying to wrap my mind around the pros and cons of implementing things in Groovy that I can't/don't do in Java. Dynamic programming is still just a concept to me since I've been deeply steeped static and strongly typed languages.
Groovy gives me the ability to duck-type, but I can't really see the value. How is duck-typing more productive than static typing? What kind of things can I do in my code practice to help me grasp the benefits of it?
I ask this question with Groovy in mind but I understand it isn't necessarily a Groovy question so I welcome answers from every code camp.
A lot of the comments for duck typing don't really substantiate the claims. Not "having to worry" about a type is not sustainable for maintenance or making an application extendable. I've really had a good opportunity to see Grails in action over my last contract and its quite funny to watch really. Everyone is happy about the gains in being able to "create-app" and get going - sadly it all catches up to you on the back end.
Groovy seems the same way to me. Sure you can write very succinct code and definitely there is some nice sugar in how we get to work with properties, collections, etc... But the cost of not knowing what the heck is being passed back and forth just gets worse and worse. At some point your scratching your head wondering why the project has become 80% testing and 20% work. The lesson here is that "smaller" does not make for "more readable" code. Sorry folks, its simple logic - the more you have to know intuitively then the more complex the process of understanding that code becomes. It's why GUI's have backed off becoming overly iconic over the years - sure looks pretty but WTH is going on is not always obvious.
People on that project seemed to have troubles "nailing down" the lessons learned, but when you have methods returning either a single element of type T, an array of T, an ErrorResult or a null ... it becomes rather apparent.
One thing working with Groovy has done for me however - awesome billable hours woot!
Duck typing cripples most modern IDE's static checking, which can point out errors as you type. Some consider this an advantage. I want the IDE/Compiler to tell me I've made a stupid programmer trick as soon as possible.
My most recent favorite argument against duck typing comes from a Grails project DTO:
class SimpleResults {
def results
def total
def categories
}
where results turns out to be something like Map<String, List<ComplexType>>, which can be discovered only by following a trail of method calls in different classes until you find where it was created. For the terminally curious, total is the sum of the sizes of the List<ComplexType>s and categories is the size of the Map
It may have been clear to the original developer, but the poor maintenance guy (ME) lost a lot of hair tracking this one down.
It's a little bit difficult to see the value of duck typing until you've used it for a little while. Once you get used to it, you'll realize how much of a load off your mind it is to not have to deal with interfaces or having to worry about exactly what type something is.
Next, which is better: EMACS or vi? This is one of the running religious wars.
Think of it this way: any program that is correct, will be correct if the language is statically typed. What static typing does is let the compiler have enough information to detect type mismatches at compile time instead of run time. This can be an annoyance if your doing incremental sorts of programming, although (I maintain) if you're thinking clearly about your program it doesn't much matter; on the other hand, if you're building a really big program, like an operating system or a telephone switch, with dozens or hundreds or thousands of people working on it, or with really high reliability requirements, then having he compiler be able to detect a large class of problems for you without needing a test case to exercise just the right code path.
It's not as if dynamic typing is a new and different thing: C, for example, is effectively dynamically typed, since I can always cast a foo* to a bar*. It just means it's then my responsibility as a C programmer never to use code that is appropriate on a bar* when the address is really pointing to a foo*. But as a result of the issues with large programs, C grew tools like lint(1), strengthened its type system with typedef and eventually developed a strongly typed variant in C++. (And, of course, C++ in turn developed ways around the strong typing, with all the varieties of casts and generics/templates and with RTTI.
One other thing, though --- don't confuse "agile programming" with "dynamic languages". Agile programming is about the way people work together in a project: can the project adapt to changing requirements to meet the customers' needs while maintaining a humane environment for the programmers? It can be done with dynamically typed languages, and often is, because they can be more productive (eg, Ruby, Smalltalk), but it can be done, has been done successfully, in C and even assembler. In fact, Rally Development even uses agile methods (SCRUM in particular) to do marketing and documentation.
There is nothing wrong with static typing if you are using Haskell, which has an incredible static type system. However, if you are using languages like Java and C++ that have terribly crippling type systems, duck typing is definitely an improvement.
Imagine trying to use something so simple as "map" in Java (and no, I don't mean the data structure). Even generics are rather poorly supported.
With, TDD + 100% Code Coverage + IDE tools to constantly run my tests, I do not feel a need of static typing any more. With no strong types, my unit testing has become so easy (Simply use Maps for creating mock objects). Specially , when you are using Generics, you can see the difference:
//Static typing
Map<String,List<Class1<Class2>>> someMap = [:] as HashMap<String,List<Class1<Class2>>>
vs
//Dynamic typing
def someMap = [:]
IMHO, the advantage of duck typing becomes magnified when you adhere to some conventions, such as naming you variables and methods in a consistent way. Taking the example from Ken G, I think it would read best:
class SimpleResults {
def mapOfListResults
def total
def categories
}
Let's say you define a contract on some operation named 'calculateRating(A,B)' where A and B adhere to another contract. In pseudocode, it would read:
Long calculateRating(A someObj, B, otherObj) {
//some fake algorithm here:
if(someObj.doStuff('foo') > otherObj.doStuff('bar')) return someObj.calcRating());
else return otherObj.calcRating();
}
If you want to implement this in Java, both A and B must implement some kind of interface that reads something like this:
public interface MyService {
public int doStuff(String input);
}
Besides, if you want to generalize you contract for calculating ratings (let's say you have another algorithm for rating calculations), you also have to create an interface:
public long calculateRating(MyService A, MyServiceB);
With duck typing, you can ditch your interfaces and just rely that on runtime, both A and B will respond correctly to your doStuff() calls. There is no need for a specific contract definition. This can work for you but it can also work against you.
The downside is that you have to be extra careful in order to guarantee that your code does not break when some other persons changes it (ie, the other person must be aware of the implicit contract on the method name and arguments).
Note that this aggravates specially in Java, where the syntax is not as terse as it could be (compared to Scala for example). A counter-example of this is the Lift framework, where they say that the SLOC count of the framework is similar to Rails, but the test code has less lines because they don't need to implement type checks within the tests.
Here's one scenario where duck typing saves work.
Here's a very trivial class
class BookFinder {
def searchEngine
def findBookByTitle(String title) {
return searchEngine.find( [ "Title" : title ] )
}
}
Now for the unit test:
void bookFinderTest() {
// with Expando we can 'fake' any object at runtime.
// alternatively you could write a MockSearchEngine class.
def mockSearchEngine = new Expando()
mockSearchEngine.find = {
return new Book("Heart of Darkness","Joseph Conrad")
}
def bf = new BookFinder()
bf.searchEngine = mockSearchEngine
def book = bf.findBookByTitle("Heart of Darkness")
assert(book.author == "Joseph Conrad"
}
We were able to substitute an Expando for the SearchEngine, because of the absence of static type checking. With static type checking we would have had to ensure that SearchEngine was an interface, or at least an abstract class, and create a full mock implementation of it. That's labour intensive, or you can use a sophisticated single-purpose mocking framework. But duck typing is general-purpose, and has helped us.
Because of duck typing, our unit test can provide any old object in place of the dependency, just as long as it implements the methods that get called.
To emphasise - you can do this in a statically typed language, with careful use of interfaces and class hierarchies. But with duck typing you can do it with less thinking and fewer keystrokes.
That's an advantage of duck typing. It doesn't mean that dynamic typing is the right paradigm to use in all situations. In my Groovy projects, I like to switch back to Java in circumstances where I feel that compiler warnings about types are going to help me.
To me, they aren't horribly different if you see dynamically typed languages as simply a form of static typing where everything inherits from a sufficiently abstract base class.
Problems arise when, as many have pointed out, you start getting strange with this. Someone pointed out a function that returns a single object, a collection, or a null. Have the function return a specific type, not multiple. Use multiple functions for single vs collection.
What it boils down to is that anyone can write bad code. Static typing is a great safety device, but sometimes the helmet gets in the way when you want to feel the wind in your hair.
It's not that duck typing is more productive than static typing as much as it is simply different. With static typing you always have to worry that your data is the correct type and in Java it shows up through casting to the right type. With duck typing the type doesn't matter as long as it has the right method, so it really just eliminates a lot of the hassle of casting and conversions between types.