antlr4 parser re-use and warm up - antlr4

In my use case I have to parse several thousand small and independent expressions into a tree representation using a Visitor on the generated parse trees.
Currently new streams, lexer and parser instances are created for each parse operation.
I assume this might not be optimal. Which object instances could be re-used in such a setup to make use of the warm-up property of ANTLR4? How about thread safety - which of these instances should be thread local? Is a reset of some kind required to re-use a lexer or parser instance?

In the early days of ANTLR 4 (many months before its initial release), the adaptive DFA cache was created on a per-instance basis, so the use of Lexer.setInputStream or Parser.setInputStream was essential for achieving good performance.
This is no longer the case. The background cache is now shared among all parser instances and is thread safe. The methods of the Lexer and Parser classes are not thread safe, so if you want to parse on multiple threads, you will need to create multiple instances of your lexer and parser.

Related

What constructs are not possible using Ponylang's lock-free model?

Ponylang is a new language that is lock-free and datarace-free. My impression is that to accomplish this, Ponylang looks at the sentence "if two threads can see the same object, then writes must prohibit any other operation by another thread", and uses a type system to enforce the various special cases. For example, there's a type descriptor that says, "no other thread can see this object", and one that says, "this reference is read-only", and various others. Admittedly my understanding of this is quite poor, and ponylang's documentation is short on examples.
My question is: are there operations possible with a lock-based language that aren't translatable into ponylang's type-based system at all? Also, are there such operations that are not translatable into efficient constructs in ponylang?
[...] are there operations possible with a lock-based language that aren't translatable into ponylang's type-based system at all?
The whole point with reference capabilities, in Pony, is to prevent you from doing things that are possible and even trivial, in other languages, like sharing a list between two threads and add elements to it concurrently. So, yes, in languages like Java, you can share data between threads in a way that is impossible in Pony.
Also, are there such operations that are not translatable into efficient constructs in ponylang?
If you're asking if the lock-based languages can be more efficient in some situations, than pony, then I think so. You can always create a situation that benefits from N threads and 1 lock and is worse when you use the actor model which forces you to pass information around in messages.
This thing is not to see the actor model as superior in all cases. It's a different model of concurrency and problems are solved differently. For example, to compute N values and accumulate the results in a list:
In a thread-model you would
create a thread pool,
create thread-safe list,
Create N tasks sharing the list, and
wait for N tasks to finish.
In an actor-model you would
create an actor A waiting for N values,
create N actors B sharing the actor A, and
wait for A to produce a list.
Obviously, each task would add a value to the list and each actor B would send the value to actor A. Depending on how messages are passed between actors, it can be a slower to send N values than to lock N times. Typically it will be slower but, on the other hand, you will never get a list with an unexpected size.
I believe it can do anything that a shared everything + locks can do. with just iso objects and consume it is basically pure a message passing system which can do anything that a lock system does. As in mach3 can do anything linux can.

How to achieve data polymorphism for multiple external formats in Haskell?

I need to process multiple formats and versions for semantically equivalent data.
I can generate Haskell data types for each schema (XSD for example), they will be technically different, but semantically and structurally identical in many cases.
The data is complex, includes references, and service components must process whole graph and produce also similar response (a component might just update a field, but might need to analyze whole graph to collect all required information, might call other services as well).
How can I represent ns1:address and ns2:adress as one polymorphic type that has country and street elements and application needs process them as identical, but keeps serialization context for writing response in correct format (one representation might encode them in single string while other might carry also superfluous complex data)?
How close can I get to writing mostly code that defines semantic equivalence of data, business logic and generate all else? What features in Haskell language or libraries should I evaluate as building blocks for potential solution?
An option is to create a data type for each schema and create a function to map them to a common data type. Process it as you wish. You don't need to create polymorphic types.
This approach is similar to Pandoc's: you get a bunch of readers to parse documents to a common document structure, then use writers to convert that common structure to a particular format.
You just need the libraries to read your complex input data (and write it back, if necessary). The rest is functions and data types.
If you are really handling graphs, you can look at the Data.Graph module.
It sounds like this is a problems that is well served by the Type Class infrastructure, and the Lens library.
Use a Type Class to define a standard and consistent high-level interface to the various implementations. Make sure that you focus on the operations you wish to perform, not on the underlying implementation or process.
Use Lenses and Prisms to reach into the underlying datatypes and return answers to queries, and modify values "in-place", without resorting to full serialisation/de-serialisation.

Uniqueness Types Instead STM

A forum post incidates using uniqueness types instead of STM. I don't understand what it is saying. How is uniqueness types suppose to deal with the problem that STM is trying to deal with where multiple threads are updating the same variable for example?
I've looked at wikipedia's articles on uniqueness types and linear types and its still not clear what the forum post meant.
Designing systems where data is shared and mutated concurrently by multiple threads is hard.
Approaches to make concurrency easier include:
STM -- With STM, data can still be shared and mutated by multiple threads, but concurrent mutations are detected thanks to the use of transactions.
Uniqueness types -- With uniqueness types, at most one reference to an object exists. So, by definition, it is impossible to mutate the same data concurrently (you would need two references at least, one per thread).
Immutability -- Avoid the problem of concurrent mutations altogether and share only immutable data.
Actors -- Actors rely on asynchronous messages, and serialize the messages they receive, thus avoiding concurrent modifications.

Is it possible, in any language, to implement rules that will affect every instance of an object?

For example, could I implement a rule that would change every string that followed the pattern '1..4' into the array [1,2,3,4]? In JavaScript:
//here you create a rule that changes every string that matches /$([0-9]+)_([0-9]+)*/
//ever created into range($1,$2) (imagine a b are the results of the regexp)
var a = '1..4';
console.log(a);
>> output: [1,2,3,4];
Of course, I'm pretty confident that would be impossible in most languages. My question is: is there any language in which that would be possible? Or have anyone ever proposed something like that? Does this thing have a 'name' for which I can google to read more about?
Modifying the language from whithin itself falls under the umbrell of reflection and metaprogramming. It is referred as behavioral reflection. It differs from structural reflection that opperates at the level of the application (e.g. classes, methods) and not the language level. Support for behavioral reflection varies greatly across languages.
We can broadly categorize language changes in two categories:
changes that modify the semantics (i.e. the rules) of the language itself (e.g. redefine the method lookup algorithm),
changes that modify the syntax (e.g. your syntax '1..4' to create arrays).
For case 1, certain languages expose the structure of the application (structural reflection) and the inner working of their implementation (behavioral reflection) to the application itself via special object, called meta-objects. Meta-objects are reifications of otherwise implicit aspects, that become then explicitely manipulable: the application can modify the meta-objects to redefine part of its structure, or part of the language. When it comes to langauge changes, the focus is usually on modifiying message sending / method invocation since it is the core mechanism of object-oriented language. But the same idea could be applied to expose other aspects of the language, e.g. field accesses, synchronization primitives, foreach enumeration, etc. depending on the language.
For case 2, the program must be representated in a suitable data structure to be modified. For languages of the lisp family, the program manipulates lists, and the program can be itself represented as lists. This is called homoiconicity and is handy for metaprogramming, hence the flexibility of lisp-like languages. For other languages, their representation is usually an AST. Transforming the representation of the program, or rewriting it, is possible with macro, preprocessors, or hooks during compilation or class loading.
The line between 1 and 2 is however blurry. Syntactical changes can appear to modify the semantics of the language. For instance, I can rewrite all fields accesses with proper getter and setter and perform additional logic there, say to implement transactional memory. Did I perform a semantical change of what a field access is, or merely a syntax change?
Also, there are other constructs the fall bewten the lines. For instance, proxies and #doesNotUnderstand trap are popular techniques to simulate the reification of message sends to some extent.
Lisp and Smalltalk have been very influencial in the field of metaprogramming, and I think the two following projects/platform are interesting to look at for a representative of each of these:
Racket, a lisp-like language focused on growing languages from within the langauge
Helvetia, a Smalltalk extension to embed new languages into the host language by leveraging the AST of the host environment.
I hope you enjoyed this even if I did not really address your question ;)
Your desired change require modifying the way literals are created. This is AFAIK not usually exposed to the application. The closed work that I can think of is Virtual Values for Language Extension, that tackled Javascript.
Yes. Common Lisp (and certain other lisps) have "reader macros" which allow the user to reprogram (incrementally) the mapping between the input stream and the actual language construct as parsed.
See http://dorophone.blogspot.com/2008/03/common-lisp-reader-macros-simple.html
If you want to operate on the level of objects, you will want to use a debugging/memory management framework that keeps track of all objects, and processes the rules on each evaluation step (nasty). This seems like the kind of thing you might be able to shoehorn into smalltalk.
CLOS (Common Lisp Object System) allows redefinition of live objects.
Ultimately you need two things to implement this:
Access to the running system's AST (Abstract Syntax Tree), and
Access to the running system's objects.
You'll want to study meta-object protocols and the languages that use them, then the implementations of both the MOPs and the environment within which these programs are executed.
Image-based systems will be the easiest to modify (e.g., Lisp, potentially Smalltalk).
(Image-based systems store a snapshot of a running system, allowing complete shutdown and restarts, redefinitions, etc. of a complete environment, including existing objects, and their definitions.)
Ruby allows you to extend classes. For instance, this example adds functionality to the String class. But you can do more than add methods to classes. You can also overwrite methods, but defining a method that's already been defined. You may want to preserve access to the original method using alias_method.
Putting all this together, you can overload a constructor in Ruby, but in your case, there's a catch: It sounds like you want the constructor to return a different type. Constructors by definition return instances of their class. If you just want it to return the string "[1,2,3,4]", that's simple enough:
class string
alias_method :initialize :old_constructor
def initialize
old_constructor
# code that applies your transformation
end
end
But there's no way to make it return an Array if that's what you want.

What does it mean for something to "compose well"?

Many a times, I've come across statements of the form
X does/doesn't compose well.
I can remember few instances that I've read recently :
Macros don't compose well (context: clojure)
Locks don't compose well (context: clojure)
Imperative programming doesn't compose well... etc.
I want to understand the implications of composability in terms of designing/reading/writing code ? Examples would be nice.
"Composing" functions basically just means sticking two or more functions together to make a big function that combines their functionality in a useful way. Essentially, you define a sequence of functions and pipe the results of each one into the next, finally giving the result of the whole process. Clojure provides the comp function to do this for you, you could do it by hand too.
Functions that you can chain with other functions in creative ways are more useful in general than functions that you can only call in certain conditions. For example, if we didn't have the last function and only had the traditional Lisp list functions, we could easily define last as (def last (comp first reverse)). Look at that — we didn't even need to defn or mention any arguments, because we're just piping the result of one function into another. This would not work if, for example, reverse took the imperative route of modifying the sequence in-place. Macros are problematic as well because you can't pass them to functions like comp or apply.
Composition in programming means assembling bigger pieces out of smaller ones.
Composition of unary functions creates a more complicated unary function by chaining simpler ones.
Composition of control flow constructs places control flow constructs inside other control flow constructs.
Composition of data structures combines multiple simpler data structures into a more complicated one.
Ideally, a composed unit works like a basic unit and you as a programmer do not need to be aware of the difference. If things fall short of the ideal, if something doesn't compose well, your composed program may not have the (intended) combined behavior of its individual pieces.
Suppose I have some simple C code.
void run_with_resource(void) {
Resource *r = create_resource();
do_some_work(r);
destroy_resource(r);
}
C facilitates compositional reasoning about control flow at the level of functions. I don't have to care about what actually happens inside do_some_work(); I know just by looking at this small function that every time a resource is created on line 2 with create_resource(), it will eventually be destroyed on line 4 by destroy_resource().
Well, not quite. What if create_resource() acquires a lock and destroy_resource() frees it? Then I have to worry about whether do_some_work acquires the same lock, which would prevent the function from finishing. What if do_some_work() calls longjmp(), and skips the end of my function entirely? Until I know what goes on in do_some_work(), I won't be able to predict the control flow of my function. We no longer have compositionality: we can no longer decompose the program into parts, reason about the parts independently, and carry our conclusions back to the whole program. This makes designing and debugging much harder and it's why people care whether something composes well.
"Bang for the Buck" - composing well implies a high ratio of expressiveness per rule-of-composition. Each macro introduces its own rules of composition. Each custom data structure does the same. Functions, especially those using general data structures have far fewer rules.
Assignment and other side effects, especially wrt concurrency have even more rules.
Think about when you write functions or methods. You create a group of functionality to do a specific task. When working in an Object Oriented language you cluster your behavior around the actions you think a distinct entity in the system will perform. Functional programs break away from this by encouraging authors to group functionality according to an abstraction. For example, the Clojure Ring library comprises a group of abstractions that cover routing in web applications.
Ring is composable where functions that describe paths in the system (routes) can be grouped into higher order functions (middlewhere). In fact, Clojure is so dynamic that it is possible (and you are encouraged) to come up with patterns of routes that can be dynamically created at runtime. This is the essence of composablilty, instead of coming up with patterns that solve a certain problem you focus on patterns that generate solutions to a certain class of problem. Builders and code generators are just two of the common patterns used in functional programming. Function programming is the art of patterns that generate other patterns (and so on and so on).
The idea is to solve a problem at its most basic level then come up with patterns or groups of the lowest level functions that solve the problem. Once you start to see patterns in the lowest level you've discovered composition. As folks discover second order patterns in groups of functions they may start to see a third level. And so on...
Composition (in the context you describe at a functional level) is typically the ability to feed one function into another cleanly and without intermediate processing. Such an example of composition is in std::cout in C++:
cout << each << item << links << on;
That is a simple example of composition which doesn't really "look" like composition.
Another example with a form more visibly compositional:
foo(bar(baz()));
Wikipedia Link
Composition is useful for readability and compactness, however chaining large collections of interlocking functions which can potentially return error codes or junk data can be hazardous (this is why it is best to minimize error code or null return values.)
Provided your functions use exceptions, or alternatively return null objects you can minimize the requirement for branching (if) on errors and maximize the compositional potential of your code at no extra risk.
Object composition (vs inheritance) is a separate issue (and not what you are asking, but it shares the name). It is one of containment to derive object hierarchy as opposed to direct inheritance.
Within the context of clojure, this comment addresses certain aspects of composability. In general, it seems to emerge when units of the system do one thing well, do not require other units to understand its internals, eschew side-effects, and accept and return the system's pervasive data structures. All of the above can be seen in M2tM's C++ example.
composability, applied to functions, means that the functions are smaller and well-defined, thus easy to integrate into other functions (i have seen this idea in the book "the joy of clojure")
the concept can apply to other things that are supposed be composed into something else.
the purpose of composability is reuse. for example, a function well-build (composable) is easier to reuse
macros aren't that well-composable because you can't pass them as parameters
lock are crap because you can't really give them names (define them well) or reuse them. you just do them inplace
imperative languages aren't that composable because (some of them, at least) don't have closures. if you want functionality passed as parameter, you're screwed. you have to build an object and pass that; disclaimer here: this last idea i'm not entirely convinced is true, therefore research more before taking it for granted
another idea on imperative languages is that they don't compose well because they imply state (from wikipedia knowledgebase :) "Imperative programming - describes computation in terms of statements that change a program state").
state does not compose well because although you have given a specific "something" in input, that "something" generates an output according to it's state. different internal state, different behaviour. and thus you can say good-bye to what you where expecting to happen.
with state, you depend to much on knowing what the current state of an object is... if you want to predict it's behavior. more stuff to keep in the back of your mind, less composable (remember well-defined ? or "small and simple", as in "easy to use" ?)
ps: thinking of learning clojure, huh ? investigating... ? good for you ! :P

Resources