Is there an XML pull parser (similar to Java StAX) for Haskell?
I am envisioning using it with a pure function that accepts a parser. My function will call something like nextItem parser and pattern-match on the result (StartElement, EndElement, Text, EntityRef, etc.) My function can then call itself recursively to process child elements, etc., constructing a private data structure as it traverses the XML "tree".
As I understand it, pull-parsing should have better performance than constructing an internal representation of the DOM, then traversing it, although I don't know if this is true in a lazy language.
You can use xml-conduit, which provides both streaming and full-document modules. The streaming parsing module Text.XML.Stream.Parse also provides a number of helper combinators.
It's true that if you had a truly lazy data source, there would be no (significant) performance difference between a pull parser and dealing with the lazy list. However, XML parsing usually involves I/O. conduit is designed to give you a high-level approach to these kinds of parsing issues.
Related
I am trying to develop an http client by using http-simple library. Some implementation of the library seems confusing to me.
This library makes heavy use of Conduit; however there is also this 'setRequestBodyLBS' function and interestingly, the function 'setRequestBodyBS' is missing here. It is documented that Conduit and lazy IO do not work well together. So my question is, why not the other way around? i.e., implement the BS version of the function instead of the LBS version? What is the idea behind the choice made here?
Internally, a lazy bytestring is like a linked list of strict bytestrings. Moving from a strict bytestring to a lazy one is cheap (you build a linked list of one element) but going in the reverse direction is costlier (you need to allocate a contiguous chunk of memory for the combined bytes, and then copy each chunk from the list).
Lazy IO uses lazy bytestrings, but they're also useful in other contexts, for example when you have strict chunks arriving from an external source and you want an easy way of accumulating them without having to preallocate a big area of memory or perform frequent reallocations/copies. Instead, you just keep a list of chunks that you later present as a lazy bytestring. (When list concatenations start getting expensive or the granularity is too small, you can use a Builder as a further optimization.)
Another frequent use case is serialization of some composite data structure (say, aeson's Value). If all you are going to do is dump the generated bytes into a file or a network request, it doesn't make much sense to perform a relatively costly consolidation of the serialized bytes of each sub-component. If needed, you can always perform it later with toStrict anyway.
I have a simple task which requires finding the modification of variables in a given code. This will be a static analysis. For instance, given a variable (e.g., age), I would like to create a list or tree (a data structure) that gives me what modifies this variable and preferably the function name that makes the modification (as a return) or any other auxiliary information. I start writing my script, yet I see that it's very error-prone as I need to consider many cases such as nested loops, etc.
Would you suggest me where to start?
If the code to be analyzed happens to be Groovy code then you could write an AST transformation (probably a global one) that walks the code and obtains the information you seek.
The Groovy documentation site has a section on AST Transformations, have a look at http://groovy-lang.org/metaprogramming.html#_compile_time_metaprogramming
This page describes existing AST xforms and how you can develop your own. I'd recommend browsing the code that implements the standard AST xforms such as #Immutable, #Cannonical, and others.
CodeNarc (http://codenarc.sourceforge.net/) is a static code analyzer for Groovy code, inspired by PMD. It also relies on AST xforms.
GContracts (https://github.com/andresteingress/gcontracts) is another tool implemented using AST xforms. These two can serve as a basis for understanding more about AST transformations.
OTOH if the analyzed code happens to be Java then AST transformations will not help you.
I need to process multiple formats and versions for semantically equivalent data.
I can generate Haskell data types for each schema (XSD for example), they will be technically different, but semantically and structurally identical in many cases.
The data is complex, includes references, and service components must process whole graph and produce also similar response (a component might just update a field, but might need to analyze whole graph to collect all required information, might call other services as well).
How can I represent ns1:address and ns2:adress as one polymorphic type that has country and street elements and application needs process them as identical, but keeps serialization context for writing response in correct format (one representation might encode them in single string while other might carry also superfluous complex data)?
How close can I get to writing mostly code that defines semantic equivalence of data, business logic and generate all else? What features in Haskell language or libraries should I evaluate as building blocks for potential solution?
An option is to create a data type for each schema and create a function to map them to a common data type. Process it as you wish. You don't need to create polymorphic types.
This approach is similar to Pandoc's: you get a bunch of readers to parse documents to a common document structure, then use writers to convert that common structure to a particular format.
You just need the libraries to read your complex input data (and write it back, if necessary). The rest is functions and data types.
If you are really handling graphs, you can look at the Data.Graph module.
It sounds like this is a problems that is well served by the Type Class infrastructure, and the Lens library.
Use a Type Class to define a standard and consistent high-level interface to the various implementations. Make sure that you focus on the operations you wish to perform, not on the underlying implementation or process.
Use Lenses and Prisms to reach into the underlying datatypes and return answers to queries, and modify values "in-place", without resorting to full serialisation/de-serialisation.
In scala, when you write out a string "Hello World" to a file it writes
Hello World
(note: no double quotes).
Lisp has a concept of print and write. One writes without the double quotes, the other includes them to make it easy to write out data structures and read them back later using the standard reader.
Is there anyway to do this in Scala?
With one string it is easy enough to format it - but with many deeply nested structures, it is nearly impossible.
For example, say I have
sealed trait PathSegment
case class P(x:String) extends PathSegment
case class V(x:Int) extends PathSegment
To create one does:
P("X")
or
V(0)
a list of these PathSegments prints as:
List(P(paths), P(/pets), P(get), P(responses), V(200))
I want this to print out as:
List(P("paths"), P("/pets"), P("get"), P("responses"), V(200))
In other words, I want strings (and characters), no matter where to occur in a structure to print out as "foo" or 'c'
That's what Serialization is about. Also, why JSON is popular.
Check out lift-json ( https://github.com/lift/lift/tree/master/framework/lift-base/lift-json/ ) for writing data out that will be parsed and read by another language. JSON is pretty standard in the web services world for request/response serialization and there are JSON libraries in just about every language.
To literally write out a string including double quotes, you can also do something like this:
"""
The word "apple" is in double quotes.
"""
I find a slightly more structured format like JSON more useful, and a library like lift-json does the right thing in terms of quoting Strings and not quoting Ints, etc.
I think you are looking for something like Javascript's eval() + JSON, and Python's eval(), str() and repr(). Essentially, you want Lispy symmetric meta-circular evaluation. Meaning you can transform data into source code, and evaluating that source code with give you back the same data, right?
AFAIK, there's no equivalent of eval() in Scala. Daniel Spiewak has talked about this here before. However, if you reeeeeealy want to. I suggest the following things:
Every collection object has 3 methods that will allow you to transform its data to a string representation anyway you want. There are mkString, addString and stringPrefix. Do something clever with them (think "decompiling" your in-memory ADTs back to source-code form) and you shall arrive to step 2). Essentially, you can transform a list of integers created by List(1,2,3) back to a string "List(1,2,3)". For more basic literals like a simple string or integer, you'll need to pimp the built-in types using implicits to provide them with these toString (I'm overloading the term here) helper methods.
Now you have your string representation, you can think about how to "interpret" or "evaluate" them. You will need an eval() function that create a new instance of a parser combinator that understands Scala's literals and reassemble the data structure for you.
Implementing this actually sounds fun. Don't forget to post back here if you've successfully implementing it. :)
One of the huge benefits in languages that have some sort of reflection/introspecition is that objects can be automatically constructed from a variety of sources.
For example, in Java I can use the same objects for persisting to a db (with Hibernate), serializing to XML (with JAXB), and serializing to JSON (json-lib). You can do the same in Ruby and Python also usually following some simple rules for properties or annotations for Java.
Thus I don't need lots "Domain Transfer Objects". I can concentrate on the domain I am working in.
It seems in very strict FP like Haskell and Ocaml this is not possible.
Particularly Haskell. The only thing I have seen is doing some sort of preprocessing or meta-programming (ocaml). Is it just accepted that you have to do all the transformations from the bottom upwards?
In other words you have to do lots of boring work to turn a data type in haskell into a JSON/XML/DB Row object and back again into a data object.
I can't speak to OCaml, but I'd say that the main difficulty in Haskell is that deserialization requires knowing the type in advance--there's no universal way to mechanically deserialize from a format, figure out what the resulting value is, and go from there, as is possible in languages with unsound or dynamic type systems.
Setting aside the type issue, there are various approaches to serializing data in Haskell:
The built-in type classes Read/Show (de)serialize algebraic data types and most built-in types as strings. Well-behaved instances should generally be such that read . show is equivalent to id, and that the result of show can be parsed as Haskell source code constructing the serialized value.
Various serialization packages can be found on Hackage; typically these require that the type to be serialized be an instance of some type class, with the package providing instances for most built-in types. Sometimes they merely require an automatically derivable instance of the type-reifying, reflective metaprogramming Data class (the charming fully qualified name for which is Data.Data.Data), or provide Template Haskell code to auto-generate instances.
For truly unusual serialization formats--or to create your own package like the previously mentioned ones--one can reach for the biggest hammer available, sort of a "big brother" to Read and Show: parsing and pretty-printing. Numerous packages are available for both, and while it may sound intimidating at first, parsing and pretty-printing are in fact amazingly painless in Haskell.
A glance at Hackage indicates that serialization packages already exist for various formats, including binary data, JSON, YAML, and XML, though I've not used any of them so I can't personally attest to how well they work. Here's a non-exhaustive list to get you started:
binary: Performance-oriented serialization to lazy ByteStrings
cereal: Similar to binary, but a slightly different interface and uses strict ByteStrings
genericserialize: Serialization via built-in metaprogramming, output format is extensible, includes R5RS sexp output.
json: Lightweight serialization of JSON data
RJson: Serialization to JSON via built-in metaprogramming
hexpat-pickle: Combinators for serialization to XML, using the "hexpat" package
regular-xmlpickler: Serialization to XML of recursive data structures using the "regular" package
The only other problem is that, inevitably, not all types will be serializable--if nothing else, I suspect you're going to have a hard time serializing polymorphic types, existential types, and functions.
For what it's worth, I think the pre-processor solution found in OCaml (as exemplified by sexplib, binprot and json-wheel among others) is pretty great (and I think people do very similar things with Template Haskell). It's far more efficient than reflection, and can also be tuned to individual types in a natural way. If you don't like the auto-generated serializer for a given type foo, you can always just write your own, and it fits beautifully into the auto-generated serializers for types that include foo as a component.
The only downside is that you need to learn camlp4 to write one of these for yourself. But using them is quite easy, once you get your build-system set up to use the preprocessor. It's as simple as adding with sexp to the end of a type definition:
type t = { foo: int; bar: float }
with sexp
and now you have your serializer.
You wanted
to do lot of boring work to turn a data type in haskell into JSON/XML/DB Row object and back again into a data object.
There are many ways to serialize and unserialize data types in Haskell. You can use for example,
Data.Binary
Text.JSON
as well as other common formants (protocol buffers, thrift, xml)
Each package often/usually comes with a macro or deriving mechanism to allow you to e.g. derive JSON. For Data.Binary for example, see this previous answer: Erlang's term_to_binary in Haskell?
The general answer is: we have many great packages for serialization in Haskell, and we tend to use the existing class 'deriving' infrastructure (with either generics or template Haskell macros to do the actual deriving).
My understanding is that the simplest way to serialize and deserialize in Haskell is to derive from Read and Show. This is simple and isn't fullfilling your requirements.
However there are HXT and Text.JSON which seem to provide what you need.
The usual approach is to employ Data.Binary. This provides the basic serialisation capability. Binary instances for data types are easy to write and can easily be built out of smaller units.
If you want to generate the instances automatically then you can use Template Haskell. I don't know of any package to do this, but I wouldn't be surprised if one already exists.