Is it ever possible to detect sharing in Haskell?

Is it ever possible to detect sharing in Haskell? - haskell

In Scheme, the primitive eq? tests whether its arguments are the same object. For example, in the following list
(define lst
(let (x (list 'a 'b))
(cons x x)))
The result of
(eq? (car x) (cdr x))
is true, and moreover it is true without having to peer into (car x) and (cdr x). This allows you to write efficient equality tests for data structures that have a lot of sharing.
Is the same thing ever possible in Haskell? For example, consider the following binary tree implementation
data Tree a = Tip | Bin a (Tree a) (Tree a)
left (Bin _ l _) = l
right (Bin _ _ r) = r
mkTree n :: Int -> Tree Int
mkTree 0 = Tip
mkTree n = let t = mkTree (n-1) in Bin n t t
which has sharing at every level. If I create a tree with let tree = mkTree 30 and I want to see if left tree and right tree are equal, naively I have to traverse over a billion nodes to discover that they are the same tree, which should be obvious because of data sharing.
I don't expect there is a simple way to discover data sharing in Haskell, but I wondered what the typical approaches to dealing with issues like this are, when it would be good to detect sharing for efficiency purposes (or e.g. to detect cyclic data structures).
Are there unsafe primitives that can detect sharing? Is there a well-known way to build data structures with explicit pointers, so that you can compare pointer equality?

There's lots of approaches.
Generate unique IDs and stick everything in a finite map (e.g. IntMap).
The refined version of the last choice is to make an explicit graph, e.g. using fgl.
Use stable names.
Use IORefs (see also), which have both Eq and Ord instances regardless of the contained type.
There are libraries for observable sharing.
As mentioned above, there is reallyUnsafePtrEquality# but you should understand what's really unsafe about it before you use it!
See also this answer about avoiding equality checks altogether.

It is not possible in Haskell, the pure language.
But in its implementation in GHC, there are loopholes, such as
the use of reallyUnsafePtrEquality# or
introspection libraries like ghc-heap-view.
In any case, using this in regular code would be very unidiomatic; at most I could imagine that building a highly specialized library for something (memoizatoin, hash tables, whatever) that then provides a sane, pure API, might be acceptable.

There is reallyUnsafePtrEquality#. Also see here

Related

Monotonic sequence type in haskell

I would like to define a type for infinite number sequence in haskell. My idea is:
type MySeq = Natural -> Ratio Integer
However, I would also like to be able to define some properties of the sequence on the type level. A simple example would be a non-decreasing sequence like this. Is this possible to do this with current dependent-type capabilities of GHC?
EDIT: I came up with the following idea:
type PositiveSeq = Natural -> Ratio Natural
data IncreasingSeq = IncreasingSeq {
start :: Ratio Natural,
diff :: PositiveSeq}
type IKnowItsIncreasing = [Ratio Natural]
getSeq :: IncreasingSeq -> IKnowItsIncreasing
getSeq s = scanl (+) (start s) [diff s i | i <- [1..]]
Of course, it's basically a hack and not actually type safe at all.

This isn't doing anything very fancy with types, but you could change how you interpret a sequence of naturals to get essentially the same guarantee.
I think you are thinking along the right lines in your edit to the question. Consider
data IncreasingSeq = IncreasingSeq (Integer -> Ratio Natural)
where each ratio represents how much it has increased from the previous number (starting with 0).
Then you can provide a single function
applyToIncreasing :: ([Ratio Natural] -> r) -> IncreasingSeq -> r
applyToIncreasing f (IncreasingSeq s) = f . drop 1 $ scanl (+) 0 (map (s $) [0..])
This should let you deconstruct it in any way, without allowing the function to inspect the real structure.
You just need a way to construct it: probably a fromList that just sorts it and an insert that performs a standard ordered insertion.
It pains part of me to say this, but I don't think you'd gain anything over this using fancy type tricks: there are only three functions that could ever possibly go wrong, and they are fairly simple to correctly implement. The implementation is hidden so anything that uses those is correct as a result of those functions being correct. Just don't export the data constructor for IncreasingSeq.
I would also suggest considering making [Ratio Natural] be the underlying representation. It simplifies things and guarantees that there are no "gaps" in the sequence (so it is guaranteed to be a sequence).
If you want more safety and can take the performance hit, you can use data Nat = Z | S Nat instead of Natural.
I will say that if this was Coq, or a similar language, instead of Haskell I would be more likely to suggest doing some fancier type-level stuff (depending on what you are trying to accomplish) for a couple reasons:
In systems like Coq, you are usually proving theorems about the code. Because of this, it can be useful to have a type-level proof that a certain property holds. Since Haskell doesn't really have a builtin way to prove those sorts of theorems, the utility diminishes.
On the other hand, we can (sometimes) construct data types that essentially must have the properties we want using a small number of trusted functions and a hidden implementation. In the context of a system with more theorem proving capability, like Coq, this might be harder to convince theorem prover of the property than if we used a dependent type (possibly, at least). In Haskell, however, we don't have that issue in the first place.

Gather data about existing tree-like data

Let's say we have existing tree-like data and we would like to add information about depth of each node. How can we easily achieve that?
Data Tree = Node Tree Tree | Leaf
For each node we would like to know in constant complexity how deep it is. We have the data from external module, so we have information as it is shown above. Real-life example would be external HTML parser which just provides the XML tree and we would like to gather data e.g. how many hyperlinks every node contains.
Functional languages are created for traversing trees and gathering data, there should be an easy solution.
Obvious solution would be creating parallel structure. Can we do better?

The standard trick, which I learned from Chris Okasaki's wonderful Purely Functional Data Structures is to cache the results of expensive operations at each node. (Perhaps this trick was known before Okasaki's thesis; I don't know.) You can provide smart constructors to manage this information for you so that constructing the tree need not be painful. For example, when the expensive operation is depth, you might write:
module SizedTree (SizedTree, sizedTree, node, leaf, depth) where
data SizedTree = Node !Int SizedTree SizedTree | Leaf
node l r = Node (max (depth l) (depth r) + 1) l r
leaf = Leaf
depth (Node d _ _) = d
depth Leaf = 0
-- since we don't expose the constructors, we should
-- provide a replacement for pattern matching
sizedTree f v (Node _ l r) = f l r
sizedTree f v Leaf = v
Constructing SizedTrees costs O(1) extra work at each node (hence it is O(n) work to convert an n-node Tree to a SizedTree), but the payoff is that checking the depth of a SizedTree -- or of any subtree -- is an O(1) operation.

You do need some another data where you can store these Ints. Define Tree as
data Tree a = Node Tree a Tree | Leaf a
and then write a function
annDepth :: Tree a -> Tree (Int, a)
Your original Tree is Tree () and with pattern synonyms you can recover nice constructors.
If you want to preserve the original tree for some reason, you can define a view:
{-# LANGUAGE GADTs, DataKinds #-}
data Shape = SNode Shape Shape | SLeaf
data Tree a sh where
Leaf :: a -> Tree a SLeaf
Node :: Tree a lsh -> a -> Tree a rsh -> Tree a (SNode lsh rsh)
With this you have a guarantee that an annotated tree has the same shape as the unannotated. But this doesn't work good without proper dependent types.
Also, have a look at the question Boilerplate-free annotation of ASTs in Haskell?

The standard solution is what #DanielWagner suggested, just extend the data structure. This can be somewhat inconvenient, but can be solved: Smart constructors for creating instances and using records for pattern matching.
Perhaps Data types a la carte could help, although I haven't used this approach myself. There is a library compdata based on that.
A completely different approach would be to efficiently memoize the values you need. I was trying to solve a similar problem and one of the solutions is provided by the library stable-memo. Note that this isn't a purely functional approach, as the library is internally based on object identity, but the interface is pure and works perfectly for the purpose.

Data.Sequence.Seq lazy parallel Functor instance

I need parallel (but lazy) version of fmap for Seq from Data.Sequence package. But package doesn't export any Seq data constructors. So I can't just wrap it in newtype and implement Functor directly for the newtype.
Can I do it without rewriting the whole package?

The best you can do is probably to splitAt the sequence into chunks, fmap over each chunk, and then append the pieces back together. Seq is represented as a finger tree, so its underlying structure isn't particularly well suited to parallel algorithms—if you split it up by its natural structure, successive threads will get larger and larger pieces. If you want to give it a go, you can copy the definition of the FingerTree type from the Data.Sequence source, and use unsafeCoerce to convert between it and a Seq. You'll probably want to send the first few Deep nodes to one thread, but then you'll have to think pretty carefully about the rest. Finger trees can be very far from weight-balanced, primarily because 3^n grows asymptotically faster than 2^n; you'll need to take that into account to balance work among threads.
There are at least two sensible ways to split up the sequence, assuming you use splitAt:
Split it all before breaking the computation into threads. If you do this, you should split it from left to right or right to left, because splitting off small pieces is cheaper than splitting off large ones and then splitting those. You should append the results in a similar fashion.
Split it recursively in multiple threads. This might make sense if you want a lot of pieces or more potential laziness. Split the list near the middle and send each piece to a thread for further splitting and processing.
There's another splitting approach that might be nicer, using the machinery currently used to implement zipWith (see the GitHub ticket I filed requesting chunksOf), but I don't know that you'd get a huge benefit in this application.
The non-strict behavior you seek seems unlikely to work in general. You can probably make it work in many or most specific cases, but I'm not too optimistic that you'll find a totally general approach.

I found a solution, but it's actually not so efficient.
-- | A combination of 'parTraversable' and 'fmap', encapsulating a common pattern:
--
-- > parFmap strat f = withStrategy (parTraversable strat) . fmap f
--
parFmap :: Traversable t => Strategy b -> (a -> b) -> t a -> t b
parFmap strat f = (`using` parTraversable strat) . fmap f
-- | Parallel version of '<$>'
(<$|>) :: Traversable t => (a -> b) -> t a -> t b
(<$|>) = parFmap rpar

Facilities for generating Haskell types in Haskell ("second order Haskell")?

Apologies in advance if this question is a bit vague. It's the result of some weekend daydreaming.
With Haskell's wonderful type system, it's delightfully pleasing to express mathematical (especially algebraic) structure as typeclasses. I mean, just have a look at numeric-prelude! But taking advantage of such wonderful type structure in practice has always seemed difficult to me.
You have a nice, type-system way of expressing that v1 and v2 are elements of a vector space V and that w is a an element of a vector space W. The type system lets you write a program adding v1 and v2, but not v1 and w. Great! But in practice you might want to play with potentially hundreds of vector spaces, and you certainly don't want to create types V1, V2, ..., V100 and declare them instances of the vector space typeclass! Or maybe you read some data from the real world resulting in symbols a, b and c - you may want to express that the free vector space over these symbols really is a vector space!
So you're stuck, right? In order to do many of the things you'd like to do with vector spaces in a scientific computing setting, you have to give up your typesystem by foregoing a vector space typeclass and having functions do run-time compatibility checks instead. Should you have to? Shouldn't it be possible to use the fact that Haskell is purely functional to write a program that generates all the types you need and inserts them into the real program? Does such a technique exist? By all means do point out if I'm simply overlooking something basic here (I probably am) :-)
Edit: Just now did I discover fundeps. I'll have to think a bit about how they relate to my question (enlightening comments with regards to this are appreciated).

Template Haskell allows this. The wiki page has some useful links; particularly Bulat's tutorials.
The top-level declaration syntax is the one you want. By typing:
mkFoo = [d| data Foo = Foo Int |]
you generate a Template Haskell splice (like a compile-time function) that will create a declaration for data Foo = Foo Int just by inserting the line $(mkFoo).
While this small example isn't too useful, you could provide an argument to mkFoo to control how many different declarations you want. Now a $(mkFoo 100) will produce 100 new data declarations for you. You can also use TH to generate type class instances. My adaptive-tuple package is a very small project that uses Template Haskell to do something similar.
An alternative approach would be to use Derive, which will automatically derive type class instances. This might be simpler if you only need the instances.

Also there are some simple type-level programming techniques in Haskell. A canonical example follows:
-- A family of types for the natural numbers
data Zero
data Succ n
-- A family of vectors parameterized over the naturals (using GADTs extension)
data Vector :: * -> * -> * where
-- empty is a vector with length zero
Empty :: Vector Zero a
-- given a vector of length n and an a, produce a vector of length n+1
Cons :: a -> Vector n a -> Vector (Succ n) a
-- A type-level adder for natural numbers (using TypeFamilies extension)
type family Plus n m :: *
type instance Plus Zero n = n
type instance Plus (Succ m) n = Succ (Plus m n)
-- Typesafe concatenation of vectors:
concatV :: Vector n a -> Vector m a -> Vector (Plus n m) a
concatV Empty ys = ys
concatV (Cons x xs) ys = Cons x (concatV xs ys)
Take a moment to take that in. I think it is pretty magical that it works.
However, type-level programming in Haskell is in the feature-uncanny-valley -- just enough to draw attention to how much you can't do. Dependently-typed languages like Agda, Coq, and Epigram take this style to its limit and full power.
Template Haskell is much more like the usual LISP-macro style of code generation. You write some code to write some code, then you say "ok insert that generated code here". Unlike the above technique, you can write any computably-specified code that way, but you don't get the very general typechecking as is seen in concatV above.
So you have a few options to do what you want. I think metaprogramming is a really interesting space, and in some ways still quite young. Have fun exploring. :-)

O(1) circular buffer in haskell?

I'm working on a small concept project in Haskell which requires a circular buffer. I've managed to create a buffer using arrays which has O(1) rotation, but of course requires O(N) for insertion/deletion. I've found an implementation using lists which appears to take O(1) for insertion and deletion, but since it maintains a left and right list, crossing a certain border when rotating will take O(N) time. In an imperative language, I could implement a doubly linked circular buffer with O(1) insertion, deletion, and rotation. I'm thinking this isn't possible in a purely functional language like Haskell, but I'd love to know if I'm wrong.

If you can deal with amortized O(1) operations, you could probably use either Data.Sequence from the containers package, or Data.Dequeue from the dequeue package. The former uses finger trees, while the latter uses the "Banker's Dequeue" from Okasaki's Purely Functional Data Structures (a prior version online here).

The ST monad allows to describe and execute imperative algorithms in Haskell. You can use STRefs for the mutable pointers of your doubly linked list.
Self-contained algorithms described using ST are executed using runST. Different runST executions may not share ST data structures (STRef, STArray, ..).
If the algorithm is not "self contained" and the data structure is required to be maintained with IO operations performed in between its uses, you can use stToIO to access it in the IO monad.
Regarding whether this is purely functional or not - I guess it's not?

It sounds like you might need something a bit more complicated than this (since you mentioned doubly-linked lists), but maybe this will help. This function acts like map over a mutable cyclic list:
mapOnCycling f = concat . tail . iterate (map f)
Use like:
*Main> (+1) `mapOnCycling` [3,2,1]
[4,3,2,5,4,3,6,5,4,7,6,5,8,7,6,9,8,7,10,9...]
And here's one that acts like mapAccumL:
mapAccumLOnCycling f acc xs =
let (acc', xs') = mapAccumL f acc xs
in xs' ++ mapAccumLOnCycling f acc' xs'
Anyway, if you care to elaborate even more on what exactly your data structure needs to be able to "do" I would be really interested in hearing about it.
EDIT: as camccann mentioned, you can use Data.Sequence for this, which according to the docs should give you O1 time complexity (is there such a thing as O1 amortized time?) for viewing or adding elements both to the left and right sides of the sequence, as well as modifying the ends along the way. Whether this will have the performance you need, I'm not sure.
You can treat the "current location" as the left end of the Sequence. Here we shuttle back and forth along a sequence, producing an infinite list of values. Sorry if it doesn't compile, I don't have GHC at the moment:
shuttle (viewl-> a <: as) = a : shuttle $ rotate (a+1 <| as)
where rotate | even a = rotateForward
| otherwise = rotateBack
rotateBack (viewr-> as' :> a') = a' <| as'
rotateForward (viewl-> a' <: as') = as' |> a'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string