What is a Sample in Helm? - haskell

There doesn't seem to be much documentation for Sample a in the Haskell FRP library Helm. I am trying to write a function similar to sample on in Elm and I think update could help. However I am confused about how update works because, from the source code here, it seems that the variable p is not used at all.
What should this function be doing and why is the input p included if it isn't used? Is there a better way to do this? I think seq could work, but I tried implementing my animation with seq and it doesn't do the thing I am looking for.

Probably the first argument there exists for historical reasons or for consistency with other functions offered by helm; but I don't know enough about either to say for sure.
The intended use of the update function seems to be to wrap the appropriate constructor around its argument: update p a s will result in either Changed a or Unchanged a depending on whether a matches with the value stored in s. One might use this, for example, as an argument to foldp:
foldp (update undefined) :: Eq a => Sample a -> Signal a -> Signal (Sample a)
Downstream signals could then ignore Unchanged values easily.

Related

Strict map with custom reading method or type with custom read instance

So, it is not a problem but I would want an opinion what would be a better way. So I need to read data from a outside source (TCP), that comes basically in this format:
key: value
okey: enum
stuff: 0.12240
amazin: 1020
And I need to parse it into a Haskell accessible format, so the two solutions I thought about, were either to, parse that into a strict String to String map, or record syntax type declarations.
Initially I thought to make a type synonym for my String => String map, and make extractor functions like amazin :: NiceSynonym -> Int, and do the necessary treatment and parsing within the method, but that felt like, sketchy at the time? Then I thought an actual type declaration with record syntax, with a custom Read instance. That was a nightmare, because there is a lot of enums and keys with different types and etc. And it felt... disappointing. It simply wraps the arguments and creates reader functions, not much different from the original: amazin :: TypeDeclaration -> Int.
Now I'm kind of regretting not going with reader functions as I initially envisioned. So, anything else I'm forgetting to consider? Any pros and cons of either sides to take note on? Is one objectively better then the other?
P.S.: Some considerations that may make one or the other better:
Once read I won't need to change it at all whatsoever, it's basically a status report
No need to compare, add, etc., again just status report no point
Not really a need for performance, I wont be reading hundreds a second or anything
TL;DR: Given that input example, what's the best way to make into a Haskell-readable format? map, data constructor, dependent map...
Both ways are very valid in their own respects, but since I was making an API to interact with such protocol too, I preferred the record syntax so I could cover all the properties more easily. Also I wasn't really going to do any checking or treatment in the getter functions, and no matter how boring making the reader instance for my type might have seemed, I bet doing all the get functions manually would be worse. Parsing stuff manually is inherently boring, I guess I was just looking for a magical funcional one liner to make all the work for me.

Why does NPM's policy of duplicated dependencies work?

By default, when I use NPM to manage a package depending on foo and bar, both of which depend on corelib, by default, NPM will install corelib twice (once for foo, and once for bar). They might even be different versions.
Now, let's suppose that corelib defined some data structure (e.g. a URL object) which is passed between foo, bar and the main application. Now, what I would expect, is if there was ever a backwards incompatible change to this object (e.g. one of the field names changed), and foo depended on corelib-1.0 and bar depended on corelib-2.0, I'd be a very sad panda: bar's version of corelib-2.0 might see a data structure created by the old version of corelib-1.0 and things would not work very well.
I was really surprised to discover that this situation basically never happens (I trawled Google, Stack Overflow, etc, looking for examples of people whose applications had stopped working, but who could have fixed it by running dedupe.) So my question is, why is this the case? Is it because node.js libraries never define data structures that are shared outside of the programmers? Is it because node.js developers never break backwards compatibility of their data structures? I'd really like to know!
this situation basically never happens
Yes, my experience is indeed that that is not a problem in the Node/JS ecosystem. And I think it is, in part, thanks to the robustness principle.
Below is my view on why and how.
Primitives, the early days
I think the first and foremost reason is that the language provides a common basis for primitive types (Number, String, Bool, Null, Undefined) and some basic compound types (Object, Array, RegExp, etc...).
So if I receive a String from one of the libs' APIs I use, and pass it to another, it cannot go wrong because there is just a single String type.
This is what used to happen, and still happens to some extent to this day: Library authors try to rely on the built-ins as much as possible and only diverge when there is sufficient reason to, and with sufficient care and thought.
Not so in Haskell. Before I started using stack, I've run into the following situation quite a few times with Text and ByteString:
Couldn't match type ‘T.Text’
with ‘Text’
NB: ‘T.Text’
is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.1’
‘Text’ is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.0’
Expected type: String -> Text
Actual type: String -> T.Text
This is quite frustrating, because in the above example only the patch version is different. The two data types may only be different nominally, and the ADT definition and the underlying memory representation may be completely identical.
As an example, it could have been a minor bugfix to the intersperse function that warranted the release of 1.2.2.1. Which is completely irrelevant to me if all I care about, in this hypothetical example, is concatenating some Texts and comparing their lengths.
Compound types, objects
Sometimes there is sufficient reason to diverge in JS from the built in data types: Take Promises as an example. It's such a useful abstraction over async computations compared to callbacks that many APIs started using them. What now? How come we don't run into many incompatibilities when different versions of these {then(), fail(), ...} objects are being passed up, down and around the dependency tree?
I think it's thanks to the robustness principle.
Be conservative in what you send, be liberal in what you accept.
So if I am authoring a JS library which I know returns promises and takes promises as part of its API, I'll be very careful how I interact with the received objects. E.g. I won't be calling fancy .success(), .finally(), ['catch']() methods on it, since I want to be as compatible as possible with different users, with different implementations of Promises. So, very conservatively, I may just use .then(done, fail), and nothing more. At this point, it doesn't matter if the user uses the promises that my lib returns, or Bluebirds' or even if they hand-write their own, so long as those adhere to the most basic Promise 'laws' -- the most basic API contracts.
Can this still lead to breakage at runtime? Yes, it can. If even the most basic API contract is not fulfilled, you may get an exception saying "Uncaught TypeError: promise.then is not a function". I think the trick here is that library authors are explicit about what their API needs: e.g. a .then method on the supplied object. And then its up to whoever is building on top of that API to make it damn sure that that method is available on the object they pass in.
I'd like to also point out here that this is also the case for Haskell, isn't it? Should I be so foolish as to write an instance for a typeclass that still type-checks without following its laws, I'll get runtime errors, won't I?
Where do we go from here?
Having thought through all this just now, I think we might be able to have the benefits of the robustness principle even in Haskell with much less (or even no(?)) risk for runtime exceptions/errors compared to JavaScript: We just need the typesystem be granular enough so it can distinguish what we want to do with the data we manipulate, and determine if that is still safe or not. E.g. The hypothetical Text example above, I would wager is still safe. And the compiler should only complain if I'm trying to use intersperse, and asks me to qualify it. E.g. with T.intersperse so it can be sure which one I want to use.
How do we do this in practice? Do we need extra support, e.g. language extension flags from GHC? We might not.
Just recently I found bookkeeper, which is a compile-time type-checked anonymous records implementation.
Please note: The following is conjecture on my part, I haven't taken much time to try and experiment with Bookkeeper. But I intend to in my Haskell projects to see if what I write about below could really be achieved with an approach such as this.
With Bookkeeper I could define an API like so:
emptyBook & #then =: id & #fail =: const
:: Bookkeeper.Internal.Book'
'["fail" 'Data.Type.Map.:-> (a -> b -> a),
"then" 'Data.Type.Map.:-> (a1 -> a1)]
Since functions are also first-class values. And whichever API takes this Book as an argument can be very specific what it demands from it: Namely the #then function, and that it has to match a certain type signature. And it cares not for any other function that may or may not be present with whatever signature. All this checked at compile time.
Prelude Bookkeeper
> let f o = (o ?: #foo) "a" "b" in f $ emptyBook & #foo =: (++)
"ab"
Conclusion
Maybe Bookkeeper or something similar will turn out to be useful in my experiments. Maybe Backpack will rush to the rescue with its common interface definitions. Or some other solution comes along. But either way, I hope we can move towards being able to take advantage of the robustness principle. And that Haskell's dependency management can also "just work" most of the time and fail with type errors only when it is truly warranted.
Does the above make sense? Anything unclear? Does it answer your question? I'd be curious to hear.
Further possibly relevant discussion may be found in this /r/haskell reddit thread, where this topic came up just not long ago, and I thought to post this answer to both places.
If I understand well, the supposed problem might be :
Module A
exports = require("c") //v0.1
Module B
console.log(require("a"))
console.log(require("c")) //v0.2
Module C
V0.1
exports = "hello";
V0.2
exports = "world";
By copying C_0.2 in node_modules and C0.1 in node_modules/a/node_modules and creating dummy packages.json, I think I created the case you're talking about.
will B have 2 different conflicting versions of C_data ?
Short answer :
it does. So node does not handle conflicting versions.
The reason you don't see it on the internet is as gustavohenke explained that node naturally does not encourage you to pollute the global scope or chain pass structures between modules.
In other words, it's not often that you'll see a module export another module's structure.
I don't have first-hand experience with this kind of situation in a large JS program, but I would guess that it has to do with the OO style of bundling data together with the functions that act on that data into a single object. Effectively the "ABI" of an object is to pull public methods by name out of a dictionary, and then invoke them by passing the object as the first argument. (Or perhaps the dictionary contains closures that are already partially applied to the object itself; it doesn't really matter.)
In Haskell we do encapsulation at a module level. For example, take a module that defines a type T and a bunch of functions, and exports the type constructor T (but not its definition) and some of the functions. The normal way to use such a module (and the only way that the type system will permit) is to use one exported function create to create a value of type T, and another exported function consume to consume the value of type T: consume (create a b c) x y z.
If I had two different versions of the module with different definitions of T and I was able to use the create from version 1 together with the consume from version 2 then I'd likely get a crash or wrong answer. Note that this is possible even if the public API and externally observable behavior of the two versions is identical; perhaps version 2 has a different representation of T that allows for a more efficient implementation of consume. Of course, GHC's type system stops you from doing this, but there are no such safeguards in a dynamic language.
You can translate this style of programming directly into a language like JavaScript or Python:
import M
result = M.consume(M.create(a, b, c), x, y, z)
and it would have exactly the same kind of problem that you are talking about.
However, it's far more common to use the OO style:
import M
result = M.create(a, b, c).consume(x, y, z)
Note that only create is imported from the module. consume is in a sense imported from the object we got back from create. In your foo/bar/corelib example, let's say that foo (which depends on corelib-1.0) calls create and passes the result to bar (which depends on corelib-2.0) which will call consume on it. Actually, while foo needs a dependency on corelib to call create, bar does not need a dependency on corelib to call consume at all. It's only using the base language notions to invoke consume (what we could spell getattr in Python). In this situation, bar will end up invoking the version of consume from corelib-1.0 regardless of what version of corelib bar "depends on".
Of course for this to work the public API of corelib must not have changed too much between corelib-1.0 and corelib-2.0. If bar wants to use a method fancyconsume which is new in corelib-2.0 then it won't be present on an object created by corelib-1.0. Still, this situation is much better than we had in original Haskell version, where even changes that do not affect the public API at all can cause breakage. And perhaps bar depends on corelib-2.0 features for the objects it creates and consumes itself, but only uses the API of corelib-1.0 to consume objects it receives externally.
To achieve something similar in Haskell, you could use this translation. Rather than directly using the underlying implementation
data TImpl = TImpl ... -- private
create_ :: A -> B -> C -> TImpl
consume_ :: TImpl -> X -> Y -> Z -> R
...
we wrap up the consumer interface with an existential in an API package corelib-api:
module TInterface where
data T = forall a. T { impl :: a,
_consume :: a -> X -> Y -> Z -> R,
... } -- Or use a type class if preferred.
consume :: T -> X -> Y -> Z -> R
consume t = (_consume t) (impl t)
and then the implementation in a separate package corelib:
module T where
import TInterface
data TImpl = TImpl ... -- private
create_ :: A -> B -> C -> TImpl
consume_ :: TImpl -> X -> Y -> Z -> R
...
create :: A -> B -> C -> T
create a b c = T { impl = create_ a b c,
_consume = consume_ }
Now foo uses corelib-1.0 to call create, but bar only needs corelib-api to call consume. The type T lives in corelib-api, so if the public API version does not change, then foo and bar can interoperate even if bar is linked against a different version of corelib.
(I know Backpack has a lot to say about this kind of thing; I'm offering this translation as a way to explain what is happening in the OO programs, not as a style one should seriously adopt.)
Here is a question that mostly answers the same thing: https://stackoverflow.com/a/15948590/2083599
Node.js modules don't pollute the global scope, so when they're required, they'll be private to the module that required them - and this is a great functionality.
When 2 or more packages require different versions of the same lib, NPM will install them for each package, so no conflicts will ever happen.
When they don't, NPM will install only once that lib.
In the other hand, Bower, which is a package manager for the browser, does install only flat dependencies because the libs will go to the global scope, so you can't install jquery 1.x.x and 2.x.x. They'll only export the same jQuery and $ vars.
About the backwards compatibility problems:
All developers do break backwards compatibility at least once! The only difference between Node developers and developers of other platforms is that we have been teached to always use semver.
Considering that most packages out there have not reached v2.0.0 yet, I believe that they have kept the same API in the switch from v0.x.x to v1.0.0.

How to create a diff of two complex data structures?

Problem specification:
I am currently searching for a elegant and/but efficient solution to a problem that i guess is quite common. Consider the following situation:
I defined a fileformat based on a BTree that is defined (in a simplified way) like this:
data FileTree = FileNode [Key] [FileOffset]
| FileLeaf [Key] [Data]
Reading and writing this from a file to a lazy data structure is implemented and works just fine. This will result in a instance of:
data MemTree = MemNode [Key] [MemTree]
| MemLeaf [Key] [Data]
Now my goal is to have a generic function updateFile :: FilePath -> (MemTree -> MemTree) -> IO () that will read in the FileTree and convert it into a MemTree, apply the MemTree -> MemTree function and write back the changes to the tree structure. The problem is that the FileOffsets have to be conserved somehow.
I have two approaches to this problem. Both of them lack in elegance and/or efficiency:
Approach 1: Extend MemTree to contain the offsets
This approach extends the MemTree to contain the offsets:
data MemTree = MemNode [Key] [(MemTree, Maybe FileOffset)]
| MemNode [Key] [Data]
The read function would then read in the FileTree and stores the FileOffset alongside the MemTree reference. Writing will checks if a reference already has an associated offset and if it does it just uses it.
Pros: easy to implement, no overhead to find the offset
Cons: exposes internal to the user who is responsible to set the offset to Nothing
Approach 2: Store offsets in a secondary structure
Another way to attack this problem is to read in the FileTree and create a StableName.Map that holds onto the FileOffsets. That way (and if i understand the semantics of StableName correctly) it should be possible to take the final MemTree and lookup the StableName of each node in the the StableName.Map. If there is an entry the node is clean and doesn't have to be written again.
Pros: doesn't expose the internals to the user
Cons: involves overhead for lookups in the map
Conclusion
These are the two approaches i can think of. The first one should be more efficient, the second one is more pleasant to the eye. I'd like your comments on my ideas, maybe someone even has a better approach in mind?
[Edit] Reasonal
There are two reasons i am searching for a solution like this:
On the one hand you should try to handle errors before they arise by using the type system. The aforementioned user is of course the designer of the next layer in the system (ie me). By working on the pure tree representation some kinds of bugs won't be able to happen. All changes to the tree in the file should be in one place. That should make reasoning easier.
On the other hand i could just implement something like insert :: FilePath -> Key -> Value -> IO () and be done with it. But then i'll lose a very nice trait that comes free when i keep a (kind of a) log by updating the tree in place. Transactions (ie merging of several inserts) are just a matter of working on the same tree in memory and writing just the differences back to the file.
I think that the package Data.Generic.Diff may do exactly what you wanted. It references somebody's thesis for the idea of how it works.
I am very new at Haskell so I won't be showing code, but hopefully my explanation may help for a solution.
First, why not just expose only the MemTree to the user, since that is what they will update, and the FileTree can be kept completely hidden. That way, later, if you want to change this to be going to a database, for example, the user doesn't see any difference.
So, since the FileTree is hidden, why not just read it in when you are going to update, then you have the offsets, so do the update, and close the file again.
One problem with keeping the offsets is that it prevents another program from making any changes to the file, and in your case that may be fine, but I think as a general rule it is a bad design.
The main change, that I see, is that the MemTree shouldn't be lazy, since the file won't be staying open.

Access the configuration parameters through a monad?

Quote from here: http://www.haskell.org/haskellwiki/Global_variables
If you have a global environment,
which various functions read from (and
you might, for example, initialise
from a configuration file) then you
should thread that as a parameter to
your functions (after having, very
likely, set it up in your 'main'
action). If the explicit parameter
passing annoys you, then you can
'hide' it with a Monad.
Now I'm writing something that needs access to configuration parameters and I wonder if someone could point me to a tutorial or any other resource that describes how monads can be used for this purpose. Sorry if this question is stupid, I'm just starting to grok monads. Reading Mike Vainer's tutorial on them now.
The basic idea is that you write code like this:
main = do
parameters <- readConfigurationParametersSomehow
forever $ do
myData <- readUserInput
putStrLn $ bigComplicatedFunction myData parameters
bigComplicatedFunction d params = someFunction params x y z
where x = function1 params d
y = function2 params x d
z = function3 params y
You read the parameters in the "main" function with an IO action, and then pass those parameters to your worker function(s) as an extra argument.
The trouble with this style is that the parameter block has to be passed down to every little function that needs to access it. This is a nuisance. You find that some function ten levels down in the call tree now needs some run-time parameter, and you have to add that run-time parameter as an argument to all the functions in between. This is known as tramp data.
The monad "solution" is to embed the run-time parameter in the Reader Monad, and make all your functions into monadic actions. This gets rid of the explicit tramp data parameter, but replaces it with a monadic type, and under the hood this monad is actually doing the data tramping for you.
The imperative world solves this problem with a global variable. In Haskell you can sort-of do the same thing like this:
parameters = unsafePerformIO readConfigurationParametersSomehow
The first time you use "parameters" the "readConfigurationParametersSomehow" gets executed, and from then on it behaves like a constant value, at least as long as your program is running. This is one of the few righteous uses for unsafePerformIO.
However if you find yourself needing such a solution then you really need to have a think about your design. Odds are you are not thinking hard enough about generalising your functions lower down; if some previously pure function suddenly needs a run-time parameter then look at the reason and see if you can exploit higher order functions in some way. For instance:
Pass down a function built using the parameter rather than the parameter itself.
Have the worker function at the bottom return a function as a result, which gets
passed up to be composed with a parameter-based function at the higher level.
Refactor your call stack so that fundamental operations are done by lower level
primitives at the bottom which are composed in a parameter-dependent way at the top.
Either way is going to involve

Ordering of parameters to make use of currying

I have twice recently refactored code in order to change the order of parameters because there was too much code where hacks like flip or \x -> foo bar x 42 were happening.
When designing a function signature what principles will help me to make the best use of currying?
For languages that support currying and partial-application easily, there is one compelling series of arguments, originally from Chris Okasaki:
Put the data structure as the last argument
Why? You can then compose operations on the data nicely. E.g. insert 1 $ insert 2 $ insert 3 $ s. This also helps for functions on state.
Standard libraries such as "containers" follow this convention.
Alternate arguments are sometimes given to put the data structure first, so it can be closed over, yielding functions on a static structure (e.g. lookup) that are a bit more concise. However, the broad consensus seems to be that this is less of a win, especially since it pushes you towards heavily parenthesized code.
Put the most varying argument last
For recursive functions, it is common to put the argument that varies the most (e.g. an accumulator) as the last argument, while the argument that varies the least (e.g. a function argument) at the start. This composes well with the data structure last style.
A summary of the Okasaki view is given in his Edison library (again, another data structure library):
Partial application: arguments more likely to be static usually appear before other arguments in order to facilitate partial application.
Collection appears last: in all cases where an operation queries a single collection or modifies an existing collection, the collection argument will appear last. This is something of a de facto standard for Haskell datastructure libraries and lends a degree of consistency to the API.
Most usual order: where an operation represents a well-known mathematical function on more than one datastructure, the arguments are chosen to match the most usual argument order for the function.
Place the arguments that you are most likely to reuse first. Function arguments are a great example of this. You are much more likely to want to map f over two different lists, than you are to want to map many different functions over the same list.
I tend to do what you did, pick some order that seems good and then refactor if it turns out that another order is better. The order depends a lot on how you are going to use the function (naturally).

Resources