Building a dependency graph in real time

Building a dependency graph in real time - dependency-management

A component of my system basically needs to build a dependency graph based on user input; there's gotta be a common way / library for doing this, I'm just unable to determine what it is. Example below (names / classes changed to protect the innocent):
User asks if a recipe has a cherry flavor, a potato flavor, and / or a mint flavor. We have various paths for the dependencies of each flavor (cherry goes recipeID -> A -> B -> C -> true/false, mint goes recipeID -> A -> D -> true/false etc). The idea is that whatever data source is called in A should only be called once. For simple cases, it's easy to come up with a solution, but as things get added like multiple inputs for a step (e.g. a diamond dependency) or needing to do batch calls (recipeID 1 for cherry, recipeID 2 for mint, should make a batch call to A(1,2) ) things get a lot more complicated.

Related

Can I build an Event based on a Behavior change in reactive-banana?

The goal here is to build an Event that triggers whenever a Behavior changes under some observation.
To make it a bit more concrete, let's say I have:
a bFoo :: Behavior a, which I like to think it as a time-varied state.
a function diffA :: a -> a -> Maybe b to tell whether certain part of value of bFoo has been changed (together with a bit more info indicating what has changed, or leaving me some room for avoiding boolean blindness)
and now I want to build an Event eChanged :: Event b that triggers if and only if we have a Just _ coming out from diffA fooBefore fooAfter.
Here comes the problem: it seems the framework only allows you to look at one Behavior value at the Moment. So I don't have access two values to compare against each other. I've been looking at Reactive.Banana.Combinators but still doesn't have much idea about whether it's possible.
Few more bits and pieces in case those are helpful or spark any new ideas:
Event-based rather than Behavior-based?
The first thing I can think of is simply to allow myself to have access to more details:
bFoo is accumB-ed from an Event eUpdate, but eUpdate doesn't necessarily update anything. So yes, we do have access to eFoo :: Event a if we want to, but I want to avoid using it as I think it as an implementation detail of bFoo.
A Failed Attempt
My current attempt looks like:
networkDesc :: MomemtIO ()
networkDesc = do
...
foo <- valueB bFoo
let eChanged = filterJust $ fmap (diffA foo) eUpdate
...
This doesn't work as foo is not changed over time.
Last resort
My last resort is to reactimate on eFoo, do the comparison outside of the framework and feed result back into the network - sounds doable but that defeats the purpose of having this network to take care of all the logic.

Why does NPM's policy of duplicated dependencies work?

By default, when I use NPM to manage a package depending on foo and bar, both of which depend on corelib, by default, NPM will install corelib twice (once for foo, and once for bar). They might even be different versions.
Now, let's suppose that corelib defined some data structure (e.g. a URL object) which is passed between foo, bar and the main application. Now, what I would expect, is if there was ever a backwards incompatible change to this object (e.g. one of the field names changed), and foo depended on corelib-1.0 and bar depended on corelib-2.0, I'd be a very sad panda: bar's version of corelib-2.0 might see a data structure created by the old version of corelib-1.0 and things would not work very well.
I was really surprised to discover that this situation basically never happens (I trawled Google, Stack Overflow, etc, looking for examples of people whose applications had stopped working, but who could have fixed it by running dedupe.) So my question is, why is this the case? Is it because node.js libraries never define data structures that are shared outside of the programmers? Is it because node.js developers never break backwards compatibility of their data structures? I'd really like to know!

this situation basically never happens
Yes, my experience is indeed that that is not a problem in the Node/JS ecosystem. And I think it is, in part, thanks to the robustness principle.
Below is my view on why and how.
Primitives, the early days
I think the first and foremost reason is that the language provides a common basis for primitive types (Number, String, Bool, Null, Undefined) and some basic compound types (Object, Array, RegExp, etc...).
So if I receive a String from one of the libs' APIs I use, and pass it to another, it cannot go wrong because there is just a single String type.
This is what used to happen, and still happens to some extent to this day: Library authors try to rely on the built-ins as much as possible and only diverge when there is sufficient reason to, and with sufficient care and thought.
Not so in Haskell. Before I started using stack, I've run into the following situation quite a few times with Text and ByteString:
Couldn't match type ‘T.Text’
with ‘Text’
NB: ‘T.Text’
is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.1’
‘Text’ is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.0’
Expected type: String -> Text
Actual type: String -> T.Text
This is quite frustrating, because in the above example only the patch version is different. The two data types may only be different nominally, and the ADT definition and the underlying memory representation may be completely identical.
As an example, it could have been a minor bugfix to the intersperse function that warranted the release of 1.2.2.1. Which is completely irrelevant to me if all I care about, in this hypothetical example, is concatenating some Texts and comparing their lengths.
Compound types, objects
Sometimes there is sufficient reason to diverge in JS from the built in data types: Take Promises as an example. It's such a useful abstraction over async computations compared to callbacks that many APIs started using them. What now? How come we don't run into many incompatibilities when different versions of these {then(), fail(), ...} objects are being passed up, down and around the dependency tree?
I think it's thanks to the robustness principle.
Be conservative in what you send, be liberal in what you accept.
So if I am authoring a JS library which I know returns promises and takes promises as part of its API, I'll be very careful how I interact with the received objects. E.g. I won't be calling fancy .success(), .finally(), ['catch']() methods on it, since I want to be as compatible as possible with different users, with different implementations of Promises. So, very conservatively, I may just use .then(done, fail), and nothing more. At this point, it doesn't matter if the user uses the promises that my lib returns, or Bluebirds' or even if they hand-write their own, so long as those adhere to the most basic Promise 'laws' -- the most basic API contracts.
Can this still lead to breakage at runtime? Yes, it can. If even the most basic API contract is not fulfilled, you may get an exception saying "Uncaught TypeError: promise.then is not a function". I think the trick here is that library authors are explicit about what their API needs: e.g. a .then method on the supplied object. And then its up to whoever is building on top of that API to make it damn sure that that method is available on the object they pass in.
I'd like to also point out here that this is also the case for Haskell, isn't it? Should I be so foolish as to write an instance for a typeclass that still type-checks without following its laws, I'll get runtime errors, won't I?
Where do we go from here?
Having thought through all this just now, I think we might be able to have the benefits of the robustness principle even in Haskell with much less (or even no(?)) risk for runtime exceptions/errors compared to JavaScript: We just need the typesystem be granular enough so it can distinguish what we want to do with the data we manipulate, and determine if that is still safe or not. E.g. The hypothetical Text example above, I would wager is still safe. And the compiler should only complain if I'm trying to use intersperse, and asks me to qualify it. E.g. with T.intersperse so it can be sure which one I want to use.
How do we do this in practice? Do we need extra support, e.g. language extension flags from GHC? We might not.
Just recently I found bookkeeper, which is a compile-time type-checked anonymous records implementation.
Please note: The following is conjecture on my part, I haven't taken much time to try and experiment with Bookkeeper. But I intend to in my Haskell projects to see if what I write about below could really be achieved with an approach such as this.
With Bookkeeper I could define an API like so:
emptyBook & #then =: id & #fail =: const
:: Bookkeeper.Internal.Book'
'["fail" 'Data.Type.Map.:-> (a -> b -> a),
"then" 'Data.Type.Map.:-> (a1 -> a1)]
Since functions are also first-class values. And whichever API takes this Book as an argument can be very specific what it demands from it: Namely the #then function, and that it has to match a certain type signature. And it cares not for any other function that may or may not be present with whatever signature. All this checked at compile time.
Prelude Bookkeeper
> let f o = (o ?: #foo) "a" "b" in f $ emptyBook & #foo =: (++)
"ab"
Conclusion
Maybe Bookkeeper or something similar will turn out to be useful in my experiments. Maybe Backpack will rush to the rescue with its common interface definitions. Or some other solution comes along. But either way, I hope we can move towards being able to take advantage of the robustness principle. And that Haskell's dependency management can also "just work" most of the time and fail with type errors only when it is truly warranted.
Does the above make sense? Anything unclear? Does it answer your question? I'd be curious to hear.
Further possibly relevant discussion may be found in this /r/haskell reddit thread, where this topic came up just not long ago, and I thought to post this answer to both places.

If I understand well, the supposed problem might be :
Module A
exports = require("c") //v0.1
Module B
console.log(require("a"))
console.log(require("c")) //v0.2
Module C
V0.1
exports = "hello";
V0.2
exports = "world";
By copying C_0.2 in node_modules and C0.1 in node_modules/a/node_modules and creating dummy packages.json, I think I created the case you're talking about.
will B have 2 different conflicting versions of C_data ?
Short answer :
it does. So node does not handle conflicting versions.
The reason you don't see it on the internet is as gustavohenke explained that node naturally does not encourage you to pollute the global scope or chain pass structures between modules.
In other words, it's not often that you'll see a module export another module's structure.

I don't have first-hand experience with this kind of situation in a large JS program, but I would guess that it has to do with the OO style of bundling data together with the functions that act on that data into a single object. Effectively the "ABI" of an object is to pull public methods by name out of a dictionary, and then invoke them by passing the object as the first argument. (Or perhaps the dictionary contains closures that are already partially applied to the object itself; it doesn't really matter.)
In Haskell we do encapsulation at a module level. For example, take a module that defines a type T and a bunch of functions, and exports the type constructor T (but not its definition) and some of the functions. The normal way to use such a module (and the only way that the type system will permit) is to use one exported function create to create a value of type T, and another exported function consume to consume the value of type T: consume (create a b c) x y z.
If I had two different versions of the module with different definitions of T and I was able to use the create from version 1 together with the consume from version 2 then I'd likely get a crash or wrong answer. Note that this is possible even if the public API and externally observable behavior of the two versions is identical; perhaps version 2 has a different representation of T that allows for a more efficient implementation of consume. Of course, GHC's type system stops you from doing this, but there are no such safeguards in a dynamic language.
You can translate this style of programming directly into a language like JavaScript or Python:
import M
result = M.consume(M.create(a, b, c), x, y, z)
and it would have exactly the same kind of problem that you are talking about.
However, it's far more common to use the OO style:
import M
result = M.create(a, b, c).consume(x, y, z)
Note that only create is imported from the module. consume is in a sense imported from the object we got back from create. In your foo/bar/corelib example, let's say that foo (which depends on corelib-1.0) calls create and passes the result to bar (which depends on corelib-2.0) which will call consume on it. Actually, while foo needs a dependency on corelib to call create, bar does not need a dependency on corelib to call consume at all. It's only using the base language notions to invoke consume (what we could spell getattr in Python). In this situation, bar will end up invoking the version of consume from corelib-1.0 regardless of what version of corelib bar "depends on".
Of course for this to work the public API of corelib must not have changed too much between corelib-1.0 and corelib-2.0. If bar wants to use a method fancyconsume which is new in corelib-2.0 then it won't be present on an object created by corelib-1.0. Still, this situation is much better than we had in original Haskell version, where even changes that do not affect the public API at all can cause breakage. And perhaps bar depends on corelib-2.0 features for the objects it creates and consumes itself, but only uses the API of corelib-1.0 to consume objects it receives externally.
To achieve something similar in Haskell, you could use this translation. Rather than directly using the underlying implementation
data TImpl = TImpl ... -- private
create_ :: A -> B -> C -> TImpl
consume_ :: TImpl -> X -> Y -> Z -> R
...
we wrap up the consumer interface with an existential in an API package corelib-api:
module TInterface where
data T = forall a. T { impl :: a,
_consume :: a -> X -> Y -> Z -> R,
... } -- Or use a type class if preferred.
consume :: T -> X -> Y -> Z -> R
consume t = (_consume t) (impl t)
and then the implementation in a separate package corelib:
module T where
import TInterface
data TImpl = TImpl ... -- private
create_ :: A -> B -> C -> TImpl
consume_ :: TImpl -> X -> Y -> Z -> R
...
create :: A -> B -> C -> T
create a b c = T { impl = create_ a b c,
_consume = consume_ }
Now foo uses corelib-1.0 to call create, but bar only needs corelib-api to call consume. The type T lives in corelib-api, so if the public API version does not change, then foo and bar can interoperate even if bar is linked against a different version of corelib.
(I know Backpack has a lot to say about this kind of thing; I'm offering this translation as a way to explain what is happening in the OO programs, not as a style one should seriously adopt.)

Here is a question that mostly answers the same thing: https://stackoverflow.com/a/15948590/2083599
Node.js modules don't pollute the global scope, so when they're required, they'll be private to the module that required them - and this is a great functionality.
When 2 or more packages require different versions of the same lib, NPM will install them for each package, so no conflicts will ever happen.
When they don't, NPM will install only once that lib.
In the other hand, Bower, which is a package manager for the browser, does install only flat dependencies because the libs will go to the global scope, so you can't install jquery 1.x.x and 2.x.x. They'll only export the same jQuery and $ vars.
About the backwards compatibility problems:
All developers do break backwards compatibility at least once! The only difference between Node developers and developers of other platforms is that we have been teached to always use semver.
Considering that most packages out there have not reached v2.0.0 yet, I believe that they have kept the same API in the switch from v0.x.x to v1.0.0.

How to check the availability of, and load, dynamic libraries in FORTRAN

I have written a FORTRAN library "B" which, depending on the way it is called, may or may not call routines in another library "C". The intent is for "B" to be used in applications "A".
So far, B and C are compiled as static libraries (.a files).
This means that C.a must be available and linked to when compiling B.a, which is okay.
This also means that C.a must be available when compiling the application A, even if A has no intention of using the functionality in B which depends on C. This is annoying and seems unnecessary, as one has to distribute C.a to users who will never use it.
Ideally I would want to have C as a dynamic/shared library, and in B do some run-time availability-check like this (pseudo-code):
if (requested feature from C)
if (is_available(libC.so))
call routine_from_C()
(Go on...)
else
call Error("You need to install C")
else
(We don't need C. Go on...)
Is something like this possible with FORTRAN on Linux?

saving intermediate steps in gremlin

I am writing a query which should detect certain loops within a graph, which means that I need I need to assign names to certain nodes within the path so that I can compare nodes later in the path with the saved ones. for example A -> B -> C -> A. Is this possible within gremlin?

It sounds like you're looking for something like this:
https://github.com/tinkerpop/gremlin/wiki/Except-Retain-Pattern
where you keep a list of previously traversed vertices and then utilize that list later in the traversal.

How to create a diff of two complex data structures?

Problem specification:
I am currently searching for a elegant and/but efficient solution to a problem that i guess is quite common. Consider the following situation:
I defined a fileformat based on a BTree that is defined (in a simplified way) like this:
data FileTree = FileNode [Key] [FileOffset]
| FileLeaf [Key] [Data]
Reading and writing this from a file to a lazy data structure is implemented and works just fine. This will result in a instance of:
data MemTree = MemNode [Key] [MemTree]
| MemLeaf [Key] [Data]
Now my goal is to have a generic function updateFile :: FilePath -> (MemTree -> MemTree) -> IO () that will read in the FileTree and convert it into a MemTree, apply the MemTree -> MemTree function and write back the changes to the tree structure. The problem is that the FileOffsets have to be conserved somehow.
I have two approaches to this problem. Both of them lack in elegance and/or efficiency:
Approach 1: Extend MemTree to contain the offsets
This approach extends the MemTree to contain the offsets:
data MemTree = MemNode [Key] [(MemTree, Maybe FileOffset)]
| MemNode [Key] [Data]
The read function would then read in the FileTree and stores the FileOffset alongside the MemTree reference. Writing will checks if a reference already has an associated offset and if it does it just uses it.
Pros: easy to implement, no overhead to find the offset
Cons: exposes internal to the user who is responsible to set the offset to Nothing
Approach 2: Store offsets in a secondary structure
Another way to attack this problem is to read in the FileTree and create a StableName.Map that holds onto the FileOffsets. That way (and if i understand the semantics of StableName correctly) it should be possible to take the final MemTree and lookup the StableName of each node in the the StableName.Map. If there is an entry the node is clean and doesn't have to be written again.
Pros: doesn't expose the internals to the user
Cons: involves overhead for lookups in the map
Conclusion
These are the two approaches i can think of. The first one should be more efficient, the second one is more pleasant to the eye. I'd like your comments on my ideas, maybe someone even has a better approach in mind?
[Edit] Reasonal
There are two reasons i am searching for a solution like this:
On the one hand you should try to handle errors before they arise by using the type system. The aforementioned user is of course the designer of the next layer in the system (ie me). By working on the pure tree representation some kinds of bugs won't be able to happen. All changes to the tree in the file should be in one place. That should make reasoning easier.
On the other hand i could just implement something like insert :: FilePath -> Key -> Value -> IO () and be done with it. But then i'll lose a very nice trait that comes free when i keep a (kind of a) log by updating the tree in place. Transactions (ie merging of several inserts) are just a matter of working on the same tree in memory and writing just the differences back to the file.

I think that the package Data.Generic.Diff may do exactly what you wanted. It references somebody's thesis for the idea of how it works.

I am very new at Haskell so I won't be showing code, but hopefully my explanation may help for a solution.
First, why not just expose only the MemTree to the user, since that is what they will update, and the FileTree can be kept completely hidden. That way, later, if you want to change this to be going to a database, for example, the user doesn't see any difference.
So, since the FileTree is hidden, why not just read it in when you are going to update, then you have the offsets, so do the update, and close the file again.
One problem with keeping the offsets is that it prevents another program from making any changes to the file, and in your case that may be fine, but I think as a general rule it is a bad design.
The main change, that I see, is that the MemTree shouldn't be lazy, since the file won't be staying open.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string