First an explanation: I'm implementing a package manager emulator in java, i.e. a package manager that just outputs the right things but doesn't actually install anything. The thing is I need to search through every package and all it's dependencies to see if they are all installed correctly. I've been told that it is possible to get worst-case running time equal to log(N)(N+M), where N is amount of packages and M is the amount of dependencies. I believe it is some kind of BFS implementation that needs to be done, maybe just do a BFS to each package and it's dependencies, so if I get to a package that has been checked through some other package's dependencies it will just skip over. This however, in my head, does not seem to have the right worst-case running time.
Does anyone have any idea on how to achieve getting the right running time?
I think that this can be done by a topological sort and is often done by piggybacking off of DFS, not BFS. That has a linear time of O(M + N).
Related
Every time I start a new poetry project, I have to go through a tedious process of listing dependencies. These include:
poetry add every dependency one by one even though it's already listed in my import block
Source diving to figure out the actual minimum version of the package given the minimal functionality I use
Going down the rabbit hole of CPython code to figure out the minimum version of Python
I don't really like the Poetry approach of just requiring whatever version I have installed. As a developer, I tend to install bleeding edge version of packages and Python, which many of my users don't have. I then get annoying bug reports that come down to "the python version is wrong" but the user is often very confused by the error messages. The process of finding minimum dependency versions is typically not very complicated, it's just tedious and not scalable.
Surely there is a tool out there that can do some static analysis and get me started with a basic dependency list? I understand that a perfect solution would likely be a lot of work, but a partial solution would be good enough for me. So long as it takes care of the bulk of the tedious work, I don't mind dealing with the handful of remaining corner cases by hand.
PyCharm seems able to at least compare the package names in requirements.txt to my imports. Unfortunately this doesn't work for the poetry dependencies, not even with the Poetry Pycharm Plugin installed.
How does one find and understand excess data dependencies in a Haskell program so that one is able to eliminate them?
I once used ghc-vis to investigate data dependencies in a Haskell program but since Stack has moved on such that ghc-vis no longer installs in unison with most current development it's no longer an option and I wonder what do people use these days instead.
Try to fix ghc-vis (or actually, its dependencies).
From the logs you reported on the ghc-vis issue tracker https://github.com/def-/ghc-vis/issues/24, the errors all belong to these two categories, neither of which requires expertise specific to the broken packages, so you should be able to fix them yourself, that's the beauty of open source:
Failed to load interface... There are files missing: this might be related to your Haskell distribution. How did you install Haskell? For example Haskell packages on Arch are dynamically linked: https://wiki.archlinux.org/index.php/Haskell
Ambiguous occurence: at least one package you depend on exports a name which clashes with the actually intended name. Look at the broken package and fix its version bounds or fix its imports.
At this point, the problems you are encountering have little to do with ghc-vis, but with wl-pprint-text, polyparse, and cairo.
What is the best python based multinest package that optimizes for multi processing with concurrent.futures?
I've had issues getting multicast to use all of my CPUs with anything but multiprocessing.pool; but the python multinest operations seem to not be able to use that.
On the github issues section for dynesty (one of the two most common, pure-python MultiNest), we discussed this is as well
https://github.com/joshspeagle/dynesty/issues/100
There was not a very settled, final explanation, but the thought is that
(1) The cost function is not large enough to require all of the cores at once
(2) The bootstrap flag should be set to 0 to avoid bootstrapping; it's a trick implemented for speed that seems to be interfering.
I've used Nestle (github.com/kbarbary/nestle) and Dynesty (github.com/joshspeagle/dynesty); they both seem to have this problem no matter the complexity of the cost function.
I have had great success using PyMultiNest (github.com/JohannesBuchner/PyMultiNest); but it requires the fortran version of MultiNest (github.com/JohannesBuchner/MultiNest), which is very difficult to install correctly -- need to manually install OpenMPI. Both MultiNest and OpenMPI can have compiler issues depending on the OS, system, and configuration thereof.
I would suggest using PyMultiNest, except that it's so hard to install; Using Dynesty and Nestle are trivial; but they have had this issue with full parallelizations.
I am having a hard time optimizing a program that is relying on ads conjugateGradientDescent function for most of it's work.
Basically my code is a translation of an old papers code that is written in Matlab and C. I have not measured it, but that code is running at several iterations per second. Mine is in the order of minutes per iteration ...
The code is available in this repositories:
https://github.com/fhaust/aer
https://github.com/fhaust/aer-utils
The code in question can be run by following these commands:
$ cd aer-utils
$ cabal sandbox init
$ cabal sandbox add-source ../aer
$ cabal run learngabors
Using GHCs profiling facilities I have confirmed that the descent is in fact the part that is taking most of the time:
(interactive version here: https://dl.dropboxusercontent.com/u/2359191/learngabors.svg)
-s is telling me that productivity is quite low:
Productivity 33.6% of total user, 33.6% of total elapsed
From what I have gathered there are two things that might lead to higher performance:
Unboxing: currently I use a custom matrix implementation (in src/Data/SimpleMat.hs). This was the only way I could get ad to work with matrices (see: How to do automatic differentiation on hmatrix?). My guess is that by using a matrix type like newtype Mat w h a = Mat (Unboxed.Vector a) would achieve better performance due to unboxing and fusion. I found some code that has ad instances for unboxed vectors, but up to now I haven't been able to use these with the conjugateGradientFunction.
Matrix derivatives: In an email I just can't find at the moment Edward mentions that it would be better to use Forward instances for matrix types instead of having matrices filled with Forward instances. I have a faint idea how to achieve that, but have yet to figure out how I'd implement it in terms of ads type classes.
This is probably a question that is too wide to be answered on SO, so if you are willing to help me out here, feel free to contact me on Github.
You are running into pretty much the worst-case scenario for the current ad library here.
FWIW- You won't be able to use the existing ad classes/types with "matrix/vector ad". It'd be a fairly large engineering effort, see https://github.com/ekmett/ad/issues/2
As for why you can't unbox: conjugateGradient requires the ability to use Kahn mode or two levels of forward mode on your functions. The former precludes it from working with unboxed vectors, as the data types carry syntax trees, and can't be unboxed. For various technical reasons I haven't figured out how to make it work with a fixed sized 'tape' like the standard Reverse mode.
I think the "right" answer here is for us to sit down and figure out how to get matrix/vector AD right and integrated into the package, but I confess I'm timesliced a bit too thinly right now to give it the attention it deserves.
If you get a chance to swing by #haskell-lens on irc.freenode.net I'd happy to talk about designs in this space and offer advice. Alex Lang has also been working on ad a lot and is often present there and may have ideas.
I have the following instance of cabal hell:
(with ghc-7.8.3 built from source on x86_64 GNU/Linux,
and user-install: True in .cabal/config)
1) at some time, transformers-0.4.0.0 was installed (in user space, shadowing (?) transformers-0.3 from the global space)
2) later, several libraries pick transformers-0.4
3) then, I install hint, which depends on ghc, which depends on transformers-0.3, and which cannot be changed, since ghc is hard-wired.
result: I cannot use libraries from 2) and hint in one project.
As a work-around, I am putting constraint: transformers installed in .cabal/config, and rebuild. Is there a better way to handle this situation - or to avoid it in the first place?
Is there a better way to handle this situation.
No, your approach is sensible.
or to avoid it in the first place?
Tricky. Most people do not build stuff depending on ghc, so for them it makes sense to upgrade transformers etc. Therefore, your constraint is not a suitable default.
As Zeta writes: Sandboxes can help. If you had used sandboxes for your installations in (2), and used another sandbox for whatever tries to use both hint and (2), then it would simply build these dependencies dedicated for whatever you are building.
This comes at the expense of not sharing any space or build-time between the various things you are doing.