Leave free access to internal intermediary functions in Haskell library?

Leave free access to internal intermediary functions in Haskell library? - haskell

I'm writing a numerical optimisation library in Haskell, with the aim of making functions like a gradient descent algorithm available for users of the library. In writing these relatively complex functions, I write intermediary functions, such as a function that performs just one step of gradient descent. Some of these intermediary functions perform tasks that no user of the library could ever have need for. Some are even quite cryptic, but make sense when used by a bigger function.
Is it common practice to leave these intermediary functions available to library users? I have considered moving these to an "Internal" library, but moving small functions into a whole different library from the main functions using them seems like a bad idea for code legibility. I'd also quite like to test these smaller functions as well as the main functions for debugging purposes down the line - and ideally would like to test both in the same place, so that complicates things even more.
I'm unsurprisingly using Cabal for the library so answers in that context as well would be helpful if that's easier.

You should definitely not just throw such internal functions in the export of your package's trunk module, together with the high-level ones. It makes the interface/haddocks hard to understand, and also poses problems if users come to depend on low-level details that may easily change in future releases.
So I would keep these functions in an “internal” module, which the “public” module imports but only re-exports those that are indended to be used:
Public
module Numeric.Hegash.Optimization (optimize) where
import Numeric.Hegash.Optimization.Internal
Private
module Numeric.Hegash.Optimization.Internal where
gradientDesc :: ...
gradientDesc = ...
optimize :: ...
optimize = ... gradientDesc ...
A more debatable matter is whether you should still allow users to load the Internal module, i.e. whether you should put it in the exposed-modules or other-modules section of your .cabal file. IMO it's best to err on the “exposed” side, because there could always be valid use cases that you didn't foresee. It also makes testing easier. Just ensure you clearly document that the module is unstable. Only functions that are so deeply in the implementation details that they are basically impossible to use outside of the module should not be exposed at all.

You can selectively export functions from a module by listing them in the header. For example, if you have functions gradient and gradient1 and only want to export the former, you can write:
module Gradient (gradient) where
You can also incorporate the intermediary functions into their parent functions using where to limit the scope to just the parent function. This will also prevent the inner function from being exported:
gradient ... =
...
where
gradient1 ... = ...

Related

Why does Rust promote use statements with explicit imports?

I see a lot of Rust code where the use statements look like this:
use std::io::net::ip::{SocketAddr, Ipv4Addr};
The way I get it, this restricts the use statement to only import SocketAddr and Ipv4Addr.
Looking from the perspective of languages such as Java or C# this feels odd as with such languages an import statement always imports all public types.
I figured one can have the same in Rust using this statement.
use std::io::net::ip::*;
The only reason I could see for the explicit naming would be to avoid conflicts where two different imports would contain public APIs with the same names. However, this could be worked around with aliasing so I wonder if there's another advantage of the more strict "import only what is needed" approach?

Rust is in this inspired by Python, which has a similar principle: importing is all explicit, and though glob imports (use x::* in Rust, from x import * in Python) are supported, they are not generally recommended.
This philosophy does have some practical impacts; calling a trait method, for example, can only be done if the trait is in scope, and so calling a trait method when there are name collisions in the imported traits is rather difficult (this will be improved in the future with Uniform Function Call Syntax, where you can call Trait::function(self) rather than just self.function()).
Mostly, though, it is something that is well expressed in the Zen of Python: "explicit is better than implicit". When vast swathes of things are in scope, it can be difficult to see what came from where, and intimate knowledge of the module structure, and/or tooling becomes quite important; if it is all explicit, tooling is largely unnecessary and working with files by hand in a simple text editor is entire feasible. Sure, tooling will still be able to assist, but it is not as necessary.
This is why Rust has adopted Python's explicit importing philosophy.

Mailing list thread asking why glob imports are not preferred
in which Huon writes
Certain aspects of them dramatically complicate the name resolution
algorithm (as I understand it), and, anyway, they have various downsides
for the actual code, e.g. the equivalent in Python is frowned upon:
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#importing
Maybe they aren't so bad in a compiled & statically typed language? I
don't know; either way, I personally find code without glob imports
easier to read, because I can work out which function is being called
very easily, whereas glob imports require more effort.
Issue to remove glob imports
where nikomatsakis writes
I think glob imports have their place. For example, in the borrow
checker, there is a shallow tree of submodules that all make use of
data types defined in borrowck/mod.rs. It seems rather tedious and
silly to require manual importing of all these names, vs just writing
use rustc::middle::borrowck::*. I know that there are complications
in the resolution algorithm, though, and I would be more amenable to
an argument based on fundamental challenges there.
This then moved to RFC 305 which was rejected by steveklabnik without comment on whether they're good style:
Glob imports are now stable, and so I'm going to give this a close.

Global import is a convenience for developers during development stage, import everything all at once because at this stage importing each items one by one may be cumbersome. They are exported under prelude module specifically designed for this purpose.
Once you finished your logic, replace glob imports with explicit ones because glob imports are hard to trace, requires extra effort to find what comes from where.
They are referred as wildcard imports in Rust.
They are frowned upon because:
can pollute the namespace
can cause clashes and lead to confusing errors
macro related issues around tooling
I don't think it has anything to do with Python.
Arguments for their deprecation has strong merits but deprecating wildcard imports may not be feasible and can impacts developer productivity but one can set clippy lints.
warn-on-all-wildcard-imports: bool: Whether to allow certain wildcard imports (prelude, super in tests). (defaults to false).
Read more https://rust-lang.github.io/rust-clippy/master/index.html#wildcard_imports

Requiring explicit imports enables code readers to see quickly and precisely which thing a local symbol corresponds to. This often helps them to navigate directly to that code.
use somelib:*;
use otherlib:*;
// Where does SomeType come from? Good luck finding it in a large
// project if you're new.
fn is_brassy(input: SomeType) {
return input.has_brass;
}
Also, wildcard imports can hide undeclared dependencies. If some code doesn't declare its dependencies, but it is always imported along with its dependencies in a wildcard import, the code will run fine. But readers looking at that code file in isolation may be confused about where the symbol comes from.
(The following code tries to demonstrate the issue in Python-like pseudo-code because I’m more proficient in expressing my thoughts using Python, even though the actual time I’ve seen this was with Rust macros.)
# helpers
def helpful_helper(x):
return x + ' world'
# helpless
def needs_help(x):
# helpful_helper not explicitly imported!
return helpful_helper(x)
# lib
from helpers import *
from helpless import *
# app
from lib import *
needs_help('hello')
A different dependency might even be used if different imports were used together.
# other_app
from other_helpers import *
from helpless import *
# what does it do? Better look at other_helpers.py...
needs_help('hello')

Benefit of importing specific parts of a Haskell module

Except from potential name clashes -- which can be got around by other means -- is there any benefit to importing only the parts from a module that you need:
import SomeModule (x, y, z)
...verses just importing all of it, which is terser and easier to maintain:
import SomeModule
Would it make the binary smaller, for instance?

Name clashes and binary size optimization are just two of the benefits you can get. Indeed, it is a good practice to always identify what you want to get from the outside world of your code. So, whenever people look at your code they will know what exactly your code requesting.
This also gives you a very good chance to creat mocking solutions for test, since you can work through the list of imports and write mockings for them.
Unfortunately, in Haskell the type class instances are not that easy. They are imported implicitly and so can creates conflicts, also they may makes mocking harder, since there is no way to specify specific class instances only. Hopefully this can be fixed in future versions of Haskell.
UPDATE
The benifits I listed above (code maintenance and test mocking) are not limited to Haskell. Actually, it is also common practice in Java, as I know. In Java you can just import a single class, or even a single static variable/method. Unfortunately again, you still cannot selectively import member functions.

No, it's only for the purpose of preventing name clashes. The other mechanism for preventing name clashes - namely import qualified - results in more verbose (less readable) code.
It wouldn't make the binary smaller - consider that functions in a given module all reference each other, usually, so they need to be compiled together.

How, why and when to use the ".Internal" modules pattern?

I've seen a couple of package on hackage which contain module names with .Internal as their last name component (e.g. Data.ByteString.Internal)
Those modules are usually not properly browsable (but they may show up nevertheless) in Haddock and should not be used by client code, but contain definitions which are either re-exported from exposed modules or just used internally.
Now my question(s) to this library organization pattern are:
What problem(s) do those .Internal modules solve?
Are there other preferable ways to workaround those problems?
Which definitions should be moved to those .Internal modules?
What's the current recommended practice with respect to organizing libraries with the help of such .Internal modules?

Internal modules are generally modules that expose the internals of a package, that break package encapsulation.
To take ByteString as an example: When you normally use ByteStrings, they are used as opaque data types; a ByteString value is atomic, and its representation is uninteresting. All of the functions in Data.ByteString take values of ByteString, and never raw Ptr CChars or something.
This is a good thing; it means that the ByteString authors managed to make the representation abstract enough that all the details about the ByteString can be hidden completely from the user. Such a design leads to encapsulation of functionality.
The Internal modules are for people that wish to work with the internals of an encapsulated concept, to widen the encapsulation.
For example, you might want to make a new BitString data type, and you want users to be able to convert a ByteString into a BitString without copying any memory. In order to do this, you can't use opaque ByteStrings, because that doesn't give you access to the memory that represents the ByteString. You need access to the raw memory pointer to the byte data. This is what the Internal module for ByteStrings provides.
You should then make your BitString data type encapsulated as well, thus widening the encapsulation without breaking it. You are then free to provide your own BitString.Internal module, exposing the innards of your data type, for users that might want to inspect its representation in turn.
If someone does not provide an Internal module (or similar), you can't gain access to the module's internal representation, and the user writing e.g. BitString is forced to (ab)use things like unsafeCoerce to cast memory pointers, and things get ugly.
The definitions that should be put in an Internal module are the actual data declarations for your data types:
module Bla.Internal where
data Bla = Blu Int | Bli String
-- ...
module Bla (Bla, makeBla) where -- ONLY export the Bla type, not the constructors
import Bla.Internal
makeBla :: String -> Bla -- Some function only dealing with the opaque type
makeBla = undefined

#dflemstr is right, but not explicit about the following point. Some authors put internals of a package in a .Internal module and then don't expose that module via cabal, thereby making it inaccessible to client code. This is a bad thing1.
Exposed .Internal modules help to communicate different levels of abstraction implemented by a module. The alternatives are:
Expose implementation details in the same module as the abstraction.
Hide implementation details by not exposing them in module exports or via cabal.
(1) makes the documentation confusing, and makes it hard for the user to tell the transition between his code respecting a module's abstraction and breaking it. This transition is important: it is analogous to removing a parameter to a function and replacing its occurrences with a constant, a loss of generality.
(2) makes the above transition impossible and hinders the reuse of code. We would like to make our code as abstract as possible, but (cf. Einstein) no more so, and the module author does not have as much information as the module user, so is not in a position to decide what code should be inaccessible. See the link for more on this argument, as it is somewhat peculiar and controversial.
Exposing .Internal modules provides a happy medium which communicates the abstraction barrier without enforcing it, allowing users to easily restrict themselves to abstract code, but allowing them to "beta expand" the module's use if the abstraction breaks down or is incomplete.
1 There are, of course, complications to this puristic judgement. An internal change can now break client code, and authors now have a larger obligation to stabilize their implementation as well as their interface. Even if it is properly disclaimed, users is users and gotsta be supported, so there is some appeal to hiding the internals. It begs for a custom version policy which differentiates between .Internal and interface changes, but fortunately this is consistent with (but not explicit in) the versioning policy. "Real code" is also notoriously lazy, so exposing an .Internal module can provide an easy out when there was an abstract way to define code that was just "harder" (but ultimately supports the community's reuse). It can also discourage reporting an omission in the abstract interface that really should be pushed to the author to fix.

The idea is that you can have the "proper", stabile API which you export from MyModule and this is the preferred and documented way to use the library.
In addition to the public API, your module probably has private data constructors and internal helper functions etc. The MyModule.Internal submodule can be used to export those internal functions instead of keeping them completely locked inside the module.
It lets the users of your libary to access the internals if they have needs that you didn't foresee, but with the understanding that they are accessing an internal API that doesn't have the same implicit guarantees as the public one.
It lets you access the internal functions and constructors for e.g. unit-testing purposes.

One extension (or possibly clarification) to what shang and dflemstr said: if you have internal definitions (data types whose constructors aren't exported, etc.) that you want to access from multiple modules which are exported, then you typically create such an .Internal module which isn't exposed at all (i.e. listed in Other-Modules in the .cabal file).
However, this sometimes does leak out when doing types in ghci (e.g. when using a function but where some of the types it refers to aren't in scope; can't think of an instance where this happens off the top of my head, but it does).

Importing modules as a function, with string as input

I want to make a function called 'load' which imports definitions of functions from another file. I know how to import modules, but in my program I want the definitions of the functions to change depending on which module is 'loaded' with this new function. Is there a way to do this? Is there a better way to write my program so that this is not necessary?
I think it's type signature would look something like:
load :: String -> IO ()
where the string is the name of the module to be loaded (and the module is in the same directory).
Edit: Thanks for all the replies. Most people agree that this is not the best way to do what I want. Instead, is there a way to declare a global variable from within an I/O program. That is, I want it so that if I type (function "thing") into a function of type String -> IO(), I can still type 'thing' into GHCi to get the value assigned to it... Any suggestions?

There is almost certainly a better way to write your program so that this is not necessary. It's hard to say what without knowing more details about your situation, though. You could, for instance, represent the generic interface each module implements as a data-type, and have each module export a value of that type with the implementation.
Basically, the set of loaded modules is a static, compile-time property, so it makes no sense to want your program's behaviour to change based on its contents. Are you trying to write a library? Your users probably won't appreciate it doing such evil magic to their import lists :) (And it probably isn't possible without Template Haskell in that case, anyway.)
The exception is if you're trying to implement a Haskell tool (e.g. REPL, IDE, etc.) or trying to do plugins; i.e. dynamically-loaded modules of Haskell source code to integrate into your Haskell program. The first thing to try for those should be hint, but you may find you need something more advanced; in that case, the GHC API is probably your best bet. plugins used to be the de-facto standard in this area, but it doesn't seem to compile with GHC 7; you might want to check out direct-plugins, a simplified implementation of a similar interface that does.
mueval might be relevant; it's designed for executing short (one-line) snippets of Haskell code in a safe sandbox, as used by lambdabot.

Unless you're building a Haskell IDE or something like that, you most likely don't need this (^1).
But, in the case you do, there is always the hint-package, which allows you to embed a haskell interpreter into your program. This allows you to both load haskell modules and to convert strings into haskell values at runtime. There is a nice example of how to use it here
^1: If you're looking for a way to make things polymorphic, i.e. changing some, but not all definitions of in your code, you're probably looking for typeclasses.

With regards to your edit, perhaps you might be interested in IORef.

How small should I make make modules in Haskell?

I'm writing a snake game in Haskell. These are some of the things I have:
A Coord data type
A Line data type
A Rect data type
A Polygon type class, which allows me to get a Rect as a series of lines ([Line]).
An Impassable type class that allows me to get a Line as a series of Coords ([Coord]) so that I can detect collisions between other Impassables.
A Draw type class for anything that I want to draw to the screen (HSCurses).
Finally I'm using QuickCheck so I want to declare Arbitrary instances for a lot of these things.
Currently I have a lot of these in separate modules so I have lots of small modules. I've noticed that I have to import a lot of them for each other so I'm kind of wondering what the point was.
I'm particularly confused about Arbitrary instances. When using -Wall I get warnings about orphaned instances when I but those instances together in one test file, my understanding is that I can avoid that warning by putting those instances in the same module as where the data type is defined but then I'll need to import Test.QuickCheck for all those modules which seems silly because QuickCheck should only be required when building the test executable.
Any advice on the specific problem with QuickCheck would be appreciated as would guidance on the more general problem of how/where programs should be divided into modules.

You can have your cake and eat it too. You can re-export modules.
module Geometry
( module Coord, module Line, module Rect, module Polygon, module Impassable )
where
I usually use a module whenever I have a complete abstraction -- i.e. when a data type's meaning differs from its implementation. Knowing little about your code, I would probably group Polygon and Impassable together, perhaps making a Collision data type to represent what they return. But Coord, Line, and Rect seem like good abstractions and they probably deserve their own modules.

For testing purposes, I use separate modules for the Arbitrary instances. Although I generally avoid orphan instances, these modules only get built when building the test executable so I don't mind the orphans or that it's not -Wall clean. You can also use -fno-warn-orphans to disable just this warning message.

I generally put more emphasis on the module interface as defined by the functions it exposes rather than the data types it exposes. Do some of your types share a common set of functions? Then I would put them in the same module.
But my practise is probably not the best since I usually write small programs. I would advise looking at some code from Hackage to see what package maintainers do.
If there were a way to sort packages by community rating or number of downloads, that would be a good place to start. (I thought there was, but now that I look for it, I can't find it.) Failing that, look at packages that you already use.

One solution with QuickCheck is to use the C preprocessor to selectively enable the Arbitrary instances when you are testing. You put the Arbitrary instances straight into your main modules but wrap them with preprocessor macros, then put a "test" flag into your Cabal file.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string