is importing haskell module as qualified a good practice?

is importing haskell module as qualified a good practice? - haskell

I know import qualified names has benefit of avoiding name conflicts. I'm asking purely from readability point of view.
Not familiar with haskell standard libraries, one thing I found annoying when reading haskell code (mostly from books and tutorials online) is that when I come across a function, I don't know if it belongs to a imported module or will be defined by user later.
Coming from a C++ background, it's usually seen as a good practice to call standard library function with namespace, for example std::find. Is it the same for haskell? If not then how do you overcome the problem that I mentioned above?

From Haskell style guide:
Always use explicit import lists or qualified imports for standard and
third party libraries. This makes the code more robust against changes
in these libraries. Exception: The Prelude.
So, the answer is yes. Using qualified import is considered a good practice for standard and third party libraries except the Prelude. But for infix function with symbols (something like <|*|>) you may want to import it explicitly as qualified import doesn't look nice on it.

I'm not too fond of qualified names, IMO they rather clutter the code. The only modules that should always be imported qualified are those that use names clashing with prelude functions – these normally have an explicit recommendation for doing so, in the documentation.
For widespreadly used modules such as Control.Applicative, there's not much reason not to import unqualified; most programmers should know all that's in there. For modules from less well-known packages that do something very specific, or to avoid clashes of a single name, you can use an explicit import list, e.g. import Data.List (sortBy), import System.Random.Shuffle (shuffleM) – this way, you don't have to litter your code with qualifiers, yet looking up an identifier in the imports section tells you immediately where it comes from (this is analogous to using std::cout;). But honestly, I find it even more convenient to just load the module into ghci and use
*ClunkyModule> :i strangeFunction
to see where it's defined.
There's one good point to be made about qualified imports or explicit import lists that I tend to neglect: they make your packages more future-proof. If a new version of some module stops exporting an item you need, or another module introduces a clashing name, then an explicit import will immediately point you to the problem.

I feel the same as you. If I see functionName in a module I'm unfamiliar with then I have no idea which one of the many imports it comes from. "Unfamiliar module" here can also mean one I myself wrote in the past! My current style is like the following, but it's by no means universally accepted. Users of this style are probably in a very small minority.
import qualified Long.Path.To.Module as M
... use M.functionName ...
or if I want more clarity
import qualified Long.Path.To.Module as Module
... use Module.functionName ...
Very rarely I will fully qualify
import qualified Long.Path.To.Module
... use Long.Path.To.Module.functionName ...
However, I almost never qualify infix operators.

My own set of rules.
1) Try not to import anything as qualified without renaming. B.ByteString is much more readable than Data.ByteString.ByteString. Exception can be made for truly ubiquitous modules, such as Control.Monad.
2) Don't import the whole module, import specific functions/types/classes, unless there are too many of them. That way if someone wants to find out where some function came from, he can just search the function's name in the beginning of the file.
3) Import closely related modules renaming them to the same name, unless imported functions conflict, or two of them are imported as a whole, without import list.
4) If possible, try to avoid using functions with the same name from different modules, even if these modules are renamed differently. If someone knows what function X.foo does, he is likely to be confused by the function Y.foo. If it's inevitable, consider creating a separate, very small, module that imports both functions and exports them under different names.

Related

Leave free access to internal intermediary functions in Haskell library?

I'm writing a numerical optimisation library in Haskell, with the aim of making functions like a gradient descent algorithm available for users of the library. In writing these relatively complex functions, I write intermediary functions, such as a function that performs just one step of gradient descent. Some of these intermediary functions perform tasks that no user of the library could ever have need for. Some are even quite cryptic, but make sense when used by a bigger function.
Is it common practice to leave these intermediary functions available to library users? I have considered moving these to an "Internal" library, but moving small functions into a whole different library from the main functions using them seems like a bad idea for code legibility. I'd also quite like to test these smaller functions as well as the main functions for debugging purposes down the line - and ideally would like to test both in the same place, so that complicates things even more.
I'm unsurprisingly using Cabal for the library so answers in that context as well would be helpful if that's easier.

You should definitely not just throw such internal functions in the export of your package's trunk module, together with the high-level ones. It makes the interface/haddocks hard to understand, and also poses problems if users come to depend on low-level details that may easily change in future releases.
So I would keep these functions in an “internal” module, which the “public” module imports but only re-exports those that are indended to be used:
Public
module Numeric.Hegash.Optimization (optimize) where
import Numeric.Hegash.Optimization.Internal
Private
module Numeric.Hegash.Optimization.Internal where
gradientDesc :: ...
gradientDesc = ...
optimize :: ...
optimize = ... gradientDesc ...
A more debatable matter is whether you should still allow users to load the Internal module, i.e. whether you should put it in the exposed-modules or other-modules section of your .cabal file. IMO it's best to err on the “exposed” side, because there could always be valid use cases that you didn't foresee. It also makes testing easier. Just ensure you clearly document that the module is unstable. Only functions that are so deeply in the implementation details that they are basically impossible to use outside of the module should not be exposed at all.

You can selectively export functions from a module by listing them in the header. For example, if you have functions gradient and gradient1 and only want to export the former, you can write:
module Gradient (gradient) where
You can also incorporate the intermediary functions into their parent functions using where to limit the scope to just the parent function. This will also prevent the inner function from being exported:
gradient ... =
...
where
gradient1 ... = ...

Importing modules over functions

Whats the difference between importing all the modules in Python 3.x, specific functions in a module and all functions in a module. I know how does it work. But like to understand what advantage do we get while importing a specific function as it no harm when we import all the functions in a module?

The reason to import specific functions, rather than the entire module, is to avoid a situation where you unintentionally have 2 or more functions with the same name.
This could lead to the program not functioning the way you intend.
You run the risk of this happening particularly with larger, more complex projects.
You can avoid this by importing specific functions. You can also give an alias to the function as you import it:
from module_name import function_name as fn
Then use fn (or whatever alias you have chosen) to call the function.

Why does Rust promote use statements with explicit imports?

I see a lot of Rust code where the use statements look like this:
use std::io::net::ip::{SocketAddr, Ipv4Addr};
The way I get it, this restricts the use statement to only import SocketAddr and Ipv4Addr.
Looking from the perspective of languages such as Java or C# this feels odd as with such languages an import statement always imports all public types.
I figured one can have the same in Rust using this statement.
use std::io::net::ip::*;
The only reason I could see for the explicit naming would be to avoid conflicts where two different imports would contain public APIs with the same names. However, this could be worked around with aliasing so I wonder if there's another advantage of the more strict "import only what is needed" approach?

Rust is in this inspired by Python, which has a similar principle: importing is all explicit, and though glob imports (use x::* in Rust, from x import * in Python) are supported, they are not generally recommended.
This philosophy does have some practical impacts; calling a trait method, for example, can only be done if the trait is in scope, and so calling a trait method when there are name collisions in the imported traits is rather difficult (this will be improved in the future with Uniform Function Call Syntax, where you can call Trait::function(self) rather than just self.function()).
Mostly, though, it is something that is well expressed in the Zen of Python: "explicit is better than implicit". When vast swathes of things are in scope, it can be difficult to see what came from where, and intimate knowledge of the module structure, and/or tooling becomes quite important; if it is all explicit, tooling is largely unnecessary and working with files by hand in a simple text editor is entire feasible. Sure, tooling will still be able to assist, but it is not as necessary.
This is why Rust has adopted Python's explicit importing philosophy.

Mailing list thread asking why glob imports are not preferred
in which Huon writes
Certain aspects of them dramatically complicate the name resolution
algorithm (as I understand it), and, anyway, they have various downsides
for the actual code, e.g. the equivalent in Python is frowned upon:
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html#importing
Maybe they aren't so bad in a compiled & statically typed language? I
don't know; either way, I personally find code without glob imports
easier to read, because I can work out which function is being called
very easily, whereas glob imports require more effort.
Issue to remove glob imports
where nikomatsakis writes
I think glob imports have their place. For example, in the borrow
checker, there is a shallow tree of submodules that all make use of
data types defined in borrowck/mod.rs. It seems rather tedious and
silly to require manual importing of all these names, vs just writing
use rustc::middle::borrowck::*. I know that there are complications
in the resolution algorithm, though, and I would be more amenable to
an argument based on fundamental challenges there.
This then moved to RFC 305 which was rejected by steveklabnik without comment on whether they're good style:
Glob imports are now stable, and so I'm going to give this a close.

Global import is a convenience for developers during development stage, import everything all at once because at this stage importing each items one by one may be cumbersome. They are exported under prelude module specifically designed for this purpose.
Once you finished your logic, replace glob imports with explicit ones because glob imports are hard to trace, requires extra effort to find what comes from where.
They are referred as wildcard imports in Rust.
They are frowned upon because:
can pollute the namespace
can cause clashes and lead to confusing errors
macro related issues around tooling
I don't think it has anything to do with Python.
Arguments for their deprecation has strong merits but deprecating wildcard imports may not be feasible and can impacts developer productivity but one can set clippy lints.
warn-on-all-wildcard-imports: bool: Whether to allow certain wildcard imports (prelude, super in tests). (defaults to false).
Read more https://rust-lang.github.io/rust-clippy/master/index.html#wildcard_imports

Requiring explicit imports enables code readers to see quickly and precisely which thing a local symbol corresponds to. This often helps them to navigate directly to that code.
use somelib:*;
use otherlib:*;
// Where does SomeType come from? Good luck finding it in a large
// project if you're new.
fn is_brassy(input: SomeType) {
return input.has_brass;
}
Also, wildcard imports can hide undeclared dependencies. If some code doesn't declare its dependencies, but it is always imported along with its dependencies in a wildcard import, the code will run fine. But readers looking at that code file in isolation may be confused about where the symbol comes from.
(The following code tries to demonstrate the issue in Python-like pseudo-code because I’m more proficient in expressing my thoughts using Python, even though the actual time I’ve seen this was with Rust macros.)
# helpers
def helpful_helper(x):
return x + ' world'
# helpless
def needs_help(x):
# helpful_helper not explicitly imported!
return helpful_helper(x)
# lib
from helpers import *
from helpless import *
# app
from lib import *
needs_help('hello')
A different dependency might even be used if different imports were used together.
# other_app
from other_helpers import *
from helpless import *
# what does it do? Better look at other_helpers.py...
needs_help('hello')

Benefit of importing specific parts of a Haskell module

Except from potential name clashes -- which can be got around by other means -- is there any benefit to importing only the parts from a module that you need:
import SomeModule (x, y, z)
...verses just importing all of it, which is terser and easier to maintain:
import SomeModule
Would it make the binary smaller, for instance?

Name clashes and binary size optimization are just two of the benefits you can get. Indeed, it is a good practice to always identify what you want to get from the outside world of your code. So, whenever people look at your code they will know what exactly your code requesting.
This also gives you a very good chance to creat mocking solutions for test, since you can work through the list of imports and write mockings for them.
Unfortunately, in Haskell the type class instances are not that easy. They are imported implicitly and so can creates conflicts, also they may makes mocking harder, since there is no way to specify specific class instances only. Hopefully this can be fixed in future versions of Haskell.
UPDATE
The benifits I listed above (code maintenance and test mocking) are not limited to Haskell. Actually, it is also common practice in Java, as I know. In Java you can just import a single class, or even a single static variable/method. Unfortunately again, you still cannot selectively import member functions.

No, it's only for the purpose of preventing name clashes. The other mechanism for preventing name clashes - namely import qualified - results in more verbose (less readable) code.
It wouldn't make the binary smaller - consider that functions in a given module all reference each other, usually, so they need to be compiled together.

How, why and when to use the ".Internal" modules pattern?

I've seen a couple of package on hackage which contain module names with .Internal as their last name component (e.g. Data.ByteString.Internal)
Those modules are usually not properly browsable (but they may show up nevertheless) in Haddock and should not be used by client code, but contain definitions which are either re-exported from exposed modules or just used internally.
Now my question(s) to this library organization pattern are:
What problem(s) do those .Internal modules solve?
Are there other preferable ways to workaround those problems?
Which definitions should be moved to those .Internal modules?
What's the current recommended practice with respect to organizing libraries with the help of such .Internal modules?

Internal modules are generally modules that expose the internals of a package, that break package encapsulation.
To take ByteString as an example: When you normally use ByteStrings, they are used as opaque data types; a ByteString value is atomic, and its representation is uninteresting. All of the functions in Data.ByteString take values of ByteString, and never raw Ptr CChars or something.
This is a good thing; it means that the ByteString authors managed to make the representation abstract enough that all the details about the ByteString can be hidden completely from the user. Such a design leads to encapsulation of functionality.
The Internal modules are for people that wish to work with the internals of an encapsulated concept, to widen the encapsulation.
For example, you might want to make a new BitString data type, and you want users to be able to convert a ByteString into a BitString without copying any memory. In order to do this, you can't use opaque ByteStrings, because that doesn't give you access to the memory that represents the ByteString. You need access to the raw memory pointer to the byte data. This is what the Internal module for ByteStrings provides.
You should then make your BitString data type encapsulated as well, thus widening the encapsulation without breaking it. You are then free to provide your own BitString.Internal module, exposing the innards of your data type, for users that might want to inspect its representation in turn.
If someone does not provide an Internal module (or similar), you can't gain access to the module's internal representation, and the user writing e.g. BitString is forced to (ab)use things like unsafeCoerce to cast memory pointers, and things get ugly.
The definitions that should be put in an Internal module are the actual data declarations for your data types:
module Bla.Internal where
data Bla = Blu Int | Bli String
-- ...
module Bla (Bla, makeBla) where -- ONLY export the Bla type, not the constructors
import Bla.Internal
makeBla :: String -> Bla -- Some function only dealing with the opaque type
makeBla = undefined

#dflemstr is right, but not explicit about the following point. Some authors put internals of a package in a .Internal module and then don't expose that module via cabal, thereby making it inaccessible to client code. This is a bad thing1.
Exposed .Internal modules help to communicate different levels of abstraction implemented by a module. The alternatives are:
Expose implementation details in the same module as the abstraction.
Hide implementation details by not exposing them in module exports or via cabal.
(1) makes the documentation confusing, and makes it hard for the user to tell the transition between his code respecting a module's abstraction and breaking it. This transition is important: it is analogous to removing a parameter to a function and replacing its occurrences with a constant, a loss of generality.
(2) makes the above transition impossible and hinders the reuse of code. We would like to make our code as abstract as possible, but (cf. Einstein) no more so, and the module author does not have as much information as the module user, so is not in a position to decide what code should be inaccessible. See the link for more on this argument, as it is somewhat peculiar and controversial.
Exposing .Internal modules provides a happy medium which communicates the abstraction barrier without enforcing it, allowing users to easily restrict themselves to abstract code, but allowing them to "beta expand" the module's use if the abstraction breaks down or is incomplete.
1 There are, of course, complications to this puristic judgement. An internal change can now break client code, and authors now have a larger obligation to stabilize their implementation as well as their interface. Even if it is properly disclaimed, users is users and gotsta be supported, so there is some appeal to hiding the internals. It begs for a custom version policy which differentiates between .Internal and interface changes, but fortunately this is consistent with (but not explicit in) the versioning policy. "Real code" is also notoriously lazy, so exposing an .Internal module can provide an easy out when there was an abstract way to define code that was just "harder" (but ultimately supports the community's reuse). It can also discourage reporting an omission in the abstract interface that really should be pushed to the author to fix.

The idea is that you can have the "proper", stabile API which you export from MyModule and this is the preferred and documented way to use the library.
In addition to the public API, your module probably has private data constructors and internal helper functions etc. The MyModule.Internal submodule can be used to export those internal functions instead of keeping them completely locked inside the module.
It lets the users of your libary to access the internals if they have needs that you didn't foresee, but with the understanding that they are accessing an internal API that doesn't have the same implicit guarantees as the public one.
It lets you access the internal functions and constructors for e.g. unit-testing purposes.

One extension (or possibly clarification) to what shang and dflemstr said: if you have internal definitions (data types whose constructors aren't exported, etc.) that you want to access from multiple modules which are exported, then you typically create such an .Internal module which isn't exposed at all (i.e. listed in Other-Modules in the .cabal file).
However, this sometimes does leak out when doing types in ghci (e.g. when using a function but where some of the types it refers to aren't in scope; can't think of an instance where this happens off the top of my head, but it does).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string