Haskell, Hackage, GHC and productivity. How to solve a real example? - haskell

I don't know the best way to solve a simple (probably) problems (hackage related).
I asked for help about it (http://stackoverflow.com/questions/12841599/haskell-hackage-ghc-and-productivity-what-to-do) but I knew not explain well.
Today, I'm with a this kin problem.
The concrete problem isn't relevant, but is it:
`Write a function that, given a string, remove diacritics.`
Example:
`simpleWord "Cigüeñal" <-> "Ciguenal"
The correct way (I think) is to use the standard Unicode normalization. In some languages/frameworks (.Net, PHP, Python, ...) exist some related function.
In Haskell, thanks to hackage community exist too:
`Text.Unicode.Normalization.normalize`
But, I couldn't install with (eg) ghc-7.4 but compact-string (that depends of) fail.
A fix for compact-string exists (compact-string-fix) then: can't I use cabal to install (directly)?, should I download and patch it?, should I look for another alternative to function about?
I explained a concrete real case (simple or complex, don't care), the question (that I ask help for) is how can, a novice haskeller, know the best way to select correct libraries, ghc correct (balanced) version, without hit a wall.
I'm really lost about it.
Really, thank you very much for any suggestion.
Best regards.

The documentation for compact-string says, "This package is obsolete. Use text instead.".
The documentation for text says, "To use an extended and very rich family of functions for working with Unicode text (including normalization, regular expressions, non-standard encodings, text breaking, and locales), see the text-icu package.".
The documentation for text-icu shows that it successfully builds on GHC 7.4 and has support for Unicode normalization.

Here's the general process I follow when deciding which packages to use. First, I try to identify multiple packages that meet my needs. Then I look more closely at each package to try to determine which ones are the best for me, according to the criteria listed below.
It's usually better to use packages that are currently maintained. To determine if a package is currently maintained, I check the "Upload date" link on the package description page. (Of course, there are some old tried-and-true packages that haven't been modified in ages because they don't need modification.)
It's usually better to use packages that are mature, so I check the version number on the package description page. A package with a version number of 7.3.5 is probably more mature than a version 0.1 package.
It's usually better to use packages that are well documented. Sometimes there's a nice example of how to use the package in the Haddock documentation (yay!). I'll also check the "Home page" link on the package description page, because often there will be more documentation there.
It's usually better to use packages that are popular, because any problems will probably be addressed quickly, and other users can answer questions. I'll usually do a Google search and see whch packages are mentioned most often on Haskell mailing lists and StackOverflow.
It's usually better to use packages that don't require a lot of packages I don't already have, so I check the "Dependencies" section on the package description page.
I tend to follow this procedure when choosing a package for any programming language, not just Haskell.

Related

How do indicate that a Haskell package is in either an alpha/beta/release candidate stage?

Let us say that I have worked on a haskell library and am now ready to release a beta version of the software to hackage/make repo public on github etc.
Possible Solutions and why they do not work for me
Use packagename-0.0.0.1-alpha or similar.
The problem here is quite simple: The Haskell PVP Specification does not allow it: (bold is me)
The components of the version number MUST be numbers! Historically Cabal supported version numbers with string tags at the end, e.g. 1.0-beta This proved not to work well because the ordering for tags was not well defined. Version tags are no longer supported and mostly ignored, however some tools will fail in some circumstances if they encounter them.
Just use packagename-0.* until it is out of alpha/beta (and then use packagename-1.*).
The problem here is twofold:
This method would not work for describing relase candidates which are post version 1.
Programmers from other ecosystems, such as that of rust, where it is quite common to have a stable library in 0.*, might wrongly assume that this library is stable. (Of course, it could be mitigated somewhat with a warning in the README, but I would prefer a better solution still.)
So, what is the best (and most conventional in haskell) way to indicate that the library version is in alpha/beta stage of development or is a release candidate?
As far as I know, there is not a package-wide way to say this. However, you can add a module description that describes the stability of the module to each of your modules' documentation.
{-|
Stability: experimental
-}
module PackageName.ModuleName where

Type hints for lxml?

New to Python and come from a statically typed language background. I want type hints for https://lxml.de just for ease of development (mypy flagging issues and suggesting methods would be nice!)
To my knowledge, this is a python 2.0 module and doesn’t have types. Currently I’ve used https://mypy.readthedocs.io/en/stable/stubgen.html to create stub type definitions and filling in “any”-types I’m using with more information, but it’s really hacky. Are there any safer ways to get type hints?
There is an official stubs package for lxml now called lxml-stubs:
$ pip install lxml-stubs
Note, however, that the stubs are still in development and are not 100% complete yet (although very much usable from my experience). These stubs were once part of typeshed, then curated by Jelle Zijlstra after removal and now are developed as part of the lxml project.
If you want the development version of the stubs, install via
$ pip install git+https://github.com/lxml/lxml-stubs.git
(the project's readme installation command is missing the git+ prefix in URL's scheme and won't work).
Recently I have done much more gap filling based on lxml-stubs with some good progress.
Welcome to check out types-lxml if any late comer are still interested. For most people I think lxml.objectify is the only missing piece lacking from the stubs, which is planned immediately after current release.
I’ve used stubgen to create stub type definitions and filling in “any”-types
This is actually the correct approach if it's not lxml; creating template from mypy stubgen is the starting point for many stub files. But lxml is mostly written in Cython, for which stubgen do not have perfect support yet. Besides as OP noted, this is a python 2.0 era module, and author uses function arguments in a quite polymorphous way. There are lots of unique challenges annotating lxml, as lxml is essentially a python interface for libxml and libxslt in its core.
As an example, the support of both unicode and bytes input complicates matter too; this is the same difficulty found when annotating xml.etree bundled with python, but in a much greater magnitude.
I would not call this "hacky", rather it is gradual typing.
You can take a closer look at lxml-stubs repository. From about:
This repository contains external type annotations (see PEP 484) for the lxml package. Such type annotations are normally included in typeshed, but lxml's annotations were frequently problematic and have therefore been deleted from typeshed. In particular, the stubs are incomplete and it has been difficult to provide complete stubs.
Perhaps it will be useful to you

Non-maintainer uploads to Hackage

I have a package on Hackage which depends on third-party package, which doesn't build on newer versions of GHC (>= 7.2). The problem with the other package can be solved with just a one-line patch (a LANGUAGE pragma). I sent the patch to the upstream twice, but didn't receive any feedback. The problem is that my package is not installable neither until the dependency is fixed.
I could have uploaded the fixed version of depenency package (with a minor version bump), but I'd like to hear what is the attitude of the community about such non-maintainer uploads. Again, I don't want to change the library interface, I only add a new compilation flag to make it buildable again.
Are non-maintainer uploads to Hackage allowed and tolerated?
When a fork of the package on Hackage is a better approach?
Package uploads by non-maintainers are allowed (there may be license issues, but most packages if not all on hackage have licenses permitting this), but of course they are not usually done. They are tolerated if done in good faith and with reasonable procedure. If you contact the maintainer and don't get any response within n weeks (where I'm not sure what the appropriate value of n is, not less than 3, I'd say), uploading a new version yourself becomes an option, discussing that on the mailing lists seems however more prudent. If the package looks like it is abandoned, even taking over maintainership - of course after again contacting the maintainer, giving her/him time to respond - may be the appropriate action, but that should definitely be discussed with the community (haskell-cafe or mailing list, for example). Whether to prefer a non-maintainer upload or a fork must be left to your judgment, personally I tend to believe forks step on fewer people's toes.
But a better founded reply would be possible if we knew which package is concerned and could look at the concrete situation.
A forking is intrusive for a package that you suspect is still maintained but the author is temporarily missing. By intrusive, I mean that other programmers might pick up your fork then not go back to the mainline once the original author has resumed work on the mainline.
For packages where the original author has left the Haskell community, my personal opinion is that its better to fork the package if you are going to develop it further. Forking prevents succession problems, such as those that happened with Parsec where many developers didn't want to update because the successor was slower and less well documented than the original for some time.
In all cases asking on the Cafe is best, regardless of whether people have chosen not to follow it, it is still the center of the Haskell community.
For the particular case in the question, while it is nice if things on Hackage compile, there is no rule that says they have to. A package that depends on a broken package could simply put change instructions for the broken dependency on its front page, i.e. "This package depends on LambdaThing-0.2.0 which is broken, to fix LambdaThing add ... to the file Lambda.hs"
I would say, it's a very good idea to consult the mailing lists regarding the specific package and the specific person who is missing. I took control of the haskell-src-meta package from its original owner, but only after consulting with the lists and IRC, who assured me that Matt Morrow had been missing for months and no-one knew why.
In my opinion, package ownership should probably only be changed where there is a consensus to do so, or at the very least there should be efforts made to find one. In the development version of the Hackage software, it's my understanding there are access controls so that only administrators can make this kind of intervention.

How are the Haddock module fields Portability, Stability and Maintainer used?

In lots of Haddock-generated module documentation (e.g. Prelude), a small box in the top-right can be seen, containing portability, stability and maintainer information:
From looking at the source code to such modules and experimentation, I confirmed that this information is generated from lines like the following in the module description:
-- Maintainer : libraries#haskell.org
-- Stability : stable
-- Portability : portable
There are several strange things about this:
The fields only seem to work in this order — any fields put out of order are simply treat as part of the module description itself. This is despite the fact that the order in the source file is the opposite of the order in the generated documentation!
I have been unable to find any official documentation of these fields. There is a Cabal package property named stability, the example values of which match the values I've seen in the equivalent Haddock fields, but beyond that, I've found nothing.
So: How are these fields intended to be used, and are they documented anywhere?
In particular, I'd like to know:
The full list of commonly-used values for Portability and Stability. This HaskellWiki page has a list, but I'd like to know where this list originated from.
The criteria for deciding whether a module is portable or non-portable. In particular, the package I would like the answers to these questions for, acme-strfry, is an FFI binding to strfry, a function only available in glibc. Is the package non-portable, because it only works on glibc systems, or portable, because it does not use any Haskell language extensions? The common usage seems to imply the latter.
Why a specific order of fields is required in the source file, and why it's the opposite of the ordering in the generated documentation.
Oh, I thought those fields were from the cabal package description. They don't seem to be documented at all on Haddock's docs. I've found this, which doesn't really answer your question but:
http://trac.haskell.org/haddock/ticket/71
So if it's freeform anyway, why not just write "non-portable (depends on glibc)"? I've seen even "portable (depends on ghc)", which is odd. I also wonder what happens with modules that were non-portable due to non-Haskell98 extension Foo, after Foo was added to Haskell 2010.
Note that the Cabal documenation you link to also says stability is freeform. Of course, even if haddock or cabal were to define what are the acceptable values, it'd still be up to the maintainer to subjectively select one.
About the specific order, you should probably just ask at the haddock mailing list, or check the source and file a bug.
PS: strfry is an invaluable contribution to the Haskell community, but it should be pure and portable, don't you think?
Ah yes, one of the more obscure and crufty features of Haddock.
As best as I can tell, it's just an undocumented hack. There's no sane reason why the order of the fields should matter, but it does. The specific choice of formatting (i.e., as a special form inside the module comment rather than as a separate block of some kind) isn't the best either. My guess is that somebody wanted to quickly add this feature one day, so they hacked up something minimal but functioning, and left it at that. (Without bothering to document it.)
Personally, I just don't bother with these fields at all. The information is available from Cabal, so I don't bother duplicating it in Haddock as well. Perhaps some day Cabal will pass this information to Haddock automatically...

Are there tools that would be suitable for maintaining a changelog for a Cabal Haskell package?

I'm working fast and furiously on a new Haskell package for compiler writers. I'm going through many minor version numbers daily, and the Haskell packaging system, Cabal, doesn't seem to offer any tools for updating version numbers or for maintaining a change log. (Logs are going into git but that's not visible to anyone using the package.) I would kill for something equivalent to Debian's uupdate or dch/debchange tools.
Does anyone know of general-purpose tools that could be used to increment version numbers automatically and add an entry to a change log?
I use a very simple scheme to generate my CHANGELOG. I just ask darcs for it and include it in the extra-files section of my package's .cabal file. Though, this seems too simplistic for what you are asking. =)
That said, you can go quite a bit farther and use a custom cabal Setup.(hs|lhs) that builds the CHANGELOG during cabal sdist out of your darcs or git repository's commit info (or out of whatever system you decide to use to track it)
The Setup.lhs used by darcs does something very similar to include information on version numbers and number of applied patches since the last version. Look at the sdistHook and generateVersionModule machinery in Setup.lhs to get an idea of how this can be done.
To non-answer your question, I'm not aware of anything. This sounds like a good match for posting in the Haskell Proposals subreddit, since it seems like a pretty useful idea.

Resources