Efficient Haskell equivalent to NumPy's argsort - haskell

Is there a standard Haskell equivalent to NumPy's argsort function?
I'm using HMatrix and, so, would like a function compatible with Vector R which is an alias for Data.Vector.Storable.Vector Double. The argSort function below is the implementation I'm currently using:
{-# LANGUAGE NoImplicitPrelude #-}
module Main where
import qualified Data.List as L
import qualified Data.Vector as V
import qualified Data.Vector.Storable as VS
import Prelude (($), Double, IO, Int, compare, print, snd)
a :: VS.Vector Double
a = VS.fromList [40.0, 20.0, 10.0, 11.0]
argSort :: VS.Vector Double -> V.Vector Int
argSort xs = V.fromList (L.map snd $ L.sortBy (\(x0, _) (x1, _) -> compare x0 x1) (L.zip (VS.toList xs) [0..]))
main :: IO ()
main = print $ argSort a -- yields [2,3,1,0]
I'm using explicit qualified imports just to make it clear where every type and function is coming from.
This implementation is not terribly efficient since it converts the input vector to a list and the result back to a vector. Does something like this (but more efficient) exist somewhere?
Update
#leftaroundabout had a good solution. This is the solution I ended up with:
module LAUtil.Sorting
( IndexVector
, argSort
)
where
import Control.Monad
import Control.Monad.ST
import Data.Ord
import qualified Data.Vector.Algorithms.Intro as VAI
import qualified Data.Vector.Storable as VS
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Unboxed.Mutable as VUM
import Numeric.LinearAlgebra
type IndexVector = VU.Vector Int
argSort :: Vector R -> IndexVector
argSort xs = runST $ do
let l = VS.length xs
t0 <- VUM.new l
forM_ [0..l - 1] $
\i -> VUM.unsafeWrite t0 i (i, (VS.!) xs i)
VAI.sortBy (comparing snd) t0
t1 <- VUM.new l
forM_ [0..l - 1] $
\i -> VUM.unsafeRead t0 i >>= \(x, _) -> VUM.unsafeWrite t1 i x
VU.freeze t1
This is more directly usable with Numeric.LinearAlgebra since the data vector is a Storable. This uses an unboxed vector for the indices.

Use vector-algorithms:
import Data.Ord (comparing)
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Algorithms.Intro as VAlgo
argSort :: (Ord a, VU.Unbox a) => VU.Vector a -> VU.Vector Int
argSort xs = VU.map fst $ VU.create $ do
xsi <- VU.thaw $ VU.indexed xs
VAlgo.sortBy (comparing snd) xsi
return xsi
Note these are Unboxed rather than Storable vectors. The latter need to make some tradeoffs to allow impure C FFI operations and can't properly handle heterogeneous tuples. You can of course always convert to and from storable vectors.

What worked better for me is using Data.map, as it is subject to list fusion, got a speed up. Here n=Length xs.
import Data.Map as M (toList, fromList, toAscList)
out :: Int -> [Double] -> [Int]
out n !xs = let !a= (M.toAscList (M.fromList $! (zip xs [0..n])))
!res=a `seq` L.map snd a
in res
However this is only aplicable for periodic lists, as:
out 12 [1,2,3,4,1,2,3,4,1,2,3,4] == out 12 [1,2,3,4,1,3,2,4,1,2,3,4]

Related

catching state that is changed by execStateT

I am very new to Haskell. Recently, I had to work with it for my project.
I have a certain code which is evaluating some state using execStateT and I want to catch each state change and return it.
I have tried to understand what execStateT and the flow of the code, but I am failing at certain places, where I couldn't understand how to get the thing I really want.
Maybe due to a somewhat RAW understanding of monads and other concepts, I am finding a need to change the whole structure of the code.
In the upcoming code, I tried to use par to create a file and write the state of a variable into that file, and so it doesn't affect the actual work of the code. But it didn't create a file and write the inputs into it.
I am facing the following code
campaign u v w ts d = let d' = fromMaybe defaultDict d in fmap (fromMaybe mempty) (view (hasLens . to knownCoverage)) >>= \c -> do
g <- view (hasLens . to seed)
let g' = mkStdGen $ fromMaybe (d' ^. defSeed) g
execStateT (evalRandT runCampaign g') (Campaign ((,Open (-1)) <$> ts) c d') where
step = runUpdate (updateTest v Nothing) >> lift u >> runCampaign
runCampaign = use (hasLens . tests . to (fmap snd)) >>= update
update c = view hasLens >>= \(CampaignConf tl q sl _ _) ->
if | any (\case Open n -> n < tl; _ -> False) c -> callseq v w q >> step
| any (\case Large n _ -> n < sl; _ -> False) c -> step
| otherwise -> lift u
What I want here is find some way to look at the changes in variable v, to do my further work. This can be done either by writing a variable into a file or returning it to the console.
Thanks for help!
[Edit 1]
Here are the imports I am making:
import Control.Lens
import Control.Monad (liftM2, replicateM, when)
import Control.Monad.Catch (MonadCatch(..), MonadThrow(..))
import Control.Monad.Random.Strict (MonadRandom, RandT, evalRandT)
import Control.Monad.Reader.Class (MonadReader)
import Control.Monad.State.Strict (MonadState(..), StateT, evalStateT, execStateT)
import Control.Monad.Trans (lift)
import Control.Monad.Trans.Random.Strict (liftCatch)
import Data.Aeson (ToJSON(..), object)
import Data.Bool (bool)
import Data.Either (lefts)
import Data.Foldable (toList)
import Data.Map (Map, mapKeys, unionWith)
import Data.Maybe (fromMaybe, isNothing, maybeToList)
import Data.Ord (comparing)
import Data.Has (Has(..))
import Data.Set (Set, union)
import Data.Text (unpack)
import EVM
import EVM.Types (W256)
import Numeric (showHex)
import System.Random (mkStdGen)
Here's one approach. Suppose you have a class for monads supporting logging of messages with a certain type (MonadLogger is one, but I don't know enough about it to use it here). I'll just use a hypothetical CanLog class. Now you can write
newtype LStateT s m a = LStateT
{runLStateT :: StateT s m a}
deriving (Functor, Applicative, Monad)
instance CanLog s m => MonadState s (LStateT s m) where
get = LStateT get
put x = LStateT $ do
lift $ -- Log the state transition
put x

Creating a random permutation of 1..N with Data.Vector.Unboxed.Mutable

I want to create a list containing a random permutation of the numbers 1 through N. As I understand it, it is possible to use VUM.swap in the runST, but since I need random numbers as well I figured I might do both in the IO monad.
The code below yields:
Expected type: IO (VU.Vector Int), Actual type: IO (VU.Vector
(VU.Vector a0))
for the return statement.
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Unboxed.Mutable as VUM
import System.Random
randVector :: Int -> IO (VU.Vector Int)
randVector n = do
vector <- VU.unsafeThaw $ VU.enumFromN 1 n
VU.forM_ (VU.fromList [2..VUM.length vector]) $ \i -> do
j <- randomRIO(0, i) :: IO Int
VUM.swap vector i j
return $ VU.unsafeFreeze vector
I'm not quite sure why the return vector is nested. Do I have to use VU.fold1M_ instead?
unsafeFreeze vector already returns IO (VU.Vector Int). Just change the last line to VU.unsafeFreeze vector.
On another note, you should iterate until VUM.length vector - 1, since both [x .. y] and randomRIO use inclusive ranges. Also, you can use plain forM_ here for iteration, since you only care about side effects.
import Control.Monad
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Unboxed.Mutable as VUM
import System.Random
randVector :: Int -> IO (VU.Vector Int)
randVector n = do
vector <- VU.unsafeThaw $ VU.enumFromN 1 n
forM_ [2..VUM.length vector - 1] $ \i -> do
j <- randomRIO(0, i) :: IO Int
VUM.swap vector i j
VU.unsafeFreeze vector
I looked at the generated code, and it seems that with GHC 7.10.3 forM_ compiles to an efficient loop while VU.forM_ retains the intermediate list and is surely significantly slower (which was my expected outcome for forM_, but I was unsure about VU.forM_).
I would try (note update at end):
import Control.Monad
randVector :: Int -> IO (VU.Vector Int)
randVector n = do
vector <- VU.unsafeThaw $ VU.enumFromN 1 n
forM_ [2..VUM.length vector] $ \i -> do
j <- randomRIO(0, i) :: IO Int
VUM.swap vector i j
return $ VU.unsafeFreeze vector
Edit: as #András Kovács pointed out, you don't want the return at the end so the last line should be:
VU.unsafeFreeze vector

Haskell Hashtable Performance

I am trying to use hash tables in Haskell with the hashtables package, and finding that I cannot get anywhere near Python's performance. How can I achieve similar performance? Is it possible given current Haskell libraries and compilers? If not, what's the underlying issue?
Here is my Python code:
y = {}
for x in xrange(10000000):
y[x] = x
print y[100]
Here's my corresponding Haskell code:
import qualified Data.HashTable.IO as H
import Control.Monad
main = do
y <- H.new :: IO (H.CuckooHashTable Int Int)
forM_ [1..10000000] $ \x -> H.insert y x x
H.lookup y 100 >>= print
Here is another version using Data.Map, which is slower than both for me:
import qualified Data.Map as Map
import Data.List
import Control.Monad
main = do
let m = foldl' (\m x -> Map.insert x x m) Map.empty [1..10000000]
print $ Map.lookup 100 m
Interestingly enough, Data.HashMap performs very badly:
import qualified Data.HashMap.Strict as Map
import Data.List
main = do
let m = foldl' (\m x -> Map.insert x x m) Map.empty [1..10000000]
print $ Map.lookup 100 m
My suspicion is that Data.HashMap performs badly because unlike Data.Map, it is not spine-strict (I think), so foldl' is just a foldl, with the associated thunk buildup problems.
Note that I have used -prof and verified that the majority of the time is spend in the hashtables or Data.Map code, not on the forM or anything like that. All code is compiled with -O2 and no other parameters.
As reddit.com/u/cheecheeo suggested here, using Data.Judy, you'll get similar performance for your particular microbenchmark:
module Main where
import qualified Data.Judy as J
import Control.Monad (forM_)
main = do
h <- J.new :: IO (J.JudyL Int)
forM_ [0..10000000] $ \i -> J.insert (fromIntegral i) i h
v <- J.lookup 100 h
putStrLn $ show v
Timeing the above:
$ time ./Main
Just 100
real 0m0.958s
user 0m0.924s
sys 0m0.032s
Timing the python code of OP:
$ time ./main.py
100
real 0m1.067s
user 0m0.886s
sys 0m0.180s
The documentation for hashtables notes that "Cuckoo hashing, like the basic hash table implementation using linear probing, can suffer from long delays when the table is resized." You use new, which creates a new table of the default size. From looking at the source, it appears that the default size is 2. Inserting 10000000 items likely entails numerous resizings.
Try using newSized.
Given the times above, I thought I would throw in the Data.Map solution, which seems to be comparable to using newSized.
import qualified Data.Map as M
main = do
print $ M.lookup 100 $ M.fromList $ map (\x -> (x,x)) [1..10000000]

Haskell: importing (infix) Data constructors from user added libraries

This is a simple question, but I cannot find the way to use the PSQ library.
The code below is messy, but seems to find PSQ and fromList, but fails to find Binding (Error: Not in scope: data constructor 'Data.PSQueue.Binding'). LearnYouAHaskell does not cover how to use non-standard libraries and I can't find any simple examples that just show PSQ being implemented.
import qualified Data.PSQueue (Binding, PSQ, fromList)
{-
data Binding k p
k :-> p binds the key k with the priority p.
Constructors
k :-> p
data PSQ k p
A mapping from keys k to priorites p.
-}
type VertHeap = Data.PSQueue.PSQ Int Int
main = do
--fromList :: (Ord k, Ord p) => [Binding k p] -> PSQ k p
return $ Data.PSQueue.fromList $ map (\k -> Data.PSQueue.Binding k 1000000) [2..10]
It can be easy to miss, but the data constructor for the Binding type is :->.
So this import should work:
import qualified Data.PSQueue (PSQ,Binding(..),fromList)
and later:
return $ Data.PSQueue.fromList $ map (\k -> k Data.PSQueue.:-> 1000000) [2..10]
Using Binding(..) will import all of the data constructors for the Binding data type.
Edit: :-> is just an infix operator defined by Data.PSQueue. Data.PSQueue.:-> is the fully qualified name for it.
Once I understood how to refer to Binding, I could use a more familiar pattern
import qualified Data.PSQueue as PSQ
type VertHeap = PSQ.PSQ Int Int
main = do
return $ PSQ.fromList $ map (\k -> k PSQ.:-> 1000000) [2..10]

Creating a mutable Data.Vector in Haskell

I wish to create a mutable vector using Data.Vector.Generic.Mutable.new. I have found examples that create a mutable vector by thawing a pure vector, but that's not what I wish to do.
Here is one of many failed attempts:
import Control.Monad.Primitive
import qualified Data.Vector.Generic.Mutable as GM
main = do
v <- (GM.new 10) :: (GM.MVector v a) => IO (v RealWorld a)
GM.write v 0 (3::Int)
x <- GM.read v 0
putStrLn $ show x
giving me the error
No instance for (GM.MVector v0 Int)
arising from an expression type signature
Possible fix: add an instance declaration for (GM.MVector v0 Int)
I tried variations based on the Haskell Vector tutorial with no luck.
I would also welcome suggestion on cleaner ways to construct the vector. The reference to RealWorld seems ugly to me.
The GM.MVector v a constaint is ambigous in v. In other words, from the type information you've given GHC, it still can't figure out what specific instance of GM.MVector you want it to use. For a mutable vector of Int use Data.Vector.Unboxed.Mutable.
import qualified Data.Vector.Unboxed.Mutable as M
main = do
v <- M.new 10
M.write v 0 (3 :: Int)
x <- M.read v 0
print x
I think the problem is that you have to give v a concrete type -- like this:
import Control.Monad.Primitive
import qualified Data.Vector.Mutable as V
import qualified Data.Vector.Generic.Mutable as GM
main = do
v <- GM.new 10 :: IO (V.MVector RealWorld Int)
GM.write v 0 (3::Int)
x <- GM.read v 0
putStrLn $ show x

Resources