group by until changed sequence - excel

I have a big Excel file, which i read with Excel Provider in F#.
The rows should be grouped by some column. Processing crashes with OutOfMemoryException. Not sure whether the Seq.groupBy call is guilty or excel type provider.
To simplify it I use 3D Point here as a row.
type Point = { x : float; y: float; z: float; }
let points = seq {
for x in 1 .. 1000 do
for y in 1 .. 1000 do
for z in 1 .. 1000 ->
{x = float x; y = float y; z = float z}
}
let groups = points |> Seq.groupBy (fun point -> point.x)
The rows are already ordered by grouped column, e.g. 10 points with x = 10, then 20 points with x = 20 and so one. Instead of grouping them I need just to split the rows in chunks until changed. Is there some way to enumerate the sequence just once and get sequence of rows splitted, not grouped, by some column value or some f(row) value?

If the rows are already ordered then this chunkify function will return a seq<'a list>. Each list will contain all the points with the same x value.
let chunkify pred s = seq {
let values = ref []
for x in s do
match !values with
|h::t -> if pred h x then
values := x::!values
else
yield !values
values := [x]
|[] -> values := [x]
yield !values
}
let chunked = points |> chunkify (fun x y -> x.x = y.x)
Here chunked has a type of
seq<Point list>

Another solution, along the same lines as Kevin's
module Seq =
let chunkBy f src =
seq {
let chunk = ResizeArray()
let mutable key = Unchecked.defaultof<_>
for x in src do
let newKey = f x
if (chunk.Count <> 0) && (newKey <> key) then
yield chunk.ToArray()
chunk.Clear()
key <- newKey
chunk.Add(x)
}
// returns 2 arrays, each with 1000 elements
points |> Seq.chunkBy (fun pt -> pt.y) |> Seq.take 2
Here's a purely functional approach, which is surely slower, and much harder to understand.
module Seq =
let chunkByFold f src =
src
|> Seq.scan (fun (chunk, (key, carry)) x ->
let chunk = defaultArg carry chunk
let newKey = f x
if List.isEmpty chunk then [x], (newKey, None)
elif newKey = key then x :: chunk, (key, None)
else chunk, (newKey, Some([x]))) ([], (Unchecked.defaultof<_>, None))
|> Seq.filter (snd >> snd >> Option.isSome)
|> Seq.map fst

Lets start with the input
let count = 1000
type Point = { x : float; y: float; z: float; }
let points = seq {
for x in 1 .. count do
for y in 1 .. count do
for z in 1 .. count ->
{x = float x; y = float y; z = float z}
}
val count : int = 1000
type Point =
{x: float;
y: float;
z: float;}
val points : seq<Point>
If we try to evalute points then we get a OutOfMemoryException:
points |> Seq.toList
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at Microsoft.FSharp.Collections.FSharpList`1.Cons(T head, FSharpList`1 tail)
at Microsoft.FSharp.Collections.SeqModule.ToList[T](IEnumerable`1 source)
at <StartupCode$FSI_0011>.$FSI_0011.main#()
Stopped due to error
It might be same reason that groupBy fails, but I'm not sure. But it tells us that we have to use seq and yield to return the groups with. So we get this implementation:
let group groupBy points =
let mutable lst = [ ]
seq { for p in points do match lst with | [] -> lst <- [p] | p'::lst' when groupBy p' p -> lst <- p::lst | lst' -> lst <- [p]; yield lst' }
val group : groupBy:('a -> 'a -> bool) -> points:seq<'a> -> seq<'a list>
It is not the most easily read code. It takes each point from the points sequence and prepends it to an accumulator list while the groupBy function is satisfied. If the groupBy function is not satisfied then a new accumulator list is generated and the old one is yielded. Note that the order of the accumulator list is reversed.
Testing the function:
for g in group (fun p' p -> p'.x = p.x ) points do
printfn "%f %i" g.[0].x g.Length
Terminates nicely (after some time).
Other implementation with bug fix and better formatting.
let group (groupBy : 'a -> 'b when 'b : equality) points =
let mutable lst = []
seq {
yield! seq {
for p in points do
match lst with
| [] -> lst <- [ p ]
| p' :: lst' when (groupBy p') = (groupBy p) -> lst <- p :: lst
| lst' ->
lst <- [ p ]
yield (groupBy lst'.Head, lst')
}
yield (groupBy lst.Head, lst)
}

Seems there is no one line purely functional solution or already defined Seq method which I have overseen.
Therefore as an alternative here my own imperative solution. Comparable to #Kevin's answer but actually satisfies more my need. The ref cell contains:
The group key, which is calculated just once for each row
The current chunk list (could be seq to be conform to Seq.groupBy), which contains the elements in the input order for which the f(x) equals to the sored group key (requires equality).
.
let splitByChanged f xs =
let acc = ref (None,[])
seq {
for x in xs do
match !acc with
| None,_ ->
acc := Some (f x),[x]
| Some key, chunk when key = f x ->
acc := Some key, x::chunk
| Some key, chunk ->
let group = chunk |> Seq.toList |> List.rev
yield key, group
acc := Some (f x),[x]
match !acc with
| None,_ -> ()
| Some key,chunk ->
let group = chunk |> Seq.toList |> List.rev
yield key, group
}
points |> splitByChanged (fun point -> point.x)
The function has the following signature:
val splitByChanged :
f:('a -> 'b) -> xs:seq<'a> -> seq<'b * 'a list> when 'b : equality
Correctures and even better solutions are welcome

Related

How do I create a dictionary in OCaml that associates to each element of the first list the number of its occurences in the second list?

I have two lists, ["0","1"] and ["0","1","0"], and I want to get a list, [(0,2),(1,1)], - which associates to each element of the first list the number of its occurrences in the second list. I tried this:
let partialSolution =
List.map(
fun x ->
let subsubList = List.for_all (fun y -> (String.compare x y)=0) ) ("0"::("1"::("0"::[]))) in
(x, List.length ( List.filter ( fun z -> z = true) subsubList ) )
) ;;
, but it's not good: it gives me these errors:
# let partialSolution =
# List.map(
# fun x ->
# let subsubList = List.for_all (fun y -> (String.compare x y)=0) )
File "", line 4, characters 73-74:
Error: Syntax error: operator expected.
# ("0"::("1"::("0"::[]))) in
File "", line 4, characters 99-101:
Error: Syntax error
# (x, List.length ( List.filter ( fun z -> z = true) subsubList ) )
# )
File "", line 6, characters 0-1:
Error: Syntax error
# ;;
File "", line 6, characters 2-4:
Error: Syntax error
I would like to understand how I can fix this - I am a total newbie to OCaml.
You're a bit overzealous with the parentheses. The syntax error is caused by an extra closing parentheses after (fun y -> ...).
But you'll still have a type error, since List.for_all returns a bool, true if all items satisfy the predicate and false otherwise. It seems you want to use List.map here instead.
You also don't need to surround every use of :: with parentheses. ("0"::"1"::"0"::[]) is fine, but you can also just reduce this to a simple list literal: ["0"; "1"; "0"]. Additionally, z = true is equivalent to z, though perhaps slightly less readable.
This compiles. I haven't checked whether it actually does what you want though:
let partialSolution =
List.map
begin fun x ->
let subsubList =
List.map
(fun y -> String.compare x y = 0)
["0"; "1"; "0"]
in
(x, List.length (List.filter (fun z -> z) subsubList))
end
Also, if you're using 4.03 or higher you can use String.equal, and if you're using 4.08 you can use Fun.id instead of the ad hoc lambda functions:
let partialSolution =
List.map
begin fun x ->
let subsubList =
List.map (String.equal x) ["0"; "1"; "0"]
in
(x, List.length (List.filter Fun.id subsubList))
end
Or instead of dealing with an intermediate bool list, you could use List.fold_left to do a count directly:
let partialSolution =
List.map
begin fun x ->
let count =
List.fold_left
(fun count y ->
if String.compare x y = 0 then
count + 1
else
count)
0 ["0"; "1"; "0"]
in
(x, count)
end

Counting the occurrences of chararacters in a string

I got the task to count the number of occurrences of each (lower case) character in a string. I am not allowed to use any function of the library, I came up with the following, working solution.
occur :: String -> [(Char,Int)]
occur y = [ (x,count x y) | x<-['a'..'z'], count x y > 0]
I was trying at first:
occur2 :: String -> [(Char,Int)]
occur2 y = [ (x,z) | x<-['a'..'z'], z<- count x y, count x y > 0]
I defined the helper function count like this:
count :: Char -> String -> Int
count k str = length [n | n <- str, n == k]
Two questions:
Why is occur2 not working?
Is there any way to define occur without my aux function count?
occur2 isn't working because count x y is not a list, so it can't be used for a generator expression like in z <- count x y. Instead, use a let expression.
You can remove the count definition by inlining it.
occur :: String -> [(Char,Int)]
occur y = [ (x,z) | x <- ['a'..'z'], let z = length [n | n <- y, n == x], z > 0]
If you were to use libraries, a simple and efficient implementation would be to use a MultiSet.
import qualified Data.MultiSet as MS
occur :: String -> [(Char,Int)]
occur = MS.toAscOccurList . MS.fromList . filter (\c -> c >= 'a' && c <= 'z')

F# - Remove duplicate characters after first in string

What I am trying to do is to remove duplicates of a specific given char in a string but letting the first char to remain. I.e:
let myStr = "hi. my .name."
//a function that gets a string and the element to be removed in the string
someFunc myStr "."
where someFunc returns the string showen as below:
"hi. my name"
It is easy to remove duplicates from a string, but is there a way to remove the duplicates but letting the first duplicated element remain in the string?
Here's one approach:
let keepFirst c s =
Seq.mapFold (fun k c' -> (c', k||c<>c'), k&&c<>c') true s
|> fst
|> Seq.filter snd
|> Seq.map fst
|> Array.ofSeq
|> System.String
let example = keepFirst '.' "hi. my .name."
let someFunc (str : string) c =
let parts = str.Split([| c |])
if Array.length parts > 1 then
seq {
yield Array.head parts
yield string c
yield! Array.tail parts
}
|> String.concat ""
else
str
Note that the character is given as char instead of a string.
let someFunc chr (str:string) =
let rec loop (a: char list) b = function
| [] -> a |> List.rev |> System.String.Concat
| h::t when h = chr -> if b then loop a b t
else loop (h::a) true t
| h::t -> loop (h::a) b t
loop [] false (str.ToCharArray() |> Array.toList)
Note that the character is given as char instead of a string.
Edit: Another way would be using regular expressions
open System.Text.RegularExpressions
let someOtherFunc c s =
let pat = Regex.Escape(c)
Regex.Replace(s, sprintf "(?<=%s.*)%s" pat pat, "")
Note that, in this case the character is given as string.
Edit 2:
let oneMoreFunc (c:char) (s:string) =
let pred = (<>) c
[ s |> Seq.takeWhile pred
seq [c]
s |> Seq.skipWhile pred |> Seq.filter pred ]
|> Seq.concat
|> System.String.Concat
When devising a function, think about gains from making its arguments generic. To pass state through the iteration, barring mutable variables, Seq.scan could be a weapon of choice. It folds into a tuple of new state and an option, then Seq.choose strips out the state and the unwanted elements.
In terms of functional building blocks, make it accept a predicate function 'a -> bool and let it return a function seq<'a> -> seq<'a>.
let filterDuplicates predicate =
Seq.scan (fun (flag, _) x ->
let p = predicate x in flag || p,
if flag && p then None else Some x ) (false, None)
>> Seq.choose snd
This can then easily reused to do other things as well, like 0 together with odd numbers.
filterDuplicates (fun i -> i % 2 = 0) [0..10]
// val it : seq<int> = seq [0; 1; 3; 5; ...]
Supplied with a call to the equality operator and fed into the constructor of System.String, you'll get near the signature you want, char -> seq<char> -> System.String.
let filterDuplicatesOfChar what s =
System.String(Array.ofSeq <| filterDuplicates ((=) what) s)
filterDuplicatesOfChar '.' "hi. my .name."
// val it : string = "hi. my name"

Getting parse error while doing list comprehensions in haskell

I'm writing a function like this:
testing :: [Int] -> [Int] -> [Int]
testing lst1 lst2 =
let t = [ r | (x,y) <- zip lst1 lst2, let r = if y == 0 && x == 2 then 2 else y ]
let t1 = [ w | (u,v) <- zip t (tail t), let w = if (u == 2) && (v == 0) then 2 else v]
head t : t1
What the first let does is: return a list like this: [2,0,0,0,1,0], from the second let and the following line, I want the output to be like this: [2,2,2,2,1,0]. But, it's not working and giving parse error!!
What am I doing wrong?
There are two kinds of lets: the "let/in" kind, which can appear anywhere an expression can, and the "let with no in" kind, which must appear in a comprehension or do block. Since your function definition isn't in either, its let's must use an in, for example:
testing :: [Int] -> [Int] -> [Int]
testing lst1 lst2 =
let t = [ r | (x,y) <- zip lst1 lst2, let r = if y == 0 && x == 2 then 2 else y ] in
let t1 = [ w | (u,v) <- zip t (tail t), let w = if (x == 2) && (y == 0) then 2 else y] in
return (head t : t1)
Alternately, since you can define multiple things in each let, you could consider:
testing :: [Int] -> [Int] -> [Int]
testing lst1 lst2 =
let t = [ r | (x,y) <- zip lst1 lst2, let r = if y == 0 && x == 2 then 2 else y ]
t1 = [ w | (u,v) <- zip t (tail t), let w = if (x == 2) && (y == 0) then 2 else y]
in return (head t : t1)
The code has other problems, but this should get you to the point where it parses, at least.
With an expression formed by a let-binding, you generally need
let bindings
in
expressions
(there are exceptions when monads are involved).
So, your code can be rewritten as follows (with simplification of r and w, which were not really necessary):
testing :: [Int] -> [Int] -> [Int]
testing lst1 lst2 =
let t = [ if y == 0 && x == 2 then 2 else y | (x,y) <- zip lst1 lst2]
t1 = [ if (v == 0) && (u == 2) then 2 else v | (u,v) <- zip t (tail t)]
in
head t : t1
(Note, I also switched u and v so that t1 and t has similar forms.
Now given a list like [2,0,0,0,1,0], it appears that your code is trying to replace 0 with 2 if the previous element is 2 (from the pattern of your code), so that eventually, the desired output is [2,2,2,2,1,0].
To achieve this, it is not enough to use two list comprehensions or any fixed number of comprehensions. You need to somehow apply this process recursively (again and again). So instead of only doing 2 steps, we can write out one step, (and apply it repeatedly). Taking your t1 = ... line, the one step function can be:
testing' lst =
let
t1 = [ if (u == 2) && (v == 0) then 2 else v | (u,v) <- zip lst (tail lst)]
in
head lst : t1
Now this gives:
*Main> testing' [2,0,0,0,1,0]
[2,2,0,0,1,0]
, as expected.
The rest of the job is to apply testing' as many times as necessary. Here applying it (length lst) times should suffice. So, we can first write a helper function to apply another function n times on a parameter, as follows:
apply_n 0 f x = x
apply_n n f x = f $ apply_n (n - 1) f x
This gives you what you expected:
*Main> apply_n (length [2,0,0,0,1,0]) testing' [2,0,0,0,1,0]
[2,2,2,2,1,0]
Of course, you can wrap the above in one function like:
testing'' lst = apply_n (length lst) testing' lst
and in the end:
*Main> testing'' [2,0,0,0,1,0]
[2,2,2,2,1,0]
NOTE: this is not the only way to do the filling, see the fill2 function in my answer to another question for an example of achieving the same thing using a finite state machine.

How To Change List of Chars To String?

In F# I want to transform a list of chars into a string. Consider the following code:
let lChars = ['a';'b';'c']
If I simply do lChars.ToString, I get "['a';'b';'c']". I'm trying to get "abc". I realize I could probably do a List.reduce to get the effect I'm looking for but it seems like there should be some primitive built into the library to do this.
To give a little context to this, I'm doing some manipulation on individual characters in a string and when I'm done, I want to display the resulting string.
I've tried googling this and no joy that way. Do I need to just bite the bullet and build a List.reduce expression to do this transformation or is there some more elegant way to do this?
Have you tried
System.String.Concat(Array.ofList(lChars))
How many ways can you build a string in F#?
Here's another handful:
let chars = ['H';'e';'l';'l';'o';',';' ';'w';'o';'r';'l';'d';'!']
//Using an array builder
let hw1 = new string [|for c in chars -> c|]
//StringBuilder-Lisp-like approach
open System.Text
let hw2 =
string (List.fold (fun (sb:StringBuilder) (c:char) -> sb.Append(c))
(new StringBuilder())
chars)
//Continuation passing style
let hw3 =
let rec aux L k =
match L with
| [] -> k ""
| h::t -> aux t (fun rest -> k (string h + rest) )
aux chars id
Edit: timings may be interesting? I turned hw1..3 into functions and fed them a list of 500000 random characters:
hw1: 51ms
hw2: 16ms
hw3: er... long enough to grow a beard? I think it just ate all of my memory.
Didn't see this one here, so:
let stringFromCharList (cl : char list) =
String.concat "" <| List.map string cl
"" is just an empty string.
FSI output:
> stringFromCharList ['a'..'d'];;
val it : string = "abcd"
EDIT:
Didn't like this syntax coming back to this so here's a more canonically functional one:
['a'..'z'] |> List.map string |> List.reduce (+)
['a';'b';'c'] |> List.fold_left (fun acc c -> acc ^ (string c)) ""
Edited:
Here is yet another funny way to do your task:
type t =
| N
| S of string
static member Zero
with get() = N
static member (+) (a: t, b: t) =
match a,b with
| S a, S b -> S (a+b)
| N, _ -> b
| _, N -> a
let string_of_t = function
|N -> ""
|S s -> s
let t_of_char c = S (string c)
['a'; 'b'; 'c'] |> List.map t_of_char |> List.sum |> string_of_t
Sadly, just extending System.String with 'Zero' member does not allow to use List.sum with strings.
Edited (answer to Juilet):
Yes, you are right, left fold is slow. But i know more slow right fold :) :
#r "FSharp.PowerPack"
List.fold_right (String.make 1 >> (^)) ['a';'b';'c'] ""
and of course there is fast and simple:
new System.String(List.to_array ['1';'2';'3'])
And i used 'sprintf' seems to me easier:
let t = "Not what you might expect"
let r = [ for i in "aeiou" -> i]
let q = [for a in t do if not (List.exists (fun x -> x=a) r) then yield a]
let rec m = function [] -> "" | h::t -> (sprintf "%c" h) + (m t)
printfn "%A" (m q)
The following solution works for me:
let charList = ["H";"E";"L";"L";"O"]
let rec buildString list =
match list with
| [] -> ""
| head::tail -> head + (buildString tail)
let resultBuildString = buildString charList
[|'w'; 'i'; 'l'; 'l'|]
|> Array.map string
|> Array.reduce (+)
or as someone else posted:
System.String.Concat([|'w'; 'i'; 'l'; 'l'|])

Resources