Code as follows:
DOMAINS
name = valentin; leonid; valery; andrew; michael
faculty = physics; history; biology; geography
instrument = sax; piano; bass; drums
student = student(name, instrument, faculty)
PREDICATES
nondeterm student(name, instrument, faculty)
CLAUSES
student(michael, sax, _).
student(_, piano, physics).
student(_, drums, _) :- student(leonid, _, _); student(michael, _, _); student(valentin, _, _); student(andrew, _, _).
student(_, _, geography) :- student(michael, _, _); student(valentin, _, _); student(valery, _, _); student(andrew, _, _).
student(michael, _, _) :- student(_, _, physics); student(_, _, geography); student(_, _, biology).
student(andrew, _, _) :- student(_, _, physics); student(_, _, history); student(_, _, geography).
student(andrew, _, _) :- student(_, sax, _); student(_, bass, _); student(_, drums, _).
student(valery, _, _) :- student(_, _, geography); student(_, _, biology); student(_, _, history).
student(_, drums, _) :- student(_, _, physics); student(_, _, geography); student(_, _, biology).
student(leonid, _, _) :- student(_, sax, _); student(_, piano, _); student(_, drums, _).
GOAL
student(valentin, Instrument, Faculty).
I use Visual Prolog 5.
This code compiles, but compiler says there are unbound variables:
_ variables in lines:
student(michael, sax, _).
student(_, piano, physics).
and the second _ variable at line, before geography atom:
student(_, _, geography) :- student(michael, _, _); student(valentin, _, _); student(valery, _, _); student(andrew, _, _).
Is it possible to make this code work?
Related
I am trying to group an RDD by using groupby. Most of the docs suggest not to use groupBy because of how it works internally to group keys. Is there another way to achieve that. I cannot use reducebyKey because I am not doing a reduction operation here.
Ex-
Entry - long id, string name;
JavaRDD<Entry> entries = rdd.groupBy(Entry::getId)
.flatmap(x -> someOp(x))
.values()
.filter()
aggregateByKey [Pair] see
Works like the aggregate function except the aggregation is applied to the values with the same key. Also unlike the aggregate function the initial value is not applied to the second reduce.
Listing Variants
def aggregateByKey[U](zeroValue: U)(seqOp: (U, V) ⇒ U, combOp: (U, U)
⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]
def aggregateByKey[U](zeroValue: U, numPartitions: Int)(seqOp: (U, V) ⇒ U,
combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]
def aggregateByKey[U](zeroValue: U, partitioner: Partitioner)(seqOp: (U,
V) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): RDD[(K, U)]
Example :
val pairRDD = sc.parallelize(List( ("cat",2), ("cat", 5), ("mouse", 4),("cat", 12), ("dog", 12), ("mouse", 2)), 2)
// lets have a look at what is in the partitions
def myfunc(index: Int, iter: Iterator[(String, Int)]) : Iterator[String] = {
iter.map(x => "[partID:" + index + ", val: " + x + "]")
}
pairRDD.mapPartitionsWithIndex(myfunc).collect
res2: Array[String] = Array([partID:0, val: (cat,2)], [partID:0, val: (cat,5)], [partID:0, val: (mouse,4)], [partID:1, val: (cat,12)], [partID:1, val: (dog,12)], [partID:1, val: (mouse,2)])
pairRDD.aggregateByKey(0)(math.max(_, _), _ + _).collect
res3: Array[(String, Int)] = Array((dog,12), (cat,17), (mouse,6))
pairRDD.aggregateByKey(100)(math.max(_, _), _ + _).collect
res4: Array[(String, Int)] = Array((dog,100), (cat,200), (mouse,200))
The following sequential Merge Sort returns the result very quickly :-
def mergeSort(xs: List[Int]): List[Int] = {
def merge(xs: List[Int], ys: List[Int]): List[Int] = (xs, ys) match {
case (Nil, _) => ys
case (_, Nil) => xs
case (x :: xs1, y :: ys1) => if (x <= y) x :: merge(xs1, ys) else y :: merge(xs, ys1)
}
val mid = xs.length / 2
if (mid <= 0) xs
else {
val (xs1, ys1) = xs.splitAt(mid)
merge(mergeSort(xs1), mergeSort(ys1))
}
}
val newList = (1 to 10000).toList.reverse
mergeSort(newList)
However, when I try to parallelize it using Futures, it times out :-
def mergeSort(xs: List[Int]): List[Int] = {
def merge(xs: List[Int], ys: List[Int]): List[Int] = (xs, ys) match {
case (Nil, _) => ys
case (_, Nil) => xs
case (x :: xs1, y :: ys1) => if (x <= y) x :: merge(xs1, ys) else y :: merge(xs, ys1)
}
val mid = xs.length / 2
if (mid <= 0) xs
else {
val (xs1, ys1) = xs.splitAt(mid)
val sortedList1 = Future{mergeSort(xs1)}
val sortedList2 = Future{mergeSort(ys1)}
merge(Await.result(sortedList1,5 seconds), Await.result(sortedList2,5 seconds))
}
}
val newList = (1 to 10000).toList.reverse
mergeSort(newList)
I get a Timeout exception. I understand that this is probably because this code spawns log2 10000 threads which adds a lot of delay as the Execution context Threadpool may not have that many threads.
1.) How do I exploit the inherent parallelism in merge sort and parallelize this code ?
2.) For what use cases are Futures useful and when should they be avoided ?
Edit 1 : Refactored code based on the feedback I've gotten so far :-
def mergeSort(xs: List[Int]): Future[List[Int]] = {
#tailrec
def merge(xs: List[Int], ys: List[Int], acc: List[Int]): List[Int] = (xs, ys) match {
case (Nil, _) => acc.reverse ::: ys
case (_, Nil) => acc.reverse ::: xs
case (x :: xs1, y :: ys1) => if (x <= y) merge(xs1, ys, x :: acc) else merge(xs, ys1, y :: acc)
}
val mid = xs.length / 2
if (mid <= 0) Future {
xs
}
else {
val (xs1, ys1) = xs.splitAt(mid)
val sortedList1 = mergeSort(xs1)
val sortedList2 = mergeSort(ys1)
for (s1 <- sortedList1; s2 <- sortedList2) yield merge(s1, s2, List())
}
}
Usually when using Futures, you should a) await as little as possible and prefer to work within Futures, and b) pay attention to which execution context you are using.
As an example of a), here's how you could change this:
def mergeSort(xs: List[Int]): Future[List[Int]] = {
def merge(xs: List[Int], ys: List[Int]): List[Int] = (xs, ys) match {
case (Nil, _) => ys
case (_, Nil) => xs
case (x :: xs1, y :: ys1) => if (x <= y) x :: merge(xs1, ys) else y :: merge(xs, ys1)
}
val mid = xs.length / 2
if (mid <= 0) Future(xs)
else {
val (xs1, ys1) = xs.splitAt(mid)
val sortedList1 = mergeSort(xs1)
val sortedList2 = mergeSort(ys1)
for (s1 <- sortedList1; s2 <- sortedList2) yield merge(s1, s2)
}
}
val newList = (1 to 10000).toList.reverse
Await.result(mergeSort(newList), 5 seconds)
However there's still a ton of overhead here. Typically you would only parallelize significantly-sized chunks of work to avoid being dominated by overhead, which in this case would probably mean falling back to a single-threaded version when recursion reaches a list below some constant size.
I have a big Excel file, which i read with Excel Provider in F#.
The rows should be grouped by some column. Processing crashes with OutOfMemoryException. Not sure whether the Seq.groupBy call is guilty or excel type provider.
To simplify it I use 3D Point here as a row.
type Point = { x : float; y: float; z: float; }
let points = seq {
for x in 1 .. 1000 do
for y in 1 .. 1000 do
for z in 1 .. 1000 ->
{x = float x; y = float y; z = float z}
}
let groups = points |> Seq.groupBy (fun point -> point.x)
The rows are already ordered by grouped column, e.g. 10 points with x = 10, then 20 points with x = 20 and so one. Instead of grouping them I need just to split the rows in chunks until changed. Is there some way to enumerate the sequence just once and get sequence of rows splitted, not grouped, by some column value or some f(row) value?
If the rows are already ordered then this chunkify function will return a seq<'a list>. Each list will contain all the points with the same x value.
let chunkify pred s = seq {
let values = ref []
for x in s do
match !values with
|h::t -> if pred h x then
values := x::!values
else
yield !values
values := [x]
|[] -> values := [x]
yield !values
}
let chunked = points |> chunkify (fun x y -> x.x = y.x)
Here chunked has a type of
seq<Point list>
Another solution, along the same lines as Kevin's
module Seq =
let chunkBy f src =
seq {
let chunk = ResizeArray()
let mutable key = Unchecked.defaultof<_>
for x in src do
let newKey = f x
if (chunk.Count <> 0) && (newKey <> key) then
yield chunk.ToArray()
chunk.Clear()
key <- newKey
chunk.Add(x)
}
// returns 2 arrays, each with 1000 elements
points |> Seq.chunkBy (fun pt -> pt.y) |> Seq.take 2
Here's a purely functional approach, which is surely slower, and much harder to understand.
module Seq =
let chunkByFold f src =
src
|> Seq.scan (fun (chunk, (key, carry)) x ->
let chunk = defaultArg carry chunk
let newKey = f x
if List.isEmpty chunk then [x], (newKey, None)
elif newKey = key then x :: chunk, (key, None)
else chunk, (newKey, Some([x]))) ([], (Unchecked.defaultof<_>, None))
|> Seq.filter (snd >> snd >> Option.isSome)
|> Seq.map fst
Lets start with the input
let count = 1000
type Point = { x : float; y: float; z: float; }
let points = seq {
for x in 1 .. count do
for y in 1 .. count do
for z in 1 .. count ->
{x = float x; y = float y; z = float z}
}
val count : int = 1000
type Point =
{x: float;
y: float;
z: float;}
val points : seq<Point>
If we try to evalute points then we get a OutOfMemoryException:
points |> Seq.toList
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at Microsoft.FSharp.Collections.FSharpList`1.Cons(T head, FSharpList`1 tail)
at Microsoft.FSharp.Collections.SeqModule.ToList[T](IEnumerable`1 source)
at <StartupCode$FSI_0011>.$FSI_0011.main#()
Stopped due to error
It might be same reason that groupBy fails, but I'm not sure. But it tells us that we have to use seq and yield to return the groups with. So we get this implementation:
let group groupBy points =
let mutable lst = [ ]
seq { for p in points do match lst with | [] -> lst <- [p] | p'::lst' when groupBy p' p -> lst <- p::lst | lst' -> lst <- [p]; yield lst' }
val group : groupBy:('a -> 'a -> bool) -> points:seq<'a> -> seq<'a list>
It is not the most easily read code. It takes each point from the points sequence and prepends it to an accumulator list while the groupBy function is satisfied. If the groupBy function is not satisfied then a new accumulator list is generated and the old one is yielded. Note that the order of the accumulator list is reversed.
Testing the function:
for g in group (fun p' p -> p'.x = p.x ) points do
printfn "%f %i" g.[0].x g.Length
Terminates nicely (after some time).
Other implementation with bug fix and better formatting.
let group (groupBy : 'a -> 'b when 'b : equality) points =
let mutable lst = []
seq {
yield! seq {
for p in points do
match lst with
| [] -> lst <- [ p ]
| p' :: lst' when (groupBy p') = (groupBy p) -> lst <- p :: lst
| lst' ->
lst <- [ p ]
yield (groupBy lst'.Head, lst')
}
yield (groupBy lst.Head, lst)
}
Seems there is no one line purely functional solution or already defined Seq method which I have overseen.
Therefore as an alternative here my own imperative solution. Comparable to #Kevin's answer but actually satisfies more my need. The ref cell contains:
The group key, which is calculated just once for each row
The current chunk list (could be seq to be conform to Seq.groupBy), which contains the elements in the input order for which the f(x) equals to the sored group key (requires equality).
.
let splitByChanged f xs =
let acc = ref (None,[])
seq {
for x in xs do
match !acc with
| None,_ ->
acc := Some (f x),[x]
| Some key, chunk when key = f x ->
acc := Some key, x::chunk
| Some key, chunk ->
let group = chunk |> Seq.toList |> List.rev
yield key, group
acc := Some (f x),[x]
match !acc with
| None,_ -> ()
| Some key,chunk ->
let group = chunk |> Seq.toList |> List.rev
yield key, group
}
points |> splitByChanged (fun point -> point.x)
The function has the following signature:
val splitByChanged :
f:('a -> 'b) -> xs:seq<'a> -> seq<'b * 'a list> when 'b : equality
Correctures and even better solutions are welcome
I'm trying to write a function that uses references and destructively updates a sorted linked list while inserting a value.
My code is as follows:
Control.Print.printDepth := 100;
datatype 'a rlist = Empty | RCons of 'a * (('a rlist) ref);
fun insert(comp: (('a * 'a) -> bool), toInsert: 'a, lst: (('a rlist) ref)): unit =
let
val r = ref Empty;
fun insert' comp toInsert lst =
case !lst of
Empty => r := (RCons(toInsert, ref Empty))
| RCons(x, L) => if comp(toInsert, x) then r := (RCons(toInsert, lst))
else ((insert(comp,toInsert,L)) ; (r := (RCons(x, L))));
in
insert' comp toInsert lst ; lst := !r
end;
val nodes = ref (RCons(1, ref (RCons(2, ref (RCons(3, ref (RCons(5, ref Empty))))))));
val _ = insert((fn (x,y) => if x <= y then true else false), 4, nodes);
!nodes;
!nodes returns
val it = RCons (1,ref (RCons (2,ref (RCons (3,ref (RCons (4,%1)) as %1)))))
: int rlist
when it should return
val it = RCons (1,ref (RCons (2,ref (RCons (3,ref (RCons (4, ref (RCons(5, ref Empty))))))))
: int rlist
It means that your code is buggy, and has returned a cyclic list, where the tail of ref(RCons(4, ...)) is actually the same ref(RCons(4, ...)) again.
Remarks:
You don't need to pass comp and toInsert to the inner function, they are already in scope.
if C then true else false is the same as writing just C.
In SML, you typically use comparison functions of type t * t -> order, and they are predefined in the library, see e.g. Int.compare.
About 70% of the parentheses in your code are redundant.
You don't normally want to use such a data structure in ML.
If you absolutely have to, here is how I would write the code:
datatype 'a rlist' = Empty | Cons of 'a * 'a rlist
withtype 'a rlist = 'a rlist' ref
fun insert compare x l =
case !l of
Empty => l := Cons(x, ref Empty)
| Cons(y, l') =>
case compare(x, y) of
LESS => l := Cons(x, ref(!l))
| EQUAL => () (* or whatever else you want to do in this case *)
| GREATER => insert compare x l'
Changed
| RCons(x, L) => if comp(toInsert, x) then r := (RCons(toInsert, lst))
to
| RCons(x, L) => if comp(toInsert, x) then r := (RCons(toInsert, ref(!lst)))
I am trying to implement tree fold in rust. My first attempt compiles and runs as expected.
pub enum Tree<T> {
Leaf,
Node(Box<Tree<T>>, T, Box<Tree<T>>)
}
impl<T, U: Copy> Tree<T> {
fn fold(self, f: |l: U, x: T, r: U| -> U, acc: U) -> U {
match self {
Leaf => acc,
Node(box l, x, box r) => {
let l = l.fold(|l,x,r| {f(l,x,r)}, acc);
let r = r.fold(|l,x,r| {f(l,x,r)}, acc);
f(l, x, r)
}
}
}
}
fn main() {
let tl = Node(box Leaf, 1i, box Leaf);
let tr = Node(box Leaf, 2i, box Leaf);
let t = Node(box tl, 3i, box tr);
println!("size(t) == {}", t.fold(|l,_,r|{l + 1i + r}, 0))
}
However, when I try to move the implementation of size into the impl block to make it a method:
pub enum Tree<T> {
Leaf,
Node(Box<Tree<T>>, T, Box<Tree<T>>)
}
impl<T, U: Copy> Tree<T> {
fn fold(self, f: |l: U, x: T, r: U| -> U, acc: U) -> U {
match self {
Leaf => acc,
Node(box l, x, box r) => {
let l = l.fold(|l,x,r| {f(l,x,r)}, acc);
let r = r.fold(|l,x,r| {f(l,x,r)}, acc);
f(l, x, r)
}
}
}
fn size(self) -> uint {
self.fold(|l, _, r| {l + 1u + r}, 0u)
}
}
fn main() {
let tl = Node(box Leaf, 1i, box Leaf);
let tr = Node(box Leaf, 2i, box Leaf);
let t = Node(box tl, 3i, box tr);
println!("size(t) == {}", t.size())
}
I get the following error in the rust playpen.
<anon>:28:31: 28:39 error: cannot determine a type for this expression: unconstrained type
<anon>:28 println!("size(t) == {}", t.size())
^~~~~~~~
note: in expansion of format_args!
<std macros>:2:23: 2:77 note: expansion site
<std macros>:1:1: 3:2 note: in expansion of println!
<anon>:28:5: 29:2 note: expansion site
error: aborting due to previous error
playpen: application terminated with error code 101
Program ended.
I was hoping someone could shed some light on what I'm doing wrong and how to fix it.
There is a crucial difference between your two things.
In the first, you had this:
t.fold(|l,x,r|{l + x + r}, 0)
In the second, you have this (shown with self changed to t):
t.fold(|l, x, r| {l + 1 + r}, 0)
See the difference? l + 1 + r is not l + x + r.
(Since then, all cases have become l + 1 + r, for size, rather than l + x + r, for sum.)
After you’ve done that, you’ll run into issues because uint is not int. You’ll need to sort out your Ts and Us. Basically, you want l, x, r and 0 all to be of the same type, the T of earlier. This requires further constraints on T:
It must be Copy, to satisfy U.
You must be able to add a T to a T and get a T. This is std::num::Add<T, T>.
You must be able to get a zero of type T. That is the std::num::Zero trait and the Zero::zero() method.
You must be able to get a one of type T. That is the std::num::One trait and the One::one() method.
While we’re at it, U should probably be a generic on the fold function specifically rather than the impl block, though either will do.
In the end, we end up with this functioning code:
use std::num::Zero;
pub enum Tree<T> {
Leaf,
Node(Box<Tree<T>>, T, Box<Tree<T>>)
}
impl<T> Tree<T> {
fn fold<U: Copy>(self, f: |l: U, x: T, r: U| -> U, acc: U) -> U {
match self {
Leaf => acc,
Node(box l, x, box r) => {
let l = l.fold(|l, x, r| f(l, x, r), acc);
let r = r.fold(|l, x, r| f(l, x, r), acc);
f(l, x, r)
}
}
}
}
impl<T: Copy + Add<T, T> + Zero + One> Tree<T> {
fn size(self) -> T {
self.fold(|l: T, _: T, r: T| l + One::one() + r, Zero::zero())
}
}
fn main() {
let tl = Node(box Leaf, 1i, box Leaf);
let tr = Node(box Leaf, 2i, box Leaf);
let t = Node(box tl, 3i, box tr);
println!("size(t) == {}", t.size())
}
(Note how the curly braces around the contents of a closure aren’t necessary, too.)