Using count, groupBy, as in Slick

Using count, groupBy, as in Slick - slick

How can I convert this sql statement to Slick in most precise, optimal way.
select t.*, count(v.userId) as vote from Talk t inner join Vote v on t.id = v.talkId group by t.id
v.talkId column is a Foreign Key to id column of Talk
Talk Model:
id
description
speaker_id
pledged_date
create_date_time
locked_date
is_approved
Vote Model:
user_id
talk_id
I tried this however it throws exception SlickException: Cannot select Path s2 in Ref s3
val x = for {
t <- models.slick.Talks
v <- models.slick.Votes if t.id === v.talkId
} yield (t, Query(models.slick.Votes).filter(_.talkId === t.id).length)
val y = x.groupBy(_._1.id)

val x = (for {
t <- models.slick.Talks
v <- models.slick.Votes if t.id === v.talkId
} yield (t, v)).groupBy(_._1).map{ case (t,tvs) => (t,tvs.map(_._2).length) }

Related

What is the best way to count adjacent edges by their name for each vertex?

I'm trying to count adjacent edges by their collection names.
For example, I have a vertex collection 'User' which has outbound edges to ['visited', 'add_to_cart', 'purchased'].
For each vertex user, I'd like to count adjacent edges by their collection names.
So the final return would be like
{
user_id : "user_1",
visit_count : 3,
add_to_cart_count : 5,
purchase_cnt : 1
}
I've tried the following query, but I doubt it makes the best performance as it uses if else condition and I guess it hinders the overall performance.
The query I tried :
FOR user IN User
FOR v, e, p IN OUTBOUND user visited, add_to_cart, purchased
COLLECT user_id = user.user_id
AGGREGATE
visit_count = SUM(SPLIT(e._id, '/')[0] == 'visited'? 1 : 0),
add_to_cart_count = SUM(SPLIT(e._id, '/')[0] == 'add_to_cart'? 1 : 0),
purchase_cnt = SUM(SPLIT(e._id, '/')[0] == 'purchased'? 1 : 0)
RETURN {
user_id, visit_count, add_to_cart_count, purchase_cnt
}
If IT IS the best way, would there be any index-related gains I can get get use of?
Looking forward to your help :)
Thanks.

thanks to tobias from arangoDB community, I could make it about 30% faster.
LET vis = (FOR e IN visited COLLECT user_id = e._from WITH COUNT INTO n RETURN {user_id, visit_count: n})
LET cart = (FOR e IN add_to_cart COLLECT user_id = e._from WITH COUNT INTO n RETURN {user_id, add_to_cart_count: n})
LET purc = (FOR e IN purchased COLLECT user_id = e._from WITH COUNT INTO n RETURN {user_id, purchase_cnt: n})
FOR x IN UNION(vis, cart, purc)
COLLECT user_id = x.user_id AGGREGATE visit_count = SUM(x.visit_count), add_to_cart_count = SUM(x.add_to_cart_count), purchase_cnt = SUM(x.purchase_cnt)
RETURN {user_id, visit_count, add_to_cart_count, purchase_cnt}
The point he mentioned was to collect the edge collections directly!

ArangoDB insert edge doesn't exists

I want to create unique edge between docment collection C1 and C3.
The unique constraint is id and kid.
I use the flow aql to create it, but i get more than one edge in the same id and kid.
how can i achieve it?
sorry for my poor english:)
for i in C1
filter i.id != null and i.id != ''
let exist = first(
for c in C2
filter i.id == c.id and i.kid == c.kid
limit 1
return c
)
filter exist == null
let result = first(
for h in C3
filter i.kid == h.kid
limit 1
return h
)
insert{_from:i._id, _to:result._id, id:i.id, kid:i.kid} INTO C2

My English not very good too^)!
But I think, that i see where you made a mistakes.
First, for your two collection, you can use this code:
LET data = [
{"parent":{"ID":"YOU_MUST_WRITE_HERE_ID_C1"},"child":{"KID":"YOU_MUST_WRITE_HERE_KID_C3"}},
{"parent":{"ID":"YOU_MUST_WRITE_HERE_NEXT_ID_C1"},"child":{"KID":"YOU_MUST_WRITE_HERE_NEXT_KID_C3"}}
]
FOR rel in data
LET parentId = FIRST(
FOR c IN C1
FILTER c.GUID == rel.parent.ID
LIMIT 1
RETURN c._id
)
LET childId = FIRST(
FOR c IN C3
FILTER c.GUID == rel.child.KID
LIMIT 1
RETURN c._id
)
FILTER parentId != null AND childId != null
INSERT { _from: childId, _to: parentId } INTO C2
RETURN NEW
I hope that it help you.
Second - Why do you use the С2 collection in this fragment?
let exist = first(
for c in C2

treeAggregate use case explanation

I am trying to understand treeAggregate but there isn't enough examples online.
So does the following code merges the elements of partition then calls makeSummary and in parallel do the same for each partition (sums the result and summarizes it again) then with depth set to (lets say) 5, is this repeated 5 times?
The result I want to get from these is to summarize the arrays until I get one of them.
val summary = input.transform(rdd=>{
rdd.treeAggregate(initialSet)(addToSet,mergePartitionSets,5)
// this returns Array[Double] not rdd but still
})
val initialSet = Array.empty[Double]
def addToSet = (s: Array[Double], v: (Int,Array[Double])) => {
val p=s ++ v._2
val ret = makeSummary(p,10000)
ret
}
val mergePartitionSets = (p1: Array[Double], p2: Array[Double]) => {
val p = p1 ++ p2
val ret = makeSummary(p,10000)
ret
}
//makeSummary selects half of the points of p randomly

How to handle optional db step in slick 3?

I'm sure I'm simply facing a mental block with the functional model of Slick 3, but I cannot discern how to transactionally sequence an optional dependent db step in Slick 3. Specifically, I have a table with an optional (nullable) foreign key and I want it to be set to the ID of the inserted dependent record (if any, else null). That is, roughly:
if ( x is non null )
start transaction
id = insert x
insert y(x = id)
commit
else
start transaction
insert y(x = null)
commit
Of course, I'd rather not have the big if around the choice. Dependencies without the Option[] seem (relatively) straightforward, but the option is throwing me.
Precise example code (sans imports) follows. In this example, the question is how to save both x (a) and y (b) in the same transaction both if y is None or not. Saving Y itself seems straightforward enough as every related C has a non-optional B reference, but addressing the optional reference in A is unclear (to me).
object test {
implicit val db = Database.forURL("jdbc:h2:mem:DataTableTypesTest;DB_CLOSE_DELAY=-1", driver = "org.h2.Driver")
/* Data model */
case class A(id: Long, b: Option[Long], s: String)
class As(tag: Tag) extends Table[A](tag, "As") {
def id = column[Long]("ID", O.PrimaryKey, O.AutoInc)
def b = column[Option[Long]]("B")
def s = column[String]("S")
def * = (id, b, s) <> (A.tupled, A.unapply)
}
val as = TableQuery[As]
case class B(id: Long, s: String)
class Bs(tag: Tag) extends Table[B](tag, "Bs") {
def id = column[Long]("ID", O.PrimaryKey, O.AutoInc)
def s = column[String]("S")
def * = (id, s) <> (B.tupled, B.unapply)
}
val bs = TableQuery[Bs]
case class C(id: Long, b: Long, s: String)
class Cs(tag: Tag) extends Table[C](tag, "Cs") {
def id = column[Long]("ID", O.PrimaryKey, O.AutoInc)
def b = column[Long]("B")
def s = column[String]("S")
def * = (id, b, s) <> (C.tupled, C.unapply)
}
val cs = TableQuery[Cs]
/* Object model */
case class X(id: Long, s: String, y: Option[Y])
case class Y(id: Long, s: String, z: Set[Z])
case class Z(id: Long, s: String)
/* Mappers */
def xToA(x: X, bId: Option[Long]): A = { A(x.id, bId, x.s) }
def yToB(y: Y): B = { B(y.id, y.s) }
def zToC(z: Z, bId: Long): C = { C(z.id, bId, z.s) }
/* Given */
val example1 = X(0, "X1", Some(Y(0, "Y1", Set(Z(0, "Z11"), Z(0, "Z12")))))
val example2 = X(0, "X2", Some(Y(0, "Y2", Set())))
val example3 = X(0, "X3", None)
Await.result(db.run((as.schema ++ bs.schema ++ cs.schema).create), 10.seconds)
val examples = Seq(example1, example2, example3)
for ( example <- examples ) {
val saveY = (for { y <- example.y }
yield ( for {
id <- (bs returning bs.map(_.id)) += yToB(y)
_ <- cs ++= y.z.map(zToC(_, id))
} yield id) transactionally)
if ( saveY.isDefined ) Await.result(db.run(saveY.get), 10.seconds)
}
println(Await.result(
db.run(
(for { a <- as } yield a).result
),
10.seconds
))
println(Await.result(
db.run(
(for { b <- bs } yield b).result
),
10.seconds
))
println(Await.result(
db.run(
(for { c <- cs } yield c).result
),
10.seconds
))
}

This is fairly straightforward; just use the monadic-ness of DBIO:
// Input B value; this is your `x` in the question.
val x: Option[B] = _
// Assume `y` is fully-initialized with a `None` `b` value.
val y: A = _
// DBIO wrapping the newly-inserted ID, if `x` is set.
val maybeInsertX: DBIO[Option[Int]] = x match {
case Some(xToInsert) =>
// Insert and return the new ID.
val newId: DBIO[Int] = bs.returning(bs.map(_.id)) += xToInsert
// Map to the expected Option.
newId.map(Some(_))
case None =>
// No x means no ID.
DBIO.successful(None)
}
// Now perform your insert, copying in the newly-generated ID.
val insertA: DBIO[Int] = maybeInsertX.flatMap(bIdOption =>
as += y.copy(b = bIdOption)
)
// Run transactionally.
db.run(insertA.transactionally)

How to find out the machine in the cluster which stores a given element in RDD and send a message to it?

I want to know if in an RDD, for example, RDD = {"0", "1", "2",... "99999"}, can I find out the machine in the cluster which stores a given element (e.g.: 100)?
And then in shuffle, can I aggregate some data and send it to the certain machine? I know that the partition of RDD is transparent for users but could I use some method like key/value to achieve that?

Generally speaking the answer is no or at least not with RDD API. If you can express your logic using graphs then you can try message based API in GraphX or Giraph. If not then using Akka directly instead of Spark could be a better choice.
Still, there are some workarounds but I wouldn't expect high performance. Lets start with some dummy data:
import org.apache.spark.rdd.RDD
val toPairs = (s: Range) => s.map(_.toChar.toString)
val rdd: RDD[(Int, String)] = sc.parallelize(Seq(
(0, toPairs(97 to 100)), // a-d
(1, toPairs(101 to 107)), // e-k
(2, toPairs(108 to 115)) // l-s
)).flatMap{ case (i, vs) => vs.map(v => (i, v)) }
and partition it using custom partitioner:
import org.apache.spark.Partitioner
class IdentityPartitioner(n: Int) extends Partitioner {
def numPartitions: Int = n
def getPartition(key: Any): Int = key.asInstanceOf[Int]
}
val partitioner = new IdentityPartitioner(4)
val parts = rdd.partitionBy(partitioner)
Now we have RDD with 4 partitions including one empty:
parts.mapPartitionsWithIndex((i, iter) => Iterator((i, iter.size))).collect
// Array[(Int, Int)] = Array((0,4), (1,7), (2,8), (3,0))
The simplest thing you can do is to leverage partitioning itself. First a dummy function and a helper:
// Dummy map function
def transform(s: String) =
Map("e" -> "x", "k" -> "y", "l" -> "z").withDefault(identity)(s)
// Map String to partition
def address(curr: Int, s: String) = {
val m = Map("x" -> 3, "y" -> 3, "z" -> 3).withDefault(x => curr)
(m(s), s)
}
and "send" data:
val transformed: RDD[(Int, String)] = parts
// Emit pairs (partition, string)
.map{case (i, s) => address(i, transform(s))}
// Repartition
.partitionBy(partitioner)
transformed
.mapPartitionsWithIndex((i, iter) => Iterator((i, iter.size)))
.collect
// Array[(Int, Int)] = Array((0,4), (1,5), (2,7), (3,3))
another approach is to collect "messages":
val tmp = parts.mapValues(s => transform(s))
val messages: Map[Int,Iterable[String]] = tmp
.flatMap{case (i, s) => {
val target = address(i, s)
if (target != (i, s)) Seq(target) else Seq()
}}
.groupByKey
.collectAsMap
create broadcast
val messagesBD = sc.broadcast(messages)
and use it to send messages:
val transformed = tmp
.filter{case (i, s) => address(i, s) == (i, s)}
.mapPartitionsWithIndex((i, iter) => {
val combined = iter ++ messagesBD.value.getOrElse(i, Seq())
combined.map((i, _))
}, true)
transformed
.mapPartitionsWithIndex((i, iter) => Iterator((i, iter.size)))
.collect
// Array[(Int, Int)] = Array((0,4), (1,5), (2,7), (3,3))
Note the following line:
val combined = iter ++ messagesBD.value.getOrElse(i, Seq())
messagesBD.value is the entire broadcast data, which is actually a Map[Int,Iterable[String]], but then getOrElse method returns only the data that was mapped to i (if available).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using count, groupBy, as in Slick - slick

val x = (for { t <- models.slick.Talks v <- models.slick.Votes if t.id === v.talkId } yield (t, v)).groupBy(_._1).map{ case (t,tvs) => (t,tvs.map(_._2).length) }

Related

What is the best way to count adjacent edges by their name for each vertex?

ArangoDB insert edge doesn't exists

treeAggregate use case explanation

How to handle optional db step in slick 3?

How to find out the machine in the cluster which stores a given element in RDD and send a message to it?

Categories

Resources