is there a way to LIMIT per sub-iteration (not total)? - arangodb

I have a graph where type "petal" vertices "connect" to type "flower" vertices with edges.
Now, for every "flower" I only want to pull one "petal". They are all in one collection.
How exactly can I do that? It seems that LIMIT statement works per transaction, not per iteration.
What I am trying is
FOR f in Botany
FILTER type=="flower"
FOR p in 1 INBOUND f GRAPH "BotanyGraph"
LIMIT 1
RETURN p
But all I am getting is 1 petal, total.
How can I achieve one petal off every flower?

Do you mean something like that?
FOR f in Botany
FILTER type == "flower"
LET pp = (
FOR p in 1 INBOUND f GRAPH "BotanyGraph"
LIMIT 1
RETURN p)
FOR p in pp
RETURN p

Related

Rank elements based off two variables

I want to rank all the entities in a list based of two variables (both percentages). One of the variables is 'the bigger the better' (x) and the other is 'smaller the better' (y). What is the best way to give each entity a score in order to rank them?
I tried doing x*(1-y) but as some of the y values are over 1, the negatives it created caused some errors.
Below is the data:
x y
a 0.953882755 0.926422663
b 0.757267676 0.926967001
c 1 1.01607838
d 0.89805254 1.008814817
e 0.672989727 0.932579014
f 0.643306278 0.924523932
g 0.621091809 0.935122957
h 0.56891321 0.918181342
i 0.563662125 0.924102288
j 0.579410248 0.946421415
k 0.781299906 1.040418561
l 0.490013047 0.920900829
m 0.475050754 0.932586282
n 0.505211144 0.972570665
o 0.566582462 1.009732948
p 0.610994363 1.031047605
q 0.686065983 1.060742126
r 0.47642017 0.983301498
s 0.463552006 0.976645044
t 0.551532341 1.025816246
u 0.478092524 1.012675037
v 0.645790431 1.084143812
w 0.390365014 1.189518019
Two ways : averaged ranking OR sort by distance from min&max
average ranking :
use =RANK.AVG() on X & Y separately. Get the average, then rank again base on the average.
sort by distance from min&max :
do '=(B2-MIN(B:B)) + (MAX(C:C)-C2)' and drag downwards. Then use =RANK.AVG() on the results, being the smaller (the distance from min/max) the better.
Hope it solves.

How to delete the element in a list which is duplicated for linear representation in python?

https://imgur.com/a/0lFwssy
I want to draw an evolution diagram like this, [1,2,3,4] is an annotation to the point:
1 :(x=1,y=2)
2 :(x=2,y=3)
3 :(x=3,y=5)
4 :(x=4,y=6)
The connection is like:
a = [1,1,2,3] *Starting Point
b = [2,4,4,4] *Ending Point
And because point1 and point2 both connect to point4 and I don't want the connection of point to point4 because point1 evolved to point2 first.
So I want to get
https://imgur.com/a/asAUlHQ
c = [1,2,3]
d = [2,4,4]
I tried to use zip to write a for loop but it failed.
How to get c and d in python?
From what I understand it looks like you are looking for a minimum spanning tree for the graph where the edges are (a_i,b_i). You can do this as follows:
A = sp.sparse.csr_matrix((len(a),len(a)),dtype='bool')
A[a,b] = 1
c,d = sp.sparse.csgraph.minimum_spanning_tree(A).nonzero()
Note that the minimum spanning tree is not unique.

Independence of random variables

let U = the number of trials needed to get the first head
and
let V = number of trials needed to get two heads in repeated tosses of a fair coin.
Are U and V independent random variables?
I would say they are dependent if
u = number of trials before first head appears
v = number of trials to get 2nd head after the event u has occurred
Please help me understand it!

Coefficient of Variations?

I have a list of values incrementing exponentially. I was asked to have multiple Coefficent of variations from them. You might agree with me that CV is only for the whole set of numbers and dividing the set of numbers into subgroups and calculating a CV for each subgroup seems unreasonable. Would there be any statistical idea behind multiple CVs and if there is, how histogram can be made by the CVs, I mean what would the bins of the historgram. I appreciate the answers in advance
I agree with you - it does not make sense to me to calculate multiple CVs for one dataset unless there's some inferential reason for doing so.
That being said, there might actually be a reason for considering sub-groups of a dataset. In the field of Statistics, context is everything. My first thought is to ask your colleague why they want you do proceed that way. Maybe there's a good reason, maybe they don't have as full a grasp of stats as you do, regardless, it should be an enlightening conversation to have.
If you do decide to go this route, here's some R code that might help (R is great - flexible, powerful, and free)
# first, simulating some fake data (100 values of measurement & group for 10 groups)
x <- rnorm(100, mean=10, sd=1)
group <- sample(LETTERS[1:10], 100, replace=T)
# first few values of each
head(data.frame(x, group))
x group
1 10.778480 F
2 9.274193 B
3 9.639143 G
4 9.080369 I
5 10.727895 D
6 10.850306 G
# this is the part you'd actually need...
# calculating the sd & avgs for each group
sds <- tapply(x, group, sd)
avgs <- tapply(x, group, mean)
# then the cv
cvs <- sds/avgs
cvs
A B C D E F G H I J
0.07859528 0.07570556 0.09370247 0.12552468 0.08897856 0.11044543 0.10947615 0.10323379 0.08908262 0.09729945
# and if you want a histogram, R makes it pretty easy
hist(cvs)

Find the cross node for number of nodes in ArangoDB?

I have a number of nodes connected through intermediate node of other type. Like on picture There are can be multiple middle nodes. I need to find all the middle nodes for a given number of nodes and sort it by number of links between my initial nodes. In my example given A, B, C, D it should return node E (4 links) folowing node F (3 links). Is this possible? If not may be it can be done using multiple requests? I was thinking about using SHORTEST_PATH function but seems it can only find path between nodes from the same collection?
Very nice question, it challenged the AQL part of my brain ;)
Good news: it is totally possible with only one query utilizing GRAPH_COMMON_NEIGHBORS and a portion of math.
Common neighbors will count for how many of your selected vertices a cross is the connecting component (taking into account ordering A-E-B is different from B-E-A) using combinatorics we end up having a*(a-1)=c many combinations, where c is comupted. We use p/q formula to identify a (the number of connected vertices given in your set).
If the type of vertex is encoded in an attribute of the vertex object
the resulting AQL looks like this:
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes , nodes)
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
filter candidate.type == "cross"
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
If you put the crosses in a different collection and filter by collection name the query will even get more efficient, we do not need to open any vertices that are not of type cross at all.
FOR x in (
(
let nodes = ["nodes/A","nodes/B","nodes/C","nodes/D"]
for n in GRAPH_COMMON_NEIGHBORS("myGraph",nodes, nodes,
{"vertexCollectionRestriction": "crosses"}, {"vertexCollectionRestriction": "crosses"})
for f in VALUES(n)
for s in VALUES(f)
for candidate in s
collect crosses = candidate._key into counter
return {crosses: crosses, connections: 0.5 + SQRT(0.25 + LENGTH(counter))}
)
)
sort x.connections DESC
return x
Both queries will yield the result on your dataset:
[
{
"crosses": "E",
"connections": 4
},
{
"crosses": "F",
"connections": 3
}
]

Resources