How to represent a flat graph in Java Spark using Dataset

How to represent a flat graph in Java Spark using Dataset - apache-spark

Given a Dataset with two columns such as:
FatherNode, ChildNode
A, B
A, C
B, D
B, F
C, G
H, I
H, L
M, I
M, B
F, J
Z, K
I want to build a new Dataset that represents a Flat Graph, i.e. a representation of the all possible destinations of each node, so, in this example, the new Dataset should be:
Node, PossibleDestinations
A, [B, C, D, F, J]
B, [D, F, J]
C, [G]
H, [I, L]
M, [I, B, D, F, J]
F, [J]
Z, [K]
For example, A could reach B, C, D, F and J: doesn't matter the path.
My simple code (implemented) is:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import static org.apache.spark.sql.functions.array;
import static org.apache.spark.sql.functions.col;
public class FlatGraph {
public static void main(String[] args) {
SparkSession spark = SparkSession.builder().appName("Flat Graph").getOrCreate();
Dataset<Row> csvData = spark.read().option("header", "true").csv("data.csv").cache();
csvData.show();
Dataset<Row> flatGraph = [...]; // What operations could be performed here?
flatGraph.show();
spark.stop();
}
}

Related

Mixing non-overlapping everygrams in order

I'm looking for a method to generate sequences from every-grams up to length n that match an input sentence:
Given a sentence: "Break this into sequences" and n = 3
I want to create the sequences:
("Break", "this", "into", "sequences")
("Break", "this", "into sequences")
("Break", "this into", "sequences")
("Break this", "into", "sequences")
("Break this", "into sequences")
("Break", "this into sequences")
("Break this into", "sequences")
nltk has the everygram package, but I'm not quite sure how I'd use it toward my goal.
I've tried adapting the problem to focus on characters for simplicity, i.e.,
It may be helpful to consider these as character-grams (and, as rici suggested, spacing out characters [with and without spacing shown for clarity]):
abcd goes to:
(a, b, c, d) (a, b, c, d)
(a, b, c d) (a, b, cd)
(a, b c, d) (a, bc, d)
(a b, c, d) (ab, c, d)
(a b, c d) (ab, cd)
(a, b c d) (a, bcd)
(a b c, d) (abc, d)
For clarity, this should generalize for any length, given a n as the maximum-sized n-gram; so, for abcde with n=3 we'd have:
(a, b, c, d, e) (a, b, c, d, e)
(a, b, c, d e) (a, b, c, de)
(a, b, c d, e) (a, b, cd, e)
(a, b c, d e) (a, bc, d, e)
(a b, c, d, e) (ab, c, d, e)
(a, b c, d e) (a, bc, de)
(a b, c, d e) (ab, c, de)
(a b, c d, e) (ab, cd, e)
(a, b, c d e) (a, b, cde)
(a, b c d, e) (a, bcd, e)
(a b c, d, e) (abc, d, e)
(a b, c d e) (ab, cde)
(a b c, d e) (abc, de)
I'm thinking I may need to generate a grammar, something like:
exp ::= ABC, d | a, BCD
ABC ::= AB, c | A, BC
BCD ::= BC, d | b, CD
AB ::= A, b | a, B
BC ::= B, c | b, C
CD ::= C, d | c, D
A ::= a
B ::= b
C ::= c
D ::= d
and find all parses of the sentence, but certainly there must be a procedural way to go about this?

Maybe it would be helpful to space your example out a bit:
(a , b , c , d)
(a , b , c d)
(a , b c , d)
(a b , c , d)
(a b , c d)
(a , b c d)
(a b c , d)
(a b c d) # added for completeness
Looking at that, it's evident that what differentiates the rows is the presence or absence of commas, a typical binary choice. There are three places a comma could go, so there are eight possibilities, corresponding to the eight binary numbers of three digits.
The easiest way to list these possibilities is to count from 0 0 0 to 1 1 1.
For your modified question, in which there is a maximum length of a part, one simple recursive solution in Python is:
def kgram(k, v):
'Generate all partitions of v with parts no larger than k'
def helper(sfx, m):
if m == 0: yield sfx
else:
for i in range(1, min(k, m)+1):
yield from helper([v[m-i:m]]+sfx, m-i)
yield from helper([], len(v))
Here's a quick test:
>>> for p in gram(3, 'one two three four five'.split()): print(p)
...
[['one'], ['two'], ['three'], ['four'], ['five']]
[['one', 'two'], ['three'], ['four'], ['five']]
[['one'], ['two', 'three'], ['four'], ['five']]
[['one', 'two', 'three'], ['four'], ['five']]
[['one'], ['two'], ['three', 'four'], ['five']]
[['one', 'two'], ['three', 'four'], ['five']]
[['one'], ['two', 'three', 'four'], ['five']]
[['one'], ['two'], ['three'], ['four', 'five']]
[['one', 'two'], ['three'], ['four', 'five']]
[['one'], ['two', 'three'], ['four', 'five']]
[['one', 'two', 'three'], ['four', 'five']]
[['one'], ['two'], ['three', 'four', 'five']]
[['one', 'two'], ['three', 'four', 'five']]

how to plott several curves after while loop in function

def a(t, A, B, C, At, Bt):
while:
calculations
return t, A, B, C, At, Bt
print(def(t, A, B, C, At, Bt))
I return several numpy.arrays. and wanna plot them in the form
B, = plt.plot(t, B)
C, = plt.plot(t, C)
plt.legend(handles=[ B, C, A],
labels=[ 'B', 'C', 'A'])

Use the object-oriented interface to matplotlib:
from matplotlib import pyplot
t, A, B, C, At, Bt = a(t, A, B, C, At, Bt)
fig, ax = pyplot.subplots()
for array, label in zip([A, B, C], ['A', 'B', 'C']):
ax.plot(t, array, label=label)
ax.legend()

How to make X number of random sets of 3 from pandas column?

I have a dataframe column that looks like this (roughly 200 rows):
col1
a
b
c
d
e
f
I want to create a new dataframe with one column and 15 sets of 3 random combinations of the items in the pandas column. for example:
new_df
combinations:
(a,b,c)
(a,c,d)
(a,d,c)
(b,a,d)
(d,a,c)
(a,d,f)
(e,a,f)
(a,f,e)
(b,e,f)
(f,b,e)
(c,b,e)
(b,e,a)
(a,e,f)
(e,f,a)
Currently the code I have creates a combination of every possible combination and runs out of memory when I try to append the results to another dataframe:
import pandas as pd
from itertools import permutations
df = pd.read_csv('')
combo = df['col1'].tolist()
perm = permutations(combo,3)
combinations = pd.DataFrame(columns=['combinations'])
list_ = []
for i in list(perm):
combinations['combinations'] = i
list_.append(i)
How do I stop the sets of random combinations to stop at any X number of set or in this case 15 combinations of 3?

The reason your code runs out of memory is specifically because of the part where you call list(perm). doing this will generate EVERY permutation possible. So when you do
for i in list(perm):
...
You're telling python to create a list of all permutations, then try to iterate over that list. Instead, if you iterate over the generator that calling permutations creates (e.g. for i in perm: instead of for i in list(perm):), you can simply iterate over each permutation without storing them all into memory at once. So if you break your for loop after it loops 15 times, you can achieve your desired result.
However, since we're using itertools, we can vastly simplify that logic using islice to do the work of getting the first 15 without explicitly writing a for-loop and breaking at the 15th iteration:
import pandas as pd
from itertools import permutations, islice
# df = pd.read_csv('')
# combo = df['col1'].tolist()
combo = list("abcefg")
perm_generator = permutations(combo,3)
# get first 15 permutations without running the generator
first_15_perms = islice(perm_generator, 15)
# Store the first 15 permutations into a Series object
series_perms = pd.Series(list(first_15_perms), name="permutations")
print(series_perms)
0 (a, b, c)
1 (a, b, e)
2 (a, b, f)
3 (a, b, g)
4 (a, c, b)
5 (a, c, e)
6 (a, c, f)
7 (a, c, g)
8 (a, e, b)
9 (a, e, c)
10 (a, e, f)
11 (a, e, g)
12 (a, f, b)
13 (a, f, c)
14 (a, f, e)
Name: permutations, dtype: object
If you want this as a single column in a DataFrame you can use the to_frame() method:
df_perms = series_perms.to_frame()
print(df_perms)
permutations
0 (a, b, c)
1 (a, b, e)
2 (a, b, f)
3 (a, b, g)
4 (a, c, b)
5 (a, c, e)
6 (a, c, f)
7 (a, c, g)
8 (a, e, b)
9 (a, e, c)
10 (a, e, f)
11 (a, e, g)
12 (a, f, b)
13 (a, f, c)
14 (a, f, e)

While not quite as elegant as the previous answers, If you truly want to create a random sampling of values, not just the first you could also do something along the lines of the following:
def newFrame(df: pd.DataFrame, srccol: int, cmbs: int, rows: int) -> pd.DataFrame:
il = df[srccol].values.tolist()
nw_df = pd.DataFrame()
data = []
for r in range(rows):
rd =[]
for ri in range(cmbs):
rd.append(rnd.choice(il))
data.append(tuple(rd))
nw_df['Combinations'] = data
return nw_df
Which when passed a a df as shown in your example in the form of:
new_df = newFrame(df, 0, 3, 15)
Produces:
Combinations
0 (a, f, e)
1 (a, d, f)
2 (b, c, d)
3 (a, a, d)
4 (f, b, c)
5 (e, b, b)
6 (e, e, d)
7 (c, f, f)
8 (f, e, b)
9 (d, c, e)

cannot output accumilated list of tau prolog answers

I am using the Tau Prolog library in my project and it has been working fine till I tried this.
I am attempting to output multiple schedules and their corresponding extra hours (the context is not relevent here).
Here is the output in the console when I use session.answers( x => console.log( pl.format_answer(x) ) );
// Query the goal
session.query("getSchedules(123,Schedule,ExtraHours).");
// Show answers
session.answers(x => console.log(pl.format_answer(x)));
CONSOLE:
Schedule = [a, b, c, d, e, f, m, g], ExtraHours = 0 ;
Schedule = [a, b, c, d, e, f, j, k], ExtraHours = 0 ;
Schedule = [a, b, c, d, e, f, j, x], ExtraHours = 0 ;
Schedule = [a, b, c, d, e, f, j, g], ExtraHours = -2 ;
Schedule = [a, b, c, d, e, f, k, x], ExtraHours = 0 ;
Schedule = [a, b, c, d, e, f, k, g], ExtraHours = -2 ;
Schedule = [a, b, c, d, e, f, x, g], ExtraHours = -2 ;
false.
I am trying to get all these schedules in 1 list but whenever I try to accumulate the results in a global variable, the resulting list only has the 1st schedule. Does anyone know how I can get a list with all the schedules?

No 3-element counterexample found

I am learning Alloy and was trying to make it find two binary relations r and s, such that s equals the transitive closure of r, and such that s is not equal to r. I suppose I can ask Alloy to do this by executing the following:
sig V
{
r : V,
s : V
}
assert F { not ( some (s-r) and s=^r ) }
check F
Now Alloy 4.2 cannot find a counterexample, although an easy 3-element structure would be the one where r = {(V0,V1), (V1,V2)} and s = r + {(V0,V2)} obviously.
Can someone explain what is going on?

Translating your requirement directly:
// find two binary relations r and s such that
// s equals the transitive closure of r ands is not equal to r
run {some r, s: univ -> univ | s != r and s = ^r}
This gives an instance as expected. The mistake in your spec is that your declarations restrict the relations to be functions; should instead have
sig V {
r: set V,
s: set V
}
or
sig V {r, s: set V}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to represent a flat graph in Java Spark using Dataset - apache-spark

Related

Mixing non-overlapping everygrams in order

how to plott several curves after while loop in function

How to make X number of random sets of 3 from pandas column?

cannot output accumilated list of tau prolog answers

No 3-element counterexample found

Categories

Resources