D separation python implementation

D separation python implementation - python-3.x

I am completely new to the field of Bayesian Networks. For my project, I need to check All the possible d separation conditions existing in a 7 node dag and for that I am looking for some good python code.
My knowledge in programming is limited ( a bit of numerical analysis and data structures; but I understand d separation, e separation and other concepts in a dag quite well).
It would be really very helpful if someone could point out where to look for such a specific code. Please note that I want a python codes that checks for All the conditional independences following from d separation in a 7 node dag.
I would be happier with an
algorithm checking whether each path is blocked or not etc rather than one built on semi graphoid axioms.
I don't know exactly where should I look or to whom should I ask, so any help would be greatly appreciated.

I guess you understand that your demand is a very large list. Even if we only consider d-separation between only 2 variables (conditionned by a set of nodes).
Anyway, you can do that quite easily with pyAgrum (https://agrum.org)
import itertools
import pyAgrum as gum
# create a BN
bn=gum.fastBN("A->B<-C->D->E->F;B->E<-G");
# print the indepency model by testing d-separations
# how to iterate for each subset of an interable
def powerset(iterable):
"""
powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)
"""
xs = list(iterable)
# note we return an iterator rather than a list
return itertools.chain.from_iterable(itertools.combinations(xs,n) for n in range(len(xs)+1))
# testing every d-separation
for i in bn.names():
for j in bn.names()-{i}:
for k in powerset(bn.names()-{i,j}):
if bn.isIndependent(i,j,k):
print(f"{i} indep {j} given {k}")
And the result (in a notebook) :

Related

Random Index from a Tensor (Sampling with Replacement from a Tensor)

I'm trying to manipulate individual weights of different neural nets to see how their performance degrades. As part of these experiments, I'm required to sample randomly from their weight tensors, which I've come to understand as sampling with replacement (in the statistical sense). However, since it's high-dimensional, I've been stumped by how to do this in a fair manner. Here are the approaches and research I've put into considering this problem:
This was previously implemented by selecting a random layer and then selecting a random weight in that layer (ignore the implementation of picking a random weight). Since layers are different sizes, we discovered that weights were being sampled unevenly.
I considered what would happen if we sampled according to the numpy.shape of the tensor; however, I realize now that this encounters the same problem as above.
Consider what happens to a rank 2 tensor like this:
[[*, *, *],
[*, *, *, *]]
Selecting a row randomly and then a value from that row results in an unfair selection. This method could work if you're able to assert that this scenario never occurs, but it's far from a general solution.
Note that this possible duplicate actually implements it in this fashion.
I found people suggesting flattening the tensor and use numpy.random.choice to select randomly from a 1D array. That's a simple solution, except I have no idea how to invert the flattened tensor back into its original shape. Further, flattening millions of weights would be a somewhat slow implementation.
I found someone discussing tf.random.multinomial here, but I don't understand enough of it to know whether it's applicable or not.
I ran into this paper about resevoir sampling, but again, it went over my head.
I found another paper which specifically discusses tensors and sampling techniques, but it went even further over my head.
A teammate found this other paper which talks about random sampling from a tensor, but it's only for rank 3 tensors.
Any help understanding how to do this? I'm working in Python with Keras, but I'll take an algorithm in any form that it exists. Thank you in advance.

Before I forget to document the solution we arrived at, I'll talk about the two different paths I see for implementing this:
Use a total ordering on scalar elements of the tensor. This is effectively enumerating your elements, i.e. flattening them. However, you can do this while maintaining the original shape. Consider this pseudocode (in Python-like syntax):
def sample_tensor(tensor, chosen_index: int) -> Tuple[int]:
"""Maps a chosen random number to its index in the given tensor.
Args:
tensor: A ragged-array n-tensor.
chosen_index: An integer in [0, num_scalar_elements_in_tensor).
Returns:
The index that accesses this element in the tensor.
NOTE: Entirely untested, expect it to be fundamentally flawed.
"""
remaining = chosen_index
for (i, sub_list) in enumerate(tensor):
if type(sub_list) is an iterable:
if |sub_list| > remaining:
remaining -= |sub_list|
else:
return i joined with sample_tensor(sub_list, remaining)
else:
if len(sub_list) <= remaining:
return tuple(remaining)
First of all, I'm aware this isn't a sound algorithm. The idea is to count down until you reach your element, with bookkeeping for indices.
We need to make crucial assumptions here. 1) All lists will eventually contain only scalars. 2) By direct consequence, if a list contains lists, assume that it also doesn't contain scalars at the same level. (Stop and convince yourself for (2).)
We also need to make a critical note here too: We are unable to measure the number of scalars in any given list, unless the list is homogeneously consisting of scalars. In order to avoid measuring this magnitude at every point, my algorithm above should be refactored to descend first, and subtract later.
This algorithm has some consequences:
It's the fastest in its entire style of approaching the problem. If you want to write a function f: [0, total_elems) -> Tuple[int], you must know the number of preceding scalar elements along the total ordering of the tensor. This is effectively bound at Theta(l) where l is the number of lists in the tensor (since we can call len on a list of scalars).
It's slow. It's too slow compared to sampling nicer tensors that have a defined shape to them.
It begs the question: can we do better? See the next solution.
Use a probability distribution in conjunction with numpy.random.choice. The idea here is that if we know ahead of time what the distribution of scalars is already like, we can sample fairly at each level of descending the tensor. The hard problem here is building this distribution.
I won't write pseudocode for this, but lay out some objectives:
This can be called only once to build the data structure.
The algorithm needs to combine iterative and recursive techniques to a) build distributions for sibling lists and b) build distributions for descendants, respectively.
The algorithm will need to map indices to a probability distribution respective to sibling lists (note the assumptions discussed above). This does require knowing the number of elements in an arbitrary sub-tensor.
At lower levels where lists contain only scalars, we can simplify by just storing the number of elements in said list (as opposed to storing probabilities of selecting scalars randomly from a 1D array).
You will likely need 2-3 functions: one that utilizes the probability distribution to return an index, a function that builds the distribution object, and possibly a function that just counts elements to help build the distribution.
This is also faster at O(n) where n is the rank of the tensor. I'm convinced this is the fastest possible algorithm, but I lack the time to try to prove it.
You might choose to store the distribution as an ordered dictionary that maps a probability to either another dictionary or the number of elements in a 1D array. I think this might be the most sensible structure.
Note that (2) is truly the same as (1), but we pre-compute knowledge about the densities of the tensor.
I hope this helps.

Association Rules using Python with data in sentence form

I would like to calculate association rules from a text field from a dataset such as the one below using Python:
ID fav_breakfast
1 I like to eat eggs and bacon for breakfast.
2 Bacon, bacon, bacon!
3 I love pancakes, but only if they have extra syrup!
4 Waffles and bacon. Eggs too!
5 Eggs, potatoes, and pancakes. No meat for me!
Please note that Orange 2.7 is not an option as I am using the current version of Python (3.6), so Orange 3 is fair game; however, I can not seem to figure out how this module works with data in this format.
The first step, in my mind, would be to convert the above into a sparse matrix, such as the (truncated) one shown below:
Next, we would want to remove stop words (ie. I, to, and, for, etc.), upper/lower case issues, numbers, punctuation, account for words such as potato, potatoes, potatos, etc (with lemmatization).
Once this sparse matrix is in place, the next step would be to calculate association rules amongst all of words/strings in the sparse matrix. I have done this in R using the arulespackage; however, I haven't been able to identify an "arules equivalent" for Python.
The final solution that I envision would include a list of left-hand and right-hand side arguments along with the support, confidence, and lift of the rules in descending order with the highest lift rules at the top and lowest lift rules at the bottom (again, easy enough to obtain in R with arules).
In addition, I would like to have the ability to specify the right hand side to "bacon" that also shows the support, confidence, and lift of the rules in descending order with the highest lift rules in regards to "bacon" at the top and lowest lift rules in relation to "bacon" at the bottom.
Using Orange3-Associate will likely be the route to go here; however, I cannot find any good examples on the web. Thanks for your help in advance!

Is this what you had in mind? Orange should be able to pass outputs from one add-on and use them as inputs in another.
[EDIT]
I was able to reconstruct the case in code, but it is far less sexy:
import numpy as np
from orangecontrib.text
import Corpus, preprocess, vectorization
from orangecontrib.associate.fpgrowth import *
data = Corpus.from_file("deerwester")
p = preprocess.Preprocessor()
preproc_corpus = p(data)
v = vectorization.bagofwords.BoWPreprocessTransform(p, "Count", preproc_corpus)
N = 30
X = np.random.random((N, 50)) > .9
itemsets = dict(frequent_itemsets(X, .1))
rules = association_rules(itemsets, .6)
list(rules_stats(rules, itemsets, N))

How does Monoid assist in parallel training?

The Readme of HLearn states that the Monoid typeclass is used for parallel batch training. I've seen trainMonoid mentioned in several files, but I'm having a difficulty to dissect this huge codebase. Could someone explain in beginner-friendly terms how does it work? I guess it's somehow related to the associativity property.

It's explained in this article which is linked in the page you linked in the question. Since you want a beginner friendly description I'll give you a very high level description of what I understood after reading the article. Consider this as a rough overview of the idea, to understand exactly everything you have to study the articles.
The basic idea is to use algebraic properties to avoid re-doing the same work over and over. They do it by using the associativity of the monoidal operation and homomorphisms.
Given two sets A and B with two binary operations + and * an homomorphism is a function f: A -> B such that f(x + y) = f(x) * f(y), i.e. it's a function that preserves the structure between the two sets.
In the case of that article the function f is basically the function that maps the sets of inputs to the trained models.
So the idea is that you can divide the input data into different portions x and y, and instead of having to compute the model of the whole thing as in T(x + y) you can do the training on just x and y and then combine the results: T(x) * T(y).
Now this as it is doesn't really help that much but, in training you often repeat work. For example in cross validation you, for k times, sample the data into a set of inputs for the trainer and a set of data used to test the trainer. But this means that in these k iterations you are executing T over the same portions of inputs multiple times.
Here monoids come into play: you can first split the domain into subsets, and compute T over these subsets, and then to compute the result of cross validation you can just put together the results from the corresponding subsets.
To give an idea: if the data is {1,2,3,4} and k = 3 instead of doing:
T({1,2}) plus test on {3, 4}
T({1,3}) plus test on {2, 4}
T({1,4}) plus test on {2, 3}
Here you can see that we trained over 1 for three times. Using the homomorphism we can compute T({1}) once and then combine the result with the other partial result to obtain the final trained model.
The correctness of the final result is assured by the associativity of the operations and the homomorphism.
The same idea can be applied when parallelizing: you divide the inputs into k groups, perform the training in parallel and then compound the results: T(x_1 + x_2 + ... + x_k) = T(x_1) * T(x_2) * ... * T(x_k) where the T(x_i) calls are performed completely in parallel and only at the end you have to compound the results.
Regarding online training algorithms, the idea is that given a "batch" training algorithm T you can make it into an online one by doing:
T_O(m, d) = m * T(d)
where m is an already trained model (which generally will be the trained model until that point) and d is the new data point you add for training.
Again the exactness of the result is due to the homorphism that tells you that if m = T(x) then m * T(d) = T(x+d), i.e. the online algorithm gives the same result of the batch algorithm with all those data points.
The more interesting (and complex) part of all of this is how can you see a training task as such a homomorphism etc. I'll leave that to your personal study.

Problems with a function and odeint in python

For a few months I started working with python, considering the great advantages it has. But recently, i used odeint from scipy to solve a system of differential equations. But during the integration process the implemented function doesn't work as expected.
In this case, I want to solve a system of differential equations where one of the initial conditions (x[0]) varies (between 4-5) depending on the value that the variable reaches during the integration process (It is programmed inside of the function by means of the if structure).
#Control of oxygen
SO2_lower=4
SO2_upper=5
if x[0]<=SO2_lower:
x[0]=SO2_upper
When the function is used by odeint, some lines of code inside the function are obviated, even when the functions changes the value of x[0]. Here is all my code:
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
plt.ion()
# Stoichiometric parameters
YSB_OHO_Ox=0.67 #Yield for XOHO growth per SB (Aerobic)
YSB_Stor_Ox=0.85 #Yield for XOHO,Stor formation per SB (Aerobic)
YStor_OHO_Ox=0.63 #Yield for XOHO growth per XOHO,Stor (Aerobic)
fXU_Bio_lys=0.2 #Fraction of XU generated in biomass decay
iN_XU=0.02 #N content of XU
iN_XBio=0.07 #N content of XBio
iN_SB=0.03 #N content of SB
fSTO=0.67 #Stored fraction of SB
#Kinetic parameters
qSB_Stor=5 #Rate constant for XOHO,Stor storage of SB
uOHO_Max=2 #Maximum growth rate of XOHO
KSB_OHO=2 #Half-saturation coefficient for SB
KStor_OHO=1 #Half-saturation coefficient for XOHO,Stor/XOHO
mOHO_Ox=0.2 #Endogenous respiration rate of XOHO (Aerobic)
mStor_Ox=0.2 #Endogenous respiration rate of XOHO,Stor (Aerobic)
KO2_OHO=0.2 #Half-saturation coefficient for SO2
KNHx_OHO=0.01 #Half-saturation coefficient for SNHx
#Other parameters
DT=1/86400.0
def f(x,t):
#Control of oxygen
SO2_lower=4
SO2_upper=5
if x[0]<=SO2_lower:
x[0]=SO2_upper
M=np.matrix([[-(1.0-YSB_Stor_Ox),-1,iN_SB,0,0,YSB_Stor_Ox],
[-(1.0-YSB_OHO_Ox)/YSB_OHO_Ox,-1/YSB_OHO_Ox,iN_SB/YSB_OHO_Ox-iN_XBio,0,1,0],
[-(1.0-YStor_OHO_Ox)/YStor_OHO_Ox,0,-iN_XBio,0,1,-1/YStor_OHO_Ox],
[-(1.0-fXU_Bio_lys),0,iN_XBio-fXU_Bio_lys*iN_XU,fXU_Bio_lys,-1,0],
[-1,0,0,0,0,-1]])
R=np.matrix([[DT*fSTO*qSB_Stor*(x[0]/(KO2_OHO+x[0]))*(x[1]/(KSB_OHO+x[1]))*x[4]],
[DT*(1-fSTO)*uOHO_Max*(x[0]/(KO2_OHO+x[0]))*(x[1]/(KSB_OHO+x[1]))* (x[2]/(KNHx_OHO+x[2]))*x[4]],
[DT*uOHO_Max*(x[0]/(KO2_OHO+x[0]))*(x[2]/(KNHx_OHO+x[2]))*((x[5]/x[4])/(KStor_OHO+(x[5]/x[4])))*(KSB_OHO/(KSB_OHO+x[1]))*x[4]],
[DT*mOHO_Ox*(x[0]/(KO2_OHO+x[0]))*x[4]],
[DT*mStor_Ox*(x[0]/(KO2_OHO+x[0]))*x[5]]])
Mt=M.transpose()
MxRm=Mt*R
MxR=MxRm.tolist()
return ([MxR[0][0],
MxR[1][0],
MxR[2][0],
MxR[3][0],
MxR[4][0],
MxR[5][0]])
#ODE solution
t=np.linspace(0.0,3600,3600)
#Initial conditions
y0=np.array([5,176,5,30,100,5])
Var=odeint(f,y0,t,args=(),h0=1,hmin=1,hmax=1,atol=1e-5,rtol=1e-5)
Sol=Var.tolist()
plt.plot(t,Var[:,0])
Thanks very much in advance!!!!!

Short answer:
You should not modify input state vector inside your ODE function. Instead try the following and verify your results:
x0 = x[0]
if x0<=SO2_lower:
x0=SO2_upper
# use x0 instead of x[0] in the rest of this function body
I suppose that this is your problem, but I am not sure, since you did not explain what exactly was wrong with the results. Moreover, you do not change "initial condition". Initial condition is
y0=np.array([5,176,5,30,100,5])
you just change the input state vector.
Detailed answer:
Your odeint integrator is probably using one of the higher order adaptive Runge-Kutta methods. This algorithm requires multiple ODE function evaluations to calculate single integration step, therefore changing the input state vector may lead to undefined results. In C++ boost::odeint this is even not possible to do so, because input variable is "const". Python however is not as strict as C++ and I suppose that it is possible to make this kind of bug unintentionally (I did not try it, though).
EDIT:
OK, I understand what you want to achieve.
Your variable x[0] is constrained by modular algebra and it is not possible to express in the form
x' = f(x,t)
which is one of the possible definitions of the Ordinary Differential Equation, that ondeint library is meant to solve. However, few possible "hacks" can be used here to bypass this limitation.
One possibility is to use a fixed step and low order (because for higher order solvers you need to know, which part of the algorithm you are actually in, see RK4 for example) solver and change your dx[0] equation (in your code it is MxR[0][0] element) to:
# at the beginning of your system
if (x[0] > S02_lower): # everything is normal here
x0 = x[0]
dx0 = # normal equation for dx0
else: # x[0] is too low, we must somehow force it to become S02_upper again
dx0 = (x[0] - S02_upper)/step_size // assuming that x0_{n+1} = x0_{n} + dx0*step_size
x0 = S02_upper
# remember to use x0 in the rest of your code and also remember to return dx0
However, I do not recommend this technique, because it makes you strongly dependent on the algorithm and you must know the exact step size (although, I may recommend it for saturation constraints). Another possibility is to perform a single integration step at a time and correct your x0 each time it is necessary:
// 1 do_step(sys, in, dxdtin, t, out, dt);
// 2 do something with output
// 3 in = out
// 4 return to 1 or finish
Sorry for C++ syntax, here is the exhaustive documentation (C++ odeint steppers), and here is its equivalent in python (Python ode class). C++ odeint interface is better for your task, however you may achieve exactly the same in python. Just look for:
integrate(t[, step, relax])
set_initial_value(y[, t])
in docs.

Distinguishing well formed English sentences from "word salad"

I'm looking for a library easily usable from C++, Python or F#, which can distinguish well formed English sentences from "word salad". I tried The Stanford Parser and unfortunately, it parsed this:
Some plants have with done stems animals with exercise that to predict?
without a complaint. I'm not looking for something very sophisticated, able to handle all possible corner cases. I only need to filter out an obvious nonsense.

Here is something I just stumbled upon:
A general-purpose sentence-level nonsense detector, by a Stanford student named Ian Tenney.
Here is the code from the project, undocumented but available on GitHub.
If you want to develop your own solution based on this, I think you should pay attention the 4th group of features used, ie the language model, under section 3 "Features and preprocessing".
It might not suffice, but I think getting a probability score of each subsequences of length n is a good start. 3-grams like "plants have with", "have with done", "done stems animals", "stems animals with" and "that to predict" seem rather improbable, which could lead to a "nonsense" label on the whole sentence.
This method has the advantage of relying on a learned model rather than on a set of hand-made rules, which afaik is your other option. Many people would point you to Chapter 8 of NLTK's manual, but I think that developing your own context-free grammar for general English is asking a bit much.

The paper was useful, but goes into too much depth for solving this problem. Here is the author's basic approach, heuristically:
Baseline sentence heuristic: first letter is Capitalized,
and line ends with one of .?! (1 feature).
Number of characters, words, punctuation, digits, and named entities (from Stanford CoreNLP NER tagger), and normalized versions by text length (10 features).
Part-of-speech distributional tags: (# / # words) for each Penn treebank tag (45 features).
Indicators for the part of speech tag of the first
and last token in the text (45x2 = 90 features).
Language model raw score (s lm = log p(text))
and normalized score (s¯lm = s lm / # words) (2 features).
However, after a lot of searching, the github repo only includes the tests and visualizations. The raw training and test data are not there. Here is his function for calculating these features:
(note: this uses pandas dataframes as df)
def make_basic_features(df):
"""Compute basic features."""
df['f_nchars'] = df['__TEXT__'].map(len)
df['f_nwords'] = df['word'].map(len)
punct_counter = lambda s: sum(1 for c in s
if (not c.isalnum())
and not c in
[" ", "\t"])
df['f_npunct'] = df['__TEXT__'].map(punct_counter)
df['f_rpunct'] = df['f_npunct'] / df['f_nchars']
df['f_ndigit'] = df['__TEXT__'].map(lambda s: sum(1 for c in s
if c.isdigit()))
df['f_rdigit'] = df['f_ndigit'] / df['f_nchars']
upper_counter = lambda s: sum(1 for c in s if c.isupper())
df['f_nupper'] = df['__TEXT__'].map(upper_counter)
df['f_rupper'] = df['f_nupper'] / df['f_nchars']
df['f_nner'] = df['ner'].map(lambda ts: sum(1 for t in ts
if t != 'O'))
df['f_rner'] = df['f_nner'] / df['f_nwords']
# Check standard sentence pattern:
# if starts with capital, ends with .?!
def check_sentence_pattern(s):
ss = s.strip(r"""`"'""").strip()
return s[0].isupper() and (s[-1] in '.?!')
df['f_sentence_pattern'] = df['__TEXT__'].map(check_sentence_pattern)
# Normalize any LM features
# by dividing logscore by number of words
lm_cols = {c:re.sub("_lmscore_", "_lmscore_norm_",c)
for c in df.columns if c.startswith("f_lmscore")}
for c,cnew in lm_cols.items():
df[cnew] = df[c] / df['f_nwords']
return df
So I guess that's a function you can use in this case. For the minimalist version:
raw = ["This is is a well-formed sentence","but this ain't a good sent","just a fragment"]
import pandas as pd
df = pd.DataFrame([{"__TEXT__":i, "word": i.split(), 'ner':[]} for i in raw])
the parser seems to want a list of the words, and named entities recognized (NER) using the Stanford CoreNLP library, which is written in Java. You can pass in nothing (an empty list []) and the function do calculate everything else. You'll get back a dataframe (like a matrix) with all the features of sentences that you can then used to decide what to call "well formed" by the rules given.
Also, you don't HAVE to use pandas here. A list of dictionaries will also work. But the original code used pandas.
Because this example involved a lot of steps, I've created a gist where I run through an example up to the point of producing a clean list of sentences and a dirty list of not-well-formed sentences
my gist: https://gist.github.com/marcmaxson/4ccca7bacc72eb6bb6479caf4081cefb
This replaces the Stanford CoreNLP java library with spacy - a newer and easier to use python library that fills in the missing meta data, such as sentiment, named entities, and parts of speech used to determine if a sentence is well-formed. This runs under python 3.6, but could work under 2.7. all the libraries are backwards compatible.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string