Behave: Writing a Scenario Outline with dynamic examples - cucumber

Gherkin / Behave Examples
Gherkin syntax features test automation using examples:
Feature: Scenario Outline (tutorial04)
Scenario Outline: Use Blender with <thing>
Given I put "<thing>" in a blender
When I switch the blender on
Then it should transform into "<other thing>"
Examples: Amphibians
| thing | other thing |
| Red Tree Frog | mush |
| apples | apple juice |
Examples: Consumer Electronics
| thing | other thing |
| iPhone | toxic waste |
| Galaxy Nexus | toxic waste |
The test suite would run four times, once for each example, giving a result similar to:
My problem
How can I test using confidential data in the Examples section? For example, I would like to test an internal API with user ids or SSN numbers, without keeping the data hard coded in the feature file.
Is there a way to load the Examples dynamically from an external source?
Update: Opened a github issue on the behave project.

I've come up with another solution (behave-1.2.6):
I managed to dynamically create examples for a Scenario Outline by using before_feature.
Given a feature file (x.feature):
Feature: Verify squared numbers
Scenario Outline: Verify square for <number>
Then the <number> squared is <result>
Examples: Static
| number | result |
| 1 | 1 |
| 2 | 4 |
| 3 | 9 |
| 4 | 16 |
# Use the tag to mark this outline
#dynamic
Scenario Outline: Verify square for <number>
Then the <number> squared is <result>
Examples: Dynamic
| number | result |
| . | . |
And the steps file (steps/x.step):
from behave import step
#step('the {number:d} squared is {result:d}')
def step_impl(context, number, result):
assert number*number == result
The trick is to use before_feature in environment.py as it has already parsed the examples tables to the scenario outlines, but hasn't generated the scenarios from the outline yet.
import behave
import copy
def before_feature(context, feature):
features = (s for s in feature.scenarios if type(s) == behave.model.ScenarioOutline and
'dynamic' in s.tags)
for s in features:
for e in s.examples:
orig = copy.deepcopy(e.table.rows[0])
e.table.rows = []
for num in range(1,5):
n = copy.deepcopy(orig)
# This relies on knowing that the table has two rows.
n.cells = ['{}'.format(num), '{}'.format(num*num)]
e.table.rows.append(n)
This will only operate on Scenario Outlines that are tagged with #dynamic.
The result is:
behave -k --no-capture
Feature: Verify squared numbers # features/x.feature:1
Scenario Outline: Verify square for 1 -- #1.1 Static # features/x.feature:8
Then the 1 squared is 1 # features/steps/x.py:3
Scenario Outline: Verify square for 2 -- #1.2 Static # features/x.feature:9
Then the 2 squared is 4 # features/steps/x.py:3
Scenario Outline: Verify square for 3 -- #1.3 Static # features/x.feature:10
Then the 3 squared is 9 # features/steps/x.py:3
Scenario Outline: Verify square for 4 -- #1.4 Static # features/x.feature:11
Then the 4 squared is 16 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 1 -- #1.1 Dynamic # features/x.feature:19
Then the 1 squared is 1 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 2 -- #1.2 Dynamic # features/x.feature:19
Then the 2 squared is 4 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 3 -- #1.3 Dynamic # features/x.feature:19
Then the 3 squared is 9 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 4 -- #1.4 Dynamic # features/x.feature:19
Then the 4 squared is 16 # features/steps/x.py:3
1 feature passed, 0 failed, 0 skipped
8 scenarios passed, 0 failed, 0 skipped
8 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.005s
This relies on having an Examples table with the correct shape as the final table, in my example, with two rows. I also don't fuss with creating new behave.model.Row objects, I just copy the one from the table and update it. For extra ugliness, if you're using a file, you can put the file name in the Examples table.

Got here looking for something else, but since I've been in similar situation with Cucumber before, maybe someone will also end up at this question, looking for a possible solution. My approach to this problem is to use BDD variables that I can later handle at runtime in my step_definitions. In my python code I can check what is the value of the Gherkin variable and map it to what's needed.
For this example:
Scenario Outline: Use Blender with <thing>
Given I put "<thing>" in a blender
When I switch the blender on
Then it should transform into "<other thing>"
Examples: Amphibians
| thing | other thing |
| Red Tree Frog | mush |
| iPhone | data.iPhone.secret_key | # can use .yaml syntax here as well
Would translate to such step_def code:
#given('I put "{thing}" in a blender')
def step_then_should_transform_into(context, other_thing):
if other_thing == BddVariablesEnum.SECRET_KEY:
basic_actions.load_secrets(context, key)
So all you have to do is to have well defined DSL layer.

Regarding the issue of using SSN numbers in testing, I'd just use fake SSNs and not worry that I'm leaking people's private information.
Ok, but what about the larger issue? You want to use a scenario outline with examples that you cannot put in your feature file. Whenever I've run into this problem what I did was to give a description of the data I need and let the step implementation either create the actual data set used for testing or fetch the data set from an existing test database.
Scenario Outline: Accessing the admin interface
Given a user who <status> an admin has logged in
Then the user <ability> see the admin interface
Examples: Users
| status | ability |
| is | can |
| is not | cannot |
There's no need to show any details about the user in the feature file. The step implementation is responsible for either creating or fetching the appropriate type of user depending on the value of status.

Related

How to handle wide markdown tables and line-length checks in pre-commit?

Context
After applying a line length limit of 80 characters on the pre-commit check of markdown-lint, I was experiencing some difficulties in including a markdown table that I create with more width than 80 characters.
Note
I see value in applying the linter to the README.md because I quite often forget about the line length while typing the README.md. (In essence, the trivial solution: disable the linter or disable MD013 everywhere, is considered sub-optimal).
Pre-commit of MarkdownLint
- repo: https://github.com/markdownlint/markdownlint
rev: v0.11.0
hooks:
- id: markdownlint
Markdown table example
| Algorithm | Encoding | Adaptation | Radiation | Backend |
| ------------------------------------ | -------- | ---------- | ------------ | ---------------------------- |
| Minimum Dominating Set Approximation | Sparse | Redundancy | Neuron Death | - networkx LIF<br>- Lava LIF |
| Some Algorithm Approximation | Sparse | Redundancy | Neuron Death | - networkx LIF<br>- Lava LIF |
| | | | | |
Approach I
First I tried to include a ignore MD013 (line length check) in the relevant section of the Markdown table, however, Markdown Lint does not support such an option.
Approach II
I tried to manually apply the new line breaks to the table, however, that results in additional rows in the table:
Question
How can I stay within the 80 lines whilst including a wide markdown table, (without generating new horizontal lines)?
You could try changing your hook to another similar project: igorshubovych/markdownlint-cli
repos:
- repo: https://github.com/igorshubovych/markdownlint-cli
rev: v0.32.2
hooks:
- id: markdownlint
args: ["--fix"]
You may include a .markdownlint.yaml file in the same directory as your .pre-commit-config.yaml. Set the line length rule but ignore it for tables. Like so:
# Default state for all rules
default: true
# MD013/line-length - Line length
MD013:
line_length: 80
tables: false
Check the .markdownlint.yaml schema for other configuration options.

Bad value during floating point read

I'm running some simulation code using ubuntu and I keep running into the same error. I am trying to read data from the .dat file. But there is some error which I could not find.
This is error message:
At line 1939 of file CompoundMPIBSC20200823.f90 (unit = 11, file = 'C-340120b.dat')
Fortran runtime error: Bad value during floating point read"
And C-340120b.dat file looks like this:
C-340120b.dat file
6 10 1.531581196563372e-15
0.0014553174 0.0055615333 0.0119703978 0.0203850084 0.0305528957 0.0422600997 0.0553257997 0.0695976542 0.0849475255 0.1012676622
0.1184670631 0.1364683308 0.1552047081 0.1746171362 0.1946516651 0.2152575030 0.2363847713 0.2579826431 0.2799978445 0.3023733948
0.3250475021 0.3479528735 0.3710162901 0.3941586119 0.4172949742 0.4403351781 0.4631846022 0.4857450840 0.5079162267 0.5295967855
0.5506862041 0.5710862773 0.5907027347 0.6094469646 0.6272378474 0.6440030470 0.6596808652 0.6742212377 0.6875870601 0.6997550335
0.7107163254 0.7204769965 0.7290581527 0.7364958286 0.7428406051 0.7481569620 0.7525223874 0.7560262383 0.7587683705 0.7608575997
0.7624099924 0.7635469966 0.7643935146 0.7650758792 0.7657198055 0.7664483527 0.7673799150 0.7686263022 0.7702908977 0.7724669898
0.7752362202 0.7786672785 0.7828147402 0.7877181912 0.7934015601 0.7998727305 0.8071233638 0.8151290525 0.8238496666 0.8332299862
0.8432005800 0.8536788860 0.8645704950 0.8757706624 0.8871659719 0.8986361128 0.9100557539 0.9212965992 0.9322292652 0.9427254733
0.9526599682 0.9619125838 0.9703700425 0.9779277777 0.9844915839 0.9899791031 0.9943211058 0.9974625834 0.9993636185 1.0000000000
6 11 1.475893077189510e-15
0.0016525844 0.0062956494 0.0135059282 0.0229208176 0.0342303569 0.0471704147 0.0615165460 0.0770787515 0.0936966304 0.1112350678
0.1295802432 0.1486359799 0.1683205560 0.1885633307 0.2093018003 0.2304793480 0.2520427451 0.2739399970 0.2961185348 0.3185236648
0.3410972443 0.3637767806 0.3864948806 0.4091790904 0.4317517598 0.4541305840 0.4762291287 0.4979578467 0.5192252758 0.5399391683
0.5600082344 0.5793434896 0.5978599757 0.6154784463 0.6321271026 0.6477428598 0.6622731054 0.6756767045 0.6879251243 0.6990032865
0.7089101737 0.7176591771 0.7252781911 0.7318094543 0.7373091004 0.7418464594 0.7455031054 0.7483717148 0.7505546359 0.7521623630
0.7533118077 0.7541244615 0.7547244701 0.7552366540 0.7557845046 0.7564881945 0.7574626361 0.7588156121 0.7606460387 0.7630423235
0.7660809513 0.7698252039 0.7743241221 0.7796116798 0.7857062078 0.7926100427 0.8003094738 0.8087748795 0.8179612002 0.8278085409
0.8382431405 0.8491784407 0.8605164079 0.8721490905 0.8839601234 0.8958267320 0.9076214149 0.9192140226 0.9304737122 0.9412709632
0.9514795610 0.9609785827 0.9696542131 0.9774014787 0.9841259295 0.9897450178 0.9941894123 0.9974040480 0.9993489837 1.0000000000
...
...
...
6 30500 2.203435261320421e-18
0.5647132406 0.8435296561 0.9197993603 0.9501219587 0.9657979424 0.9751478483 0.9812026747 0.9853454006 0.9882967561 0.9904674356
0.9921063590 0.9933715119 0.9943670960 0.9951635956 0.9958099778 0.9963412928 0.9967830203 0.9971540150 0.9974684566 0.9977371653
0.9979685074 0.9981690312 0.9983439172 0.9984973073 0.9986325426 0.9987523421 0.9988589360 0.9989541675 0.9990395697 0.9991164266
0.9991858197 0.9992486651 0.9993057428 0.9993577205 0.9994051720 0.9994485928 0.9994884125 0.9995250049 0.9995586968 0.9995897744
0.9996184895 0.9996450644 0.9996696956 0.9996925575 0.9997138054 0.9997335777 0.9997519982 0.9997691778 0.9997852163 0.9998002034
0.9998142198 0.9998273389 0.9998396267 0.9998511432 0.9998619429 0.9998720754 0.9998815859 0.9998905154 0.9998989017 0.9999067791
0.9999141790 0.9999211303 0.9999276594 0.9999337906 0.9999395461 0.9999449463 0.9999500100 0.9999547546 0.9999591958 0.9999633483
0.9999672254 0.9999708395 0.9999742019 0.9999773228 0.9999802118 0.9999828775 0.9999853278 0.9999875699 0.9999896101 0.9999914544
0.9999931080 0.9999945754 0.9999958609 0.9999969680 0.9999978997 0.9999986585 0.9999992466 0.9999996655 0.9999999164 1.0000000000
The 3 dots in the above data file are just to tell you that there are many more entries in the file. These dots are not there in the original file.
And program :
file CompoundMPIBSC20200823.f90
open(11,file=fname(mel),status='old',form='formatted')
open(12,file=fname1(mel),status='old',form='formatted')
do men=1,nen !nen=75:energy intervals Energy split number cycle
!!write(iw,*) 'mel,men:',mel,men
read (11,'(i2,I7,d22.15/(10f13.10))') na,nenerg,tcrpc,(rpw(i),i=1,ith)
read (12,'(i2,I7,d22.15/(10f15.10))') na,nenerg,tcrpc,(rpw1(i),i=1,ith)
!!write(iw,*) 'mel,men,na,nenerg:',mel,men,na,nenerg,tcrpc
ftcs(men)=tcrpc ! corresponds to the total elastic scattering cross section at the energy
penergy(men)=nenerg/1000.
rpw(ith)=1.
!---------------------------
line 1939 is
read (11,'(i2,I7,d22.15/(10f13.10))') na,nenerg,tcrpc,(rpw(i),i=1,ith)
I've tried different modifications of the code but didn't get any results.
Any help would be greatly appreciated!
You appear to have spaces padding the fields in your data file, but not in your read format. The first line of your file (with column labels) is
000000000111111111122222222223333
123456789012345678901234567890123
6 10 1.531581196563372e-15
so splitting this into i2,I7,d22.15 gives
i2 | I7 | d22.15 |
00 | 0000000 | 1111111111222222222233 | 33
12 | 3456789 | 0123456789012345678901 | 23
6 | 1 | 0 1.531581196563372e- | 15
which is clearly not as intended.
There are two ways around this problem:
As Ian Bush points out, you can forego the read format entirely, and used list-directed input, as
read (11,*) na,nenerg,tcrpc,(rpw(i),i=1,ith)
This will parse your file token by token rather than relying on column widths, and is usually a much better option for parsing data files.
If you must use a read format, you need to add space padding to it, e.g.
'(i2,X,I7,X,d22.15/10(X,f13.10))', which will then split the input string as
i2 | X | I7 | X | d22.15
00 | 0 | 0000001 | 1 | 1111111122222222223333
12 | 3 | 4567890 | 1 | 2345678901234567890123
6 | | 10 | | 1.531581196563372e-15

How to get parents and grand parents tags given specific attribute in XML in python?

I have an xml with a structure like this one:
<cat>
<foo>
<fooID>1</fooID>
<fooName>One</fooName>
<bar>
<barID>a</barID>
<barName>small_a</barName>
<barClass>
<baz>
<qux>
<corge>
<corgeName>...</corgeName>
<corgeType>
<corgeReport>
<corgeReportRes Reference="x" Channel="High">
<Pos>1</Pos>
</corgeReportRes>
</corgeReport>
</corgeType>
</corge>
</qux>
</baz>
</barClass>
</bar>
<bar>
<barID>b</barID>
<barName>small_b</barName>
<barClass>
<baz>
<qux>
<corge>
<corgeName>...</corgeName>
<corgeType>
<corgeReport>
<corgeReportRes Reference="y" Channel="High">
<Pos>1</Pos>
</corgeReportRes>
</corgeReport>
</corgeType>
</corge>
</qux>
</baz>
</barClass>
</bar>
</foo>
<foo>
<fooID>2</fooID>
<fooName>Two</fooName>
<bar>
<barID>c</barID>
<barName>small_c</barName>
<barClass>
<baz>
<qux>
<corge>
<corgeName>...</corgeName>
<corgeType>
<corgeReport>
<corgeReportRes Reference="z" Channel="High">
<Pos>1</Pos>
</corgeReportRes>
</corgeReport>
</corgeType>
</corge>
</qux>
</baz>
</barClass>
</bar>
</foo>
</cat>
And, I would like to obtain the values of specific parent/grand parent/grand grand parent tags that have a node with attribute Channel="High". I would like to obtain only fooID value, fooName value, barID value, barName value.
I have the following code in Python 3:
import xml.etree.ElementTree as xmlET
root = xmlET.parse('file.xml').getroot()
test = root.findall(".//*[#Channel='High']")
Which is actually giving me a list of elements that match, however, I still need the information of the specific parents/grand parents/grand grand parents.
How could I do that?
fooID | fooName | barID | barName
- - - - - - - - - - - - - - - - -
1 | One | a | small_a <-- This is the information I'm interested
1 | One | b | small_b <-- Also this
2 | Two | c | small_c <-- And this
Edit: fooID and fooName nodes are siblings of the grand-grand-parent bar, the one that contains the Channel="High". It's almost the same case for barID and barName, they are siblings of the grand-parent barClass, the one that contains the Channel="High". Also, what I want to obtain is the values 1, One, a and small_a, not filtering by it, since there will be multiple foo blocks.
If I understand you correctly, you are probably looking for something like this (using python):
from lxml import etree
foos = """[your xml above]"""
items = []
for entry in doc.xpath('//foo[.//corgeReportRes[#Channel="High"]]'):
items.append(entry.xpath('./fooID/text()')[0])
items.append(entry.xpath('./fooName/text()')[0])
items.append(entry.xpath('./bar/barID/text()')[0])
items.append(entry.xpath('./bar/barName/text()')[0])
print('fooID | fooName | barID | barName')
print(' | '.join(items))
Output:
fooID | fooName | barID | barName
1 | One | a | small_a

How to remove '/5' from CSV file

I am cleaning a restaurant data set using Pandas' read_csv.
I have columns like this:
name, online_order, book_table, rate, votes
xxxx, Yes, Yes, 4.5/5, 705
I expect them to be like this:
name, online_order, book_table, rate, votes
xxxx, Yes, Yes, 4.5, 705
You basically need to split the item(dataframe["rate"]) based on / and take out what you need. .apply this on your dataframe using lambda x: getRate(x)
def getRate(x):
return str(x).split("/")[0]
To use it with column name rate, we can use:
dataframe["rate"] = dataframe["rate"].apply(lambda x: getRate(x))
You can use the python .split() function to remove specific text, given that the text is consistently going to be "/5", and there are no instances of "/5" that you want to keep in that string. You can use it like this:
num = "4.5/5"
num.split("/5")[0]
output: '4.5'
If this isn't exactly what you need, there's more regex python functions here
You can use DataFrame.apply() to make your replacement operation on the ratecolumn:
def clean(x):
if "/" not in x :
return x
else:
return x[0:x.index('/')]
df.rate = df.rate.apply(lambda x : clean(x))
print(df)
Output
+----+-------+---------------+-------------+-------+-------+
| | name | online_order | book_table | rate | votes |
+----+-------+---------------+-------------+-------+-------+
| 0 | xxxx | Yes | Yes | 4.5 | 705 |
+----+-------+---------------+-------------+-------+-------+
EDIT
Edited to handle situations in which there could be multiple / or that it could be another number than /5 (ie : /4or /1/3 ...)

Create Bayesian Network and learn parameters with Python3.x [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I'm searching for the most appropriate tool for python3.x on Windows to create a Bayesian Network, learn its parameters from data and perform the inference.
The network structure I want to define myself as follows:
It is taken from this paper.
All the variables are discrete (and can take only 2 possible states) except "Size" and "GraspPose", which are continuous and should be modeled as Mixture of Gaussians.
Authors use Expectation-Maximization algorithm to learn the parameters for conditional probability tables and Junction-Tree algorithm to compute the exact inference.
As I understand all is realised in MatLab with Bayes Net Toolbox by Murphy.
I tried to search something similar in python and here are my results:
Python Bayesian Network Toolbox http://sourceforge.net/projects/pbnt.berlios/ (http://pbnt.berlios.de/). Web-site doesn't work, project doesn't seem to be supported.
BayesPy https://github.com/bayespy/bayespy
I think this is what I actually need, but I fail to find some examples similar to my case, to understand how to approach construction of the network structure.
PyMC seems to be a powerful module, but I have problems with importing it on Windows 64, python 3.3. I get error when I install development version
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
UPDATE:
libpgm (http://pythonhosted.org/libpgm/). Exactly what I need, unfortunately not supported by python 3.x
Very interesting actively developing library: PGMPY. Unfortunately continuous variables and learning from data is not supported yet. https://github.com/pgmpy/pgmpy/
Any advices and concrete examples will be highly appreciated.
It looks like pomegranate was recently updated to include Bayesian Networks. I haven't tried it myself, but the interface looks nice and sklearn-ish.
Try the bnlearn library, it contains many functions to learn parameters from data and perform the inference.
pip install bnlearn
Your use-case would be like this:
# Import the library
import bnlearn
# Define the network structure
edges = [('task', 'size'),
('lat var', 'size'),
('task', 'fill level'),
('task', 'object shape'),
('task', 'side graspable'),
('size', 'GrasPose'),
('task', 'GrasPose'),
('fill level', 'GrasPose'),
('object shape', 'GrasPose'),
('side graspable', 'GrasPose'),
('GrasPose', 'latvar'),
]
# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# DAG is stored in adjacency matrix
print(DAG['adjmat'])
# target task size lat var ... side graspable GrasPose latvar
# source ...
# task False True False ... True True False
# size False False False ... False True False
# lat var False True False ... False False False
# fill level False False False ... False True False
# object shape False False False ... False True False
# side graspable False False False ... False True False
# GrasPose False False False ... False False True
# latvar False False False ... False False False
#
# [8 rows x 8 columns]
# No CPDs are in the DAG. Lets see what happens if we print it.
bnlearn.print_CPD(DAG)
# >[BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
# Plot DAG. Note that it can be differently orientated if you re-make the plot.
bnlearn.plot(DAG)
Now we need the data to learn its parameters. Suppose these are stored in your df. The variable names in the data-file must be present in the DAG.
# Read data
df = pd.read_csv('path_to_your_data.csv')
# Learn the parameters and store CPDs in the DAG. Use the methodtype your desire. Options are maximumlikelihood or bayes.
DAG = bnlearn.parameter_learning.fit(DAG, df, methodtype='maximumlikelihood')
# CPDs are present in the DAG at this point.
bnlearn.print_CPD(DAG)
# Start making inferences now. As an example:
q1 = bnlearn.inference.fit(DAG, variables=['lat var'], evidence={'fill level':1, 'size':0, 'task':1})
Below is a working example with a demo dataset (sprinkler). You can play around with this.
# Import example dataset
df = bnlearn.import_example('sprinkler')
print(df)
# Cloudy Sprinkler Rain Wet_Grass
# 0 0 0 0 0
# 1 1 0 1 1
# 2 0 1 0 1
# 3 1 1 1 1
# 4 1 1 1 1
# .. ... ... ... ...
# 995 1 0 1 1
# 996 1 0 1 1
# 997 1 0 1 1
# 998 0 0 0 0
# 999 0 1 1 1
# [1000 rows x 4 columns]
# Define the network structure
edges = [('Cloudy', 'Sprinkler'),
('Cloudy', 'Rain'),
('Sprinkler', 'Wet_Grass'),
('Rain', 'Wet_Grass')]
# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# Print the CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
# Plot the DAG
bnlearn.plot(DAG)
# Parameter learning on the user-defined DAG and input data
DAG = bnlearn.parameter_learning.fit(DAG, df)
# Print the learned CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] Independencies:
# (Cloudy _|_ Wet_Grass | Rain, Sprinkler)
# (Sprinkler _|_ Rain | Cloudy)
# (Rain _|_ Sprinkler | Cloudy)
# (Wet_Grass _|_ Cloudy | Rain, Sprinkler)
# [BNLEARN.print_CPD] Nodes: ['Cloudy', 'Sprinkler', 'Rain', 'Wet_Grass']
# [BNLEARN.print_CPD] Edges: [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')]
# CPD of Cloudy:
# +-----------+-------+
# | Cloudy(0) | 0.494 |
# +-----------+-------+
# | Cloudy(1) | 0.506 |
# +-----------+-------+
# CPD of Sprinkler:
# +--------------+--------------------+--------------------+
# | Cloudy | Cloudy(0) | Cloudy(1) |
# +--------------+--------------------+--------------------+
# | Sprinkler(0) | 0.4807692307692308 | 0.7075098814229249 |
# +--------------+--------------------+--------------------+
# | Sprinkler(1) | 0.5192307692307693 | 0.2924901185770751 |
# +--------------+--------------------+--------------------+
# CPD of Rain:
# +---------+--------------------+---------------------+
# | Cloudy | Cloudy(0) | Cloudy(1) |
# +---------+--------------------+---------------------+
# | Rain(0) | 0.6518218623481782 | 0.33695652173913043 |
# +---------+--------------------+---------------------+
# | Rain(1) | 0.3481781376518219 | 0.6630434782608695 |
# +---------+--------------------+---------------------+
# CPD of Wet_Grass:
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Rain | Rain(0) | Rain(0) | Rain(1) | Rain(1) |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Sprinkler | Sprinkler(0) | Sprinkler(1) | Sprinkler(0) | Sprinkler(1) |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(0) | 0.7553816046966731 | 0.33755274261603374 | 0.25588235294117645 | 0.37910447761194027 |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(1) | 0.2446183953033268 | 0.6624472573839663 | 0.7441176470588236 | 0.6208955223880597 |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# Make inference
q1 = bnlearn.inference.fit(DAG, variables=['Wet_Grass'], evidence={'Rain':1, 'Sprinkler':0, 'Cloudy':1})
# +--------------+------------------+
# | Wet_Grass | phi(Wet_Grass) |
# +==============+==================+
# | Wet_Grass(0) | 0.2559 |
# +--------------+------------------+
# | Wet_Grass(1) | 0.7441 |
# +--------------+------------------+
print(q1.values)
# array([0.25588235, 0.74411765])
More examples can be found on documentation the pages of bnlearn or read the blog.
I was looking for a similar library, and I found that the pomegranate is a good one. Thanks James Atwood
Here is an example how to use it.
from pomegranate import *
import numpy as np
mydb=np.array([[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,3,8],[2,3,8],[1,2,4]])
bnet = BayesianNetwork.from_samples(mydb)
print(bnet.node_count())
print(bnet.probability([[1,2,3]]))
print (bnet.probability([[1,2,8]]))
For pymc's g++ problem, I highly recommend to get g++ installation done, it would hugely boost the sampling process, otherwise you will have to live with this warning and sit there for 1 hour for a 2000 sampling process.
The way to get the warning fixed is:
1. get g++ installed, download cywing and get g++ install, you can google that. To check this, just go to "cmd" and type "g++", if it says "require input file", great, you got g++ installed.
2. install python package: mingw, libpython
3. install python package: theano
this should get this problem fixed.
I am currently working on the same problem with you, good luck!
Late to the party, as always, but I've wrapped up the BayesServer Java API using JPype; it might not have all the functionality that you need but you would create the above network using something like:
from bayesianpy.network import Builder as builder
import bayesianpy.network
nt = bayesianpy.network.create_network()
# where df is your dataframe
task = builder.create_discrete_variable(nt, df, 'task')
size = builder.create_continuous_variable(nt, 'size')
grasp_pose = builder.create_continuous_variable(nt, 'GraspPose')
builder.create_link(nt, size, grasp_pose)
builder.create_link(nt, task, grasp_pose)
for v in ['fill level', 'object shape', 'side graspable']:
va = builder.create_discrete_variable(nt, df, v)
builder.create_link(nt, va, grasp_pose)
builder.create_link(nt, task, va)
# write df to data store
with bayesianpy.data.DataSet(df, bayesianpy.utils.get_path_to_parent_dir(__file__), logger) as dataset:
model = bayesianpy.model.NetworkModel(nt, logger)
model.train(dataset)
# to query model multi-threaded
results = model.batch_query(dataset, [bayesianpy.model.QueryModelStatistics()], append_to_df=False)
I'm not affiliated with Bayes Server - and the Python wrapper is not 'official' (you can use the Java API via Python directly). My wrapper makes some assumptions and places limitations on functions that I don't use very much. The repo is here: github.com/morganics/bayesianpy

Resources