Export p-values from reghdfe to Excel - excel

I am trying to export p-values to Excel from the Stata community-contributed command reghdfe:
// get data
use "http://www.stata-press.com/data/r9/nlswork.dta", clear
xtset idcode year
// run regression
reghdfe ttl_exp age not_smsa msp nev_mar, abs(idcode year) cluster(idcode)
// store numbers
local rmse = `e(rmse)'
// export
putexcel A1 = (`rmse') using "export.xlsx" , modify // this works
How can I get the p-value of a given variable (say msp) exported to the same Excel document?
The ereturn list does not suggest any helpful info, e(V) contains only the covariance matrix.
Cross posted on Statalist.

The following works for me:
use "http://www.stata-press.com/data/r9/nlswork.dta", clear
xtset idcode year
reghdfe ttl_exp age not_smsa msp nev_mar, abs(idcode year) cluster(idcode)
------------------------------------------------------------------------------
| Robust
ttl_exp | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.008044 .0618821 -0.13 0.897 -.129366 .1132781
not_smsa | -.1616256 .0812134 -1.99 0.047 -.3208472 -.0024039
msp | -.004893 .0634814 -0.08 0.939 -.1293505 .1195645
nev_mar | -.36351 .0955207 -3.81 0.000 -.5507816 -.1762384
------------------------------------------------------------------------------
local pval = (2 * ttail(e(df_r), abs(_b[msp] / _se[msp]) ) )
display `pval'
.93856498
putexcel A1 = (`pval') using "export.xlsx", modify

Related

Let the user create a custom format using a string

I'd like a user to be able to create a custom format in QtWidgets.QPlainTextEdit() and it would format the string and split out the results in another QtWidgets.QPlainTextEdit().
For example:
movie = {
"Title":"The Shawshank Redemption",
"Year":"1994",
"Rated":"R",
"Released":"14 Oct 1994",
"Runtime":"142 min",
"Genre":"Drama",
"Director":"Frank Darabont",
"Writer":"Stephen King (short story \"Rita Hayworth and Shawshank Redemption\"),Frank Darabont (screenplay)",
"Actors":"Tim Robbins, Morgan Freeman, Bob Gunton, William Sadler",
"Plot":"Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.",
"Language":"English",
"Country":"USA",
"Awards":"Nominated for 7 Oscars. Another 21 wins & 36 nominations.",
"Poster":"https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmNhMC00ZDIzLWFmNTEtODM1ZmRlYWMwMWFmXkEyXkFqcGdeQXVyMTMxODk2OTU#._V1_SX300.jpg",
"Ratings": [
{
"Source":"Internet Movie Database",
"Value":"9.3/10"
},
{
"Source":"Rotten Tomatoes",
"Value":"91%"
},
{
"Source":"Metacritic",
"Value":"80/100"
}
],
"Metascore":"80",
"imdbRating":"9.3",
"imdbVotes":"2,367,380",
"imdbID":"tt0111161",
"Type":"movie",
"DVD":"15 Aug 2008",
"BoxOffice":"$28,699,976",
"Production":"Columbia Pictures, Castle Rock Entertainment",
"Website":"N/A"
}
custom_format = '[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | πŸ“… {Released} | {Rated} ]'.format(Title=movie['Title'], Runtime=movie['Runtime'], Genre=movie['Genre'],Released=movie['Released'],Rated=movie['Rated'])
print(custom_format)
This code above, would easily print [ The Shawshank Redemption | ⌚ 142 min | ⭐ Drama | πŸ“… 14 Oct 1994 | R ].
However, if I change this code from:
custom_format = '[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | πŸ“… {Released} | {Rated} ]'.format(Title=movie['Title'], Runtime=movie['Runtime'], Genre=movie['Genre'],Released=movie['Released'],Rated=movie['Rated'])
To:
custom_format = "'[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | πŸ“… {Released} | {Rated} ]'.format(Title=movie['Title'], Runtime=movie['Runtime'], Genre=movie['Genre'],Released=movie['Released'],Rated=movie['Rated'])"
Notice, that the whole thing is wrapped in "". Therefor its a string. Now doing this will not print out the format that I want.
The reason I wrapped it in "" is because when I add my original custom_format into a QtWidgets.QPlainTextEdit(), it converts it into a string it wont format later on.
So my original idea was, the user creates a custom format for themselves in a QtWidgets.QPlainTextEdit(). Then I copy that format, open a new window wher the movie json variable is contained and paste the format into another QtWidgets.QPlainTextEdit() where it would hopefuly show it formatted correctly.
Any help on this would be appreciated.
ADDITIONAL INFORMATION:
User creates their format inside QtWidgets.QPlainTextEdit().
Then the user clicks Test Format which should display [ The Shawshank Redemption | ⌚ 142 min | ⭐ Drama | πŸ“… 14 Oct 1994 | R ] but instead it displays
Trying to use the full format command would require an eval(), which is normally considered not only bad practice, but also a serious security issue, especially when the input argument is completely set by the user.
Since the fields are known, I see little point in providing the whole format line, and it is better to parse the format string looking for keywords, then use keyword lookup to create the output.
class Formatter(QtWidgets.QWidget):
def __init__(self):
super().__init__()
layout = QtWidgets.QVBoxLayout(self)
self.formatBase = QtWidgets.QPlainTextEdit(
'[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | πŸ“… {Released} | {Rated} ]')
self.formatOutput = QtWidgets.QPlainTextEdit()
layout.addWidget(self.formatBase)
layout.addWidget(self.formatOutput)
self.formatBase.textChanged.connect(self.processFormat)
self.processFormat()
def processFormat(self):
format_str = self.formatBase.toPlainText()
# escape double braces
clean = re.sub('{{', '', re.sub('}}', '', format_str))
# capture keyword arguments
tokens = re.split(r'\{(.*?)\}', clean)
keywords = tokens[1::2]
try:
# build the dictionary with given arguments, unrecognized keywords
# are just printed back in the {key} form, in order let the
# user know that the key wasn't valid;
values = {k:movie.get(k, '{{{}}}'.format(k)) for k in keywords}
self.formatOutput.setPlainText(format_str.format(**values))
except (ValueError, KeyError):
# exception for unmatching braces
pass

How to remove '/5' from CSV file

I am cleaning a restaurant data set using Pandas' read_csv.
I have columns like this:
name, online_order, book_table, rate, votes
xxxx, Yes, Yes, 4.5/5, 705
I expect them to be like this:
name, online_order, book_table, rate, votes
xxxx, Yes, Yes, 4.5, 705
You basically need to split the item(dataframe["rate"]) based on / and take out what you need. .apply this on your dataframe using lambda x: getRate(x)
def getRate(x):
return str(x).split("/")[0]
To use it with column name rate, we can use:
dataframe["rate"] = dataframe["rate"].apply(lambda x: getRate(x))
You can use the python .split() function to remove specific text, given that the text is consistently going to be "/5", and there are no instances of "/5" that you want to keep in that string. You can use it like this:
num = "4.5/5"
num.split("/5")[0]
output: '4.5'
If this isn't exactly what you need, there's more regex python functions here
You can use DataFrame.apply() to make your replacement operation on the ratecolumn:
def clean(x):
if "/" not in x :
return x
else:
return x[0:x.index('/')]
df.rate = df.rate.apply(lambda x : clean(x))
print(df)
Output
+----+-------+---------------+-------------+-------+-------+
| | name | online_order | book_table | rate | votes |
+----+-------+---------------+-------------+-------+-------+
| 0 | xxxx | Yes | Yes | 4.5 | 705 |
+----+-------+---------------+-------------+-------+-------+
EDIT
Edited to handle situations in which there could be multiple / or that it could be another number than /5 (ie : /4or /1/3 ...)

svm train output file has less lines than that of the input file

I am currently building a binary classification model and have created an input file for svm-train (svm_input.txt). This input file has 453 lines, 4 No. features and 2 No. classes [0,1].
i.e
0 1:15.0 2:40.0 3:30.0 4:15.0
1 1:22.73 2:40.91 3:36.36 4:0.0
1 1:31.82 2:27.27 3:22.73 4:18.18
0 1:22.73 2:13.64 3:36.36 4:27.27
1 1:30.43 2:39.13 3:13.04 4:17.39 ......................
My problem is that when I count the number of lines in the output model generated by svm-train (svm_train_model.txt), this has 12 fewer lines than that of the input file. The line count here shows 450, although there are obviously also 9 lines at the beginning showing the various parameters generated
i.e.
svm_type c_svc
kernel_type rbf
gamma 1
nr_class 2
total_sv 441
rho -0.156449
label 0 1
nr_sv 228 213
SV
Therefore 12 lines in total from the original input of 453 have gone. I am new to svm and was hoping that someone could shed some light on why this might have happened?
Thanks in advance
Updated.........
I now believe that in generating the model, it has removed lines whereby the labels and all the parameters are exactly the same.
To explain............... My input is a set of miRNAs which have been classified as 1 and 0 depending on their involvement in a particular process or not (i.e 1=Yes & 0=No). The input file looks something like.......
0 1:22 2:30 3:14 4:16
1 1:26 2:15 3:17 4:25
0 1:22 2:30 3:14 4:16
Whereby, lines one and three are exactly the same and as a result will be removed from the output model. My question is then both why the output model would do this and how I can get around this (whilst using the same features)?
Whilst both SOME OF the labels and their corresponding feature values are identical within the input file, these are still different miRNAs.
NOTE: The Input file does not have a feature for miRNA name (and this would clearly show the differences in each line) however, in terms of the features used (i.e Nucleotide Percentage Content), some of the miRNAs do have exactly the same percentage content of A,U,G & C and as a result are viewed as duplicates and then removed from the output model as it obviously views them as duplicates even though they are not (hence there are less lines in the output model).
the format of the input file is:
Where:
Column 0 - label (i.e 1 or 0): 1=Yes & 0=No
Column 1 - Feature 1 = Percentage Content "A"
Column 2 - Feature 2 = Percentage Content "U"
Column 3 - Feature 3 = Percentage Content "G"
Column 4 - Feature 4 = Percentage Content "C"
The input file actually looks something like (See the very first two lines below), as they appear identical, however each line represents a different miRNA):
1 1:23 2:36 3:23 4:18
1 1:23 2:36 3:23 4:18
0 1:36 2:32 3:5 4:27
1 1:14 2:41 3:36 4:9
1 1:18 2:50 3:18 4:14
0 1:36 2:23 3:23 4:18
0 1:15 2:40 3:30 4:15
In terms of software, I am using libsvm-3.22 and python 2.7.5
Align your input file properly, is my first observation. The code for libsvm doesnt look for exactly 4 features. I identifies by the string literals you have provided separating the features from the labels. I suggest manually converting your input file to create the desired input argument.
Try the following code in python to run
Requirements - h5py, if your input is from matlab. (.mat file)
pip install h5py
import h5py
f = h5py.File('traininglabel.mat', 'r')# give label.mat file for training
variables = f.items()
labels = []
c = []
import numpy as np
for var in variables:
data = var[1]
lables = (data.value[0])
trainlabels= []
for i in lables:
trainlabels.append(str(i))
finaltrain = []
trainlabels = np.array(trainlabels)
for i in range(0,len(trainlabels)):
if trainlabels[i] == '0.0':
trainlabels[i] = '0'
if trainlabels[i] == '1.0':
trainlabels[i] = '1'
print trainlabels[i]
f = h5py.File('training_features.mat', 'r') #give features here
variables = f.items()
lables = []
file = open('traindata.txt', 'w+')
for var in variables:
data = var[1]
lables = data.value
for i in range(0,1000): #no of training samples in file features.mat
file.write(str(trainlabels[i]))
file.write(' ')
for j in range(0,49):
file.write(str(lables[j][i]))
file.write(' ')
file.write('\n')

Behave: Writing a Scenario Outline with dynamic examples

Gherkin / Behave Examples
Gherkin syntax features test automation using examples:
Feature: Scenario Outline (tutorial04)
Scenario Outline: Use Blender with <thing>
Given I put "<thing>" in a blender
When I switch the blender on
Then it should transform into "<other thing>"
Examples: Amphibians
| thing | other thing |
| Red Tree Frog | mush |
| apples | apple juice |
Examples: Consumer Electronics
| thing | other thing |
| iPhone | toxic waste |
| Galaxy Nexus | toxic waste |
The test suite would run four times, once for each example, giving a result similar to:
My problem
How can I test using confidential data in the Examples section? For example, I would like to test an internal API with user ids or SSN numbers, without keeping the data hard coded in the feature file.
Is there a way to load the Examples dynamically from an external source?
Update: Opened a github issue on the behave project.
I've come up with another solution (behave-1.2.6):
I managed to dynamically create examples for a Scenario Outline by using before_feature.
Given a feature file (x.feature):
Feature: Verify squared numbers
Scenario Outline: Verify square for <number>
Then the <number> squared is <result>
Examples: Static
| number | result |
| 1 | 1 |
| 2 | 4 |
| 3 | 9 |
| 4 | 16 |
# Use the tag to mark this outline
#dynamic
Scenario Outline: Verify square for <number>
Then the <number> squared is <result>
Examples: Dynamic
| number | result |
| . | . |
And the steps file (steps/x.step):
from behave import step
#step('the {number:d} squared is {result:d}')
def step_impl(context, number, result):
assert number*number == result
The trick is to use before_feature in environment.py as it has already parsed the examples tables to the scenario outlines, but hasn't generated the scenarios from the outline yet.
import behave
import copy
def before_feature(context, feature):
features = (s for s in feature.scenarios if type(s) == behave.model.ScenarioOutline and
'dynamic' in s.tags)
for s in features:
for e in s.examples:
orig = copy.deepcopy(e.table.rows[0])
e.table.rows = []
for num in range(1,5):
n = copy.deepcopy(orig)
# This relies on knowing that the table has two rows.
n.cells = ['{}'.format(num), '{}'.format(num*num)]
e.table.rows.append(n)
This will only operate on Scenario Outlines that are tagged with #dynamic.
The result is:
behave -k --no-capture
Feature: Verify squared numbers # features/x.feature:1
Scenario Outline: Verify square for 1 -- #1.1 Static # features/x.feature:8
Then the 1 squared is 1 # features/steps/x.py:3
Scenario Outline: Verify square for 2 -- #1.2 Static # features/x.feature:9
Then the 2 squared is 4 # features/steps/x.py:3
Scenario Outline: Verify square for 3 -- #1.3 Static # features/x.feature:10
Then the 3 squared is 9 # features/steps/x.py:3
Scenario Outline: Verify square for 4 -- #1.4 Static # features/x.feature:11
Then the 4 squared is 16 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 1 -- #1.1 Dynamic # features/x.feature:19
Then the 1 squared is 1 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 2 -- #1.2 Dynamic # features/x.feature:19
Then the 2 squared is 4 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 3 -- #1.3 Dynamic # features/x.feature:19
Then the 3 squared is 9 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 4 -- #1.4 Dynamic # features/x.feature:19
Then the 4 squared is 16 # features/steps/x.py:3
1 feature passed, 0 failed, 0 skipped
8 scenarios passed, 0 failed, 0 skipped
8 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.005s
This relies on having an Examples table with the correct shape as the final table, in my example, with two rows. I also don't fuss with creating new behave.model.Row objects, I just copy the one from the table and update it. For extra ugliness, if you're using a file, you can put the file name in the Examples table.
Got here looking for something else, but since I've been in similar situation with Cucumber before, maybe someone will also end up at this question, looking for a possible solution. My approach to this problem is to use BDD variables that I can later handle at runtime in my step_definitions. In my python code I can check what is the value of the Gherkin variable and map it to what's needed.
For this example:
Scenario Outline: Use Blender with <thing>
Given I put "<thing>" in a blender
When I switch the blender on
Then it should transform into "<other thing>"
Examples: Amphibians
| thing | other thing |
| Red Tree Frog | mush |
| iPhone | data.iPhone.secret_key | # can use .yaml syntax here as well
Would translate to such step_def code:
#given('I put "{thing}" in a blender')
def step_then_should_transform_into(context, other_thing):
if other_thing == BddVariablesEnum.SECRET_KEY:
basic_actions.load_secrets(context, key)
So all you have to do is to have well defined DSL layer.
Regarding the issue of using SSN numbers in testing, I'd just use fake SSNs and not worry that I'm leaking people's private information.
Ok, but what about the larger issue? You want to use a scenario outline with examples that you cannot put in your feature file. Whenever I've run into this problem what I did was to give a description of the data I need and let the step implementation either create the actual data set used for testing or fetch the data set from an existing test database.
Scenario Outline: Accessing the admin interface
Given a user who <status> an admin has logged in
Then the user <ability> see the admin interface
Examples: Users
| status | ability |
| is | can |
| is not | cannot |
There's no need to show any details about the user in the feature file. The step implementation is responsible for either creating or fetching the appropriate type of user depending on the value of status.

Create Bayesian Network and learn parameters with Python3.x [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I'm searching for the most appropriate tool for python3.x on Windows to create a Bayesian Network, learn its parameters from data and perform the inference.
The network structure I want to define myself as follows:
It is taken from this paper.
All the variables are discrete (and can take only 2 possible states) except "Size" and "GraspPose", which are continuous and should be modeled as Mixture of Gaussians.
Authors use Expectation-Maximization algorithm to learn the parameters for conditional probability tables and Junction-Tree algorithm to compute the exact inference.
As I understand all is realised in MatLab with Bayes Net Toolbox by Murphy.
I tried to search something similar in python and here are my results:
Python Bayesian Network Toolbox http://sourceforge.net/projects/pbnt.berlios/ (http://pbnt.berlios.de/). Web-site doesn't work, project doesn't seem to be supported.
BayesPy https://github.com/bayespy/bayespy
I think this is what I actually need, but I fail to find some examples similar to my case, to understand how to approach construction of the network structure.
PyMC seems to be a powerful module, but I have problems with importing it on Windows 64, python 3.3. I get error when I install development version
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
UPDATE:
libpgm (http://pythonhosted.org/libpgm/). Exactly what I need, unfortunately not supported by python 3.x
Very interesting actively developing library: PGMPY. Unfortunately continuous variables and learning from data is not supported yet. https://github.com/pgmpy/pgmpy/
Any advices and concrete examples will be highly appreciated.
It looks like pomegranate was recently updated to include Bayesian Networks. I haven't tried it myself, but the interface looks nice and sklearn-ish.
Try the bnlearn library, it contains many functions to learn parameters from data and perform the inference.
pip install bnlearn
Your use-case would be like this:
# Import the library
import bnlearn
# Define the network structure
edges = [('task', 'size'),
('lat var', 'size'),
('task', 'fill level'),
('task', 'object shape'),
('task', 'side graspable'),
('size', 'GrasPose'),
('task', 'GrasPose'),
('fill level', 'GrasPose'),
('object shape', 'GrasPose'),
('side graspable', 'GrasPose'),
('GrasPose', 'latvar'),
]
# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# DAG is stored in adjacency matrix
print(DAG['adjmat'])
# target task size lat var ... side graspable GrasPose latvar
# source ...
# task False True False ... True True False
# size False False False ... False True False
# lat var False True False ... False False False
# fill level False False False ... False True False
# object shape False False False ... False True False
# side graspable False False False ... False True False
# GrasPose False False False ... False False True
# latvar False False False ... False False False
#
# [8 rows x 8 columns]
# No CPDs are in the DAG. Lets see what happens if we print it.
bnlearn.print_CPD(DAG)
# >[BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
# Plot DAG. Note that it can be differently orientated if you re-make the plot.
bnlearn.plot(DAG)
Now we need the data to learn its parameters. Suppose these are stored in your df. The variable names in the data-file must be present in the DAG.
# Read data
df = pd.read_csv('path_to_your_data.csv')
# Learn the parameters and store CPDs in the DAG. Use the methodtype your desire. Options are maximumlikelihood or bayes.
DAG = bnlearn.parameter_learning.fit(DAG, df, methodtype='maximumlikelihood')
# CPDs are present in the DAG at this point.
bnlearn.print_CPD(DAG)
# Start making inferences now. As an example:
q1 = bnlearn.inference.fit(DAG, variables=['lat var'], evidence={'fill level':1, 'size':0, 'task':1})
Below is a working example with a demo dataset (sprinkler). You can play around with this.
# Import example dataset
df = bnlearn.import_example('sprinkler')
print(df)
# Cloudy Sprinkler Rain Wet_Grass
# 0 0 0 0 0
# 1 1 0 1 1
# 2 0 1 0 1
# 3 1 1 1 1
# 4 1 1 1 1
# .. ... ... ... ...
# 995 1 0 1 1
# 996 1 0 1 1
# 997 1 0 1 1
# 998 0 0 0 0
# 999 0 1 1 1
# [1000 rows x 4 columns]
# Define the network structure
edges = [('Cloudy', 'Sprinkler'),
('Cloudy', 'Rain'),
('Sprinkler', 'Wet_Grass'),
('Rain', 'Wet_Grass')]
# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# Print the CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
# Plot the DAG
bnlearn.plot(DAG)
# Parameter learning on the user-defined DAG and input data
DAG = bnlearn.parameter_learning.fit(DAG, df)
# Print the learned CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] Independencies:
# (Cloudy _|_ Wet_Grass | Rain, Sprinkler)
# (Sprinkler _|_ Rain | Cloudy)
# (Rain _|_ Sprinkler | Cloudy)
# (Wet_Grass _|_ Cloudy | Rain, Sprinkler)
# [BNLEARN.print_CPD] Nodes: ['Cloudy', 'Sprinkler', 'Rain', 'Wet_Grass']
# [BNLEARN.print_CPD] Edges: [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')]
# CPD of Cloudy:
# +-----------+-------+
# | Cloudy(0) | 0.494 |
# +-----------+-------+
# | Cloudy(1) | 0.506 |
# +-----------+-------+
# CPD of Sprinkler:
# +--------------+--------------------+--------------------+
# | Cloudy | Cloudy(0) | Cloudy(1) |
# +--------------+--------------------+--------------------+
# | Sprinkler(0) | 0.4807692307692308 | 0.7075098814229249 |
# +--------------+--------------------+--------------------+
# | Sprinkler(1) | 0.5192307692307693 | 0.2924901185770751 |
# +--------------+--------------------+--------------------+
# CPD of Rain:
# +---------+--------------------+---------------------+
# | Cloudy | Cloudy(0) | Cloudy(1) |
# +---------+--------------------+---------------------+
# | Rain(0) | 0.6518218623481782 | 0.33695652173913043 |
# +---------+--------------------+---------------------+
# | Rain(1) | 0.3481781376518219 | 0.6630434782608695 |
# +---------+--------------------+---------------------+
# CPD of Wet_Grass:
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Rain | Rain(0) | Rain(0) | Rain(1) | Rain(1) |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Sprinkler | Sprinkler(0) | Sprinkler(1) | Sprinkler(0) | Sprinkler(1) |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(0) | 0.7553816046966731 | 0.33755274261603374 | 0.25588235294117645 | 0.37910447761194027 |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(1) | 0.2446183953033268 | 0.6624472573839663 | 0.7441176470588236 | 0.6208955223880597 |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# Make inference
q1 = bnlearn.inference.fit(DAG, variables=['Wet_Grass'], evidence={'Rain':1, 'Sprinkler':0, 'Cloudy':1})
# +--------------+------------------+
# | Wet_Grass | phi(Wet_Grass) |
# +==============+==================+
# | Wet_Grass(0) | 0.2559 |
# +--------------+------------------+
# | Wet_Grass(1) | 0.7441 |
# +--------------+------------------+
print(q1.values)
# array([0.25588235, 0.74411765])
More examples can be found on documentation the pages of bnlearn or read the blog.
I was looking for a similar library, and I found that the pomegranate is a good one. Thanks James Atwood
Here is an example how to use it.
from pomegranate import *
import numpy as np
mydb=np.array([[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,3,8],[2,3,8],[1,2,4]])
bnet = BayesianNetwork.from_samples(mydb)
print(bnet.node_count())
print(bnet.probability([[1,2,3]]))
print (bnet.probability([[1,2,8]]))
For pymc's g++ problem, I highly recommend to get g++ installation done, it would hugely boost the sampling process, otherwise you will have to live with this warning and sit there for 1 hour for a 2000 sampling process.
The way to get the warning fixed is:
1. get g++ installed, download cywing and get g++ install, you can google that. To check this, just go to "cmd" and type "g++", if it says "require input file", great, you got g++ installed.
2. install python package: mingw, libpython
3. install python package: theano
this should get this problem fixed.
I am currently working on the same problem with you, good luck!
Late to the party, as always, but I've wrapped up the BayesServer Java API using JPype; it might not have all the functionality that you need but you would create the above network using something like:
from bayesianpy.network import Builder as builder
import bayesianpy.network
nt = bayesianpy.network.create_network()
# where df is your dataframe
task = builder.create_discrete_variable(nt, df, 'task')
size = builder.create_continuous_variable(nt, 'size')
grasp_pose = builder.create_continuous_variable(nt, 'GraspPose')
builder.create_link(nt, size, grasp_pose)
builder.create_link(nt, task, grasp_pose)
for v in ['fill level', 'object shape', 'side graspable']:
va = builder.create_discrete_variable(nt, df, v)
builder.create_link(nt, va, grasp_pose)
builder.create_link(nt, task, va)
# write df to data store
with bayesianpy.data.DataSet(df, bayesianpy.utils.get_path_to_parent_dir(__file__), logger) as dataset:
model = bayesianpy.model.NetworkModel(nt, logger)
model.train(dataset)
# to query model multi-threaded
results = model.batch_query(dataset, [bayesianpy.model.QueryModelStatistics()], append_to_df=False)
I'm not affiliated with Bayes Server - and the Python wrapper is not 'official' (you can use the Java API via Python directly). My wrapper makes some assumptions and places limitations on functions that I don't use very much. The repo is here: github.com/morganics/bayesianpy

Resources