spaCy Alternatives in Java - nlp
I currently use spaCy to traverse the dependency tree, and generate entities.
nlp = get_spacy_model(detect_lang(unicode_text))
doc = nlp(unicode_text)
entities = set()
for sentence in doc.sents:
# traverse tree picking up entities
for token in sentence.subtree:
## pick entitites using some pre-defined rules
entities.discard('')
return entities
Are there any good Java alternatives for spaCy?
I am looking for libs which generate the Dependency Tree as is done by spaCy.
EDIT:
I looked into Stanford Parser. However, it generated the following parse tree:
ROOT
|
NP
_______________|_________
| NP
| _________|___
| | PP
| | ________|___
NP NP | NP
____|__________ | | _______|____
DT JJ JJ NN NNS IN DT JJ NN
| | | | | | | | |
the quick brown fox jumps over the lazy dog
However, I am looking for a tree structure like spaCy does:
jumps_VBZ
__________________________|___________________
| | | | | over_IN
| | | | | |
| | | | | dog_NN
| | | | | _______|_______
The_DT quick_JJ brown_JJ fox_NN ._. the_DT lazy_JJ
You're looking for the Stanford Dependency Parser. Like most of the Stanford tools, this is also bundled with Stanford CoreNLP under the depparse annotator. Other parsers include the Malt parser (a feature-based shift reduce parser) and Ryan McDonald's the MST parser (an accurate but slower maximum spanning tree parser).
Another solution to integrate with Java and other languages is by using Spacy REST API. For example https://github.com/jgontrum/spacy-api-docker provide a Dockerization of Spacy REST API.
I recently released spaCy4j which mimics Token container objects from spaCy and integrates with spaCy server or CoreNLP.
Once you have a running docker of spacy-server (very easy to set up) it's as easy as:
// Create a new spacy-server adapter with host and port matching a running instance of spacy-server.
SpaCyAdapter adapter = SpaCyServerAdapter.create("localhost", 8080);
// Create a new SpaCy object. It is thread safe and should be reused across our app
SpaCy spacy = SpaCy.create(adapter);
// Parse a doc
Doc doc = spacy.nlp("My head feels like a frisbee, twice its normal size.");
// Inspect tokens
for (Token token : doc.tokens()) {
System.out.printf("Token: %s, Tag: %s, Pos: %s, Dependency: %s%n",
token.text(), token.tag(), token.pos(), token.dependency());
}
Feel free to contact via github for any questions etc.
spacy can be run through java program.
The env should be created first from command prompt by executing the following commands
python3 -m venv env
source ./env/bin/activate
pip install -U spacy
python -m spacy download en
python -m spacy download de
create a bash file spacyt.sh with following commands,parallel to env folder
#!/bin/bash
python3 -m venv env
source ./env/bin/activate
python test1.py
place the spacy code as python script, test1.py
import spacy
print('This is a test script of spacy')
nlp=spacy.load("en_core_web_sm")
doc=nlp(u"This is a sentence")
print([(w.text, w.pos_) for w in doc])
// instead of print we can write to a file for further processing
In java program run the bash file
String cmd="./spacyt.sh";
try {
Process p = Runtime.getRuntime().exec(cmd);
p.waitFor();
System.out.println("cmdT executed!");
} catch (Exception e) {
e.printStackTrace();
}
Related
How to handle wide markdown tables and line-length checks in pre-commit?
Context After applying a line length limit of 80 characters on the pre-commit check of markdown-lint, I was experiencing some difficulties in including a markdown table that I create with more width than 80 characters. Note I see value in applying the linter to the README.md because I quite often forget about the line length while typing the README.md. (In essence, the trivial solution: disable the linter or disable MD013 everywhere, is considered sub-optimal). Pre-commit of MarkdownLint - repo: https://github.com/markdownlint/markdownlint rev: v0.11.0 hooks: - id: markdownlint Markdown table example | Algorithm | Encoding | Adaptation | Radiation | Backend | | ------------------------------------ | -------- | ---------- | ------------ | ---------------------------- | | Minimum Dominating Set Approximation | Sparse | Redundancy | Neuron Death | - networkx LIF<br>- Lava LIF | | Some Algorithm Approximation | Sparse | Redundancy | Neuron Death | - networkx LIF<br>- Lava LIF | | | | | | | Approach I First I tried to include a ignore MD013 (line length check) in the relevant section of the Markdown table, however, Markdown Lint does not support such an option. Approach II I tried to manually apply the new line breaks to the table, however, that results in additional rows in the table: Question How can I stay within the 80 lines whilst including a wide markdown table, (without generating new horizontal lines)?
You could try changing your hook to another similar project: igorshubovych/markdownlint-cli repos: - repo: https://github.com/igorshubovych/markdownlint-cli rev: v0.32.2 hooks: - id: markdownlint args: ["--fix"] You may include a .markdownlint.yaml file in the same directory as your .pre-commit-config.yaml. Set the line length rule but ignore it for tables. Like so: # Default state for all rules default: true # MD013/line-length - Line length MD013: line_length: 80 tables: false Check the .markdownlint.yaml schema for other configuration options.
SyntaxError when using print(""" with a lst of numbers to populate a file with GROMACS patched with PLUMED
I am using GROMACS with PLUMED to run MD simulations. In setting up my plumed file for collecting the S2/SH CV from Omar(https://www.plumed.org/doc-v2.8/user-doc/html/_s2_c_m.html) I am having difficulties with the line: File "makingplumed.py", line 25 """ % (x,i)file=f) ^ SyntaxError: invalid syntax Here is the code I am trying to run: # here we create the PLUMED input file with python with open("plumed.dat","w") as f: # print initial stuff #K# from __future__ import print_function # Define Atoms which are Oxygen hydrogen bond acceptors ATOMS=[21,35,45,62,76,97,109,133,152,174,188,202,213,227,239,253,269,280,292,311,323,339,353,377,401,416,426,447,466,477,488,503,518,538,560,575,597,617,624,641,655,677,692,702,722,743,765,784,798,820,844,866,883,897,919,939,961,978,988,1004,1021,1040] #Define heavy atoms for S2CM CV (protein and backbone and not hydrogen) heavy_atoms_nh: GROUP ATOMS=1,5,7,10,12,16,20,21,22,23,26,29,32,34,35,36,38,40,44,45,46,48,50,53,54,55,57,59,61,62,63,64,67,70,73,75,76,79,81,84,85,87,89,90,92,94,96,97,98,100,102,105,106,107,108,109,110,112,114,117,120,123,124,125,126,129,132,133,134,136,138,141,143,147,151,152,153,155,157,160,163,166,169,173,174,175,177,179,181,185,187,188,189,191,193,195,199,201,202,203,205,207,210,212,213,214,216,217,218,220,224,226,227,228,230,232,235,236,237,238,239,240,241,244,247,250,252,253,254,256,258,260,264,268,269,270,272,274,277,279,280,281,283,285,288,291,392,293,295,297,299,303,306,310,311,312,314,316,319,320,321,322,323,326,328,330,334,338,339,340,342,344,346,350,352,353,354,356,358,361,364,367,369,370,373,376,377,378,380,382,385,388,391,393,397,400,401,402,404,406,409,412,415,416,417,419,421,425,426,427,429,431,434,345,437,439,442,444,446,447,448,450,452,455,457,461,465,466,467,469,471,474,476,477,478,480,482,485,487,489,491,493,496,499,500,501,502,503,504,506,508,511,514,515,516,517,518,519,521,523,526,527,529,531,533,535,537,538,539,541,543,546,549,552,555,559,560,561,563,265,268,571,572,573,574,575,576,578,580,583,586,589,592,596,597,598,600,602,605,606,608,610,612,614,616,617,618,620,623,624,625,627,629,632,635,636,640,641,642,644,646,648,652,654,655,656,658,660,663,666,669,672,676,677,678,680,682,685,688,691,692,693,695,697,701,702,703,705,707,710,711,713,715,717,719,721,722,723,725,727,730,731,733,735,738,740,742,743,744,746,748,751,754,757,760,764,765,766,768,770,773,775,779,783,784,785,786,789,792,795,797,798,799,801,803,806,809,812,815,819,820,821,823,825,828,829,831,833,834,836,838,840,842,843,844,845,847,849,852,855,858,861,865,866,867,869,871,874,877,878,879,882,883,884,886,888,891,892,893,896,867,898,900,902,905,908,911,914,918,919,920,922,924,927,928,930,932,934,936,938,939,940,942,944,947,950,953,956,960,961,962,964,966,969,975,973,977,978,979,981,983,987,988,989,991,993,995,999,1003,1004,1005,1007,1009,1012,1015,1016,1017,1020,1021,1022,1024,1026,1029,1031,1035,1039,1040,1041,1043,1045,1048,1049,1051,1053,1055,1057,1059,1060,1061 for x in range(len(ATOMS)): for i in range(1, 60): print(""" S2CM ... NH_ATOMS=x,x+2 HEAVY_ATOMS=heavy_atoms_nh LABEL=S2nh-%d R_EFF=0.10 PREFACTOR_A=0.80 EXPONENT_B=1.0 OFFSET_C=0.10 N_I=1 NOPBC ... S2CM """ % (x,i)file=f) I am just learning python and Linux this summer as I am getting involved with computational biochemistry research, so if there is a simple fix I am very sorry for the waste of time, and I appreciate any and all time and attention to this matter. python3 --version Python 3.6.13 Thank You, David Cummins Masters Student at Western Washington University
Linux cat and less output for my text file is different from gedit and other gnome editor
I redirected ri's ruby Array doc into a file but it didn't look good in gedit. But text looks just fine in cli. That's how my file looks in terminal editors. Everything is fine here. = Array#to_param (from gem activesupport-5.1.3) ------------------------------------------------------------------------------ to_param() ------------------------------------------------------------------------------ Calls to_param on all its elements and joins the result with slashes. This is used by url_for in Action Pack. = Array#to_query (from gem activesupport-5.1.3) ------------------------------------------------------------------------------ to_query(key) ------------------------------------------------------------------------------ Converts an array into a string suitable for use as a URL query string, using the given key as the param name. ['Rails', 'coding'].to_query('hobbies') # => "hobbies%5B%5D=Rails&hobbies%5B%5D=coding" = Array#to_s (from ruby core) ------------------------------------------------------------------------------ to_s() ------------------------------------------------------------------------------ (from gem activesupport-5.1.3) ------------------------------------------------------------------------------ to_s(format = :default) ------------------------------------------------------------------------------ = Array#to_sentence (from gem activesupport-5.1.3) ------------------------------------------------------------------------------ to_sentence(options = {}) ------------------------------------------------------------------------------ But when I open it in gedit or other gnome editors, that's how it looks like. Some specific words look in absurd format. Any suggestions or help will be appreciated. = AArrrraayy##ttoo__ffoorrmmaatttteedd__ss (from gem activesupport-5.1.3) ------------------------------------------------------------------------------ to_formatted_s(format = :default) ------------------------------------------------------------------------------ Extends Array#to_s to convert a collection of elements into a comma separated id list if :db argument is given as the format. Blog.all.to_formatted_s(:db) # => "1,2,3" Blog.none.to_formatted_s(:db) # => "null" [1,2].to_formatted_s # => "[1, 2]" = AArrrraayy##ttoo__ppaarraamm (from gem activesupport-5.1.3) ------------------------------------------------------------------------------ to_param() ------------------------------------------------------------------------------ Calls to_param on all its elements and joins the result with slashes. This is used by url_for in Action Pack. = AArrrraayy##ttoo__qquueerryy (from gem activesupport-5.1.3) ------------------------------------------------------------------------------ to_query(key) ------------------------------------------------------------------------------ Converts an array into a string suitable for use as a URL query string, using the given key as the param name. ['Rails', 'coding'].to_query('hobbies') # => "hobbies%5B%5D=Rails&hobbies%5B%5D=coding" = AArrrraayy##ttoo__ss (from ruby core) ------------------------------------------------------------------------------ to_s() ------------------------------------------------------------------------------ (from gem activesupport-5.1.3) ------------------------------------------------------------------------------ to_s(format = :default) ------------------------------------------------------------------------------ = AArrrraayy##ttoo__sseenntteennccee (from gem activesupport-5.1.3) ------------------------------------------------------------------------------ to_sentence(options = {}) ------------------------------------------------------------------------------ I tried opening it into every system. but text is still messed up everywhere except terminal editors like cat or less. Does it have anything to do with text encoding?
ri outputs documentation formatted as ASCII text, with overstriking. You can check this by running the file command on your file. Some parts of the documentation are bolded, which is represented by some character, backspace (^H) and the first character again. It seems that gedit and other gnome editors ignore these backspace characters, leaving the actual character repeated. You can output just the ASCII like this: ri Array | col -bx > array.txt An answer with more information about nroff formatting: https://unix.stackexchange.com/a/274795
Behave: Writing a Scenario Outline with dynamic examples
Gherkin / Behave Examples Gherkin syntax features test automation using examples: Feature: Scenario Outline (tutorial04) Scenario Outline: Use Blender with <thing> Given I put "<thing>" in a blender When I switch the blender on Then it should transform into "<other thing>" Examples: Amphibians | thing | other thing | | Red Tree Frog | mush | | apples | apple juice | Examples: Consumer Electronics | thing | other thing | | iPhone | toxic waste | | Galaxy Nexus | toxic waste | The test suite would run four times, once for each example, giving a result similar to: My problem How can I test using confidential data in the Examples section? For example, I would like to test an internal API with user ids or SSN numbers, without keeping the data hard coded in the feature file. Is there a way to load the Examples dynamically from an external source? Update: Opened a github issue on the behave project.
I've come up with another solution (behave-1.2.6): I managed to dynamically create examples for a Scenario Outline by using before_feature. Given a feature file (x.feature): Feature: Verify squared numbers Scenario Outline: Verify square for <number> Then the <number> squared is <result> Examples: Static | number | result | | 1 | 1 | | 2 | 4 | | 3 | 9 | | 4 | 16 | # Use the tag to mark this outline #dynamic Scenario Outline: Verify square for <number> Then the <number> squared is <result> Examples: Dynamic | number | result | | . | . | And the steps file (steps/x.step): from behave import step #step('the {number:d} squared is {result:d}') def step_impl(context, number, result): assert number*number == result The trick is to use before_feature in environment.py as it has already parsed the examples tables to the scenario outlines, but hasn't generated the scenarios from the outline yet. import behave import copy def before_feature(context, feature): features = (s for s in feature.scenarios if type(s) == behave.model.ScenarioOutline and 'dynamic' in s.tags) for s in features: for e in s.examples: orig = copy.deepcopy(e.table.rows[0]) e.table.rows = [] for num in range(1,5): n = copy.deepcopy(orig) # This relies on knowing that the table has two rows. n.cells = ['{}'.format(num), '{}'.format(num*num)] e.table.rows.append(n) This will only operate on Scenario Outlines that are tagged with #dynamic. The result is: behave -k --no-capture Feature: Verify squared numbers # features/x.feature:1 Scenario Outline: Verify square for 1 -- #1.1 Static # features/x.feature:8 Then the 1 squared is 1 # features/steps/x.py:3 Scenario Outline: Verify square for 2 -- #1.2 Static # features/x.feature:9 Then the 2 squared is 4 # features/steps/x.py:3 Scenario Outline: Verify square for 3 -- #1.3 Static # features/x.feature:10 Then the 3 squared is 9 # features/steps/x.py:3 Scenario Outline: Verify square for 4 -- #1.4 Static # features/x.feature:11 Then the 4 squared is 16 # features/steps/x.py:3 #dynamic Scenario Outline: Verify square for 1 -- #1.1 Dynamic # features/x.feature:19 Then the 1 squared is 1 # features/steps/x.py:3 #dynamic Scenario Outline: Verify square for 2 -- #1.2 Dynamic # features/x.feature:19 Then the 2 squared is 4 # features/steps/x.py:3 #dynamic Scenario Outline: Verify square for 3 -- #1.3 Dynamic # features/x.feature:19 Then the 3 squared is 9 # features/steps/x.py:3 #dynamic Scenario Outline: Verify square for 4 -- #1.4 Dynamic # features/x.feature:19 Then the 4 squared is 16 # features/steps/x.py:3 1 feature passed, 0 failed, 0 skipped 8 scenarios passed, 0 failed, 0 skipped 8 steps passed, 0 failed, 0 skipped, 0 undefined Took 0m0.005s This relies on having an Examples table with the correct shape as the final table, in my example, with two rows. I also don't fuss with creating new behave.model.Row objects, I just copy the one from the table and update it. For extra ugliness, if you're using a file, you can put the file name in the Examples table.
Got here looking for something else, but since I've been in similar situation with Cucumber before, maybe someone will also end up at this question, looking for a possible solution. My approach to this problem is to use BDD variables that I can later handle at runtime in my step_definitions. In my python code I can check what is the value of the Gherkin variable and map it to what's needed. For this example: Scenario Outline: Use Blender with <thing> Given I put "<thing>" in a blender When I switch the blender on Then it should transform into "<other thing>" Examples: Amphibians | thing | other thing | | Red Tree Frog | mush | | iPhone | data.iPhone.secret_key | # can use .yaml syntax here as well Would translate to such step_def code: #given('I put "{thing}" in a blender') def step_then_should_transform_into(context, other_thing): if other_thing == BddVariablesEnum.SECRET_KEY: basic_actions.load_secrets(context, key) So all you have to do is to have well defined DSL layer.
Regarding the issue of using SSN numbers in testing, I'd just use fake SSNs and not worry that I'm leaking people's private information. Ok, but what about the larger issue? You want to use a scenario outline with examples that you cannot put in your feature file. Whenever I've run into this problem what I did was to give a description of the data I need and let the step implementation either create the actual data set used for testing or fetch the data set from an existing test database. Scenario Outline: Accessing the admin interface Given a user who <status> an admin has logged in Then the user <ability> see the admin interface Examples: Users | status | ability | | is | can | | is not | cannot | There's no need to show any details about the user in the feature file. The step implementation is responsible for either creating or fetching the appropriate type of user depending on the value of status.
Create Bayesian Network and learn parameters with Python3.x [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations. Closed 2 years ago. Improve this question I'm searching for the most appropriate tool for python3.x on Windows to create a Bayesian Network, learn its parameters from data and perform the inference. The network structure I want to define myself as follows: It is taken from this paper. All the variables are discrete (and can take only 2 possible states) except "Size" and "GraspPose", which are continuous and should be modeled as Mixture of Gaussians. Authors use Expectation-Maximization algorithm to learn the parameters for conditional probability tables and Junction-Tree algorithm to compute the exact inference. As I understand all is realised in MatLab with Bayes Net Toolbox by Murphy. I tried to search something similar in python and here are my results: Python Bayesian Network Toolbox http://sourceforge.net/projects/pbnt.berlios/ (http://pbnt.berlios.de/). Web-site doesn't work, project doesn't seem to be supported. BayesPy https://github.com/bayespy/bayespy I think this is what I actually need, but I fail to find some examples similar to my case, to understand how to approach construction of the network structure. PyMC seems to be a powerful module, but I have problems with importing it on Windows 64, python 3.3. I get error when I install development version WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string. UPDATE: libpgm (http://pythonhosted.org/libpgm/). Exactly what I need, unfortunately not supported by python 3.x Very interesting actively developing library: PGMPY. Unfortunately continuous variables and learning from data is not supported yet. https://github.com/pgmpy/pgmpy/ Any advices and concrete examples will be highly appreciated.
It looks like pomegranate was recently updated to include Bayesian Networks. I haven't tried it myself, but the interface looks nice and sklearn-ish.
Try the bnlearn library, it contains many functions to learn parameters from data and perform the inference. pip install bnlearn Your use-case would be like this: # Import the library import bnlearn # Define the network structure edges = [('task', 'size'), ('lat var', 'size'), ('task', 'fill level'), ('task', 'object shape'), ('task', 'side graspable'), ('size', 'GrasPose'), ('task', 'GrasPose'), ('fill level', 'GrasPose'), ('object shape', 'GrasPose'), ('side graspable', 'GrasPose'), ('GrasPose', 'latvar'), ] # Make the actual Bayesian DAG DAG = bnlearn.make_DAG(edges) # DAG is stored in adjacency matrix print(DAG['adjmat']) # target task size lat var ... side graspable GrasPose latvar # source ... # task False True False ... True True False # size False False False ... False True False # lat var False True False ... False False False # fill level False False False ... False True False # object shape False False False ... False True False # side graspable False False False ... False True False # GrasPose False False False ... False False True # latvar False False False ... False False False # # [8 rows x 8 columns] # No CPDs are in the DAG. Lets see what happens if we print it. bnlearn.print_CPD(DAG) # >[BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot. # Plot DAG. Note that it can be differently orientated if you re-make the plot. bnlearn.plot(DAG) Now we need the data to learn its parameters. Suppose these are stored in your df. The variable names in the data-file must be present in the DAG. # Read data df = pd.read_csv('path_to_your_data.csv') # Learn the parameters and store CPDs in the DAG. Use the methodtype your desire. Options are maximumlikelihood or bayes. DAG = bnlearn.parameter_learning.fit(DAG, df, methodtype='maximumlikelihood') # CPDs are present in the DAG at this point. bnlearn.print_CPD(DAG) # Start making inferences now. As an example: q1 = bnlearn.inference.fit(DAG, variables=['lat var'], evidence={'fill level':1, 'size':0, 'task':1}) Below is a working example with a demo dataset (sprinkler). You can play around with this. # Import example dataset df = bnlearn.import_example('sprinkler') print(df) # Cloudy Sprinkler Rain Wet_Grass # 0 0 0 0 0 # 1 1 0 1 1 # 2 0 1 0 1 # 3 1 1 1 1 # 4 1 1 1 1 # .. ... ... ... ... # 995 1 0 1 1 # 996 1 0 1 1 # 997 1 0 1 1 # 998 0 0 0 0 # 999 0 1 1 1 # [1000 rows x 4 columns] # Define the network structure edges = [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')] # Make the actual Bayesian DAG DAG = bnlearn.make_DAG(edges) # Print the CPDs bnlearn.print_CPD(DAG) # [BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot. # Plot the DAG bnlearn.plot(DAG) # Parameter learning on the user-defined DAG and input data DAG = bnlearn.parameter_learning.fit(DAG, df) # Print the learned CPDs bnlearn.print_CPD(DAG) # [BNLEARN.print_CPD] Independencies: # (Cloudy _|_ Wet_Grass | Rain, Sprinkler) # (Sprinkler _|_ Rain | Cloudy) # (Rain _|_ Sprinkler | Cloudy) # (Wet_Grass _|_ Cloudy | Rain, Sprinkler) # [BNLEARN.print_CPD] Nodes: ['Cloudy', 'Sprinkler', 'Rain', 'Wet_Grass'] # [BNLEARN.print_CPD] Edges: [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')] # CPD of Cloudy: # +-----------+-------+ # | Cloudy(0) | 0.494 | # +-----------+-------+ # | Cloudy(1) | 0.506 | # +-----------+-------+ # CPD of Sprinkler: # +--------------+--------------------+--------------------+ # | Cloudy | Cloudy(0) | Cloudy(1) | # +--------------+--------------------+--------------------+ # | Sprinkler(0) | 0.4807692307692308 | 0.7075098814229249 | # +--------------+--------------------+--------------------+ # | Sprinkler(1) | 0.5192307692307693 | 0.2924901185770751 | # +--------------+--------------------+--------------------+ # CPD of Rain: # +---------+--------------------+---------------------+ # | Cloudy | Cloudy(0) | Cloudy(1) | # +---------+--------------------+---------------------+ # | Rain(0) | 0.6518218623481782 | 0.33695652173913043 | # +---------+--------------------+---------------------+ # | Rain(1) | 0.3481781376518219 | 0.6630434782608695 | # +---------+--------------------+---------------------+ # CPD of Wet_Grass: # +--------------+--------------------+---------------------+---------------------+---------------------+ # | Rain | Rain(0) | Rain(0) | Rain(1) | Rain(1) | # +--------------+--------------------+---------------------+---------------------+---------------------+ # | Sprinkler | Sprinkler(0) | Sprinkler(1) | Sprinkler(0) | Sprinkler(1) | # +--------------+--------------------+---------------------+---------------------+---------------------+ # | Wet_Grass(0) | 0.7553816046966731 | 0.33755274261603374 | 0.25588235294117645 | 0.37910447761194027 | # +--------------+--------------------+---------------------+---------------------+---------------------+ # | Wet_Grass(1) | 0.2446183953033268 | 0.6624472573839663 | 0.7441176470588236 | 0.6208955223880597 | # +--------------+--------------------+---------------------+---------------------+---------------------+ # Make inference q1 = bnlearn.inference.fit(DAG, variables=['Wet_Grass'], evidence={'Rain':1, 'Sprinkler':0, 'Cloudy':1}) # +--------------+------------------+ # | Wet_Grass | phi(Wet_Grass) | # +==============+==================+ # | Wet_Grass(0) | 0.2559 | # +--------------+------------------+ # | Wet_Grass(1) | 0.7441 | # +--------------+------------------+ print(q1.values) # array([0.25588235, 0.74411765]) More examples can be found on documentation the pages of bnlearn or read the blog.
I was looking for a similar library, and I found that the pomegranate is a good one. Thanks James Atwood Here is an example how to use it. from pomegranate import * import numpy as np mydb=np.array([[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,3,8],[2,3,8],[1,2,4]]) bnet = BayesianNetwork.from_samples(mydb) print(bnet.node_count()) print(bnet.probability([[1,2,3]])) print (bnet.probability([[1,2,8]]))
For pymc's g++ problem, I highly recommend to get g++ installation done, it would hugely boost the sampling process, otherwise you will have to live with this warning and sit there for 1 hour for a 2000 sampling process. The way to get the warning fixed is: 1. get g++ installed, download cywing and get g++ install, you can google that. To check this, just go to "cmd" and type "g++", if it says "require input file", great, you got g++ installed. 2. install python package: mingw, libpython 3. install python package: theano this should get this problem fixed. I am currently working on the same problem with you, good luck!
Late to the party, as always, but I've wrapped up the BayesServer Java API using JPype; it might not have all the functionality that you need but you would create the above network using something like: from bayesianpy.network import Builder as builder import bayesianpy.network nt = bayesianpy.network.create_network() # where df is your dataframe task = builder.create_discrete_variable(nt, df, 'task') size = builder.create_continuous_variable(nt, 'size') grasp_pose = builder.create_continuous_variable(nt, 'GraspPose') builder.create_link(nt, size, grasp_pose) builder.create_link(nt, task, grasp_pose) for v in ['fill level', 'object shape', 'side graspable']: va = builder.create_discrete_variable(nt, df, v) builder.create_link(nt, va, grasp_pose) builder.create_link(nt, task, va) # write df to data store with bayesianpy.data.DataSet(df, bayesianpy.utils.get_path_to_parent_dir(__file__), logger) as dataset: model = bayesianpy.model.NetworkModel(nt, logger) model.train(dataset) # to query model multi-threaded results = model.batch_query(dataset, [bayesianpy.model.QueryModelStatistics()], append_to_df=False) I'm not affiliated with Bayes Server - and the Python wrapper is not 'official' (you can use the Java API via Python directly). My wrapper makes some assumptions and places limitations on functions that I don't use very much. The repo is here: github.com/morganics/bayesianpy