Varnish histogram showing late hits - varnish

Is there an explanation for having so many hits so late ?
Here it's just after a varnish reload (debian 10 varnish 6.1.1)
1:10, n = 1485, d = 1
|
200_ |
|
||
||
||
150_ ||
||
||#
| #|# #
| | ### #
100_ | | #### #
| | #### #
|||| #### ##
|||| #### ##
|||| #### ##
50_ ||||| #### ##
||||| #### |##
||||| #### ###
||||| # #### ###
||||||## |####|###
+-------+-------+-------+-------+-------+-------+-------+-------+-------
|1e-6 |1e-5 |1e-4 |1e-3 |1e-2 |1e-1 |1e0 |1e1 |1e2
Having 2 or 3 happens, I assumed it's when receiving requests on same url while a backend fetch is happening. Is that right?
I'm suspecting could be actually be grace hits period and indicating Hit at the time of completed background fetch (is_bgfetch). Is it true?
But so many I never seen before.
I can't spot any hits that took more then 4ms (got nginx logging this at frontend) so I shouldn't be too worried, but this happens just after I did a big big refactoring of a complex setup so I'm freeking out ö
Tx

The default profile is responsetime, so it just looks like you are seeing a lot of misses, and are going to the backend a lot.
By default, the grouping is request, and grace is going to trigger an asynchronous fetch that will not delay the Process time, so it can't be that.

Related

Linux memory allocation on Rstudio/Rstudio Server

I am trying to do clustering with CLARA using Rstudio on Linux and I have a very large dataset.
However, it seemed that the memory is not enough for the whole dataset?
## Estimating the number of clusters ----
fviz_nbclust(df, clara, method = "silhouette", k.max = 15)
It showed me this:
Error: cannot allocate vector of size 339.8 GB
So I tried all of this and it still didn't work. memory.limit is also specific for Windows only (I still gave it a try tho).
# devtools::install_github("krlmlr/ulimit")
# gc()
# memory.limit(9999999999)
#
#
# install.packages("devtools", dependencies = TRUE)
# devtools::install_github("krlmlr/ulimit")
# ulimit::memory_limit(2000)
#
# devtools::install_github("jeroen/unix")
#
#
# if(.Platform$OS.type == "windows") withAutoprint({
# memory.size()
# memory.size(TRUE)
# memory.limit()
# })
# memory.limit(size=56000)
# memory.size(max = FALSE)
Can somebody help me with this?
Any help would be appreciated!
The error simply means that it cannot allocate 339.8 GB to your RAM. Do you have 360GB of RAM?
If not, you will just have to dplyr::nsample() and just run the function on a subset of your dataset.

NEO M8T with RTKRCV on Raspberry Pi 3

I am currently doing a university project and my goal is to find the position of a rover in real time.
My set up is the following: 2x NEO M8T boards connected to a Raspberry Pi 3 (updated to latest version GNU/Linux 8).
The reason that they are both connected to the Pi is that I am not sure that my SiK telemetry transmits anything as even in RTK Navi on a laptop I don't get base station data. (the radios are matched)
The M8Ts are set to 115000 baud rate by using u-center (latest version). NMEA messages are turned off and UBX messages are turned on.
I installed the latest version of RTKLIB from tomojitakasu's github on the Pi. Ran make in the rtkrcv folder.
Ran chmod +x rtkstart.sh and chmod +x rtkshut.sh as it wanted permissions.
Started rtkrcv with sudo ./rtkrcv
I get "invalid option value pos-1snrmask" but the program still runs.
I run a conf file which I created but I DONT KNOW if it is correct.
It says "startup script ok" "rtk server start error" and thats it... nothing else.
The conf file I use is as following:
# RTKRCV options for RTK (2014/10/24, tyan)
console-passwd =admin
console-timetype =gpst # (0:gpst,1:utc,2:jst,3:tow)
console-soltype =dms # (0:dms,1:deg,2:xyz,3:enu,4:pyl)
console-solflag =1 # (0:off,1:std+2:age/ratio/ns)
## Specify connection type for Rover (1), Base (2) and Correction (3) streams
inpstr1-type =serial # (0:off,1:serial,2:file,3:tcpsvr,4:tcpcli,7:ntripcli,8:ftp,9:http)
inpstr2-type =serial # (0:off,1:serial,2:file,3:tcpsvr,4:tcpcli,7:ntripcli,8:ftp,9:http)
##inpstr3-type =serial # (0:off,1:serial,2:file,3:tcpsvr,4:tcpcli,7:ntripcli,8:ftp,9:http)
## Specify connection parameters for each stream
inpstr1-path = ttyACM0:115200:8n:1off
inpstr2-path = ttyACM1:115200:8n:1off
##inpstr3-path =
## Specify data format for each stream
inpstr1-format =ubx # (0:rtcm2,1:rtcm3,2:oem4,3:oem3,4:ubx,5:ss2,6:hemis,7:skytraq,8:sp3)
inpstr2-format =ubx # (0:rtcm2,1:rtcm3,2:oem4,3:oem3,4:ubx,5:ss2,6:hemis,7:skytraq,8:sp3)
##inpstr3-format = # (0:rtcm2,1:rtcm3,2:oem4,3:oem3,4:ubx,5:ss2,6:hemis,7:skytraq,8:sp3)
## Configure the NMEA string to send to get Base stream. Required for VRS.
inpstr2-nmeareq =off # (0:off,1:latlon,2:single)
inpstr2-nmealat =0 # (deg)
inpstr2-nmealon =0 # (deg)
## Configure where to send the solutions
outstr1-type =off # (0:off,1:serial,2:file,3:tcpsvr,4:tcpcli,6:ntripsvr)
outstr2-type =off # (0:off,1:serial,2:file,3:tcpsvr,4:tcpcli,6:ntripsvr)
## Specify here which stream contains the navigation message.
misc-navmsgsel =corr # (0:all,1:rover,1:base,2:corr)
misc-startcmd =./rtkstart.sh
misc-stopcmd =./rtkshut.sh
## Set the command file to send prior to requesting stream (if required)
file-cmdfile1 =/home/pi/rtklib/app/rtkrcv/gcc/m8t.cmd
file-cmdfile2 =/home/pi/rtklib/app/rtkrcv/gcc/m8t.cmd
## file-cmdfile3 =
pos1-posmode =static # (0:single,1:dgps,2:kinematic,3:static,4:movingbase,5:fixed,6:ppp-kine,7:ppp-static)
pos1-frequency =l1 # (1:l1,2:l1+l2,3:l1+l2+l5)
pos1-soltype =forward # (0:forward,1:backward,2:combined)
pos1-elmask =15 # (deg)
pos1-snrmask_L1 =0 # (dBHz)
pos1-dynamics =off # (0:off,1:on)
pos1-tidecorr =off # (0:off,1:on)
pos1-ionoopt =brdc # (0:off,1:brdc,2:sbas,3:dual-freq,4:est-stec)
pos1-tropopt =saas # (0:off,1:saas,2:sbas,3:est-ztd,4:est-ztdgrad)
pos1-sateph =brdc # (0:brdc,1:precise,2:brdc+sbas,3:brdc+ssrapc,4:brdc+ssrcom)
pos1-exclsats = # (prn ...)
## Set which GNSS to use. 1 is GPS only, 4 is GLONASS only. Add codes for multiple systems. Eg. (1+4)=5 is GPS+GLONASS.
pos1-navsys =7 # (1:gps+2:sbas+4:glo+8:gal+16:qzs+32:comp)
## Ambiguity Resolution mode, set to continuous to obtain fixed solutions
pos2-armode =fix-and-hold # (0:off,1:continuous,2:instantaneous,3:fix-and-hold)
pos2-gloarmode =off # (0:off,1:on,2:autocal)
pos2-arthres =3
pos2-arlockcnt =0
pos2-arelmask =0 # (deg)
pos2-aroutcnt =5
pos2-arminfix =10
pos2-slipthres =0.05 # (m)
pos2-maxage =30 # (s)
pos2-rejionno =30 # (m)
pos2-niter =1
pos2-baselen =0 # (m)
pos2-basesig =0 # (m)
out-solformat =llh # (0:llh,1:xyz,2:enu,3:nmea)
out-outhead =on # (0:off,1:on)
out-outopt =off # (0:off,1:on)
out-timesys =gpst # (0:gpst,1:utc,2:jst)
out-timeform =tow # (0:tow,1:hms)
out-timendec =3
out-degform =deg # (0:deg,1:dms)
out-fieldsep =
out-height =ellipsoidal # (0:ellipsoidal,1:geodetic)
out-geoid =internal # (0:internal,1:egm96,2:egm08_2.5,3:egm08_1,4:gsi2000)
out-solstatic =all # (0:all,1:single)
out-nmeaintv1 =0 # (s)
out-nmeaintv2 =0 # (s)
out-outstat =off # (0:off,1:state,2:residual)
stats-errratio =100
stats-errphase =0.003 # (m)
stats-errphaseel =0.003 # (m)
stats-errphasebl =0 # (m/10km)
stats-errdoppler =1 # (Hz)
stats-stdbias =30 # (m)
stats-stdiono =0.03 # (m)
stats-stdtrop =0.3 # (m)
stats-prnaccelh =1 # (m/s^2)
stats-prnaccelv =0.1 # (m/s^2)
stats-prnbias =0.0001 # (m)
stats-prniono =0.001 # (m)
stats-prntrop =0.0001 # (m)
stats-clkstab =5e-12 # (s/s)
ant1-postype =llh # (0:llh,1:xyz,2:single,3:posfile,4:rinexhead,5:rtcm)
ant1-pos1 =0 # (deg|m)
ant1-pos2 =0 # (deg|m)
ant1-pos3 =0 # (m|m)
ant1-anttype =
ant1-antdele =0 # (m)
ant1-antdeln =0 # (m)
ant1-antdelu
=0 # (m)
Please, help!!
Regards
Arnaudov

Behave: Writing a Scenario Outline with dynamic examples

Gherkin / Behave Examples
Gherkin syntax features test automation using examples:
Feature: Scenario Outline (tutorial04)
Scenario Outline: Use Blender with <thing>
Given I put "<thing>" in a blender
When I switch the blender on
Then it should transform into "<other thing>"
Examples: Amphibians
| thing | other thing |
| Red Tree Frog | mush |
| apples | apple juice |
Examples: Consumer Electronics
| thing | other thing |
| iPhone | toxic waste |
| Galaxy Nexus | toxic waste |
The test suite would run four times, once for each example, giving a result similar to:
My problem
How can I test using confidential data in the Examples section? For example, I would like to test an internal API with user ids or SSN numbers, without keeping the data hard coded in the feature file.
Is there a way to load the Examples dynamically from an external source?
Update: Opened a github issue on the behave project.
I've come up with another solution (behave-1.2.6):
I managed to dynamically create examples for a Scenario Outline by using before_feature.
Given a feature file (x.feature):
Feature: Verify squared numbers
Scenario Outline: Verify square for <number>
Then the <number> squared is <result>
Examples: Static
| number | result |
| 1 | 1 |
| 2 | 4 |
| 3 | 9 |
| 4 | 16 |
# Use the tag to mark this outline
#dynamic
Scenario Outline: Verify square for <number>
Then the <number> squared is <result>
Examples: Dynamic
| number | result |
| . | . |
And the steps file (steps/x.step):
from behave import step
#step('the {number:d} squared is {result:d}')
def step_impl(context, number, result):
assert number*number == result
The trick is to use before_feature in environment.py as it has already parsed the examples tables to the scenario outlines, but hasn't generated the scenarios from the outline yet.
import behave
import copy
def before_feature(context, feature):
features = (s for s in feature.scenarios if type(s) == behave.model.ScenarioOutline and
'dynamic' in s.tags)
for s in features:
for e in s.examples:
orig = copy.deepcopy(e.table.rows[0])
e.table.rows = []
for num in range(1,5):
n = copy.deepcopy(orig)
# This relies on knowing that the table has two rows.
n.cells = ['{}'.format(num), '{}'.format(num*num)]
e.table.rows.append(n)
This will only operate on Scenario Outlines that are tagged with #dynamic.
The result is:
behave -k --no-capture
Feature: Verify squared numbers # features/x.feature:1
Scenario Outline: Verify square for 1 -- #1.1 Static # features/x.feature:8
Then the 1 squared is 1 # features/steps/x.py:3
Scenario Outline: Verify square for 2 -- #1.2 Static # features/x.feature:9
Then the 2 squared is 4 # features/steps/x.py:3
Scenario Outline: Verify square for 3 -- #1.3 Static # features/x.feature:10
Then the 3 squared is 9 # features/steps/x.py:3
Scenario Outline: Verify square for 4 -- #1.4 Static # features/x.feature:11
Then the 4 squared is 16 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 1 -- #1.1 Dynamic # features/x.feature:19
Then the 1 squared is 1 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 2 -- #1.2 Dynamic # features/x.feature:19
Then the 2 squared is 4 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 3 -- #1.3 Dynamic # features/x.feature:19
Then the 3 squared is 9 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 4 -- #1.4 Dynamic # features/x.feature:19
Then the 4 squared is 16 # features/steps/x.py:3
1 feature passed, 0 failed, 0 skipped
8 scenarios passed, 0 failed, 0 skipped
8 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.005s
This relies on having an Examples table with the correct shape as the final table, in my example, with two rows. I also don't fuss with creating new behave.model.Row objects, I just copy the one from the table and update it. For extra ugliness, if you're using a file, you can put the file name in the Examples table.
Got here looking for something else, but since I've been in similar situation with Cucumber before, maybe someone will also end up at this question, looking for a possible solution. My approach to this problem is to use BDD variables that I can later handle at runtime in my step_definitions. In my python code I can check what is the value of the Gherkin variable and map it to what's needed.
For this example:
Scenario Outline: Use Blender with <thing>
Given I put "<thing>" in a blender
When I switch the blender on
Then it should transform into "<other thing>"
Examples: Amphibians
| thing | other thing |
| Red Tree Frog | mush |
| iPhone | data.iPhone.secret_key | # can use .yaml syntax here as well
Would translate to such step_def code:
#given('I put "{thing}" in a blender')
def step_then_should_transform_into(context, other_thing):
if other_thing == BddVariablesEnum.SECRET_KEY:
basic_actions.load_secrets(context, key)
So all you have to do is to have well defined DSL layer.
Regarding the issue of using SSN numbers in testing, I'd just use fake SSNs and not worry that I'm leaking people's private information.
Ok, but what about the larger issue? You want to use a scenario outline with examples that you cannot put in your feature file. Whenever I've run into this problem what I did was to give a description of the data I need and let the step implementation either create the actual data set used for testing or fetch the data set from an existing test database.
Scenario Outline: Accessing the admin interface
Given a user who <status> an admin has logged in
Then the user <ability> see the admin interface
Examples: Users
| status | ability |
| is | can |
| is not | cannot |
There's no need to show any details about the user in the feature file. The step implementation is responsible for either creating or fetching the appropriate type of user depending on the value of status.

Mysql seconds_behind master very high

Hi we have mysql master slave replication, master is mysql 5.6 and slave is mysql 5.7, seconds behind master is 245000, how I make it catch up faster. Right now it is taking more than 6 hours to copy 100 000 seconds.
My slave ram is 128 GB. Below is my my.cnf
[mysqld]
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
innodb_buffer_pool_size = 110G
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
# These are commonly set, remove the # and set as required.
basedir = /usr/local/mysql
datadir = /disk1/mysqldata
port = 3306
#server_id = 3
socket = /var/run/mysqld/mysqld.sock
user=mysql
log_error = /var/log/mysql/error.log
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
join_buffer_size = 256M
sort_buffer_size = 128M
read_rnd_buffer_size = 2M
#copied from old config
#key_buffer = 16M
max_allowed_packet = 256M
thread_stack = 192K
thread_cache_size = 8
query_cache_limit = 1M
#disabling query_cache_size and type, for replication purpose, need to enable it when going live
query_cache_size = 0
#query_cache_size = 64M
#query_cache_type = 1
query_cache_type = OFF
#GroupBy
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
#sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
enforce-gtid-consistency
gtid-mode = ON
log_slave_updates=0
slave_transaction_retries = 100
#replication related changes
server-id = 2
relay-log = /disk1/mysqllog/mysql-relay-bin.log
log_bin = /disk1/mysqllog/binlog/mysql-bin.log
binlog_do_db = brandmanagement
#replicate_wild_do_table=brandmanagement.%
replicate-wild-ignore-table=brandmanagement.t\_gnip\_data\_recent
replicate-wild-ignore-table=brandmanagement.t\_gnip\_data
replicate-wild-ignore-table=brandmanagement.t\_fb\_rt\_data
replicate-wild-ignore-table=brandmanagement.t\_keyword\_tweets
replicate-wild-ignore-table=brandmanagement.t\_gnip\_data\_old
replicate-wild-ignore-table=brandmanagement.t\_gnip\_data\_new
binlog_format=row
report-host=10.125.133.220
report-port=3306
#sync-master-info=1
read-only=1
net_read_timeout = 7200
net_write_timeout = 7200
innodb_flush_log_at_trx_commit = 2
sync_binlog=0
sync_relay_log_info=0
max_relay_log_size=268435456
Lots of possible solutions. But I'll go with the simplest one. Have you got enough network bandwidth to send all changes over the network? You're using "row" binlog, which may be good in case of random, unindexed updates. But if you're changing a lot of data using indexes only, then "mixed" binlog may be better.

Create Bayesian Network and learn parameters with Python3.x [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I'm searching for the most appropriate tool for python3.x on Windows to create a Bayesian Network, learn its parameters from data and perform the inference.
The network structure I want to define myself as follows:
It is taken from this paper.
All the variables are discrete (and can take only 2 possible states) except "Size" and "GraspPose", which are continuous and should be modeled as Mixture of Gaussians.
Authors use Expectation-Maximization algorithm to learn the parameters for conditional probability tables and Junction-Tree algorithm to compute the exact inference.
As I understand all is realised in MatLab with Bayes Net Toolbox by Murphy.
I tried to search something similar in python and here are my results:
Python Bayesian Network Toolbox http://sourceforge.net/projects/pbnt.berlios/ (http://pbnt.berlios.de/). Web-site doesn't work, project doesn't seem to be supported.
BayesPy https://github.com/bayespy/bayespy
I think this is what I actually need, but I fail to find some examples similar to my case, to understand how to approach construction of the network structure.
PyMC seems to be a powerful module, but I have problems with importing it on Windows 64, python 3.3. I get error when I install development version
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
UPDATE:
libpgm (http://pythonhosted.org/libpgm/). Exactly what I need, unfortunately not supported by python 3.x
Very interesting actively developing library: PGMPY. Unfortunately continuous variables and learning from data is not supported yet. https://github.com/pgmpy/pgmpy/
Any advices and concrete examples will be highly appreciated.
It looks like pomegranate was recently updated to include Bayesian Networks. I haven't tried it myself, but the interface looks nice and sklearn-ish.
Try the bnlearn library, it contains many functions to learn parameters from data and perform the inference.
pip install bnlearn
Your use-case would be like this:
# Import the library
import bnlearn
# Define the network structure
edges = [('task', 'size'),
('lat var', 'size'),
('task', 'fill level'),
('task', 'object shape'),
('task', 'side graspable'),
('size', 'GrasPose'),
('task', 'GrasPose'),
('fill level', 'GrasPose'),
('object shape', 'GrasPose'),
('side graspable', 'GrasPose'),
('GrasPose', 'latvar'),
]
# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# DAG is stored in adjacency matrix
print(DAG['adjmat'])
# target task size lat var ... side graspable GrasPose latvar
# source ...
# task False True False ... True True False
# size False False False ... False True False
# lat var False True False ... False False False
# fill level False False False ... False True False
# object shape False False False ... False True False
# side graspable False False False ... False True False
# GrasPose False False False ... False False True
# latvar False False False ... False False False
#
# [8 rows x 8 columns]
# No CPDs are in the DAG. Lets see what happens if we print it.
bnlearn.print_CPD(DAG)
# >[BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
# Plot DAG. Note that it can be differently orientated if you re-make the plot.
bnlearn.plot(DAG)
Now we need the data to learn its parameters. Suppose these are stored in your df. The variable names in the data-file must be present in the DAG.
# Read data
df = pd.read_csv('path_to_your_data.csv')
# Learn the parameters and store CPDs in the DAG. Use the methodtype your desire. Options are maximumlikelihood or bayes.
DAG = bnlearn.parameter_learning.fit(DAG, df, methodtype='maximumlikelihood')
# CPDs are present in the DAG at this point.
bnlearn.print_CPD(DAG)
# Start making inferences now. As an example:
q1 = bnlearn.inference.fit(DAG, variables=['lat var'], evidence={'fill level':1, 'size':0, 'task':1})
Below is a working example with a demo dataset (sprinkler). You can play around with this.
# Import example dataset
df = bnlearn.import_example('sprinkler')
print(df)
# Cloudy Sprinkler Rain Wet_Grass
# 0 0 0 0 0
# 1 1 0 1 1
# 2 0 1 0 1
# 3 1 1 1 1
# 4 1 1 1 1
# .. ... ... ... ...
# 995 1 0 1 1
# 996 1 0 1 1
# 997 1 0 1 1
# 998 0 0 0 0
# 999 0 1 1 1
# [1000 rows x 4 columns]
# Define the network structure
edges = [('Cloudy', 'Sprinkler'),
('Cloudy', 'Rain'),
('Sprinkler', 'Wet_Grass'),
('Rain', 'Wet_Grass')]
# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# Print the CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
# Plot the DAG
bnlearn.plot(DAG)
# Parameter learning on the user-defined DAG and input data
DAG = bnlearn.parameter_learning.fit(DAG, df)
# Print the learned CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] Independencies:
# (Cloudy _|_ Wet_Grass | Rain, Sprinkler)
# (Sprinkler _|_ Rain | Cloudy)
# (Rain _|_ Sprinkler | Cloudy)
# (Wet_Grass _|_ Cloudy | Rain, Sprinkler)
# [BNLEARN.print_CPD] Nodes: ['Cloudy', 'Sprinkler', 'Rain', 'Wet_Grass']
# [BNLEARN.print_CPD] Edges: [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')]
# CPD of Cloudy:
# +-----------+-------+
# | Cloudy(0) | 0.494 |
# +-----------+-------+
# | Cloudy(1) | 0.506 |
# +-----------+-------+
# CPD of Sprinkler:
# +--------------+--------------------+--------------------+
# | Cloudy | Cloudy(0) | Cloudy(1) |
# +--------------+--------------------+--------------------+
# | Sprinkler(0) | 0.4807692307692308 | 0.7075098814229249 |
# +--------------+--------------------+--------------------+
# | Sprinkler(1) | 0.5192307692307693 | 0.2924901185770751 |
# +--------------+--------------------+--------------------+
# CPD of Rain:
# +---------+--------------------+---------------------+
# | Cloudy | Cloudy(0) | Cloudy(1) |
# +---------+--------------------+---------------------+
# | Rain(0) | 0.6518218623481782 | 0.33695652173913043 |
# +---------+--------------------+---------------------+
# | Rain(1) | 0.3481781376518219 | 0.6630434782608695 |
# +---------+--------------------+---------------------+
# CPD of Wet_Grass:
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Rain | Rain(0) | Rain(0) | Rain(1) | Rain(1) |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Sprinkler | Sprinkler(0) | Sprinkler(1) | Sprinkler(0) | Sprinkler(1) |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(0) | 0.7553816046966731 | 0.33755274261603374 | 0.25588235294117645 | 0.37910447761194027 |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(1) | 0.2446183953033268 | 0.6624472573839663 | 0.7441176470588236 | 0.6208955223880597 |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# Make inference
q1 = bnlearn.inference.fit(DAG, variables=['Wet_Grass'], evidence={'Rain':1, 'Sprinkler':0, 'Cloudy':1})
# +--------------+------------------+
# | Wet_Grass | phi(Wet_Grass) |
# +==============+==================+
# | Wet_Grass(0) | 0.2559 |
# +--------------+------------------+
# | Wet_Grass(1) | 0.7441 |
# +--------------+------------------+
print(q1.values)
# array([0.25588235, 0.74411765])
More examples can be found on documentation the pages of bnlearn or read the blog.
I was looking for a similar library, and I found that the pomegranate is a good one. Thanks James Atwood
Here is an example how to use it.
from pomegranate import *
import numpy as np
mydb=np.array([[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,3,8],[2,3,8],[1,2,4]])
bnet = BayesianNetwork.from_samples(mydb)
print(bnet.node_count())
print(bnet.probability([[1,2,3]]))
print (bnet.probability([[1,2,8]]))
For pymc's g++ problem, I highly recommend to get g++ installation done, it would hugely boost the sampling process, otherwise you will have to live with this warning and sit there for 1 hour for a 2000 sampling process.
The way to get the warning fixed is:
1. get g++ installed, download cywing and get g++ install, you can google that. To check this, just go to "cmd" and type "g++", if it says "require input file", great, you got g++ installed.
2. install python package: mingw, libpython
3. install python package: theano
this should get this problem fixed.
I am currently working on the same problem with you, good luck!
Late to the party, as always, but I've wrapped up the BayesServer Java API using JPype; it might not have all the functionality that you need but you would create the above network using something like:
from bayesianpy.network import Builder as builder
import bayesianpy.network
nt = bayesianpy.network.create_network()
# where df is your dataframe
task = builder.create_discrete_variable(nt, df, 'task')
size = builder.create_continuous_variable(nt, 'size')
grasp_pose = builder.create_continuous_variable(nt, 'GraspPose')
builder.create_link(nt, size, grasp_pose)
builder.create_link(nt, task, grasp_pose)
for v in ['fill level', 'object shape', 'side graspable']:
va = builder.create_discrete_variable(nt, df, v)
builder.create_link(nt, va, grasp_pose)
builder.create_link(nt, task, va)
# write df to data store
with bayesianpy.data.DataSet(df, bayesianpy.utils.get_path_to_parent_dir(__file__), logger) as dataset:
model = bayesianpy.model.NetworkModel(nt, logger)
model.train(dataset)
# to query model multi-threaded
results = model.batch_query(dataset, [bayesianpy.model.QueryModelStatistics()], append_to_df=False)
I'm not affiliated with Bayes Server - and the Python wrapper is not 'official' (you can use the Java API via Python directly). My wrapper makes some assumptions and places limitations on functions that I don't use very much. The repo is here: github.com/morganics/bayesianpy

Resources