I have a large number of priors that can be put into a dictionary. For the sake of simplicity, let's use the following example containing only 3 priors:
d = {'a1':{'name':'a1','lower':0,'upper':10},\
'a2':{'name':'a2','lower':0,'upper':10},\
'a3':{'name':'a3','lower':0,'upper':10}}
I can create these variables manually with:
import pymc3
model = pymc3.Model()
with model:
a1 = pymc3.Uniform('a1',lower=0,upper=10)
a2 = pymc3.Uniform('a2',lower=0,upper=10)
a3 = pymc3.Uniform('a3',lower=0,upper=10)
But the number of priors that I have makes this approach painful. Is there a proper way to define the priors from a dictionary in pymc3? So far, the only automatic solution that I've found is:
list_prior_names = ['a1','a2','a3']
for prior_name in list_prior_names:
exec(prior_name+"=pymc3.Uniform(prior_name,lower=d[prior_name]['lower'],upper=d[prior_name]['upper'])")
Is there a better way to proceed?
Similarly, I have a dictionary that gives the relations between these priors and other variables. For the sake of simplicity, let's use the following example defining linear relations between the priors and these new variables:
relations = {'a1':{'b1':2,'b2:4},'a2':{'b1':1},'a3':{'b3':5}}
Once again, I could create b1, b2 and b3 manually with the following code:
b1 = 2*a1 + a2
b2 = 4*a1
b3 = 5*a3
I could use a solution similar to the other one, but I think that there is here too a better way to create b1, b2, b3.
The code I'm currently using is:
import pymc3
model = pymc3.Model()
obs1,obs2,obs3 = 2,4,5
d = {'a1':{'name':'a1','lower':0,'upper':10},\
'a2':{'name':'a2','lower':0,'upper':10},\
'a3':{'name':'a3','lower':0,'upper':10}}
with model:
list_prior_names = ['a1','a2','a3']
for prior_name in list_prior_names:
exec(prior_name+"=pymc3.Uniform(prior_name,lower=d[prior_name]['lower'],upper=d[prior_name]['upper'])")
b1 = 2*a1 + a2
b2 = 4*a1
b3 = 5*a3
m1 = pymc3.Normal('M1',mu=b1,sd=0.1,observed=obs1)
m2 = pymc3.Normal('M2',mu=b2,sd=0.1,observed=obs2)
m3 = pymc3.Normal('M3',mu=b3,sd=0.1,observed=obs3)
trace = pymc3.sample(1000)
If anyone had a clue on a proper way to create a1, a2, a3, b1, b2 and b3, I'd be grateful.
In fact, pymc3 can make its calculations with variables that are defined in dictionaries. I've therefore written the following code to solve my problem. I give it here, in case someone has some day to deal with the same question.
import pymc3
model = pymc3.Model()
obs1,obs2,obs3 = 2,4,5
d = {'a1':{'name':'a1','lower':0,'upper':10},
'a2':{'name':'a2','lower':0,'upper':10},
'a3':{'name':'a3','lower':0,'upper':10}}
relations = {'b1':{'a1':2,'a2':1},'b2':{'a1':4},'b3':{'a3':5}}
correspondances_dict = {'b1':{'random_var_name':'m1','observation':obs1},
'b2':{'random_var_name':'m2','observation':obs2},
'b3':{'random_var_name':'m3','observation':obs3}}
with model:
priors={prior_name:pymc3.Uniform(prior_name,lower=d[prior_name]['lower'],
upper=d[prior_name]['upper']) for prior_name in list(d.keys())}
intermediate_vars = {intermediate_var:sum([relations[intermediate_var][prior_name]*priors[prior_name]
for prior_name in list(relations[intermediate_var].keys())])
for intermediate_var in list(relations.keys())}
observed_vars = {correspondances_dict[intermediate_var]['random_var_name']:
pymc3.Normal(correspondances_dict[intermediate_var]['random_var_name'],
mu=intermediate_vars[intermediate_var],
sd=0.1,
observed=correspondances_dict[intermediate_var]['observation'])
for intermediate_var in list(intermediate_vars.keys())}
trace = pymc3.sample(1000)
Related
I have a code that calculates some stuff and has a lot of hardcoded values and duplicity, I just need help in arranging it in such a way It wouldn't look messy.
This is not my actual code but basic structure of my code
def my_code():
a = [func1(x),func2(x),func3(x)]
return a
def func1(x):
func1_x = #calculation using x formula
func1_dict = {}
func1["name"] = #some f1hardcoded name
func1["school"] = "some f1hardcoded value"
func1["school_address"] = "some f1hardcoded value"
func1["score_in_written"] = f" func1 student scored {func1_x} percentage "
func1["score_in_perc"] = func1_x
def func2(x):
func2_x = #some calculation using y formula
func2_dict = {}
func2["name"] = #some f2hardcoded name
func2["school"] = "some f2hardcoded value"
func2["school_address"] = "some f2hardcoded value"
func2["score_in_written"] = f" func1 student scored {func2_x} percentage "
func2["score_in_perc"] = func2_x
def func3(x):
func3_x = #some calculation using z formula
func3_dict = {}
func3["name"] = # f3 related some hardcoded name
func3["school"] = " f3 related some hardcoded value"
func3["school_address"] = " f3 related hardcoded value"
func3["score_in_written"] = f" func1 student scored {func3_x} percentage "
func3["score_in_perc"] = func3_x
Hardcoded values basically would not be changed for any x value in a particular function, only the score in percentage and score_in_written would change.
As I have many functions like this (till func9), is there any way I can change this into better code structure?
Is there any way I can make this code a little tidy and clean?
You can use a class to structure the repeated values like this:
class StudentAcademics:
def __init__(self, name, school, schoolAddress, score):
self.name = name
self.school = school
self.schoolAddress = schoolAddress
self.scoreInWritten = f" func1 student scored {score} percentage "
self.scoreInPercent = score
score = #calculation using x formula
student_academics_1 = StudentAcademics("John", "xyz school", "xyz address", score)
score = #calculation using y formula
student_academics_2 = StudentAcademics("Tom", "abc school", "abc address", score)
score = #calculation using z formula
student_academics_2 = StudentAcademics("Bill", "jkl school", "jkl address", score)
Now when you want to access the data again you can simply do:
print("name {} studied at {}".format(student_academics_1.name, student_academics_1.school))
Let me know if you have any questions further.
What you could be looking at is creating multiple classes with having a class for each subsection of similar operations. Then create the main class that combines all these classes. That way it looks clean and structured. Just read yourself into python classes and how to create objects.
I'm making a simple sudoku generator, and have got 81 different entries (e1, e2, e3, e4 ... etc).
I would like to know if there is any way to select a random entry to insert a number into.
So kind of like this:
num = randint(0, 81)
entry = "e" + str(num)
entry.insert()
With the above code you get an error saying
str object has no attribute 'insert'
which makes sense, but is there any way to 'convert' a string to a variable name?
Thanks in advance.
Store the entries in a list, then use random.choice to pick one of the entries.
entries = []
for i in range 81:
entry = tk.Entry(...)
entries.append(entry)
...
random_entry = random.choice(entries)
I have 2 text files as follows: animals = ['tiger'; 'lion'] and birds = ['parrot'; 'eagle']
Now I have to fetch these values into a numpy array and the array must look as follows:
So, I want to add the data from a new text file into the next available column. But I could only add it row-wise but not column-wise.
I have tried the following code:
a = np.array([])
for c in list:
s = np.genfromtxt(os.getcwd()+c+'.txt', dtype = 'str', delimiter = ';')
#s = np.reshape(s, (-1, 2))
h = np.concatenate([h, s], axis = 1)
I am getting an error as follows: "AxisError: axis 1 is out of bounds for array of dimension 1"
Tried many times using several techniques, but only getting the output as follows:
Can someone please help me out with this!
You can use the logic below. I assume you have two lists animals, birds and your required array is list_req
animals = ['tiger', 'lion']
birds = ['parrot', 'eagle']
list_req = []
list_req.append(animals)
list_req.append(birds)
list_req = np.transpose(list_req)
I am new to this so any help is much appreciated.
I would like to add a new column to a data frame that is a function of both values in the data frame and a python object.
The format is as follows:
df['col_3'] = list(map( function, df['col_1'], df['col_2'],instance_of_class ))
def function(a,b,instance):
return a + b + instance_of_class.attribute
where one of the parameters needs to be an instance of a class.
When I do this, python throws an error that the object is non-iterable, I assume this is because it wants only lists passed as parameters. Not sure how to get around this without substantially slowing things down. Thanks!
This could be done with map or apply, but since you are just starting why don't you keep it simple?
col_3 = []
for c1, c2 in df[['col_1', 'col_2']].values:
c3_list.append(function(c1, c2, instance))
df['col_3'] = col_3
I'm trying to calculate the 95th percentile for multiple water quality values grouped by watershed, for example:
Watershed WQ
50500101 62.370661
50500101 65.505046
50500101 58.741477
50500105 71.220034
50500105 57.917249
I reviewed this question posted - Percentile for Each Observation w/r/t Grouping Variable. It seems very close to what I want to do but it's for EACH observation. I need it for each grouping variable. so ideally,
Watershed WQ - 95th
50500101 x
50500105 y
This can be achieved using the plyr library. We specify the grouping variable Watershed and ask for the 95% quantile of WQ.
library(plyr)
#Random seed
set.seed(42)
#Sample data
dat <- data.frame(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
#plyr call
ddply(dat, "Watershed", summarise, WQ95 = quantile(WQ, .95))
and the results
Watershed WQ95
1 a 1.353993
2 b 1.461711
I hope I understand your question correctly. Is this what you're looking for?
my.df <- data.frame(group = gl(3, 5), var = runif(15))
aggregate(my.df$var, by = list(my.df$group), FUN = function(x) quantile(x, probs = 0.95))
Group.1 x
1 1 0.6913747
2 2 0.8067847
3 3 0.9643744
EDIT
Based on Vincent's answer,
aggregate(my.df$var, by = list(my.df$group), FUN = quantile, probs = 0.95)
also works (you can skin a cat 1001 ways - I've been told). A side note, you can specify a vector of desired -iles, say c(0.1, 0.2, 0.3...) for deciles. Or you can try function summary for some predefined statistics.
aggregate(my.df$var, by = list(my.df$group), FUN = summary)
Use a combination of the tapply and quantile functions. For example, if your dataset looks like this:
DF <- data.frame('watershed'=sample(c('a','b','c','d'), 1000, replace=T), wq=rnorm(1000))
Use this:
with(DF, tapply(wq, watershed, quantile, probs=0.95))
In Excel, you're going to want to use an array formula to make this easy. I suggest the following:
{=PERCENTILE(IF($A2:$A6 = Watershed ID, $B$2:$B$6), 0.95)}
Column A would be the Watershed ids, and Column B would be the WQ values.
Also, be sure to enter the formula as an array formula. Do so by pressing Ctrl+Shift+Enter when entering the formula.
Using the data.table-package you can do:
set.seed(42)
#Sample data
dt <- data.table(Watershed = sample(letters[1:2], 100, TRUE), WQ = rnorm(100))
dt[ ,
j = .(WQ95 = quantile(WQ, .95, na.rm = TRUE),
by = Watershed]