I am trying to solve a stochastic problem. The problem for 24 steps and in each step I solve it for 450 different instances.
Now it is possible to solve the problem serially, but that takes a lot of time, so i wanted to solve those 450 instances parallely.
I approached this in 2 ways:
using solver_manager and multiple pyro mip servers. Here I just have to queue the insances and the solver manager would assign the problems to pyro mip servers. But the pyro mip servers occupy a lot of memory and retain it even after the final result is computed.
using opt solver.
a) I can solve using the opt solver directly, but it is slow.
b) using multi threading. I tried to create a threadPoolExecutor with 5 threads and then solve those 450 instances in different threads. But just shifting to multithreading causes the pyomo to throw errors.
def optimize():
optsolver = SolverFactory(self.solver_name)
value = self.initialize_all_values()
for timestep in range(24):
instance_list = self.create_instance_list(value)
futures = {}
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
for instance_object in instance_list:
futures = {executor.submit(self.thread_solver, instance_object["instance"], optsolver, timestep)}
for future in concurrent.futures.as_completed(futures):
try:
result = future.result()
value.update(result)
except Exception as e:
self.logger.error(e)
def thread_solver(instance, optsolver, timestep):
result = optsolver.solve(instance)
if (result.solver.status == SolverStatus.ok) and (
result.solver.termination_condition == TerminationCondition.optimal):
instance.solutions.load_from(result)
my_dict = {}
for v in instance.component_objects(Var, active=True):
varobject = getattr(instance, str(v))
var_list = []
try:
for index in varobject:
var_list.append(varobject[index].value)
my_dict[str(v)] = var_list
except Exception as e:
self.logger.error("error reading result " + str(e))
value = {key: my_dict[k][0]}
return value
elif result.solver.termination_condition == TerminationCondition.infeasible:
self.logger.info("Termination condition is infeasible")
return {}
else:
self.logger.info("Nothing fits")
return {}
I get the following errors/outputs:
Nothing fits
Solver failed to locate input problem file: /usr/src/app/temp/pyomo/tmpn23yhmz_.pyomo.lp
I have set the pyomo temp file location to "/usr/src/app/temp/pyomo"
The same code runs perfectly fine without multithreading.
Any reason for the pyomo to not solve the problem?
UPDATE 1:
I tried using mutex when I try to call the optsolver.solve(instance). This helped in reducing the above error and not completely eliminate. Whenever these error occur, I tried to solve the instance again and in the second attemp they would be solved. So, multithreading does work but I don't know why the errors still occur.
Related
First, I'd like to thank the StackOverflow community for the tremendous help it provided me over the years, without me having to ask a single question.
I could not find anything that I can relate to my problem, though it is probably due to my lack of understanding of the subject, rather than the absence of a response on the website. My apologies in advance if this is a duplicate.
I am relatively new to multiprocess; some time ago I succeeded in using multiprocessing.pools in a very simple way, where I didn't need any feedback between the child processes.
Now I am facing a much more complicated problem, and I am just lost in the documentation about multiprocessing. I hence ask for you help, your kindness and your patience.
I am trying to build a parallel tempering monte-carlo algorithm, from a class.
The basic class very roughly goes as follows:
import numpy as np
class monte_carlo:
def __init__(self):
self.x=np.ones((1000,3))
self.E=np.mean(self.x)
self.Elist=[]
def simulation(self,temperature):
self.T=temperature
for i in range(3000):
self.MC_step()
if i%10==0:
self.Elist.append(self.E)
return
def MC_step(self):
x=self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1,1,3))
temp_E=np.mean(self.x)
if np.random.random()<np.exp((self.E-temp_E)/self.T):
self.E=temp_E
self.x=x
return
Obviously, I simplified a great deal (actual class is 500 lines long!), and built fake functions for simplicity: __init__ takes a bunch of parameters as arguments, there are many more lists of measurement else than self.Elist, and also many arrays derived from self.X that I use to compute them. The key point is that each instance of the class contains a lot of informations that I want to keep in memory, and that I don't want to copy over and over again, to avoid dramatic slowing down. Else I would just use the multiprocessing.pool module.
Now, the parallelization I want to do, in pseudo-code:
def proba(dE,pT):
return np.exp(-dE/pT)
Tlist=[1.1,1.2,1.3]
N=len(Tlist)
G=[]
for _ in range(N):
G.append(monte_carlo())
for _ in range(5):
for i in range(N): # this loop should be ran in multiprocess
G[i].simulation(Tlist[i])
for i in range(N//2):
dE=G[i].E-G[i+1].E
pT=G[i].T + G[i+1].T
p=proba(dE,pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = G[i].T
G[i].T = G[i+1].T
G[i+1].T = T_temp
Synthesis: I want to run several instances of my monte-carlo class in parallel child processes, with different values for a parameter T, then periodically pause everything to change the different T's, and run again the child processes/class instances, from where they paused.
Doing this, I want each class-instance/child-process to stay independent from one another, save its current state with all internal variables while it is paused, and do as few copies as possible. This last point is critical, as the arrays inside the class are quite big (some are 1000x1000), and a copy will therefore very quickly become quite time-costly.
Thanks in advance, and sorry if I am not clear...
Edit:
I am using a distant machine with many (64) CPUs, running on Debian GNU/Linux 10 (buster).
Edit2:
I made a mistake in my original post: in the end, the temperatures must be exchanged between the class-instances, and not inside the global Tlist.
Edit3: Charchit answer works perfectly for the test code, on both my personal machine and the distant machine I am usually using for running my codes. I hence check this as the accepted answer.
However, I want to report here that, inserting the actual, more complicated code, instead of the oversimplified monte_carlo class, the distant machine gives me some strange errors:
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gtk-WARNING **: ##:##:##:###: Locale not supported by C library.
Using the fallback 'C' locale.
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###:
gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
The "##:##:##:###" are (or seems like) IP adresses.
Without the call to set_start_method('spawn') this error shows only once, in the very beginning, while when I use this method, it seems to show at every occurrence of result.get()...
The strangest thing is that the code seems otherwise to work fine, does not crash, produces the datafiles I then ask it to, etc...
I think this would deserve to publish a new question, but I put it here nonetheless in case someone has a quick answer.
If not, I will resort to add one by one the variables, methods, etc... that are present in my actual code but not in the test example, to try and find the origin of the bug. My best guess for now is that the memory space required by each child-process with the actual code, is too large for the distant machine to accept it, due to some restrictions implemented by the admin.
What you are looking for is sharing state between processes. As per the documentation, you can either create shared memory, which is restrictive about the data it can store and is not thread-safe, but offers better speed and performance; or you can use server processes through managers. The latter is what we are going to use since you want to share whole objects of user-defined datatypes. Keep in mind that using managers will impact speed of your code depending on the complexity of the arguments that you pass and receive, to and from the managed objects.
Managers, proxies and pickling
As mentioned, managers create server processes to store objects, and allow access to them through proxies. I have answered a question with better details on how they work, and how to create a suitable proxy here. We are going to use the same proxy defined in the linked answer, with some variations. Namely, I have replaced the factory functions inside the __getattr__ to something that can be pickled using pickle. This means that you can run instance methods of managed objects created with this proxy without resorting to using multiprocess. The result is this modified proxy:
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
Solution
Now we only need to make sure that when we are creating objects of monte_carlo, we do so using managers and the above proxy. For that, we create a class constructor called create. All objects for monte_carlo should be created with this function. With that, the final code looks like this:
from multiprocessing import Pool
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
class monte_carlo:
def __init__(self, ):
self.x = np.ones((1000, 3))
self.E = np.mean(self.x)
self.Elist = []
self.T = None
def simulation(self, temperature):
self.T = temperature
for i in range(3000):
self.MC_step()
if i % 10 == 0:
self.Elist.append(self.E)
return
def MC_step(self):
x = self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1, 1, 3))
temp_E = np.mean(self.x)
if np.random.random() < np.exp((self.E - temp_E) / self.T):
self.E = temp_E
self.x = x
return
#classmethod
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
manager.start()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
def proba(dE,pT):
return np.exp(-dE/pT)
if __name__ == "__main__":
Tlist = [1.1, 1.2, 1.3]
N = len(Tlist)
G = []
# Create our managed instances
for _ in range(N):
G.append(monte_carlo.create())
for _ in range(5):
# Run simulations in the manager server
results = []
with Pool(8) as pool:
for i in range(N): # this loop should be ran in multiprocess
results.append(pool.apply_async(G[i].simulation, (Tlist[i], )))
# Wait for the simulations to complete
for result in results:
result.get()
for i in range(N // 2):
dE = G[i].E - G[i + 1].E
pT = G[i].T + G[i + 1].T
p = proba(dE, pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = Tlist[i]
Tlist[i] = Tlist[i + 1]
Tlist[i + 1] = T_temp
print(Tlist)
This meets the criteria you wanted. It does not create any copies at all, rather, all arguments to the simulation method call are serialized inside the pool and sent to the manager server where the object is actually stored. It gets executed there, and the results (if any) are serialized and returned in the main process. All of this, with only using the builtins!
Output
[1.2, 1.1, 1.3]
Edit
Since you are using Linux, I encourage you to use multiprocessing.set_start_method inside the if __name__ ... clause to set the start method to "spawn". Doing this will ensure that the child processes do not have access to variables defined inside the clause.
I tried optimizing my code by running it on GPU, but I encountered the following error.
Also, I'm fairly new to this, so I have little idea how to work with numba decorators, and my aim for this is to speed up my program.
ValueError: cannot determine Numba type of <class 'collections.defaultdict'>
#jit(target = "cuda")
def initialize(foreign_no_of_words,foreign_l,english_l,num_dict_dutch,num_dict_eng):
probabilities = {} # Initializing proablities
#count = {} # Count
counter = 1
index = -1*(foreign_no_of_words)
num_dict_dutch = make_dict_dutch(foreign_l,num_dict_dutch)
for i in english_l.keys():
for j in foreign_l.keys():
s = i+"_"+j
probabilities[s] = 1/foreign_no_of_words
#count[s] = 0
index = write_to_file(probabilities,i,counter,PROB_FILE,foreign_no_of_words,index,num_dict_eng)
#write_to_file(count,i,counter,COUNT_FILE)
counter +=1
probabilities ={}
return True
The goal is to lessen the function running time.
Numba does not support the entire python language, but instead is capable of jitting a (growing) subset of its features. Please see:
https://numba.pydata.org/numba-doc/dev/reference/pysupported.html
https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
Currently, numba only supports namedtuples from the collections module. Also please note that writing code for the GPU is even more restricted:
http://numba.pydata.org/numba-doc/latest/cuda/index.html#cuda-index
I am working on processing a dataset that includes dense GPS data. My goal is to use parallel processing to test my dataset against all possible distributions and return the best one with the parameters generated for said distribution.
Currently, I have code that does this in serial thanks to this answer https://stackoverflow.com/a/37616966. Of course, it is going to take entirely too long to process my full dataset. I have been playing around with multiprocessing, but can't seem to get it to work right. I want it to test multiple distributions in parallel, keeping track of sum of square error. Then I want to select the distribution with the lowest SSE and return its name along with the parameters generated for it.
def fit_dist(distribution, data=data, bins=200, ax=None):
#Block of code that tests the distribution and generates params
return(distribution.name, best_params, sse)
if __name__ == '__main__':
p = Pool()
result = p.map(fit_dist, DISTRIBUTIONS)
p.close()
p.join()
I need some help with how to actually make use of the return values on each of the iterations in the multiprocessing to compare those values. I'm really new to python especially multiprocessing so please be patient with me and explain as much as possible.
The problem I'm having is it's giving me an "UnboundLocalError" on the variables that I'm trying to return from my fit_dist function. The DISTRIBUTIONS list is 89 objects. Could this be related to the parallel processing, or is it something to do with the definition of fit_dist?
With the help of Tomerikoo's comment and some further struggling, I got the code working the way I wanted it to. The UnboundLocalError was due to me not putting the return statement in the correct block of code within my fit_dist function. To answer the question I did the following.
from multiprocessing import Pool
def fit_dist:
#put this return under the right section of this method
return[distribution.name, params, sse]
if __name__ == '__main__':
p = Pool()
result = p.map(fit_dist, DISTRIBUTIONS)
p.close()
p.join()
'''filter out the None object results. Due to the nature of the distribution fitting,
some distributions are so far off that they result in None objects'''
res = list(filter(None, result))
#iterates over nested list storing the lowest sum of squared errors in best_sse
for dist in res:
if best_sse > dist[2] > 0:
best_sse = dis[2]
else:
continue
'''iterates over list pulling out sublist of distribution with best sse.
The sublists are made up of a string, tuple with parameters,
and float value for sse so that's why sse is always index 2.'''
for dist in res:
if dist[2]==best_sse:
best_dist_list = dist
else:
continue
The rest of the code simply consists of me using that list to construct charts and plots with that best distribution overtop of a histogram of my raw data.
I have a list of issues (jira issues):
listOfKeys = [id1,id2,id3,id4,id5...id30000]
I want to get worklogs of this issues, for this I used jira-python library and this code:
listOfWorklogs=pd.DataFrame() (I used pandas (pd) lib)
lst={} #dictionary for help, where the worklogs will be stored
for i in range(len(listOfKeys)):
worklogs=jira.worklogs(listOfKeys[i]) #getting list of worklogs
if(len(worklogs)) == 0:
i+=1
else:
for j in range(len(worklogs)):
lst = {
'self': worklogs[j].self,
'author': worklogs[j].author,
'started': worklogs[j].started,
'created': worklogs[j].created,
'updated': worklogs[j].updated,
'timespent': worklogs[j].timeSpentSeconds
}
listOfWorklogs = listOfWorklogs.append(lst, ignore_index=True)
########### Below there is the recording to the .xlsx file ################
so I simply go into the worklog of each issue in a simple loop, which is equivalent to referring to the link:
https://jira.mycompany.com/rest/api/2/issue/issueid/worklogs and retrieving information from this link
The problem is that there are more than 30,000 such issues.
and the loop is sooo slow (approximately 3 sec for 1 issue)
Can I somehow start multiple loops / processes / threads in parallel to speed up the process of getting worklogs (maybe without jira-python library)?
I recycled a piece of code I made into your code, I hope it helps:
from multiprocessing import Manager, Process, cpu_count
def insert_into_list(worklog, queue):
lst = {
'self': worklog.self,
'author': worklog.author,
'started': worklog.started,
'created': worklog.created,
'updated': worklog.updated,
'timespent': worklog.timeSpentSeconds
}
queue.put(lst)
return
# Number of cpus in the pc
num_cpus = cpu_count()
index = 0
# Manager and queue to hold the results
manager = Manager()
# The queue has controlled insertion, so processes don't step on each other
queue = manager.Queue()
listOfWorklogs=pd.DataFrame()
lst={}
for i in range(len(listOfKeys)):
worklogs=jira.worklogs(listOfKeys[i]) #getting list of worklogs
if(len(worklogs)) == 0:
i+=1
else:
# This loop replaces your "for j in range(len(worklogs))" loop
while index < len(worklogs):
processes = []
elements = min(num_cpus, len(worklogs) - index)
# Create a process for each cpu
for i in range(elements):
process = Process(target=insert_into_list, args=(worklogs[i+index], queue))
processes.append(process)
# Run the processes
for i in range(elements):
processes[i].start()
# Wait for them to finish
for i in range(elements):
processes[i].join(timeout=10)
index += num_cpus
# Dump the queue into the dataframe
while queue.qsize() != 0:
listOfWorklogs.append(q.get(), ignore_index=True)
This should work and reduce the time by a factor of little less than the number of CPUs in your machine. You can try and change that number manually for better performance. In any case I find it very strange that it takes about 3 seconds per operation.
PS: I couldn't try the code because I have no examples, it probably has some bugs
I have some troubles((
1) indents in the code where the first "for" loop appears and the first "if" instruction begins (this instruction and everything below should be included in the loop, right?)
for i in range(len(listOfKeys)-99):
worklogs=jira.worklogs(listOfKeys[i]) #getting list of worklogs
if(len(worklogs)) == 0:
....
2) cmd, conda prompt and Spyder did not allow your code to work for a reason:
Python Multiprocessing error: AttributeError: module '__ main__' has no attribute 'spec'
After researching in the google, I had to set a bit higher in the code: spec = None (but I'm not sure if this is correct) and this error disappeared.
By the way, the code in Jupyter Notebook worked without this error, but listOfWorklogs is empty and this is not right.
3) when I corrected indents and set __spec __ = None, a new error occurred in this place:
processes[i].start ()
error like this:
"PicklingError: Can't pickle : attribute lookup PropertyHolder on jira.resources failed"
if I remove the parentheses from the start and join methods, the code will work, but I will not have any entries in the listOfWorklogs(((
I ask again for your help!)
How about thinking about it not from a technical standpoint but a logical one? You know your code works, but at a rate of 3sec per 1 issue which means it would take 25 hours to complete. If you have the ability to split up the # of Jira issues that are passed into the script (maybe use date or issue key, etc) you could create multiple different .py files with basically the same code, you would just be passing each one a different list of Jira tickets. So you could just run say 4 of them at the same time and you would reduce your time to 6.25 hours each.
I am attempting to make a few thousand dns queries. I have written my script to use python-adns. I have attempted to add threading and queue's to ensure the script runs optimally and efficiently.
However, I can only achieve mediocre results. The responses are choppy/intermittent. They start and stop, and most times pause for 10 to 20 seconds.
tlock = threading.Lock()#printing to screen
def async_dns(i):
s = adns.init()
for i in names:
tlock.acquire()
q.put(s.synchronous(i, adns.rr.NS)[0])
response = q.get()
q.task_done()
if response == 0:
dot_net.append("Y")
print(i + ", is Y")
elif response == 300:
dot_net.append("N")
print(i + ", is N")
tlock.release()
q = queue.Queue()
threads = []
for i in range(100):
t = threading.Thread(target=async_dns, args=(i,))
threads.append(t)
t.start()
print(threads)
I have spent countless hours on this. I would appreciate some input from expedienced pythonista's . Is this a networking issue ? Can this bottleneck/intermittent responses be solved by switching servers ?
Thanks.
Without answers to the questions, I asked in comments above, I'm not sure how well I can diagnose the issue you're seeing, but here are some thoughts:
It looks like each thread is processing all names instead of just a portion of them.
Your Queue seems to be doing nothing at all.
Your lock seems to guarantee that you actually only do one query at a time (defeating the purpose of having multiple threads).
Rather than trying to fix up this code, might I suggest using multiprocessing.pool.ThreadPool instead? Below is a full working example. (You could use adns instead of socket if you want... I just couldn't easily get it installed and so stuck with the built-in socket.)
In my testing, I also sometimes see pauses; my assumption is that I'm getting throttled somewhere.
import itertools
from multiprocessing.pool import ThreadPool
import socket
import string
def is_available(host):
print('Testing {}'.format(host))
try:
socket.gethostbyname(host)
return False
except socket.gaierror:
return True
# Test the first 1000 three-letter .com hosts
hosts = [''.join(tla) + '.com' for tla in itertools.permutations(string.ascii_lowercase, 3)][:1000]
with ThreadPool(100) as p:
results = p.map(is_available, hosts)
for host, available in zip(hosts, results):
print('{} is {}'.format(host, 'available' if available else 'not available'))