I tried optimizing my code by running it on GPU, but I encountered the following error.
Also, I'm fairly new to this, so I have little idea how to work with numba decorators, and my aim for this is to speed up my program.
ValueError: cannot determine Numba type of <class 'collections.defaultdict'>
#jit(target = "cuda")
def initialize(foreign_no_of_words,foreign_l,english_l,num_dict_dutch,num_dict_eng):
probabilities = {} # Initializing proablities
#count = {} # Count
counter = 1
index = -1*(foreign_no_of_words)
num_dict_dutch = make_dict_dutch(foreign_l,num_dict_dutch)
for i in english_l.keys():
for j in foreign_l.keys():
s = i+"_"+j
probabilities[s] = 1/foreign_no_of_words
#count[s] = 0
index = write_to_file(probabilities,i,counter,PROB_FILE,foreign_no_of_words,index,num_dict_eng)
#write_to_file(count,i,counter,COUNT_FILE)
counter +=1
probabilities ={}
return True
The goal is to lessen the function running time.
Numba does not support the entire python language, but instead is capable of jitting a (growing) subset of its features. Please see:
https://numba.pydata.org/numba-doc/dev/reference/pysupported.html
https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html
Currently, numba only supports namedtuples from the collections module. Also please note that writing code for the GPU is even more restricted:
http://numba.pydata.org/numba-doc/latest/cuda/index.html#cuda-index
Related
I want to extract the VGG features of a set of images and keep them in memory in a dictionary. The dictionary ends up holding 8091 tensors each of shape (1,4096), but my machine crashes with an out of memory error after about 6% of the way. Does anybody have a clue why this is happening and how to prevent it?
In fact, this seems to be triggered by the call to VGG rather than the memory space, since storing the VGG classification is sufficient to trigger the error.
Below is the simplest code I've found to reproduce the error. Once a helper function is defined:
import torch, torchvision
from tqdm import tqdm
vgg = torchvision.models.vgg16(weights='DEFAULT')
def try_and_crash(gen_data):
store_out = {}
for i in tqdm(range(8091)):
my_output = gen_data(torch.randn(1,3,224,224))
store_out[i] = my_output
return store_out
Calling it to quickly produce a large tensor doesn't cause a fuss
just_fine = try_and_crash(lambda x: torch.randn(1,4096))
but calling it to use vgg causes the machine to crash:
will_crash = try_and_crash(vgg)
The problem is that each element of the dictionary store_out[i] also stores the gradients that led to its computation, therefore ends up being much larger than a simple 1x4096 element tensor.
Running the code with torch.no_grad(), or equivalently with torch.set_grad_enabled(False) solves the issue. We can test it by slightly changing the helper function
def try_and_crash_grad(gen_data, grad_enabled):
store_out = {}
for i in tqdm(range(8091)):
with torch.set_grad_enabled(grad_enabled):
my_output = gen_data(torch.randn(1,3,224,224))
store_out[i] = my_output
return store_out
Now the following works
works_fine = try_and_crash_grad(vgg, False)
while the following throws an out of memory error
crashes = try_and_crash_grad(vgg, True)
I am trying to solve a stochastic problem. The problem for 24 steps and in each step I solve it for 450 different instances.
Now it is possible to solve the problem serially, but that takes a lot of time, so i wanted to solve those 450 instances parallely.
I approached this in 2 ways:
using solver_manager and multiple pyro mip servers. Here I just have to queue the insances and the solver manager would assign the problems to pyro mip servers. But the pyro mip servers occupy a lot of memory and retain it even after the final result is computed.
using opt solver.
a) I can solve using the opt solver directly, but it is slow.
b) using multi threading. I tried to create a threadPoolExecutor with 5 threads and then solve those 450 instances in different threads. But just shifting to multithreading causes the pyomo to throw errors.
def optimize():
optsolver = SolverFactory(self.solver_name)
value = self.initialize_all_values()
for timestep in range(24):
instance_list = self.create_instance_list(value)
futures = {}
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
for instance_object in instance_list:
futures = {executor.submit(self.thread_solver, instance_object["instance"], optsolver, timestep)}
for future in concurrent.futures.as_completed(futures):
try:
result = future.result()
value.update(result)
except Exception as e:
self.logger.error(e)
def thread_solver(instance, optsolver, timestep):
result = optsolver.solve(instance)
if (result.solver.status == SolverStatus.ok) and (
result.solver.termination_condition == TerminationCondition.optimal):
instance.solutions.load_from(result)
my_dict = {}
for v in instance.component_objects(Var, active=True):
varobject = getattr(instance, str(v))
var_list = []
try:
for index in varobject:
var_list.append(varobject[index].value)
my_dict[str(v)] = var_list
except Exception as e:
self.logger.error("error reading result " + str(e))
value = {key: my_dict[k][0]}
return value
elif result.solver.termination_condition == TerminationCondition.infeasible:
self.logger.info("Termination condition is infeasible")
return {}
else:
self.logger.info("Nothing fits")
return {}
I get the following errors/outputs:
Nothing fits
Solver failed to locate input problem file: /usr/src/app/temp/pyomo/tmpn23yhmz_.pyomo.lp
I have set the pyomo temp file location to "/usr/src/app/temp/pyomo"
The same code runs perfectly fine without multithreading.
Any reason for the pyomo to not solve the problem?
UPDATE 1:
I tried using mutex when I try to call the optsolver.solve(instance). This helped in reducing the above error and not completely eliminate. Whenever these error occur, I tried to solve the instance again and in the second attemp they would be solved. So, multithreading does work but I don't know why the errors still occur.
I am working on processing a dataset that includes dense GPS data. My goal is to use parallel processing to test my dataset against all possible distributions and return the best one with the parameters generated for said distribution.
Currently, I have code that does this in serial thanks to this answer https://stackoverflow.com/a/37616966. Of course, it is going to take entirely too long to process my full dataset. I have been playing around with multiprocessing, but can't seem to get it to work right. I want it to test multiple distributions in parallel, keeping track of sum of square error. Then I want to select the distribution with the lowest SSE and return its name along with the parameters generated for it.
def fit_dist(distribution, data=data, bins=200, ax=None):
#Block of code that tests the distribution and generates params
return(distribution.name, best_params, sse)
if __name__ == '__main__':
p = Pool()
result = p.map(fit_dist, DISTRIBUTIONS)
p.close()
p.join()
I need some help with how to actually make use of the return values on each of the iterations in the multiprocessing to compare those values. I'm really new to python especially multiprocessing so please be patient with me and explain as much as possible.
The problem I'm having is it's giving me an "UnboundLocalError" on the variables that I'm trying to return from my fit_dist function. The DISTRIBUTIONS list is 89 objects. Could this be related to the parallel processing, or is it something to do with the definition of fit_dist?
With the help of Tomerikoo's comment and some further struggling, I got the code working the way I wanted it to. The UnboundLocalError was due to me not putting the return statement in the correct block of code within my fit_dist function. To answer the question I did the following.
from multiprocessing import Pool
def fit_dist:
#put this return under the right section of this method
return[distribution.name, params, sse]
if __name__ == '__main__':
p = Pool()
result = p.map(fit_dist, DISTRIBUTIONS)
p.close()
p.join()
'''filter out the None object results. Due to the nature of the distribution fitting,
some distributions are so far off that they result in None objects'''
res = list(filter(None, result))
#iterates over nested list storing the lowest sum of squared errors in best_sse
for dist in res:
if best_sse > dist[2] > 0:
best_sse = dis[2]
else:
continue
'''iterates over list pulling out sublist of distribution with best sse.
The sublists are made up of a string, tuple with parameters,
and float value for sse so that's why sse is always index 2.'''
for dist in res:
if dist[2]==best_sse:
best_dist_list = dist
else:
continue
The rest of the code simply consists of me using that list to construct charts and plots with that best distribution overtop of a histogram of my raw data.
I have a list of issues (jira issues):
listOfKeys = [id1,id2,id3,id4,id5...id30000]
I want to get worklogs of this issues, for this I used jira-python library and this code:
listOfWorklogs=pd.DataFrame() (I used pandas (pd) lib)
lst={} #dictionary for help, where the worklogs will be stored
for i in range(len(listOfKeys)):
worklogs=jira.worklogs(listOfKeys[i]) #getting list of worklogs
if(len(worklogs)) == 0:
i+=1
else:
for j in range(len(worklogs)):
lst = {
'self': worklogs[j].self,
'author': worklogs[j].author,
'started': worklogs[j].started,
'created': worklogs[j].created,
'updated': worklogs[j].updated,
'timespent': worklogs[j].timeSpentSeconds
}
listOfWorklogs = listOfWorklogs.append(lst, ignore_index=True)
########### Below there is the recording to the .xlsx file ################
so I simply go into the worklog of each issue in a simple loop, which is equivalent to referring to the link:
https://jira.mycompany.com/rest/api/2/issue/issueid/worklogs and retrieving information from this link
The problem is that there are more than 30,000 such issues.
and the loop is sooo slow (approximately 3 sec for 1 issue)
Can I somehow start multiple loops / processes / threads in parallel to speed up the process of getting worklogs (maybe without jira-python library)?
I recycled a piece of code I made into your code, I hope it helps:
from multiprocessing import Manager, Process, cpu_count
def insert_into_list(worklog, queue):
lst = {
'self': worklog.self,
'author': worklog.author,
'started': worklog.started,
'created': worklog.created,
'updated': worklog.updated,
'timespent': worklog.timeSpentSeconds
}
queue.put(lst)
return
# Number of cpus in the pc
num_cpus = cpu_count()
index = 0
# Manager and queue to hold the results
manager = Manager()
# The queue has controlled insertion, so processes don't step on each other
queue = manager.Queue()
listOfWorklogs=pd.DataFrame()
lst={}
for i in range(len(listOfKeys)):
worklogs=jira.worklogs(listOfKeys[i]) #getting list of worklogs
if(len(worklogs)) == 0:
i+=1
else:
# This loop replaces your "for j in range(len(worklogs))" loop
while index < len(worklogs):
processes = []
elements = min(num_cpus, len(worklogs) - index)
# Create a process for each cpu
for i in range(elements):
process = Process(target=insert_into_list, args=(worklogs[i+index], queue))
processes.append(process)
# Run the processes
for i in range(elements):
processes[i].start()
# Wait for them to finish
for i in range(elements):
processes[i].join(timeout=10)
index += num_cpus
# Dump the queue into the dataframe
while queue.qsize() != 0:
listOfWorklogs.append(q.get(), ignore_index=True)
This should work and reduce the time by a factor of little less than the number of CPUs in your machine. You can try and change that number manually for better performance. In any case I find it very strange that it takes about 3 seconds per operation.
PS: I couldn't try the code because I have no examples, it probably has some bugs
I have some troubles((
1) indents in the code where the first "for" loop appears and the first "if" instruction begins (this instruction and everything below should be included in the loop, right?)
for i in range(len(listOfKeys)-99):
worklogs=jira.worklogs(listOfKeys[i]) #getting list of worklogs
if(len(worklogs)) == 0:
....
2) cmd, conda prompt and Spyder did not allow your code to work for a reason:
Python Multiprocessing error: AttributeError: module '__ main__' has no attribute 'spec'
After researching in the google, I had to set a bit higher in the code: spec = None (but I'm not sure if this is correct) and this error disappeared.
By the way, the code in Jupyter Notebook worked without this error, but listOfWorklogs is empty and this is not right.
3) when I corrected indents and set __spec __ = None, a new error occurred in this place:
processes[i].start ()
error like this:
"PicklingError: Can't pickle : attribute lookup PropertyHolder on jira.resources failed"
if I remove the parentheses from the start and join methods, the code will work, but I will not have any entries in the listOfWorklogs(((
I ask again for your help!)
How about thinking about it not from a technical standpoint but a logical one? You know your code works, but at a rate of 3sec per 1 issue which means it would take 25 hours to complete. If you have the ability to split up the # of Jira issues that are passed into the script (maybe use date or issue key, etc) you could create multiple different .py files with basically the same code, you would just be passing each one a different list of Jira tickets. So you could just run say 4 of them at the same time and you would reduce your time to 6.25 hours each.
I am quite new to the numba package in python. I am not sure if I am using the numba.jit correctly, but the code just runs too slow with 23.7s per loops over the line: Z1 = mmd(X,Y,20)
What is the correct way to optimize the code? I need your help guys. Thank you.
Here is my code:
import pandas as pd
import numba as nb
import numpy as np
#nb.jit
def mmd(array1, array2, n):
n1 = array1.shape[0]
MMD = np.empty(n1, dtype = 'float64')
for i in range(n-1,n1):
MMD[i] = np.average(abs(array1[i+1-n:i+1] - array2[i]))
return MMD
X = np.array([i**2 for i in range(1000000)])
Y = np.array([i for i in range(1000000)])
Z1 = mmd(X,Y,20)
EDIT: simplified the code even further
EDIT2: tried #nb.jit(nopython = True), then there is an error message:
KeyError: "<class 'numba.targets.cpu.CPUTargetOptions'> does not support option: 'nonpython'"
also tried:
#nb.jit(nb.float32[:](nb.float32[:],nb.float32[:],nb.int8))
To make Numba work well you need to use "nopython" mode, as you mentioned. To enable this, simply run the program with jit replaced by njit (or equivalently, jit(nopython=True), and fix the errors one by one:
np.empty() doesn't support the dtype='float64' argument in Numba. That's OK though, because float64 is the default. Just remove it.
np.average() is not supported in Numba. That's OK, since we are not passing any weights anyway, it's the same as np.mean(). Replace it.
The built-in abs() is not supported in Numba. Use np.abs() instead.
We end up with this:
#nb.njit
def mmd(array1, array2, n):
n1 = array1.shape[0]
MMD = np.empty(n1)
for i in range(n-1,n1):
MMD[i] = np.mean(np.abs(array1[i+1-n:i+1] - array2[i]))
return MMD
And it is 100x faster.
Bonus tips:
You can initialize your sample data more concisely and faster like this:
Y = np.arange(1000000)
X = Y * Y
The first n values in the result are uninitialized garbage. You might want to clean that up somehow.