tqdm notebook: increase bar width - python-3.x

I have a task I'd like to monitor the progress of; it's a brute force np problem running in a while loop.
For the first x (unknown number) iterations of the loop it discovers an unknown additional number of future combinations (many per loop), eventually it progresses through the solution to a point where it is solving puzzles (each loop is a single solution) faster than it is finding new possible puzzles and it eventually solves the last puzzle it found (100%).
I've created some fake growth to provide a repeatable example:
from tqdm import tqdm_notebook as tqdm
growthFactorA = 19
growthFactorB = 2
prog = tqdm(total=50, dynamic_ncols=True)
done = []
todo = [1]
while len(todo)>0:
current = todo.pop(0)
if current < growthFactorA:
todo.extend(range(current+1, growthFactorA+growthFactorB))
done.append(current)
prog.total = len(todo) + len(done)
prog.update()
You'll see the total eventually stops at 389814 at first it is growing much faster that the loop is solving puzzles, but at a point the system stops growing.
It is impossible to calculate the number of iterations before running the algorithm.
The blue bar is confined to the original total amount used at initialization. My goal is to achieve something similar to if the initial total was set to 389814, it's okay that during the growth period (early on in the trial) the progress bar appears to move backwards or not move as the total increases.

As posted in https://github.com/tqdm/tqdm/issues/883#issuecomment-575873544 for now you could do:
prog.container.children[0].max = prog.total (after setting the new prog.total).
This is even more annoying in case of writing code to run on both notebooks and CLI (from tqdm.auto import tqdm), where you'll have to first check hasattr(prog, 'container').

Related

Changing the speed of a periodically appearing object in grid

I'm working in the minigrid environment. I currently have an object moving across the screen at one grid space per step count (from left to right). This object periodically appears currently half time on screen, and half off. I'd like to slow down the object so that it moves slower across the screen. I'm not sure how to do it without losing the periodic appearance attribute. Current code is below:
idx = (self.step_count+2)%(2*self.width) # 2 is the ratio of appear not appear
if idx < self.width:
try:
self.put_obj(self.obstacles[i_obst], idx , old_pos[1])
self.grid.set(*old_pos, None) # deletes old obstacle
except:
pass
else:
self.grid.set(*old_pos, None) # deletes old obstacle
I got something to work. The below snippet includes an integer titled "slow_factor" that reduces the speed yet still has the idx useful for the original purpose.
idx = (self.step_count+2)//slow_factor%(2*self.width)

how to measure the time of selection sort?

when i measure the time of selection sort
with random array size of 10000 in random number range of 1000
it gives me big time like 14 sec when the size is 1000000 it gives me 1 min i think it supposed to be less than 5 sec
can you help me with the algorithm to lower the time
def selection_sort(selection_random_array):
for i in range(len(selection_array) - 1):
minimum_index = i
for j in range(i + 1, len(selection_array)):
if selection_array[j] < selection_array[minimum_index]:
minimum_index = j
selection_array[i], selection_array[minimum_index] = selection_array[minimum_index], selection_array[i]
return selection_array
print("--------selection_sort----------")
start1 = time.time()
selection_sort(selection_random_array)
end1 = time.time()
print(f"random array: {end1 - start1}")
You seem to be asking two questions: how to improve selection sort and how to time it exactly.
The short answer for both is: you can't. If you modify the sorting algorithm it is no longer selection sort. If that's okay, the industry standard is quicksort, so take a look at that algorithm (it's much more complicated, but runs in O(n log n) time instead of selection sort's O(n^2) time.
As for your other question, "how do I time it exactly", you also can't. Computers don't handle only one thing anymore. Your operating system is constantly threading tasks in between each other. There is a 0% chance that your CPU is dedicated entirely to this program while it runs. What does that mean? It means that the time it takes for the program to finish will change each time you run it. Beyond that, the time it takes to call time.time() will need to be taken into account.

linearK - large time difference between empirical and acceptance envelopes in spatstat

I am interested in knowing correlation in points between 0 to 2km on a linear network. I am using the following statement for empirical data, this is solved in 2 minutes.
obs<-linearK(c, r=seq(0,2,by=0.20))
Now I want to check the acceptance of Randomness, so I used envelopes for the same r range.
acceptance_enve<-envelope(c, linearK, nsim=19, fix.n = TRUE, funargs = list(r=seq(0,2,by=0.20)))
But this show estimated time to be little less than 3 hours. I just want to ask if this large time difference is normal. Am I correct in my syntax to the function call of envelope its extra arguments for r as a sequence?
Is there some efficient way to shorten this 3 hour execution time for envelopes?
I have a road network of whole city, so it is quite large and I have checked that there are no disconnected subgraphs.
c
Point pattern on linear network
96 points
Linear network with 13954 vertices and 19421 lines
Enclosing window: rectangle = [559.653, 575.4999] x
[4174.833, 4189.85] Km
thank you.
EDIT AFTER COMMENT
system.time({s <- runiflpp(npoints(c), as.linnet(c));
+ linearK(s, r=seq(0,2,by=0.20))})
user system elapsed
343.047 104.428 449.650
EDIT 2
I made some really small changes by deleting some peripheral network segments that seem to have little or no effect on the overall network. This also lead to split some long segments into smaller segments. But now on the same network with different point pattern, I have even longer estimated time:
> month1envelope=envelope(months[[1]], linearK ,nsim = 39, r=seq(0,2,0.2))
Generating 39 simulations of CSR ...
1, 2, [etd 12:03:43]
The new network is
> months[[1]]
Point pattern on linear network
310 points
Linear network with 13642 vertices and 18392 lines
Enclosing window: rectangle = [560.0924, 575.4999] x [4175.113,
4189.85] Km
System Config: MacOS 10.9, 2.5Ghz, 16GB, R 3.3.3, RStudio Version 1.0.143
You don't need to use funargs in this context. Arguments can be passed directly through the ... argument. So I suggest
acceptance_enve <- envelope(c, linearK, nsim=19,
fix.n = TRUE, r=seq(0,2,by=0.20))
Please try this to see if it accelerates the execution.

Folium + Bokeh: Poor performance and massive memory usage

I'm using Folium and Bokeh together in a Jupyter notebook. I'm looping through a dataframe, and for each row inserting a marker on the Folium map, pulling some data from a separate dataframe, creating a Bokeh chart out of that data, and then embedding the Bokeh chart in the Folium map popup in an IFrame. Code is as follows:
map = folium.Map(location=[36.710021, 35.086146],zoom_start=6)
for i in range (0,len(duty_station_totals)):
popup_table = station_dept_totals.loc[station_dept_totals['Duty Station'] == duty_station_totals.iloc[i,0]]
chart = Bar(popup_table,label=CatAttr(columns=['Department / Program'],sort=False),values='dept_totals',
title=duty_station_totals.iloc[i,0] + ' Staff',xlabel='Department / Program',ylabel='Staff',plot_width=350,plot_height=350)
hover = HoverTool(point_policy='follow_mouse')
hover.tooltips=[('Staff','#height'),('Department / Program','#{Department / Program}'),('Duty Station',duty_station_totals.iloc[i,0])]
chart.add_tools(hover)
html = file_html(chart, INLINE, "my plot")
iframe = folium.element.IFrame(html=html, width=400, height=400)
popup = folium.Popup(iframe, max_width=400)
marker = folium.CircleMarker(duty_station_totals.iloc[i,2],
radius=duty_station_totals.iloc[i,1] * 150,
color=duty_station_totals.iloc[i,3],
fill_color=duty_station_totals.iloc[i,3])
marker.add_to(map)
folium.Marker(duty_station_totals.iloc[i,2],icon=folium.Icon(color='black',icon_color=duty_station_totals.iloc[i,3]),popup=popup).add_to(map)
map
This loop runs extremely slowly, and adds approx. 200mb to the memory usage of the associated python 3.5 process, per run of the loop! In fact, after running the loop a couple times my entire macbook is slowing down to a crawl - even the mouse is lagging. The associated map also lags heavily when scrolling and zooming, and the popups are slow to open. In case it isn't obvious, I'm pretty new to the python analytics and web visualization world so maybe there is something clearly very inefficient here.
I'm wondering why this is and if there is a better way of having Bokeh charts appear in the map popups. From some basic experiments I've done, it doesn't seem that the issue is with the calls to Bar - the memory usage seems to really skyrocket when I include calls to file_html and just get worse as calls to folium.element.IFrame are added. Seems like there is some sort of memory leak going on due to the increasing memory usage on re-running of the same code.
If anyone has ideas as to how to achieve the same effect (Bokeh charts opening when clicking a Folium marker) in a more efficient manner I would really appreciate it!
Update following some experimentation
I've run through the loop step by step and observed changes in memory usage as more steps are added in to try and isolate what piece of code is driving this issue. On the Bokeh side, the biggest culprit seems to be the calls to file_html() - when running the loop through this step it adds about 5mb of memory usage to the associated python 3.5 process per run (the loop is creating 18 charts), even when including bokeh.io.curdoc().clear().
The bigger issue, however, seems to be driven by Folium. running the whole loop including the creation of the Folium IFrames with the Bokeh-generated HTML and the map markers linked to the IFrames adds 25-30mb to the memory usage of the Python process per run.
So, I guess this is turning in to more of a Folium question. Why is this structure so memory intensive and is there a better way? By the way, saving the resulting Folium map as an HTML file with map.save('map.html') creates a huge, 22mb HTML file.
There are lots of different use-cases, and some of them come with unavoidable trade-offs. In order to make some other use-cases very simple and convenient, Bokeh has an implicit "current document" and keeps accumulating things there. For the particular use-case of generating a bunch of plots sequentially in a loop, you will want to call bokeh.io.reset_output() in between each, to prevent this accumulation.

Joblib parallel increases time by n jobs

While trying to get multiprocessing to work (and understand it) in python 3.3 I quickly reverted to joblib to make my life easier. But I experience something very strange (in my point of view). When running this code (just to test if it works):
Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(200000))
It takes about 9 seconds but by increasing n_jobs it actually takes longer... for n_jobs=2 it takes 25 seconds and n_jobs=4 it takes 27 seconds.
Correct me if I'm wrong... but shouldn't it instead be much faster if n_jobs increases? I have an Intel I7 3770K so I guess it's not the problem of my CPU.
Perhaps giving my original problem can increase the possibility of an answer or solution.
I have a list of 30k+ strings, data, and I need to do something with each string (independent of the other strings), it takes about 14 seconds. This is only the test case to see if my code works. In real applications it will probably be 100k+ entries so multiprocessing is needed since this is only a small part of the entire calculation.
This is what needs to be done in this part of the calculation:
data_syno = []
for entry in data:
w = wordnet.synsets(entry)
if len(w)>0: data_syno.append(w[0].lemma_names[0])
else: data_syno.append(entry)
The n_jobs parameter is counter intuitive as the max number of cores to be used is at -1. at 1 it uses only one core. At -2 it uses max-1 cores, at -3 it uses max-2 cores, etc. Thats how I read it:
from the docs:
n_jobs: int :
The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.

Resources