How do I speed up this nested for loop in Python? - python-3.x

the function shown below is running quite slow even though I used swifter to call it. Does anyone know how to speed this up? My python knowledge is limited at this point and I would appreciate any help I could get. I tried using map() function but somehow it didnt work for me. I guess the nested for loop makes it rather slow, right?
BR,
Hannes
def polyData(uniqueIds):
for index in range(len(uniqueIds) - 1):
element = uniqueIds[index]
polyData1 = df[df['id'] == element]
poly1 = build_poly(polyData1)
poly1 = poly1.buffer(0)
for secondIndex in range(index + 1, len(uniqueIds)):
otherElement = uniqueIds[secondIndex]
polyData2 = df[df['id'] == otherElement]
poly2 = build_poly(polyData2)
poly2 = poly2.buffer(0)
# Calculate overlap percentage wise
overlap_pct = poly1.intersection(poly2).area/poly1.area
# Form new DF
df_ol = pd.DataFrame({'id_1':[element],'id_2':[otherElement],'overlap_pct':[overlap_pct]})
# Write to SQL database
df_ol.to_sql(name='df_overlap', con=e,if_exists='append',index=False)

This function is inherently slow for large amounts of data due to its complexity (trying every 2-combination of a set). However, you're calculating the 'poly' for the same ids multiple times, even though it seems that you can calculate them only once beforehand (which might be expensive) and store them for later usage. So try to extract the building of the polys.
def getPolyForUniqueId(uid):
polyData = df[df['id'] == uid]
poly = build_poly(polyData)
poly = poly.buffer(0)
return polyData
def polyData(uniqueIds):
polyDataList = [getPolyForUniqueId(uid) for uid in uniqueIds]
for index in range(len(uniqueIds) - 1):
id_1 = uniqueIds[index]
poly_1 = polyDataList[index]
for secondIndex in range(index + 1, len(uniqueIds)):
id_2 = uniqueIds[secondIndex]
poly_2 = polyDataList[secondIndex]
...

Related

Find the point on the top or bottom of a sequence of numbers

I have a problem like this and i would like to write a snippet of code to solve this problem.
Sequences like: [1,2,3,2], [1,3,2], [1,3,2,1] -> i want to output 3 (maximum) because the sequence increases to 3 and then decreases again
Sequences like [3,2,1,2], [3,1,2], [3,1,2,3] -> i want to output 1 (minimum) because the sequence decreases to 1 and then increases again
Any idea on how to do this automatically?
Try getting local maximas and/or local minimas:
import numpy as np
from scipy.signal import argrelextrema
a = np.array([3,2,1,3])
res=a[np.hstack([argrelextrema(a, np.greater),argrelextrema(a, np.less)]).ravel()]
This will return both local maximas and minimas. You can mark them somehow separately, if it's better for your use case. From your question I assumed it can be just one extremum. Also - depending on your data you might consider using np.less_equal or np.greater_equal instead of np.less or np.greater respectively.
I found it interesting to implement this algorithm in Python 3.
The basic idea is practically to find the minimum and maximum given a sequence of numbers. A sequence, however, can have several maximum points and several minimum points to be taken into consideration. However this is the algorithm I implemented.
I hope it's useful.
sequence = [1,2,3,4,5,4,3,2,8,10,1]
index = 1
max_points = [] #A sequence may have multiple points of relatives max
relative_maximum = sequence[0]
for element in range(len(sequence)):
if(index == len(sequence)):
break
if(sequence[element] < sequence[index]):
relative_maximum = sequence[index]
change_direction = True
else:
if(change_direction == True):
max_points.append(relative_maximum)
change_direction = False
index = index + 1
index = 1
min_points = [] #A sequence may have multiple points of relatives min
relative_minimum = sequence[0]
for element in range(len(sequence)):
if(index == len(sequence)):
break
if(sequence[element] > sequence[index]):
relative_minimum = sequence[index]
change_direction = True
else:
if(change_direction == True):
min_points.append(relative_minimum)
change_direction = False
index = index + 1
print("The max points: " + str(max_points))
print("The min points: " + str(min_points))
Result: The max points: [5, 10] The min points: [2]

Is there any method in DASK for creating parallelism while counting distinct values from a dataset

I have successfully extracted the count of a specific word from a dataset but, it is taking too much time. I am new to parallel programming.
How can I create parallelism in the following code:
df = dd.read_csv('crime.csv', encoding="ISO-8859-1")
distinct_values = df.YEAR.unique().compute()
counter = len(distinct_values)
values_count = {}
for i in distinct_values:
count = df[df.YEAR == i].YEAR.value_counts().compute()
values_count.update(count)
list = []
for x, y in values_count.items():
dict = {}
for i in x, y:
dict['name'] = x
dict['value'] = y
# print(dict)
list.append(dict)
# print(list)
maximum = max(distinct_values)
mininmum = min(distinct_values)
Maybe you're looking for a groupby aggregation like the following?
df.groupby("YEAR").count.compute()
Or, if you need to do this as many operations, you should at least use the dask.compute function with many inputs rather than call the .compute method many times.

A strategy-proof method of finding the time complexity of complex algorithms?

I have a question in regard to time complexity (big-O) in Python. I want to understand the general method I would need to implement when trying to find the big-O of a complex algorithm. I have understood the reasoning behind calculating the time complexity of simple algorithms, such as a for loop iterating over a list of n elements having a O(n), or having two nested for loops each iterating over 2 lists of n elements each having a big-O of n**2. But, for more complex algorithms that implement multiple if-elif-else statements coupled with for loops, I would want to see if there is a strategy to, simply based on the code, in an iterative fashion, to determine the big-O of my code using simple heuristics (such as, ignoring constant time complexity if statements or always squaring the n upon going over a for loop, or doing something specific when encountering an else statement).
I have created a battleship game, for which I would like to find the time complexity, using such an aforementioned strategy.
from random import randint
class Battle:
def __init__(self):
self.my_grid = [[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False],[False,False,False,False,False,False,False,False,False,False]]
def putting_ship(self,x,y):
breaker = False
while breaker == False:
r1=x
r2=y
element = self.my_grid[r1][r2]
if element == True:
continue
else:
self.my_grid[r1][r2] = True
break
def printing_grid(self):
return self.my_grid
def striking(self,r1,r2):
element = self.my_grid[r1][r2]
if element == True:
print("STRIKE!")
self.my_grid[r1][r2] = False
return True
elif element == False:
print("Miss")
return False
def game():
battle_1 = Battle()
battle_2 = Battle()
score_player1 = 0
score_player2 = 0
turns = 5
counter_ships = 2
while True:
input_x_player_1 = input("give x coordinate for the ship, player 1\n")
input_y_player_1 = input("give y coordinate for the ship, player 1\n")
battle_1.putting_ship(int(input_x_player_1),int(input_y_player_1))
input_x_player_2 = randint(0,9)
input_y_player_2 = randint(0,9)
battle_2.putting_ship(int(input_x_player_2),int(input_y_player_2))
counter_ships -= 1
if counter_ships == 0:
break
while True:
input_x_player_1 = input("give x coordinate for the ship\n")
input_y_player_1 = input("give y coordinate for the ship\n")
my_var = battle_1.striking(int(input_x_player_1),int(input_y_player_1))
if my_var == True:
score_player1 += 1
print(score_player1)
input_x_player_2 = randint(0,9)
input_y_player_2 = randint(0,9)
my_var_2 = battle_2.striking(int(input_x_player_2),int(input_y_player_2))
if my_var_2 == True:
score_player2 += 1
print(score_player2)
counter_ships -= 1
if counter_ships == 0:
break
print("the score for player 1 is",score_player1)
print("the score for player 2 is",score_player2)
print(game())
If it's just nested for loops and if/else statements, you can take the approach ibonyun has suggested - assume all if/else cases are covered and look at the deepest loops (being aware that some operations like sorting, or copying an array, might hide loops of their own.)
However, your code also has while loops. In this particular example it's not too hard to replace them with fors, but for code containing nontrivial whiles there is no general strategy that will always give you the complexity - this is a consequence of the halting problem.
For example:
def collatz(n):
n = int(abs(n))
steps = 0
while n != 1:
if n%2 == 1:
n=3*n+1
else:
n=n//2
steps += 1
print(n)
print("Finished in",steps,"steps!")
So far nobody has been able to prove that this will even finish for all n, let alone shown an upper bound to the run-time.
Side note: instead of the screen-breaking
self.my_grid = [[False,False,...False],[False,False,...,False],...,[False,False,...False]]
consider something like:
grid_size = 10
self.my_grid = [[False for i in range(grid_size)] for j in range(grid_size)]
which is easier to read and check.
Empirical:
You could do some time trials while increasing n (so maybe increasing the board size?) and plot the resulting data. You could tell by the curve/slope of the line what the time complexity is.
Theoretical:
Parse the script and keep track of the biggest O() you find for any given line or function call. Any sorting operations will give you nlogn. A for loop inside a for loop will give you n^2 (assuming their both iterating over the input data), etc. Time complexity is about the broad strokes. O(n) and O(n*3) are both linear time, and that's what really matters. I don't think you need to worry about the minutia of all your if-elif-else logic. Maybe just focus on worst case scenario?

speeding up CVXPY processing speed

I have written some code that uses the cvxpy library to solve an integer programming problem, however the code is taking so much time to run I was wondering if there is any way to make the code faster?
The integer programming problem in this case takes in a matrix of shape (1,569 x 3,071), and it has 3,071 constraints to satisfy. The code is as follows:
mat_f = sys.argv[1]
matIdx2genome_dic_f = sys.argv[2]
genomes_f = sys.argv[3]
with open(matIdx2genome_dic_f, 'r') as in_f:
matIdx2genome_dic = json.load(in_f)
M = np.load(mat_f)
selection = cp.Variable(M.shape[1], boolean = True)
ones_vec = np.ones(M.shape[1])
constraints = []
for i in range(len(M)):
constraints.append(M[i] * selection >= 1)
total_genomes = ones_vec * selection
problem = cp.Problem(cp.Minimize(total_genomes), constraints)
print('solving the integer programming problem: ')
time = time.time()
problem.solve(parallel = True)
print('problem solved in: '+ str(time.time() - time))
solution = selection.value
solution = list(map(round, solution))
solution = np.array(solution)
which_genomes = np.where(solution == 1.0)[0]
with open(genomes_f, 'w') as out_f:
for idx in which_genomes:
out_f.write(matIdx2genome_dic[idx]+'\n')
The first command line argument is what's important here, it's a numpy binary matrix that is of shape (1569, 3071).
The problem here is to minimize the number of columns of the matrix needed such that each and every row has at least a 1 in it's rows.
My question is how can I write this script so that it can run faster? is there a way to parallelize it? I have set the parallel parameter to True in the solve method by I don't think it's doing much since I'm monitoring the cpu utilization and it's only 100%, so I don't think that parallel option is doing much?
Or is there another way (solver that I should call maybe) that would solve this faster?

Runner technique to combine two equal Linked Lists of general size

2
So, I am facing a doubt here.
I was reading the book Cracking the coding Interview. The following text is written over there.
Suppose you had a linked list a1->a2....->an->b1->b2....bn, and you want to rearrange it into a1->b1->a2->b2->.....an->bn. You don't know the length of the linked list but all you know is that it is an even number.
(Here both the linked lists are of the same length)
You could have one pointer p1 (fast pointer) move every two elements for every one move that p2 makes. When p1 hits the end of the linked list, p2 will be at the endpoint. Then, move p1 back to the front and begin "weaving" the elements. On each iteration, p2 selects an element and inserts it after p1.
I don't understand how when p1 hits the end of the linked list, p2 will be at the midpoint. This is how I am imagining it if n=3 (length = 6). Each step below represents an iteration.
I have tried with linked listed consisting of 4 elements and was successful in achieving the result. However, I can't solve for the general case, because my pointers get dangly. Would it be possible to provide code for the problem in python, I am stuck. And this is my code:
def runner_technique_ex(self, head):
"""
Assume the length of the ll that we will run thru will be even
:param head:
:return:
"""
slow = head
fast = head.next
while fast.next is not None:
slow = slow.next
fast = fast.next.next
fast = head
slow = slow.next
while slow.next is not None:
tempSlow = slow
tempFast = fast.next
fast.next = tempSlow
slow = slow.next
tempSlow.next = tempFast
tempFast.next = slow
Figured out after some struggle
def runner_technique_ex(self, head):
"""
Assume the length of the ll that we will run thru will be even
:param head:
:return:
"""
slow = head
fast = head.next
while fast.next is not None:
slow = slow.next
fast = fast.next.next
fast = head
slow = slow.next
newHead = Node(fast.data)
newHeadExtraPointer = newHead
newHead.next = Node(slow.data)
newHead = newHead.next
while slow.next is not None:
fast = fast.next
slow = slow.next
fastNextNode = Node(fast.data)
slowNextNode = Node(slow.data)
fastNextNode.next = slowNextNode
newHead.next = fastNextNode
newHead = fastNextNode.next
return newHeadExtraPointer

Resources