Python loop performance - python-3.x

If I loop through 1 big list like:
range1 = list(range(100))
for i in range1:
print ('i')
Will it be faster than looping through a nested list with the same number of iterables:
range2 = list(range(10))
range3 = list(range(10))
for i in range2:
for ii in range3:
print ('i')
I'm guessing it will be the same, but I'm not sure if iterable number (correct word?) is the only determinant of loop performance. Thought I would check with the coding gods!
Cheers,
Tim

Yes, it is the same thing because of something called "Time Complexity". This is how much the time scales with the given input. It's hard to explain in just a post like this, but it means that you're looping through 100 elements no matter what and it won't change the duration.
I recommend this video where it is explained perfectly: https://www.youtube.com/watch?v=D6xkbGLQesk

Related

Valid Sudoku: How to decrease runtime

Problem is to check whether the given 2D array represents a valid Sudoku or not. Given below are the conditions required
Each row must contain the digits 1-9 without repetition.
Each column must contain the digits 1-9 without repetition.
Each of the 9 3x3 sub-boxes of the grid must contain the digits 1-9 without repetition.
Here is the code I prepared for this, please give me tips on how I can make it faster and reduce runtime and whether by using the dictionary my program is slowing down ?
def isValidSudoku(self, boards: List[List[str]]) -> bool:
r = {}
a = {}
for i in range(len(boards)):
c = {}
for j in range(len(boards[i])):
if boards[i][j] != '.':
x,y = r.get(boards[i][j]+f'{j}',0),c.get(boards[i][j],0)
u,v = (i+3)//3,(j+3)//3
z = a.get(boards[i][j]+f'{u}{v}',0)
if (x==0 and y==0 and z==0):
r[boards[i][j]+f'{j}'] = x+1
c[boards[i][j]] = y+1
a[boards[i][j]+f'{u}{v}'] = z+1
else:
return False
return True
Simply optimizing assignment without rethinking your algorithm limits your overall efficiency by a lot. When you make a choice you generally take a long time before discovering a contradiction.
Instead of representing, "Here are the values that I have figured out", try to represent, "Here are the values that I have left to try in each spot." And now your fundamental operation is, "Eliminate this value from this spot." (Remember, getting it down to 1 propagates to eliminating the value from all of its peers, potentially recursively.)
Assignment is now "Eliminate all values but this one from this spot."
And now your fundamental search operation is, "Find the square with the least number of remaining possibilities > 1. Try each possibility in turn."
This may feel heavyweight. But the immediate propagation of constraints results in very quickly discovering constraints on the rest of the solution, which is far faster than having to do exponential amounts of reasoning before finding the logical contradiction in your partial solution so far.
I recommend doing this yourself. But https://norvig.com/sudoku.html has full working code that you can look at at need.

Find the closest distance between every galaxy in the data and create pairs based on closest distance between them

My task is to pair up galaxies that are closest together from a large list of galaxies. I have the RA, DEC and Z of each, and a formula to work out the distance between each one from the data given. However, I can't work out an efficient method of iterating over the whole list to find the distance between EACH galaxy and EVERY other galaxy in the list, with the intention of then matching each galaxy with its nearest neighbour.
The data has been imported in the following way:
hdulist = fits.open("documents/RADECMASSmatch.fits")
CATAID = data['CATAID_1']
Xpos_DEIMOS_1 = data['Xpos_DEIMOS_1']
z = data['Z_1']
RA = data['RA']
DEC = data['DEC']
I have tried something like:
radiff = []
for i in range(0,n):
for j in range(i+1,n):
radiff.append(abs(RA[i]-RA[j]))
to initially work out difference in RA and DEC between every galaxy, which does actually work but I feel like there must be a better way.
A friend suggested something along the lines of:
galaxy_coords = (data['RA'],data['DEC'],data['Z])
separation_matrix = np.zeros((len(galaxy_coords),len(galaxy_coords))
done = []
for i, coords1 in enumerate(galaxy_coords):
for j, coords2 in enumerate(galaxy_coords):
if (j,i) in done:
separation_matrix[i,j] += separation matrix[j,i]
continue
separation = your_formula(coords1, coords2)
separation_matrix[i,j] += separation
done.append((i,j))
But I don't really understand this so can't readily apply it. I've tried but it yields nothing useful.
Any help with this would be much appreciated, thanks
Your friend's code seems to be generating a 2D array of the distances between each pair, and taking advantage of the symmetry (distance(x,y) = distance(y,x)). It would be slightly better if it used itertools to generate combinations, and assigned your_formula(coords1, coords2) to separation_matrix[i,j] and separation_matrix[j,i] within the same iteration, rather than having separate iterations for both i,j and j,i.
Even better would probably be this package that uses a tree-based algorithm: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html . It seems to be focused on rectilinear coordinates, but that should be addressable in linear time.

How to analyze a Python code's performance?

Regards. To analyze Python code's performance, may the code below does it?
import time
to = time.clock(); x = [];
for i in range(0,4):
x.append(i*0.1);
tend = time.clock(); print(tend-to);
to = time.clock();
y = list(map(lambda x: x*0.1, list(range(0,4))));
tend = time.clock(); print(tend-to);
The timers show inconsistency. But sometimes, the result of the two timers also shows inconsistency (sometimes the first timer is faster, sometimes the second one is, although the first one tends to be faster). Some outputs :
4.631622925399206e-05
4.4898385501326854e-05
4.9624531343562917e-05
6.852911471254275e-05
5.0569760512011734e-05
4.867930217511418e-05
3.78091667379527e-05
2.5993802132341648e-05
My question pertain to the code above :
I thought timer to calculate a code performance should be consistent? How to know that a syntax or a tactic performs better than the other? (run more efficiently than the other) Any thoughts on this?
Thanks before. Regards, Arief
From the official documentation:
>>> import timeit
>>> timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)
0.3018611848820001
>>> timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000)
0.2727368790656328
>>> timeit.timeit('"-".join(map(str, range(100)))', number=10000)
0.23702679807320237
In other words: if we strive to talk about a precision in execution time measurements - we restricted to iterate a simple function several thousands times to elicit all side effects.

Playing consecutive pitches in MATLAB

so I've been struggling with this for a while. I am supposed to make a sequence of tones play with only one soundsc(wave,fs) call, but when I try to put the tone waves in an array, it just plays them at the same time instead of consecutively. For example:
pitch1 = sin(2*pi*freq1*t);
pitch2 = sin(2*pi*freq2*t);
pitch3 = sin(2*pi*freq3*t);
concat_pitch = [pitch1; pitch2; pitch3]; % I want them to play in order, not together
soundsc(concat_pitch, fs); % this just plays them all together
Can anyone help me out? Thanks.
Change your concatenation to form a single row vector:
concat_pitch = [pitch1, pitch2, pitch3];
Or if the concatenation you specified is important and has to stay as is, then you can loop through the rows of the 2-d matrix:
for ind=1:length(concat_pitch)
soundsc(concat_pitch(ind,:), fs);
end

MATLAB: fastest way to do a root-mean-squared error between a vector and array of vectors

I have a question regarding the fastest way to compute the RMSE between a single vector and an array of vectors. Specifically, I have a vector A representing an point and would like to find the index in a list B of points that A is closest to. Right now I am using:
tempmat = bsxfun(#minus,A,B);
tempmat1 = sqrt(sum(tempmat.^2,2);
index = find(tempmat1 == min(tempmat1));
this takes about 0.058 seconds to calculate the index. Is there a faster way in MATLAB of doing this? I performing this calculations literally millions of times.
Many thanks for reading,
Joe
tempmat = bsxfun(#minus,A,B);
tmpmat1 = sum(tempmat.^2,2);
[m,index] = min(tempmat1);
m = sqrt(m); %# optional, only if you need the actual numerical value
This avoids calculating sqrt on the whole array, since the minumum of the squared differences will have the same index. It also uses the second output of min to avoid the second pass of find.
You'll probably find that
tempmat = A - B(ones(1, size(A,1)), :)
is faster than the bsxfun version, unless size(A,1) is exceptionally large.
This assumes that A is your array and B is your vector. The RSS calculation implies that you have row vectors.
Also, I presume you know that you're calculating the RSS not RMS.

Resources