The question is: how to use two np.where in the same statement, like this (oversimplified):
np.where((ndarr1==ndarr2),np.where((ndarr1+ndarr2==ndarr3),True,False),False)
To avoid computing second conditional statement if the first is not reached.
My first objective is to find the intersection of a ray in a triangle, if there is one. This problem can be solved by this algorithm (found on stackoverflow):
def intersect_line_triangle(q1,q2,p1,p2,p3):
def signed_tetra_volume(a,b,c,d):
return np.sign(np.dot(np.cross(b-a,c-a),d-a)/6.0)
s1 = signed_tetra_volume(q1,p1,p2,p3)
s2 = signed_tetra_volume(q2,p1,p2,p3)
if s1 != s2:
s3 = signed_tetra_volume(q1,q2,p1,p2)
s4 = signed_tetra_volume(q1,q2,p2,p3)
s5 = signed_tetra_volume(q1,q2,p3,p1)
if s3 == s4 and s4 == s5:
n = np.cross(p2-p1,p3-p1)
t = np.dot(p1-q1,n) / np.dot(q2-q1,n)
return q1 + t * (q2-q1)
return None
Here are two conditional statements:
s1!=s2
s3==s4 & s4==s5
Now since I have >20k triangles to check, I want to apply this function on all triangles at the same time.
First solution is:
s1 = vol(r0,tri[:,0,:],tri[:,1,:],tri[:,2,:])
s2 = vol(r1,tri[:,0,:],tri[:,1,:],tri[:,2,:])
s3 = vol(r1,r2,tri[:,0,:],tri[:,1,:])
s4 = vol(r1,r2,tri[:,1,:],tri[:,2,:])
s5 = vol(r1,r2,tri[:,2,:],tri[:,0,:])
np.where((s1!=s2) & (s3+s4==s4+s5),intersect(),False)
where s1,s2,s3,s4,s5 are arrays containing the value S for each triangle. Problem is, it means I have to compute s3,s4,and s5 for all triangles.
Now the ideal would be to compute statement 2 (and s3,s4,s5) only when statement 1 is True, with something like this:
check= np.where((s1!=s2),np.where((compute(s3)==compute(s4)) & (compute(s4)==compute(s5), compute(intersection),False),False)
(to simplify explanation, I just stated 'compute' instead of the whole computing process. Here, 'compute' is does only on the appropriate triangles).
Now of course this option doesn't work (and computes s4 two times), but I'd gladly have some recommendations on a similar process
Here's how I used masked arrays to answer this problem:
loTrue= np.where((s1!=s2),False,True)
s3=ma.masked_array(np.sign(dot(np.cross(r0r1, r0t0), r0t1)),mask=loTrue)
s4=ma.masked_array(np.sign(dot(np.cross(r0r1, r0t1), r0t2)),mask=loTrue)
s5=ma.masked_array(np.sign(dot(np.cross(r0r1, r0t2), r0t0)),mask=loTrue)
loTrue= ma.masked_array(np.where((abs(s3-s4)<1e-4) & ( abs(s5-s4)<1e-4),True,False),mask=loTrue)
#also works when computing s3,s4 and s5 inside loTrue, like this:
loTrue= np.where((s1!=s2),False,True)
loTrue= ma.masked_array(np.where(
(abs(np.sign(dot(np.cross(r0r1, r0t0), r0t1))-np.sign(dot(np.cross(r0r1, r0t1), r0t2)))<1e-4) &
(abs(np.sign(dot(np.cross(r0r1, r0t2), r0t0))-np.sign(dot(np.cross(r0r1, r0t1), r0t2)))<1e-4),True,False)
,mask=loTrue)
Note that the same process, when not using such approach, is done like this:
s3= np.sign(dot(np.cross(r0r1, r0t0), r0t1) /6.0)
s4= np.sign(dot(np.cross(r0r1, r0t1), r0t2) /6.0)
s5= np.sign(dot(np.cross(r0r1, r0t2), r0t0) /6.0)
loTrue= np.where((s1!=s2) & (abs(s3-s4)<1e-4) & ( abs(s5-s4)<1e-4) ,True,False)
Both give the same results, however, when looping on this process only for 10k iterations, NOT using masked arrays is faster! (26 secs without masked arrays, 31 secs with masked arrays, 33 when using masked arrays in one line only (not computing s3,s4 and s5 separately, or computing s4 before).
Conclusion: using nested arrays is solved here (note that the mask indicates where it won't be computed, hence first loTri must bet set to False (0) when condition is verified). However, in that scenario, it's not faster.
I can get a small speedup from short circuiting but I'm not convinced it is worth the additional admin.
full computation 4.463818839867599 ms per iteration (one ray, 20,000 triangles)
short ciruciting 3.0060838296776637 ms per iteration (one ray, 20,000 triangles)
Code:
import numpy as np
def ilt_cut(q1,q2,p1,p2,p3):
qm = (q1+q2)/2
qd = qm-q2
p12 = p1-p2
aux = np.cross(qd,q2-p2)
s3 = np.einsum("ij,ij->i",aux,p12)
s4 = np.einsum("ij,ij->i",aux,p2-p3)
ge = (s3>=0)&(s4>=0)
le = (s3<=0)&(s4<=0)
keep = np.flatnonzero(ge|le)
aux = p1[keep]
qpm1 = qm-aux
p31 = p3[keep]-aux
s5 = np.einsum("ij,ij->i",np.cross(qpm1,p31),qd)
ge = ge[keep]&(s5>=0)
le = le[keep]&(s5<=0)
flt = np.flatnonzero(ge|le)
keep = keep[flt]
n = np.cross(p31[flt], p12[keep])
s12 = np.einsum("ij,ij->i",n,qpm1[flt])
flt = np.abs(s12) <= np.abs(s3[keep]+s4[keep]+s5[flt])
return keep[flt],qm-(s12[flt]/np.einsum("ij,ij->i",qd,n[flt]))[:,None]*qd
def ilt_full(q1,q2,p1,p2,p3):
qm = (q1+q2)/2
qd = qm-q2
p12 = p1-p2
qpm1 = qm-p1
p31 = p3-p1
aux = np.cross(qd,q2-p2)
s3 = np.einsum("ij,ij->i",aux,p12)
s4 = np.einsum("ij,ij->i",aux,p2-p3)
s5 = np.einsum("ij,ij->i",np.cross(qpm1,p31),qd)
n = np.cross(p31, p12)
s12 = np.einsum("ij,ij->i",n,qpm1)
ge = (s3>=0)&(s4>=0)&(s5>=0)
le = (s3<=0)&(s4<=0)&(s5<=0)
keep = np.flatnonzero((np.abs(s12) <= np.abs(s3+s4+s5)) & (ge|le))
return keep,qm-(s12[keep]/np.einsum("ij,ij->i",qd,n[keep]))[:,None]*qd
tri = np.random.uniform(1, 10, (20_000, 3, 3))
p0, p1 = np.random.uniform(1, 10, (2, 3))
from timeit import timeit
A,B,C = tri.transpose(1,0,2)
print('full computation', timeit(lambda: ilt_full(p0[None], p1[None], A, B, C), number=100)*10, 'ms per iteration (one ray, 20,000 triangles)')
print('short ciruciting', timeit(lambda: ilt_cut(p0[None], p1[None], A, B, C), number=100)*10, 'ms per iteration (one ray, 20,000 triangles)')
Note that I played a bit with the algorithm, so this may not in every edge case give the same result aas yours.
What I changed:
I inlined the tetra volume, which allows to save a few repeated subcomputations
I replace one of the ray ends with the midpoint M of the ray. This saves computing one tetra volume (s1 or s2) because one can check whether the ray crosses the triangle ABC plane by comparing the volume of tetra ABCM to the sum of s3, s4, s5 (if they have the same signs).
Related
I tried the following code to find the range of a dataframe not within the range of another dataframe. However, it takes more than a day to compute the large files because, in the last 2 for-loops, it's comparing each row. Each of my 24 dataframes has around 10^8 rows. Is there any efficient alternative to the following approach?
Please refer to this thread for a better understanding of my I/O: Return the range of a dataframe not within a range of another dataframe
My approach:
I created the tuple pairs from the (df1['first.start'], df1['first.end']) and (df2['first.start'], df2['first.end']) initially in order to apply the range() function. After that, I put a condition whether df1_ranges are in the ranges of df2_ranges or not. Here the edge case was df1['first.start'] = df1['first.end']. I collected the filtered indices from the iterations and then passed into the df1.
df2_lst=[]
for i,j in zip(temp_df2['first.start'], temp_df2['first.end']):
df2_lst.append(i)
df2_lst.append(j)
df1_lst=[]
for i,j in zip(df1['first.start'], df1['first.end']):
df1_lst.append(i)
df1_lst.append(j)
def range_subset(range1, range2):
"""Whether range1 is a subset of range2."""
if not range1:
return True # empty range is a subset of anything
if not range2:
return False # non-empty range can't be a subset of empty range
if len(range1) > 1 and range1.step % range2.step:
return False # must have a single value or integer multiple step
return range1.start in range2 and range1[-1] in range2
##### FUNCTION FOR CREATING CHUNKS OF LISTS ####
def chunks(lst, n):
"""Yield successive n-sized chunks from lst."""
for i in range(0, len(lst), n):
yield lst[i],lst[i+1]
df1_lst2 = list(chunks(df1_lst,2))
df2_lst2 = list(chunks(df2_lst,2))
indices=[]
for idx,i in enumerate(df1_lst2): #main list
x,y = i
for j in df2_lst2: #filter list
m,n = j
if((x!=y) & (range_subset(range(x,y), range(m,n)))): #checking if the main list exists in the filter range or not
indices.append(idx) #collecting the filtered indices
df1.iloc[indices]
If n and m are the number of rows in df1 and df2, any algorithm needs to make at least n * m comparision to check every range in df1 against every range in df2, The problem with your code as posted is that (a) it has too may intermediate steps and (b) it uses slow Python loops. If you switch to numpy broadcast, which uses highly optimized C loop under the hood, it will be a lot faster.
The downside with numpy broadcast is memory: it will create a comparison matrix of n * m bytes and the size of your problem may run your computer out of memory. We can mitigate that by chunking df1 to trade performance for lower memory usage.
# Sample data
def random_dataframe(size):
a = np.random.randint(1, 100, 2*size).cumsum()
return pd.DataFrame({
'first.start': a[::2],
'first.end': a[1::2]
})
n, m = 10_000_000, 1000
np.random.seed(42)
df1 = random_dataframe(n)
df2 = random_dataframe(m)
# ---------------------------
# Prepare the Start and End time of df2 for comparison
# [:, None] raise the array by one dimension, which is necessary
# for array broadcasting
s2 = df2['first.start'].to_numpy()[:, None]
e2 = df2['first.end'].to_numpy()[:, None]
# A chunk_size that is too small or too big will lower performance.
# Experiment to find a sweet spot
chunk_size = 100_000
offset = 0
mask = []
while offset < len(df1):
s1 = df1['first.start'].to_numpy()[offset:offset+chunk_size]
e1 = df1['first.end'].to_numpy()[offset:offset+chunk_size]
mask.append(
((s2 <= s1) & (s1 <= e2) & (s2 <= e1) & (e1 <= e2)).any(axis=0)
)
offset += chunk_size
mask = np.hstack(mask)
# ---------------------------
# If memory is not a concern, use the following code. However, this
# may run slower than the chunking approach due to increased size of
# the array broadcasting operation. Profile your code to find out.
s2 = df2['first.start'].to_numpy()[:, None]
e2 = df2['first.end'].to_numpy()[:, None]
s1 = df1['first.start'].to_numpy()
e1 = df1['first.end'].to_numpy()
mask = ((s2 <= s1) & (s1 <= e2) & (s2 <= e1) & (e1 <= e2)).any(axis=0)
The chunking code took 30s on my computer. To access the result:
df1[mask] # ranges in df1 that are completely surrounded by a range in df2
df1[~mask] # ranges in df1 that are NOT completely surrounded by any range in df2
By tweaking the comparison, you can check for overlapping ranges too.
I have a nested loop that has to loop through a huge amount of data.
Assuming a data frame with random values with a size of 1000,000 rows each has an X,Y location in 2D space. There is a window of 10 length that go through all the 1M data rows one by one till all the calculations are done.
Explaining what the code is supposed to do:
Each row represents a coordinates in X-Y plane.
r_test is containing the diameters of different circles of investigations in our 2D plane (X-Y plane).
For each 10 points/rows, for every single diameter in r_test, we compare the distance between every point with the remaining 9 points and if the value is less than R we add 2 to H. Then we calculate H/(N**5) and store it in c_10 with the index corresponding to that of the diameter of investigation.
For this first 10 points finally when the loop went through all those diameters in r_test, we read the slope of the fitted line and save it to S_wind[ii]. So the first 9 data points will have no value calculated for them thus giving them np.inf to be distinguished later.
Then the window moves one point down the rows and repeat this process till S_wind is completed.
What's a potentially better algorithm to solve this than the one I'm using? in python 3.x?
Many thanks in advance!
import numpy as np
import pandas as pd
####generating input data frame
df = pd.DataFrame(data = np.random.randint(2000, 6000, (1000000, 2)))
df.columns= ['X','Y']
####====creating upper and lower bound for the diameter of the investigation circles
x_range =max(df['X']) - min(df['X'])
y_range = max(df['Y']) - min(df['Y'])
R = max(x_range,y_range)/20
d = 2
N = 10 #### Number of points in each window
#r1 = 2*R*(1/N)**(1/d)
#r2 = (R)/(1+d)
#r_test = np.arange(r1, r2, 0.05)
##===avoiding generation of empty r_test
r1 = 80
r2= 800
r_test = np.arange(r1, r2, 5)
S_wind = np.zeros(len(df['X'])) + np.inf
for ii in range (10,len(df['X'])): #### maybe the code run slower because of using len() function instead of a number
c_10 = np.zeros(len(r_test)) +np.inf
H = 0
C = 0
N = 10 ##### maybe I should also remove this
for ind in range(len(r_test)):
for i in range (ii-10,ii):
for j in range(ii-10,ii):
dd = r_test[ind] - np.sqrt((df['X'][i] - df['X'][j])**2+ (df['Y'][i] - df['Y'][j])**2)
if dd > 0:
H += 1
c_10[ind] = (H/(N**2))
S_wind[ii] = np.polyfit(np.log10(r_test), np.log10(c_10), 1)[0]
You can use numpy broadcasting to eliminate all of the inner loops. I'm not sure if there's an easy way to get rid of the outermost loop, but the others are not too hard to avoid.
The inner loops are comparing ten 2D points against each other in pairs. That's just dying for using a 10x10x2 numpy array:
# replacing the `for ind` loop and its contents:
points = np.hstack((np.asarray(df['X'])[ii-10:ii, None], np.asarray(df['Y'])[ii-10:ii, None]))
differences = np.subtract(points[None, :, :], points[:, None, :]) # broadcast to 10x10x2
squared_distances = (differences * differences).sum(axis=2)
within_range = squared_distances[None,:,:] < (r_test*r_test)[:, None, None] # compare squares
c_10 = within_range.sum(axis=(1,2)).cumsum() * 2 / (N**2)
S_wind[ii] = np.polyfit(np.log10(r_test), np.log10(c_10), 1)[0] # this is unchanged...
I'm not very pandas savvy, so there's probably a better way to get the X and Y values into a single 2-dimensional numpy array. You generated the random data in the format that I'd find most useful, then converted into something less immediately useful for numeric operations!
Note that this code matches the output of your loop code. I'm not sure that's actually doing what you want it to do, as there are several slightly strange things in your current code. For example, you may not want the cumsum in my code, which corresponds to only re-initializing H to zero in the outermost loop. If you don't want the matches for smaller values of r_test to be counted again for the larger values, you can skip that sum (or equivalently, move the H = 0 line to in between the for ind and the for i loops in your original code).
In the code supplied below I am trying to iterate over 2D numpy array [i][k]
Originally it is a code which was written in Fortran 77 which is older than my grandfather. I am trying to adapt it to python.
(for people interested whatabouts: it is a simple hydraulics transients event solver)
Bear in mind that all variables are introduced in my code which I don't paste here.
H = np.zeros((NS,50))
Q = np.zeros((NS,50))
Here I am assigning the first row values:
for i in range(NS):
H[0][i] = HR-i*R*Q0**2
Q[0][i] = Q0
CVP = .5*Q0**2/H[N]
T = 0
k = 0
TAU = 1
#Interior points:
HP = np.zeros((NS,50))
QP = np.zeros((NS,50))
while T<=Tmax:
T += dt
k += 1
for i in range(1,N):
CP = H[k][i-1]+Q[k][i-1]*(B-R*abs(Q[k][i-1]))
CM = H[k][i+1]-Q[k][i+1]*(B-R*abs(Q[k][i+1]))
HP[k][i-1] = 0.5*(CP+CM)
QP[k][i-1] = (HP[k][i-1]-CM)/B
#Boundary Conditions:
HP[k][0] = HR
QP[k][0] = Q[k][1]+(HP[k][0]-H[k][1]-R*Q[k][1]*abs(Q[k][1]))/B
if T == Tc:
TAU = 0
CV = 0
else:
TAU = (1.-T/Tc)**Em
CV = CVP*TAU**2
CP = H[k][N-1]+Q[k][N-1]*(B-R*abs(Q[k][N-1]))
QP[k][N] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
HP[k][N] = CP-B*QP[k][N]
for i in range(NS):
H[k][i] = HP[k][i]
Q[k][i] = QP[k][i]
Remember i is for rows and k is for columns
What I am expecting is that for all k number of columns the values should be calculated until T<=Tmax condition is met. I cannot figure out what my mistake is, I am getting the following errors:
RuntimeWarning: divide by zero encountered in true_divide
CVP = .5*Q0**2/H[N]
RuntimeWarning: invalid value encountered in multiply
QP[N][k] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
QP[N][k] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
ValueError: setting an array element with a sequence.
Looking at your first iteration:
H = np.zeros((NS,50))
Q = np.zeros((NS,50))
for i in range(NS):
H[0][i] = HR-i*R*Q0**2
Q[0][i] = Q0
The shape of H is (NS,50), but when you iterate over a range(NS) you apply that index to the 2nd dimension. Why? Shouldn't it apply to the dimension with size NS?
In numpy arrays have 'C' order by default. Last dimension is inner most. They can have a F (fortran) order, but let's not go there. Thinking of the 2d array as a table, we typically talk of rows and columns, though they don't have a formal definition in numpy.
Lets assume you want to set the first column to these values:
for i in range(NS):
H[i, 0] = HR - i*R*Q0**2
Q[i, 0] = Q0
But we can do the assignment whole rows or columns at a time. I believe new versions of Fortran also have these 'whole-array' functions.
Q[:, 0] = Q0
H[:, 0] = HR - np.arange(NS) * R * Q0**2
One point of caution when translating to Python. Indexing starts with 0; so does ranges and np.arange(...).
H[0][i] is functionally the same as H[0,i]. But when using slices you have to use the H[:,i] format.
I suspect your other iterations have similar problems, but I'll stop here for now.
Regarding the errors:
The first:
RuntimeWarning: divide by zero encountered in true_divide
CVP = .5*Q0**2/H[N]
You initialize H as zeros so it is normal that it complains of division by zero. Maybe you should add a conditional.
The third:
QP[N][k] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
ValueError: setting an array element with a sequence.
You define CVP = .5*Q0**2/H[N] and then CV = CVP*TAU**2 which is a sequence. And then you try to assign a derivate form it to QP[N][K] which is an element. You are trying to insert an array to a value.
For the second error I think it might be related to the third. If you could provide more information I would like to try to understand what happens.
Hope this has helped.
I am running a simulation to solve the advection diffusion equation. I wish to parallelize the part of the code where I calculate the partial derivatives so as to speed up my computation. Here is what I am doing:
p1 = np.zeros((len(r), len(th)-1 )) #The solution of the matrix
def func(i):
pti = np.zeros(len(th)-1)
for j in range (len(pti)):
abc = f(p1) #Some function calculating the derivatives at each point
pti[j] = p1[i][j] + dt*( abc ) #dt is some small float number
return pti
#Setting the initial condition of the p1 matrix
for i in range(len(p1[:,0])):
for j in range(len(p1[0])):
p1[i][j] = 0.01
#Final loop calculating the integral by finite difference scheme
p = Pool(args.cores)
for k in range (0,args.iterations): #This is integration in time
p1=p.map(func,range(len(r)))
print (p1)
The problem that I am facing here is that my p1 matrix is not updating after each iteration in k. In the end when I print p1 I get the same matrix that I initialized.
Also, the linear version of this code is working (but it takes too long).
Okay I solved this myself. Apparently putting the line
p = Pool(args.cores)
inside the loop
for k in range (0,args.iterations):
does the trick.
I would like to convert my data in frequency domain into time domain. In this attached excel sheet (book1.xlxs) Column A is Frequency. Column B and C is real and imaginary data (B+jC). Also attached you can see my code. But its not working. I would like to have the my result something shown in figure in time domain (green curve part-1).
[num, data, raw] = xlsread('Book1.xlsx');
ln=length(raw)-1; %find the length of the sequence
xk=zeros(1,ln); %initilise an array of same size as that of input sequence
ixk=zeros(1,ln); %initilise an array of same size as that of input sequence
rx = zeros(1,ln); %real value of fft
ix = zeros(1,ln); %imaginary value of fft
for i= 2:length(raw)
rx(i-1) = cell2mat(raw(i,2));
ix(i-1) = cell2mat(raw(i,3));
xk(i-1) = sqrt(rx(i-1)^2 + ix(i-1)^2);
end
for n=0:ln-1
for k=0:ln-1
ixk(n+1)=ixk(n+1)+(xk(k+1)*exp(i*2*pi*k*n/ln));
end
end
ixk=10*log(ixk./ln);
t=0:ln-1
plot(t, ixk)
In this image this code should give me the result similar to the green curve-part1
Instead of doing the FFT yourself, you could use the built-in Matlab functions to do it - much easier.
A good example from Mathworks is given here. The following is some code I have based myself on. The passed-in parameter f is your time domain trace, and fsampling is your sampling rate. The passed-out parameters freq and finv are your frequency vector and fourier transform, respectively.
function [freq, finv] = FourierTransform(f,fsampling)
% Fast Fourier Transform
fsampling = round(fsampling);
finv = fft(f,fsampling);
finv = finv(1:length(finv)/2+1); % Truncate out only the second half, due to symmetry
finv(2:end - 1) = 2*finv(2:end - 1); % Adjust amplitude to account for truncation
finv = finv./length(f);
freq = 0:fsampling/2;
end