I want to create an SPC chart that will detect data points that are out of specification limits using python.
I have a data set that contains column [XX] which is the one that I'd like to test and DateTime type data as an index.
I have already come up with an idea of how to detect points that are out of spec, and points that more than k points in a row are out of spec limit. On the other hand, I assume that there has to be a better way to achieve the same outcome. Below you find my code.
`# first part to detect points that are out of spec
import plotly.graph_objects as go
# creat a upper and lower spec limit (that are used to plot a line on an SPC chart)
df['USL_MarginesG'] = 5.5
df['LSL_MarginesG'] = 3
# create empty list to contain data points that are out of spec
occ_trace_x = []
occ_trace_y =[]
# for all elements in df['XX'] I look for elements that are out of spec and append them to the created list
for y in range(len(df['XX'])):
if df['XX'].iloc[y] > 5.5 or df['XX'].iloc[y] < 3:
occ_trace_x.append(df.index[y])
occ_trace_y.append(df['XX'].iloc[y])`
The second part of the code (this part detects k points in a row that are out of spec):
`# create containers for detected data points
list_k = []
list_index = []
# input for user to write a number for k points to detect
k = int(input("Put a number"))
# for data points in df['XX'] test if a slice from [x:x+k+1] is greather/lower that the spec.
for x in range (len(df['XX'])):
if (all(df['XX'].iloc[x:x+k+1] > 5.5) or all(df['XX'].iloc[x:x+k+1] < 3)):
if True:
# take a slice from df and convert it to a list with the aim to append the lists to created containers.
s = df['XX'].iloc[x:x+k+1].to_list()
s_index = df.index[x:x+k+1].to_list()
list_k.append(s)
list_index.append(s_index)`
The next step is to unpack the nested list:
c = []
for x in list_k:
c = c + x
v = []
for b in list_index:
v = v + b`
Last step is to plot data set on a chart:
fig = go.Figure()
fig.add_trace(go.Scatter(x=df.index, y=df['XX'],
mode='lines',
name='Margines_Górny'))
fig.add_trace(go.Scatter(
x= occ_trace_x,
y= occ_trace_y,
name= "Out of Control",
mode= "markers",
marker= dict(color="rgba(210, 77, 87, 0.7)", symbol="square", size=4)))
fig.add_trace(go.Scatter(
x = v,
y = c,
name = f'{k}' + ' parameters in a row are out of control',
mode = "markers",
marker= dict(color="yellow", symbol="square", size=4)))
fig.show()`
As a result, I got a plot with data:
blue line describes the data set
red squares detect data points out of spec
yellow squares detect k data points in a row that are out of spec k
= 2
I am looking for optimization of the code (some way that I might achieve the same results in the faster way)
I have a nested loop that has to loop through a huge amount of data.
Assuming a data frame with random values with a size of 1000,000 rows each has an X,Y location in 2D space. There is a window of 10 length that go through all the 1M data rows one by one till all the calculations are done.
Explaining what the code is supposed to do:
Each row represents a coordinates in X-Y plane.
r_test is containing the diameters of different circles of investigations in our 2D plane (X-Y plane).
For each 10 points/rows, for every single diameter in r_test, we compare the distance between every point with the remaining 9 points and if the value is less than R we add 2 to H. Then we calculate H/(N**5) and store it in c_10 with the index corresponding to that of the diameter of investigation.
For this first 10 points finally when the loop went through all those diameters in r_test, we read the slope of the fitted line and save it to S_wind[ii]. So the first 9 data points will have no value calculated for them thus giving them np.inf to be distinguished later.
Then the window moves one point down the rows and repeat this process till S_wind is completed.
What's a potentially better algorithm to solve this than the one I'm using? in python 3.x?
Many thanks in advance!
import numpy as np
import pandas as pd
####generating input data frame
df = pd.DataFrame(data = np.random.randint(2000, 6000, (1000000, 2)))
df.columns= ['X','Y']
####====creating upper and lower bound for the diameter of the investigation circles
x_range =max(df['X']) - min(df['X'])
y_range = max(df['Y']) - min(df['Y'])
R = max(x_range,y_range)/20
d = 2
N = 10 #### Number of points in each window
#r1 = 2*R*(1/N)**(1/d)
#r2 = (R)/(1+d)
#r_test = np.arange(r1, r2, 0.05)
##===avoiding generation of empty r_test
r1 = 80
r2= 800
r_test = np.arange(r1, r2, 5)
S_wind = np.zeros(len(df['X'])) + np.inf
for ii in range (10,len(df['X'])): #### maybe the code run slower because of using len() function instead of a number
c_10 = np.zeros(len(r_test)) +np.inf
H = 0
C = 0
N = 10 ##### maybe I should also remove this
for ind in range(len(r_test)):
for i in range (ii-10,ii):
for j in range(ii-10,ii):
dd = r_test[ind] - np.sqrt((df['X'][i] - df['X'][j])**2+ (df['Y'][i] - df['Y'][j])**2)
if dd > 0:
H += 1
c_10[ind] = (H/(N**2))
S_wind[ii] = np.polyfit(np.log10(r_test), np.log10(c_10), 1)[0]
You can use numpy broadcasting to eliminate all of the inner loops. I'm not sure if there's an easy way to get rid of the outermost loop, but the others are not too hard to avoid.
The inner loops are comparing ten 2D points against each other in pairs. That's just dying for using a 10x10x2 numpy array:
# replacing the `for ind` loop and its contents:
points = np.hstack((np.asarray(df['X'])[ii-10:ii, None], np.asarray(df['Y'])[ii-10:ii, None]))
differences = np.subtract(points[None, :, :], points[:, None, :]) # broadcast to 10x10x2
squared_distances = (differences * differences).sum(axis=2)
within_range = squared_distances[None,:,:] < (r_test*r_test)[:, None, None] # compare squares
c_10 = within_range.sum(axis=(1,2)).cumsum() * 2 / (N**2)
S_wind[ii] = np.polyfit(np.log10(r_test), np.log10(c_10), 1)[0] # this is unchanged...
I'm not very pandas savvy, so there's probably a better way to get the X and Y values into a single 2-dimensional numpy array. You generated the random data in the format that I'd find most useful, then converted into something less immediately useful for numeric operations!
Note that this code matches the output of your loop code. I'm not sure that's actually doing what you want it to do, as there are several slightly strange things in your current code. For example, you may not want the cumsum in my code, which corresponds to only re-initializing H to zero in the outermost loop. If you don't want the matches for smaller values of r_test to be counted again for the larger values, you can skip that sum (or equivalently, move the H = 0 line to in between the for ind and the for i loops in your original code).
I’m trying to solve the minimum graph coloring problem. I’m trying to solve it as an mip using cvxpy. I’m following the outline of a solution described in this url:
https://manas.tech/blog/2010/09/16/modelling-graph-coloring-with-integer-linear-programming.html
I’m not sure if I’m understanding how the cvxpy variables are created correctly, and how I’m defining my constraints. I have sample input data below along with the code creating the variables, constraints, and objective function, solving the problem and the solution returned.
I think the correct answer for this input should be:
‘2 1\n0 1 0 0’
That is that the minimum number colors required is 2 and all nodes are the same color except node 1.
I’m creating the w variable to count the number of colors used:
w = cvxpy.Variable(j, boolean=True)
What I think I am doing is creating a binary variable of length equal to the number of nodes. The idea being that the maximum number of colors you could use would be equal to the number of nodes. So maximum colors:
w=[1,1,1,1]
I’m picturing w as binary variable like a list where the values can be 0 or 1 indicating if that color is used by any of the nodes.
Then to create the objective function:
obj=cvxpy.sum(w,axis=0)
I think I’m summing the entries in the row which are 1, so for example:
w=[1,1,0,0]
obj=2
I also create a variable x to indicate the color of a given node:
x = cvxpy.Variable((j,int(first_line[0])), boolean=True)
I’m picturing this as a 2 dimensional array with binary values, where the column indicates the node and the row indicates the color.
So for example if node 0 had color 0, node 1 had color 1, node 2 had color 2, and node 3 had color 2, I would imagine x to look like:
[[1,0,0,0],[0,1,0,0],[0,0,1,1],[0,0,0,0]]
Can someone please tell me if I’m understanding and creating my selection variables correctly? Also do I understand and have I created my objective function correctly? That is does the way I’ve described my objective function match the way I’ve created it? And any input on the other constraints I’ve defined or my code would be greatly appreciated. I’m learning linear programing and I’m trying to understand cvxpy syntax.
Sample Data:
input_data
'4 3\n0 1\n1 2\n1 3\n'
# parse the input
lines = input_data.split('\n')
first_line = lines[0].split()
node_count = int(first_line[0])
edge_count = int(first_line[1])
edges = []
for i in range(1, edge_count + 1):
line = lines[i]
parts = line.split()
edges.append((int(parts[0]), int(parts[1])))
edges
# Output:
[(0, 1), (1, 2), (1, 3)]
# solution using cvxpy solver
import numpy as np
import cvxpy
from collections import namedtuple
# selection variables
# binary variable if at least one node is color j
j=int(first_line[0])
# w=1 if at least one node has color j
w = cvxpy.Variable(j, boolean=True)
# x=1 if node i is color j
x = cvxpy.Variable((j,int(first_line[0])), boolean=True)
# Objective function
# minimize number of colors needed
obj=cvxpy.sum(w,axis=0)
# constraints
# 1 color per node
node_color=cvxpy.sum(x,axis=1)==1
# for adjacent nodes at most 1 node has color
diff_col = []
for edge in edges:
for k in range(node_count):
diff_col += [
# x[edge[0],k]+x[edge[1],k]<=1
x[k,edge[0]]+x[k,edge[1]]<=1
]
# w is upper bound for color of node x<=w
upper_bound = []
for i in range(j):
for k in range(j):
upper_bound += [
x[k,i]<=w[i]
]
# constraints
constraints=[node_color]+diff_col+upper_bound
# solving problem
# cvxpy must be passed as a list
graph_problem = cvxpy.Problem(cvxpy.Minimize(obj), constraints)
# Solving the problem
graph_problem.solve(solver=cvxpy.GLPK_MI)
value2=int(graph_problem.solve(solver=cvxpy.GLPK_MI))
# taken2=[int(i) for i in selection.value.tolist()]
# taken2=[int(i) for i in w.value.tolist()]
taken2=[int(i) for i in w.value.tolist()]
# prepare the solution in the specified output format
output_data2 = str(value2) + ' ' + str(0) + '\n'
output_data2 += ' '.join(map(str, taken2))
output_data2
'1 0\n0 0 0 1'
Your solution is almost correct. The main problem here is the definition of variable x. According to the blog post
x_{ij} variables that will be true if and only if node i is assigned color j
which indicates that the size of x is (nb of nodes, nb of colors).
In your code you need to change x to:
x = cvxpy.Variable((node_count, j), boolean=True)
and then, consequently, the second and third constraints:
# for adjacent nodes at most 1 node has color
diff_col = []
for edge in edges:
for k in range(j):
diff_col += [
x[edge[0],k]+x[edge[1],k]<=1
]
# w is upper bound for color of node x<=w
upper_bound = []
for i in range(node_count):
for k in range(j):
upper_bound += [
x[i,k]<=w[k]
]
Then the output is as expected i.e. 2 colors are used: one color for node 1 and another color for nodes 0, 2, 3 (because they are not adjacent).
I am having a bit of a problem with an algorithm that I am currently using. I wanted it to make a boundary.
Here is an example of the current behavior:
Here is an MSPaint example of wanted behavior:
Current code of Convex Hull in C#:https://hastebin.com/dudejesuja.cs
So here are my questions:
1) Is this even possible?
R: Yes
2) Is this even called Convex Hull? (I don't think so)
R: Nope it is called boundary, link: https://www.mathworks.com/help/matlab/ref/boundary.html
3) Will this be less performance friendly than a conventional convex hull?
R: Well as far as I researched it should be the same performance
4) Example of this algorithm in pseudo code or something similar?
R: Not answered yet or I didn't find a solution yet
Here is some Python code that computes the alpha-shape (concave hull) and keeps only the outer boundary. This is probably what matlab's boundary does inside.
from scipy.spatial import Delaunay
import numpy as np
def alpha_shape(points, alpha, only_outer=True):
"""
Compute the alpha shape (concave hull) of a set of points.
:param points: np.array of shape (n,2) points.
:param alpha: alpha value.
:param only_outer: boolean value to specify if we keep only the outer border
or also inner edges.
:return: set of (i,j) pairs representing edges of the alpha-shape. (i,j) are
the indices in the points array.
"""
assert points.shape[0] > 3, "Need at least four points"
def add_edge(edges, i, j):
"""
Add an edge between the i-th and j-th points,
if not in the list already
"""
if (i, j) in edges or (j, i) in edges:
# already added
assert (j, i) in edges, "Can't go twice over same directed edge right?"
if only_outer:
# if both neighboring triangles are in shape, it's not a boundary edge
edges.remove((j, i))
return
edges.add((i, j))
tri = Delaunay(points)
edges = set()
# Loop over triangles:
# ia, ib, ic = indices of corner points of the triangle
for ia, ib, ic in tri.vertices:
pa = points[ia]
pb = points[ib]
pc = points[ic]
# Computing radius of triangle circumcircle
# www.mathalino.com/reviewer/derivation-of-formulas/derivation-of-formula-for-radius-of-circumcircle
a = np.sqrt((pa[0] - pb[0]) ** 2 + (pa[1] - pb[1]) ** 2)
b = np.sqrt((pb[0] - pc[0]) ** 2 + (pb[1] - pc[1]) ** 2)
c = np.sqrt((pc[0] - pa[0]) ** 2 + (pc[1] - pa[1]) ** 2)
s = (a + b + c) / 2.0
area = np.sqrt(s * (s - a) * (s - b) * (s - c))
circum_r = a * b * c / (4.0 * area)
if circum_r < alpha:
add_edge(edges, ia, ib)
add_edge(edges, ib, ic)
add_edge(edges, ic, ia)
return edges
If you run it with the following test code you will get this figure, which looks like what you need:
from matplotlib.pyplot import *
# Constructing the input point data
np.random.seed(0)
x = 3.0 * np.random.rand(2000)
y = 2.0 * np.random.rand(2000) - 1.0
inside = ((x ** 2 + y ** 2 > 1.0) & ((x - 3) ** 2 + y ** 2 > 1.0)
points = np.vstack([x[inside], y[inside]]).T
# Computing the alpha shape
edges = alpha_shape(points, alpha=0.25, only_outer=True)
# Plotting the output
figure()
axis('equal')
plot(points[:, 0], points[:, 1], '.')
for i, j in edges:
plot(points[[i, j], 0], points[[i, j], 1])
show()
EDIT: Following a request in a comment, here is some code that "stitches" the output edge set into sequences of consecutive edges.
def find_edges_with(i, edge_set):
i_first = [j for (x,j) in edge_set if x==i]
i_second = [j for (j,x) in edge_set if x==i]
return i_first,i_second
def stitch_boundaries(edges):
edge_set = edges.copy()
boundary_lst = []
while len(edge_set) > 0:
boundary = []
edge0 = edge_set.pop()
boundary.append(edge0)
last_edge = edge0
while len(edge_set) > 0:
i,j = last_edge
j_first, j_second = find_edges_with(j, edge_set)
if j_first:
edge_set.remove((j, j_first[0]))
edge_with_j = (j, j_first[0])
boundary.append(edge_with_j)
last_edge = edge_with_j
elif j_second:
edge_set.remove((j_second[0], j))
edge_with_j = (j, j_second[0]) # flip edge rep
boundary.append(edge_with_j)
last_edge = edge_with_j
if edge0[0] == last_edge[1]:
break
boundary_lst.append(boundary)
return boundary_lst
You can then go over the list of boundary lists and append the points corresponding to the first index in each edge to get a boundary polygon.
I would use a different approach to solve this problem. Since we are working with a 2-D set of points, it is straightforward to compute the bounding rectangle of the points’ region. Then I would divide this rectangle into “cells” by horizontal and vertical lines, and for each cell simply count the number of pixels located within its bounds. Since each cell can have only 4 adjacent cells (adjacent by cell sides), then the boundary cells would be the ones that have at least one empty adjacent cell or have a cell side located at the bounding rectangle boundary. Then the boundary would be constructed along boundary cell sides. The boundary would look like a “staircase”, but choosing a smaller cell size would improve the result. As a matter of fact, the cell size should be determined experimentally; it could not be too small, otherwise inside the region may appear empty cells. An average distance between the points could be used as a lower boundary of the cell size.
Consider using an Alpha Shape, sometimes called a Concave Hull. https://en.wikipedia.org/wiki/Alpha_shape
It can be built from the Delaunay triangulation, in time O(N log N).
As pointed out by most previous experts, this might not be a convex hull but a concave hull, or an Alpha Shape in other words. Iddo provides a clean Python code to acquire this shape. However, you can also directly utilize some existing packages to realize that, perhaps with a faster speed and less computational memory if you are working with a large number of point clouds.
[1] Alpha Shape Toolbox: a toolbox for generating n-dimensional alpha shapes.
https://plotly.com/python/v3/alpha-shapes/
[2] Plotly: It can can generate a Mesh3d object, that depending on a key-value can be the convex hull of that set, its Delaunay triangulation, or an alpha set.
https://plotly.com/python/v3/alpha-shapes/
Here is the JavaScript code that builds concave hull: https://github.com/AndriiHeonia/hull Probably you can port it to C#.
One idea is creating triangles, a mesh, using the point cloud, perhaps through Delanuay triangulation,
and filling those triangles with a color then run level set, or active contour segmentation which will find the outer boundary of the shape whose color is now different then the outside "background" color.
https://xphilipp.developpez.com/contribuez/SnakeAnimation.gif
The animation above did not go all the way but many such algorithms can be configured to do that.
Note: The triangulation alg has to be tuned so that it doesn't merely create a convex hull - for example removing triangles with too large angles and sides from the delanuay result. A prelim code could look like
from scipy.spatial import Delaunay
points = np.array([[13.43, 12.89], [14.44, 13.86], [13.67, 15.87], [13.39, 14.95],\
[12.66, 13.86], [10.93, 14.24], [11.69, 15.16], [13.06, 16.24], [11.29, 16.35],\
[10.28, 17.33], [10.12, 15.49], [9.03, 13.76], [10.12, 14.08], [9.07, 15.87], \
[9.6, 16.68], [7.18, 16.19], [7.62, 14.95], [8.39, 16.79], [8.59, 14.51], \
[8.1, 13.43], [6.57, 11.59], [7.66, 11.97], [6.94, 13.86], [6.53, 14.84], \
[5.48, 12.84], [6.57, 12.56], [5.6, 11.27], [6.29, 10.08], [7.46, 10.45], \
[7.78, 7.21], [7.34, 8.72], [6.53, 8.29], [5.85, 8.83], [5.56, 10.24], [5.32, 7.8], \
[5.08, 9.86], [6.01, 5.75], [6.41, 7.48], [8.19, 5.69], [8.23, 4.72], [6.85, 6.34], \
[7.02, 4.07], [9.4, 3.2], [9.31, 4.99], [7.86, 3.15], [10.73, 2.82], [10.32, 4.88], \
[9.72, 1.58], [11.85, 5.15], [12.46, 3.47], [12.18, 1.58], [11.49, 3.69], \
[13.1, 4.99], [13.63, 2.61]])
tri = Delaunay(points,furthest_site=False)
res = []
for t in tri.simplices:
A,B,C = points[t[0]],points[t[1]],points[t[2]]
e1 = B-A; e2 = C-A
num = np.dot(e1, e2)
n1 = np.linalg.norm(e1); n2 = np.linalg.norm(e2)
denom = n1 * n2
d1 = np.rad2deg(np.arccos(num/denom))
e1 = C-B; e2 = A-B
num = np.dot(e1, e2)
denom = np.linalg.norm(e1) * np.linalg.norm(e2)
d2 = np.rad2deg(np.arccos(num/denom))
d3 = 180-d1-d2
res.append([n1,n2,d1,d2,d3])
res = np.array(res)
m = res[:,[0,1]].mean()*res[:,[0,1]].std()
mask = np.any(res[:,[2,3,4]] > 110) & (res[:,0] < m) & (res[:,1] < m )
plt.triplot(points[:,0], points[:,1], tri.simplices[mask])
Then fill with color and segment.
I'm writing a program which randomly chooses two integers within a certain interval. I also wrote a class (which I didn't add below) which uses two numbers 'a' and 'b' and creates an elliptical curve of the form:
y^2 = x^3 + ax + b
I've written the following to create the two random numbers.
def numbers():
n = 1
while n>0:
a = random.randint(-100,100)
b = random.randint(-100,100)
if -16 * (4 * a ** 3 + 27 * b ** 2) != 0:
result = [a,b]
return result
n = n+1
Now I would like to generate a random point on this elliptical curve. How do I do that?
The curve has an infinite length, as for every y ϵ ℝ there is at least one x ϵ ℝ so that (x, y) is on the curve. So if we speak of a random point on the curve we cannot hope to have a homogeneous distribution of the random point over the whole curve.
But if that is not important, you could take a random value for y within some range, and then calculate the roots of the following function:
f(x) = x3 + ax + b - y2
This will result in three roots, of which possibly two are complex (not real numbers). You can take a random real root from that. This will be the x coordinate for the random point.
With the help of numpy, getting the roots is easy, so this is the function for getting a random point on the curve, given values for a and b:
def randomPoint(a, b):
y = random.randint(-100,100)
# Get roots of: f(x) = x^3 + ax + b - y^2
roots = numpy.roots([1, 0, a, b - y**2])
# 3 roots are returned, but ignore potential complex roots
# At least one will be real
roots = [val.real for val in roots if val.imag == 0]
# Choose a random root among those real root(s)
x = random.choice(roots)
return [x, y]
See it run on repl.it.