Correlation between independent explanatory variables in regression - statistics

sample_size = 100
x1 = rnorm(sample_size, mean = 0, sd = 0.5)
x2 = rnorm(sample_size, mean = 0, sd = 0.5)
x3 = 0.5*x2+rnorm(sample_size, mean = 0, sd = 0.5)
generate errors:
epsilon = rnorm(sample_size, mean = 0, sd = 0.1)
generate dependent variable:
y = 0+0.1*x1+0.2*x2+0.3*x3+epsilon
Case 1 :x1 and x2 are independent variables.
correlation should come to 0 as we increase sample size ie. cor(x1,x2)
Case 2: But seeing the equation in terms of x2,
x2=(5*y-0.5*x1-1.5*x3)
it says the beta should be -0.5
' correlation =beta*var(x)/var(y)'
correlation= -0.5*var(x1)/var(x2)
which states that correlation should not be zero.
Where am i going wrong in this second case???

Related

Nott getting same result when trying to get slope from 2 points

I'm trying to make something that would give me the slope and y-intercept from 2 points. Sometimes it would give me the correct values, but other times it would give me something close to correct but wrong.
Anyone know any thing that I could have done wrong? (Also, I feel like I should say that I'm just now starting to learn python)
y2 = input('Y2: ')
y1 = input('Y1: ')
x2 = input('X2: ')
x1 = input('X1: ')
y2 = float(y2)
y1 = float(y1)
x2 = float(x2)
x1 = float(x1)
over = y2-y1
under = x2-x1
m = over/under
y = float(y2)
x = float(x2)
m = float(m)
ym = y-m
b = ym/x
print(f'Y = {m}x + {b}')
it appears as though the mathematics for the intercept b is incorrect.
it would be logical to use the x1, y1 coordinate with the slope to generate b.
assume that you want to find x0, y0 where y0 = b. Then you would do a similar calculation (already having calculated m before as you correctly did).
So (y1-b) / x1 = m. and you can just rearrange this to get:
b = y1 - m*x1.
So this works:
x1, y1 = 1, 1
x2, y2 = 2, 2
over = y2-y1
under = x2-x1
m = over/under
b = y1 - m*x1
print(f'Y = {m}x + {b}')
returns this:
Y = 1.0x + 0.0

how to calculate the skew slopes of the skewed document using connnected components opencv python?

I am using Connected components(NN) method to detect and correct the skew document.I have an image of skew document.I have done the following steps :
1.document image preprocessing.
2.elegible connected components
def imshow(image1):
plt.figure(figsize=(20,10)
plt.imshow(image1)
output = cv2.connectedComponentsWithStats(invr_binary, connectivity, cv2.CV_32S)
(numLabels, labels, stats, centroids) = output
## non text removal
w_avg=stats[1:, cv2.CC_STAT_WIDTH].mean()
h_avg=stats[1: , cv2.CC_STAT_HEIGHT].mean()
B_max=(w_avg * h_avg) * 4
B_min=(w_avg * h_avg) * 0.25
result = np.zeros((labels.shape), np.uint8)
output1=image.copy()
a, b=0.6, 2
for i in range(0, numLabels - 1):
area=stats[i, cv2.CC_STAT_AREA]
if area>B_min and area<B_max: ## non text removal
result[labels == i + 1] = 255
x = stats[i, cv2.CC_STAT_LEFT]
y = stats[i, cv2.CC_STAT_TOP]
w = stats[i, cv2.CC_STAT_WIDTH]
h = stats[i, cv2.CC_STAT_HEIGHT]
area = stats[i, cv2.CC_STAT_AREA]
(cX, cY) = centroids[i]
c=w/h
if a<c and c<b: ## A and C type filtering
result[labels == i + 1] = 255
cv2.rectangle(output1, (x, y), (x + w, y + h), (0, 255, 0), 1)
cv2.circle(output1, (int(cX), int(cY)), 1, (0, 0, 255), -1)
imshow(output1)
input image :
output image :
After finding the center points of the text which is shown in the output image.Now is the next step skew slop calculation. But I could not understand that how to calculate that.I am using that research papers link :3.3(page no. 7)
https://www.mdpi.com/2079-9292/9/1/55/pdf

Sphere-Sphere Intersection

I have two spheres that are intersecting, and I'm trying to find the intersection point nearest in the direction of the point (0,0,1)
My first sphere's (c1) center is at (c1x = 0, c1y = 0, c1z = 0) and has a radius of r1 = 2.0
My second sphere's (c2) center is at (c2x = 2, c2y = 0, c2z = 0) and has a radius of r2 = 2.0
I've been following the logic on this identical question for the 'Typical intersections' part, but was having some trouble understanding it and was hoping someone could help me.
First I'm finding the center of intersection c_i and radius of the intersecting circle r_i:
Here the first sphere has center c_1 and radius r_1, the second c_2 and r_2, and their intersection has center c_i and radius r_i. Let d = ||c_2 - c_1||, the distance between the spheres.
So sphere1 has center c_1 = (0,0,0) with r_1 = 2. Sphere2 has c_2 = (2,0,0) with r_2 = 2.0.
d = ||c_2 - c_1|| = 2
h = 1/2 + (r_1^2 - r_2^2)/(2* d^2)
So now I solve the function of h like so and get 0.5:
h = .5 + (2^2 - 2^2)/(2*2^2)
h = .5 + (0)/(8)
h = 0.5
We can sub this into our formula for c_i above to find the center of the circle of intersections.
c_i = c_1 + h * (c_2 - c_1)
(this equation was my original question, but a comment on this post helped me understand to solve it for each x,y,z)
c_i_x = c_1_x + h * (c_2_x - c_1_x)
c_i_x = 0 + 0.5 * (2 - 0) = 0.5 * 2
1 = c_i_x
c_i_y = c_1_y + h * (c_2_y - c_1_y)
c_i_y = 0 + 0.5 * (0- 0)
0 = c_i_y
c_i_z = c_1_z + h * (c_2_z - c_1_z)
c_i_z = 0 + 0.5 * (0 - 0)
0 = c_i_z
c_i = (c_i_x, c_i_z, c_i_z) = (1, 0, 0)
Then, reversing one of our earlier Pythagorean relations to find r_i:
r_i = sqrt(r_1*r_1 - hhd*d)
r_i = sqrt(4 - .5*.5*2*2)
r_i = sqrt(4 - 1)
r_i = sqrt(3)
r_i = 1.73205081
So if my calculations are correct, I know the circle where my two spheres intersect is centered at (1, 0, 0) and has a radius of 1.73205081
I feel somewhat confident about all the calculations above, the steps make sense as long as I didn't make any math mistakes. I know I'm getting closer but my understanding begins to weaken starting at this point. My end goal is to find an intersection point nearest to (0,0,1), and I have the circle of intersection, so I think what I need to do is find a point on that circle which is nearest to (0,0,1) right?
The next step from this solutionsays:
So, now we have the center and radius of our intersection. Now we can revolve this around the separating axis to get our full circle of solutions. The circle lies in a plane perpendicular to the separating axis, so we can take n_i = (c_2 - c_1)/d as the normal of this plane.
So finding the normal of the plane involves n_i = (c_2 - c_1)/d, do I need to do something similar for finding n_i for x, y, and z again?
n_i_x = (c_2_x - c_1_x)/d = (2-0)/2 = 2/2 = 1
n_i_y = (c_2_y - c_1_y)/d = (0-0)/2 = 0/2 = 0
n_i_z = (c_2_z - c_1_z)/d = (0-0)/2 = 0/2 = 0
After choosing a tangent and bitangent t_i and b_i perpendicular to this normal and each other, you can write any point on this circle as: p_i(theta) = c_i + r_i * (t_i * cos(theta) + b_i sin(theta));
Could I choose t_i and b_i from the point I want to be nearest to? (0,0,1)
Because of the Hairy Ball Theorem, there's no one universal way to choose the tangent/bitangent to use. My recommendation would be to pick one of the coordinate axes not parallel to n_i, and set t_i = normalize(cross(axis, n_i)), and b_i = cross(t_i, n_i) or somesuch.
c_i = c_1 + h * (c_2 - c_1)
This is vector expression, you have to write similar one for every component like this:
c_i.x = c_1.x + h * (c_2.x - c_1.x)
and similar for y and z
As a result, you'll get circle center coordinates:
c_i = (1, 0, 0)
As your citate says, choose axis not parallel to n vect0r- for example, y-axis, get it's direction vector Y_dir=(0,1,0) and multiply by n
t = Y_dir x n = (0, 0, 1)
b = n x t = (0, 1, 0)
Now you have two vectors t,b in circle plane to build circumference points.

Decision Boundary Plot for Support Vector Classifier (distance from separating hyperplane)

I am working through the book "Hands-on Machine Learning with Scikit-Learn and TensorFlow" by Aurélien Géron. The code below is written in Python 3.
On the GitHub page for the Chap. 5 solutions to the Support Vector Machine problems there is the following code for plotting the SVC decision boundary (https://github.com/ageron/handson-ml/blob/master/05_support_vector_machines.ipynb):
def plot_svc_decision_boundary(svm_clf, xmin, xmax):
w = svm_clf.coef_[0]
b = svm_clf.intercept_[0]
# At the decision boundary, w0*x0 + w1*x1 + b = 0
# => x1 = -w0/w1 * x0 - b/w1
x0 = np.linspace(xmin, xmax, 200)
decision_boundary = -w[0]/w[1] * x0 - b/w[1]
margin = 1/w[1]
gutter_up = decision_boundary + margin
gutter_down = decision_boundary - margin
svs = svm_clf.support_vectors_
plt.scatter(svs[:, 0], svs[:, 1], s=180, facecolors='#FFAAAA')
plt.plot(x0, decision_boundary, "k-", linewidth=2)
plt.plot(x0, gutter_up, "k--", linewidth=2)
plt.plot(x0, gutter_down, "k--", linewidth=2)
My question is why is the margin defined as 1/w[1]? I believe the margin should be 1/sqrt(w[0]^2+w[1]^2). That is, the margin is half of 2/L_2_norm(weight_vector) which is 1/L_2_norm(weight_vector). See https://math.stackexchange.com/questions/1305925/why-does-the-svm-margin-is-frac2-mathbfw.
Is this an error in the code?
Given:
decision boundary: w0*x0 + w1*x1 + b = 0
gutter_up: w0*x0 + w1*x1 + b = 1, i.e. w0*x0 + w1*(x1 - 1/w1) + b = 0
gutter_down: w0*x0 + w1*x1 + b = -1, i.e. w0*x0 + w1*(x1 + 1/w1) + b = 0
corresponding to (x0, x1) in decision boundary line, (x0, x1 +1/w1) and (x0, x1 -1/w1) are points in gutter_up/down line.

Finding relative position in plus shape

I am making a dithering library. To find the relative position of an absolute point a in a 2-dimensional plane tiled with 4 unit squares, I use rel.x = abs.x % 4; rel.y = abs.y % 4. This is good, and produces the expected results. But what if I am tiling the plane with plus shapes, which are 3 units? How do I find the absolute position? The tile shape is showed here, 1's are parts of the shape, and 0's are empty areas.
0 1 0
1 1 1
0 1 0
For example, if I have point a resting on x = 1, y = 1, then the absolute position should be x = 1, y = 1. But if it is on, say x = 4, y = 1, then the absolute position should be x = 1, y = 2. You see, there would be another plus which's bottom is on the point x = 1, y = 2. How is this accomplished mathematically? Any language, pseudo code is great too. :)
There is periodicity along X and Y axes with period 5. So long switch expression might look like:
case y % 5 of:
0: case x % 5 of
0: cx = x - 1; cy = y;
1: cx = x; cy = y + 1;
2: cx = x; cy = y - 1;
3: cx = x + 1; cy = y;
4: cx = x; cy = y;
1:...
Or we can create constant array 5x5 and fill it with shifts -1, 0, 1.
dx: [[-1,0,0,1,0],[1,0,-1,0,0],[0,0,1,0,-1],[0,-1,0,0,1],[0,1,0,-1,0]]
dy: [[0,1,-1,0,0],[0,0,0,1,-1],[1,-1,0,0,0],[0,0,1,-1,0],[-1,0,0,0,1]]
I feel that some simple formula might exist.
Edit: simpler version:
const dx0: [-1,0,0,1,0]
const dy0: [0,1,-1,0,0]
ixy = (x - 2 * y + 10) % 5;
dx = dx0[ixy];
dy = dy0[ixy];
And finally crazy one-liners without constant arrays
dx = (((11 + x - 2 * (y%5)) % 5) ^ 1 - 2) / 2 //^=xor; /2 - integer division
dy = ((13 + x - 2 * (y%5)) % 5 - 2) / 2

Resources