Apply an affine transform to a bounding rectangle - python-3.x

I am working on a pedestrian tracking algorithm using Python3 & OpenCV.
We can use SIFT keypoints as an identifier of a pedestrian silhouette on a frame and then perform brute force matching between two sets of SIFT keypoints (i.e. between one frame and the next one) to find the pedestrian in the next frame.
To visualize this on the sequence of frames, we can draw a bounding rectangle delimiting the pedestrian. This is what it looks like :
The main problem is about characterizing the motion of the pedestrian using the keypoints. The idea here is to find an affine transform (that is translation in x & y, rotation & scaling) using the coordinates of the keypoints on 2 successives frames. Ideally, this affine transform somehow corresponds to the motion of the pedestrian. To track this pedestrian, we would then just have to apply the same affine transform on the bounding rectangle coordinates.
That last part doesn’t work well. The rectangle consistently shrinks over several frames to inevitably disappear or drifts away from the pedestrian, as you see below or on the previous image :
To specify, we characterize the bounding rectangle with 2 extreme points :
There are some built-in cv2 functions that can apply an affine transform to an image, like cv2.warpAffine(), but I want to apply it only to the bounding rectangle coordinates (i.e 2 points or 1 point + width & height).
To find the affine transform between the 2 sets of keypoints, I’ve written my own function (I can post the code if it helps), but I’ve observed similar results when using cv2.getAffineTransform() for instance.
Do you know how to properly apply an affine transform to this bounding rectangle ?
EDIT : here’s some explanation & code for better context :
The pedestrian detection is done with the pre-trained SVM classifier available in openCV : hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) & hog.detectMultiScale()
Once a first pedestrian is detected, the SVM returns the coordinates of the associated bounding rectangle (xA, yA, w, h) (we stop using the SVM after the 1st detection as it is quite slow, and we are focusing on one pedestrian for now)
We select the corresponding region of the current frame, with image[yA: yA+h, xA: xA+w] and search for SURF keypoints within with surf.detectAndCompute()
This returns the keypoints & their associated descriptors (an array of 64 characteristics for each keypoint)
We perform brute force matching, based on the L2-norm between the descriptors and the distance in pixels between the keypoints to construct pairs of keypoints between the current frame & the previous one. The code for this function is pretty long, but should be similar to cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
Once we have the matched pairs of keypoints, we can use them to find the affine transform with this function :
previousKpts = previousKpts[:5] # select 4 best matches
currentKpts = currentKpts[:5]
# build A matrix of shape [2 * Nb of keypoints, 4]
A = np.ndarray(((2 * len(previousKpts), 4)))
for idx, keypoint in enumerate(previousKpts):
# Keypoint.pt = (x-coord, y-coord)
A[2 * idx, :] = [keypoint.pt[0], -keypoint.pt[1], 1, 0]
A[2 * idx + 1, :] = [keypoint.pt[1], keypoint.pt[0], 0, 1]
# build b matrix of shape [2 * Nb of keypoints, 1]
b = np.ndarray((2 * len(previousKpts), 1))
for idx, keypoint in enumerate(currentKpts):
b[2 * idx, :] = keypoint.pt[0]
b[2 * idx + 1, :] = keypoint.pt[1]
# convert the numpy.ndarrays to matrix :
A = np.matrix(A)
b = np.matrix(b)
# solution of the form x = [x1, x2, x3, x4]' = ((A' * A)^-1) * A' * b
x = np.linalg.inv(A.T * A) * A.T * b
theta = math.atan2(x[1, 0], x[0, 0]) # outputs rotation angle in [-pi, pi]
alpha = math.sqrt(x[0, 0] ** 2 + x[1, 0] ** 2) # scaling parameter
bx = x[2, 0] # translation along x-axis
by = x[3, 0] # translation along y-axis
return theta, alpha, bx, by
We then just have to apply the same affine transform to the corner points of the bounding rectangle :
# define the 4 bounding points using xA, yA
xB = xA + w
yB = yA + h
rect_pts = np.array([[[xA, yA]], [[xB, yA]], [[xA, yB]], [[xB, yB]]], dtype=np.float32)
# warp the affine transform into a full perspective transform
affine_warp = np.array([[alpha*np.cos(theta), -alpha*np.sin(theta), tx],
[alpha*np.sin(theta), alpha*np.cos(theta), ty],
[0, 0, 1]], dtype=np.float32)
# apply affine transform
rect_pts = cv2.perspectiveTransform(rect_pts, affine_warp)
xA = rect_pts[0, 0, 0]
yA = rect_pts[0, 0, 1]
xB = rect_pts[3, 0, 0]
yB = rect_pts[3, 0, 1]
return xA, yA, xB, yB
Save the updated rectangle coordinates (xA, yA, xB, yB), all current keypoints & descriptors, and iterate over the next frame : select image[yA: yB, xA: xA] using (xA, yA, xB, yB) we previously saved, get SURF keypoints etc.

As Micka suggested, cv2.perspectiveTransform() is an easy way to accomplish this. You'll just need to turn your affine warp into a full perspective transform (homography) by adding a third row at the bottom with the values [0, 0, 1]. For example, let's put a box with w, h = 100, 200 at the point (10, 20) and then use an affine transformation to shift the points so that the box is moved to (0, 0) (i.e. shift 10 pixels to the left and 20 pixels up):
>>> xA, yA, w, h = (10, 20, 100, 200)
>>> xB, yB = xA + w, yA + h
>>> rect_pts = np.array([[[xA, yA]], [[xB, yA]], [[xA, yB]], [[xB, yB]]], dtype=np.float32)
>>> affine_warp = np.array([[1, 0, -10], [0, 1, -20], [0, 0, 1]], dtype=np.float32)
>>> cv2.perspectiveTransform(rect_pts, affine_warp)
array([[[ 0., 0.]],
[[ 100., 0.]],
[[ 0., 200.]],
[[ 100., 200.]]], dtype=float32)
So that works perfectly as expected. You could also just simply transform the points yourself with matrix multiplication:
>>> rect_pts.dot(affine_warp[:, :2]) + affine_warp[:, 2]
array([[[ 0., 0.]],
[[ 100., 0.]],
[[ 0., 200.]],
[[ 100., 200.]]], dtype=float32)

Related

How can i get the inner contour points without redundancy in OpenCV - Python

I'm new with OpenCV and the thing is that i need to get all the contour points. This is easy setting the cv2.RETR_TREE mode in findContours method. The thing is that in this way, returns redundant coordinates. So, for example, in this polygon, i don't want to get the contour points like this:
But like this:
So according to the first image, green color are the contours detected with RETR_TREE mode, and points 1-2, 3-5, 4-6, ... are redundant, because they are so close to each other. I need to put together those redundant points into one, and append it in the customContours array.
For the moment, i only have the code according for the first picture, setting up the distance between the points and the points coordinates:
def getContours(img, minArea=20000, cThr=[100, 100]):
font = cv2.FONT_HERSHEY_COMPLEX
imgColor = img
imgGray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
imgBlur = cv2.GaussianBlur(imgGray, (5, 5), 1)
imgCanny = cv2.Canny(imgBlur, cThr[0], cThr[1])
kernel = np.ones((5, 5))
imgDial = cv2.dilate(imgCanny, kernel, iterations=3)
imgThre = cv2.erode(imgDial, kernel, iterations=2)
cv2.imshow('threshold', imgThre)
contours, hierachy = cv2.findContours(imgThre, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
customContours = []
for cnt in contours:
area = cv2.contourArea(cnt)
if area > minArea:
peri = cv2.arcLength(cnt, True)
approx = cv2.approxPolyDP(cnt, 0.009*peri, True)
bbox = cv2.boundingRect(approx)
customContours.append([len(approx), area, approx, bbox, cnt])
print('points: ', len(approx))
n = approx.ravel()
i = 0
for j in n:
if i % 2 == 0:
x = n[i]
y = n[i + 1]
string = str(x)+" " + str(y)
cv2.putText(imgColor, str(i//2+1) + ': ' + string, (x, y), font, 2, (0, 0, 0), 2)
i = i + 1
customContours = sorted(customContours, key=lambda x: x[1], reverse=True)
for cnt in customContours:
cv2.drawContours(imgColor, [cnt[2]], 0, (0, 0, 255), 5)
return imgColor, customContours
Could you help me to get the real points regarding to i.e. the second picture?
(EDIT 01/07/21)
I want a generic solution, because the image could be more complex, such as the following picture:
NOTE: notice that the middle arrow (points 17 and 18) doesn't have a closed area, so isn't a polygon to study. Then, that region is not interested to obtain his points. Also, notice that the order of the points aren't important, but if the entry is the hole image, it should know that there are 4 polygons, so for each polygon points starts with 0, then 1, etc.
Here's my approach. It is mainly morphological-based. It involves convolving the image with a special kernel. This convolution identifies the end-points of the triangle as well as the intersection points where the middle line is present. This will result in a points mask containing the pixel that matches the points you are looking for. After that, we can apply a little bit of morphology to join possible duplicated points. What remains is to get a list of the coordinate of these points for further processing.
These are the steps:
Get a binary image of the input via Otsu's thresholding
Get the skeleton of the binary image
Define the special kernel and convolve the skeleton image
Apply a morphological dilate to join possible duplicated points
Get the centroids of the points and store them in a list
Here's the code:
# Imports:
import numpy as np
import cv2
# image path
path = "D://opencvImages//"
fileName = "triangle.png"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Prepare a deep copy for results:
inputImageCopy = inputImage.copy()
# Convert BGR to Grayscale
grayImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu:
_, binaryImage = cv2.threshold(grayImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
The first bit computes the binary image. Very straightforward. I'm using this image as base, which is just a cleaned-up version of what you posted without the annotations. This is the resulting binary image:
Now, to perform the convolution we must first get the image "skeleton". The skeleton is a version of the binary image where lines have been normalized to have a width of 1 pixel. This is useful because we can then convolve the image with a 3 x 3 kernel and look for specific pixel patterns. Let's compute the skeleton using OpenCV's extended image processing module:
# Get image skeleton:
skeleton = cv2.ximgproc.thinning(binaryImage, None, 1)
This is the image obtained:
We can now apply the convolution. The approach is based on Mark Setchell's info on this post. The post mainly shows the method for finding end-points of a shape, but I extended it to also identify line intersections, such as the middle portion of the triangle. The main idea is that the convolution yields a very specific value where patterns of black and white pixels are found in the input image. Refer to the post for the theory behind this idea, but here, we are looking for two values: 110 and 40. The first one occurs when an end-point has been found. The second one when a line intersections is found. Let's setup the convolution:
# Threshold the image so that white pixels get a value of 0 and
# black pixels a value of 10:
_, binaryImage = cv2.threshold(skeleton, 128, 10, cv2.THRESH_BINARY)
# Set the convolution kernel:
h = np.array([[1, 1, 1],
[1, 10, 1],
[1, 1, 1]])
# Convolve the image with the kernel:
imgFiltered = cv2.filter2D(binaryImage, -1, h)
# Create list of thresholds:
thresh = [110, 40]
The first part is done. We are going to detect end-points and intersections in two separated steps. Each step will produce a partial result, we can OR both results to get a final mask:
# Prepare the final mask of points:
(height, width) = binaryImage.shape
pointsMask = np.zeros((height, width, 1), np.uint8)
# Perform convolution and create points mask:
for t in range(len(thresh)):
# Get current threshold:
currentThresh = thresh[t]
# Locate the threshold in the filtered image:
tempMat = np.where(imgFiltered == currentThresh, 255, 0)
# Convert and shape the image to a uint8 height x width x channels
# numpy array:
tempMat = tempMat.astype(np.uint8)
tempMat = tempMat.reshape(height,width,1)
# Accumulate mask:
pointsMask = cv2.bitwise_or(pointsMask, tempMat)
This is the final mask of points:
Note that the white pixels are the locations that matched our target patterns. Those are the points we are looking for. As the shape is not a perfect triangle, some points could be duplicated. We can "merge" neighboring blobs by applying a morphological dilation:
# Set kernel (structuring element) size:
kernelSize = 7
# Set operation iterations:
opIterations = 3
# Get the structuring element:
morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Perform Dilate:
morphoImage = cv2.morphologyEx(pointsMask, cv2.MORPH_DILATE, morphKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
This is the result:
Very nice, we have now big clusters of pixels (or blobs). To get their coordinates, one possible approach would be to get the bounding rectangles of these contours and compute their centroids:
# Look for the outer contours (no children):
contours, _ = cv2.findContours(morphoImage, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Store the points here:
pointsList = []
# Loop through the contours:
for i, c in enumerate(contours):
# Get the contours bounding rectangle:
boundRect = cv2.boundingRect(c)
# Get the centroid of the rectangle:
cx = int(boundRect[0] + 0.5 * boundRect[2])
cy = int(boundRect[1] + 0.5 * boundRect[3])
# Store centroid into list:
pointsList.append( (cx,cy) )
# Set centroid circle and text:
color = (0, 0, 255)
cv2.circle(inputImageCopy, (cx, cy), 3, color, -1)
font = cv2.FONT_HERSHEY_COMPLEX
string = str(cx) + ", " + str(cy)
cv2.putText(inputImageCopy, str(i) + ':' + string, (cx, cy), font, 0.5, (255, 0, 0), 1)
# Show image:
cv2.imshow("Circles", inputImageCopy)
cv2.waitKey(0)
These are the points located in the original input:
Note also that I've stored their coordinates in the pointsList list:
# Print the list of points:
print(pointsList)
This prints the centroids as the tuple (centroidX, centroidY):
[(717, 971), (22, 960), (183, 587), (568, 586), (388, 98)]

Pytorch gridsample - Different Resolutions?

Does Pytorch gridsample work for this particular case.
I have an image of size [B, 100, 200] that I want to map into a smaller [B, 50, 60] space. I have the pixel 1-1 mappings stored in a [B, 100, 200, 2] tensor where most of it i guess is 0?
Is this actually possible?
The answer is yes, it is possible! But not just with the gridsample.
You can check the documentation here:
grid_sample: https://pytorch.org/docs/stable/nn.functional.html#grid-sample
interpolate: https://pytorch.org/docs/stable/nn.functional.html#interpolate
You will need the following:
import torch.nn.F as F
warped_img = F.grid_sample(input, grid)
small_img = F.interplate(warped_img, size=(50, 60))
input that is the image you have [B, 100, 200];
grid it is the 1-1 pixel mapping [B, 100, 200, 2], the values should be normalized in the range [-1, 1] with (-1, -1) being the leftmost upper corner. Notice that, if most of the values are zeros, most pixels will be mapped to the center of the image. That is because grid takes the end_location of each pixel, it does not take the displacement.
warped_img = F.grid_sample(input, grid)
Now, to downsize the image you have to use F.interpolate
small_img = F.interpolate(warped_img, size=(56, 60))
Notice that you can also downsize first (not sure how that will impact the end-result). That is because the grid is normalized!
import torch.nn.F as F
warped_img = F.grid_sample(F.interplate(input, size=(50, 60)),
F.interplate(grid, size=(50, 60)))
Notice that grid is the end location of each pixel. And it is not the displacement. If you have a flow field (just the displacement of each pixel) you can turn that into a grid with the following:
def warp(img, flow, size):
B, C, H, W = img.size()
# mesh grid
grid_x = torch.arange(W, device=img.device)
grid_y = torch.arange(H, device=img.device)
yy, xx = torch.meshgrid(grid_y, grid_x)
grid = torch.cat((xx.unsqueeze(0), yy.unsqueeze(0)), dim=0)
vgrid = grid + flow.clamp(min=-1000., max=1000.)
# scale grid to [-1,1]
vgrid[:, 0, :, :] = 2.0 * vgrid[:, 0, :, :].clone() / max(W - 1, 1) - 1.0
vgrid[:, 1, :, :] = 2.0 * vgrid[:, 1, :, :].clone() / max(H - 1, 1) - 1.0
vgrid = vgrid.permute(0, 2, 3, 1)
warped_img = F.grid_sample(img, vgrid, align_corners=False)
small_img = F.interpolate(warped_img, size=size)
return small_img

Quiver plot with optical flow?

Recently I'm working in cloud motion tracking using images, but in many examples when is used in video implementations shows a quiver plot that moves according the object tracked.
Quiver documentations takes four argumets principally ([X, Y], U, V), when X and Y are the starting points and U and V the directions. In the other hand, optical flow based on this example returnsp1 (the displacements) with a shape (m, n, l) of the image with shape of (200,200). My confusion is in how to order the parameters, because also goodFeaturesToTrack return the same as p1
¿How can I join both components to plot a quiver of the cloud motion?
I found a pretty good solution. I explain all my example here using the Hamburg taxi sequence:
Download the taxi sequence.
$ curl -O ftp://ftp.ira.uka.de/pub/vid-text/image_sequences/taxi/taxi.zip
$ unzip -q taxi.zip
Get all images and pick two random frames
from pathlib import Path
import numpy as np
import cv2 as cv
from PIL import Image
import matplotlib.pyplot as plt
taxis_fnames = list(Path('taxi').iterdir())
taxi1 = Image.open(taxis_fnames[rand_idx])
taxi2 = Image.open(taxis_fnames[rand_idx + 4])
Compute the optical flow
flow = cv.calcOpticalFlowFarneback(np.array(taxi1),
np.array(taxi2),
None, 0.5, 3, 15, 3, 5, 1.2, 0)
Plot the quiver
step = 3
plt.quiver(np.arange(0, flow.shape[1], step), np.arange(flow.shape[0], -1, -step),
flow[::step, ::step, 0], flow[::step, ::step, 1])
The step is to downsample the number of optical flow vectors picked. The x positions goes from 0 to image width, while the y positions are inversed (otherwise the optical flow will be up side down) from image height to 0. In some occasions, you will have to change the step so the height and with are divisible by it.
The resulting image:
Here is a general method for plotting a quiver field easily and accurately.
def plot_quiver(ax, flow, spacing, margin=0, **kwargs):
"""Plots less dense quiver field.
Args:
ax: Matplotlib axis
flow: motion vectors
spacing: space (px) between each arrow in grid
margin: width (px) of enclosing region without arrows
kwargs: quiver kwargs (default: angles="xy", scale_units="xy")
"""
h, w, *_ = flow.shape
nx = int((w - 2 * margin) / spacing)
ny = int((h - 2 * margin) / spacing)
x = np.linspace(margin, w - margin - 1, nx, dtype=np.int64)
y = np.linspace(margin, h - margin - 1, ny, dtype=np.int64)
flow = flow[np.ix_(y, x)]
u = flow[:, :, 0]
v = flow[:, :, 1]
kwargs = {**dict(angles="xy", scale_units="xy"), **kwargs}
ax.quiver(x, y, u, v, **kwargs)
ax.set_ylim(sorted(ax.get_ylim(), reverse=True))
ax.set_aspect("equal")
Example usage:
flow = cv2.calcOpticalFlowFarneback(
frame_1, frame_2, None, 0.5, 3, 15, 3, 5, 1.2, 0
)
fig, ax = plt.subplots()
plot_quiver(ax, flow, spacing=10, scale=1, color="#ff44ff")

Rotating 2D grayscale image with transformation matrix

I am new to image processing so i am really confused regarding the coordinate system with images. I have a sample image and i am trying to rotate it 45 clockwise. My transformation matrix is T = [ [cos45 sin45] [-sin45 cos45] ]
Here is the code:
import numpy as np
from matplotlib import pyplot as plt
from skimage import io
image = io.imread('sample_image')
img_transformed = np.zeros((image.shape), dtype=np.uint8)
trans_matrix = np.array([[np.cos(45), np.sin(45)], [-np.sin(45), np.cos(45)]])
for i, row in enumerate(image):
for j,col in enumerate(row):
pixel_data = image[i,j] #get the value of pixel at corresponding location
input_coord = np.array([i, j]) #this will be my [x,y] matrix
result = trans_matrix # input_coord
i_out, j_out = result #store the resulting coordinate location
#make sure the the i and j values remain within the index range
if (0 < int(i_out) < image.shape[0]) and (0 < int(j_out) < image.shape[1]):
img_transformed[int(i_out)][int(j_out)] = pixel_data
plt.imshow(img_transformed, cmap='gray')
The image comes out distorted and doesn't seems right. I know that in pixel coordinate, the origin is at the top left corner (row, column). is the rotation happening with respect to origin from the top left corner? is there a way to shift origin to center or any other given point?
Thank you all!
Yes, as you suspect, the rotation is happening with respect to the top left corner, which has coordinates (0, 0). (Also: the NumPy trigonometric functions use radians rather than degrees, so you need to convert your angle.) To compute a rotation with respect to the center, you do a little hack: you compute the transformation for moving the image so that it is centered on (0, 0), then you rotate it, then you move the result back. You need to combine these transformations in a sequence because if you do it one after the other, you'll lose everything in negative coordinates.
It's much, much easier to do this using Homogeneous coordinates, which add an extra "dummy" dimension to your image. Here's what your code would look like in homogeneous coordinates:
import numpy as np
from matplotlib import pyplot as plt
from skimage import io
image = io.imread('sample_image')
img_transformed = np.zeros((image.shape), dtype=np.uint8)
c, s = np.cos(np.radians(45)), np.sin(np.radians(45))
rot_matrix = np.array([[c, s, 0], [-s, c, 0], [0, 0, 1]])
x, y = np.array(image.shape) // 2
# move center to (0, 0)
translate1 = np.array([[1, 0, -x], [0, 1, -y], [0, 0, 1]])
# move center back to (x, y)
translate2 = np.array([[1, 0, x], [0, 1, y], [0, 0, 1]])
# compose all three transformations together
trans_matrix = translate2 # rot_matrix # translate1
for i, row in enumerate(image):
for j,col in enumerate(row):
pixel_data = image[i,j] #get the value of pixel at corresponding location
input_coord = np.array([i, j, 1]) #this will be my [x,y] matrix
result = trans_matrix # input_coord
i_out, j_out, _ = result #store the resulting coordinate location
#make sure the the i and j values remain within the index range
if (0 < int(i_out) < image.shape[0]) and (0 < int(j_out) < image.shape[1]):
img_transformed[int(i_out)][int(j_out)] = pixel_data
plt.imshow(img_transformed, cmap='gray')
The above should work ok, but you will probably get some black spots due to aliasing. What can happen is that no coordinates i, j from the input land exactly on an output pixel, so that pixel never gets updated. Instead, what you need to do is iterate over the pixels of the output image, then use the inverse transform to find which pixel in the input image maps closest to that output pixel. Something like:
inverse_tform = np.linalg.inv(trans_matrix)
for i, j in np.ndindex(img_transformed.shape):
i_orig, j_orig, _ = np.round(inverse_tform # [i, j, 1]).astype(int)
if i_orig in range(image.shape[0]) and j_orig in range(image.shape[1]):
img_transformed[i, j] = image[i_orig, j_orig]
Hope this helps!

Why does contourf (matplotlib) switch x and y coordinates?

I am trying to get contourf to plot my stuff right, but it seems to switch the x and y coordinates. In the example below, I show this by evaluating a 2d Gaussian function that has different widths in x and y directions. With the values given, the width in y direction should be larger. Here is the script:
from numpy import *
from matplotlib.pyplot import *
xMax = 50
xNum = 100
w0x = 10
w0y = 15
dx = xMax/xNum
xGrid = linspace(-xMax/2+dx/2, xMax/2-dx/2, xNum, endpoint=True)
yGrid = xGrid
Int = zeros((xNum, xNum))
for idX in range(xNum):
for idY in range(xNum):
Int[idX, idY] = exp(-((xGrid[idX]/w0x)**2 + (yGrid[idY]/(w0y))**2))
fig = figure(6)
clf()
ax = subplot(2,1,1)
X, Y = meshgrid(xGrid, yGrid)
contour(X, Y, Int, colors='k')
plot(array([-xMax, xMax])/2, array([0, 0]), '-b')
plot(array([0, 0]), array([-xMax, xMax])/2, '-r')
ax.set_aspect('equal')
xlabel("x")
ylabel("y")
subplot(2,1,2)
plot(xGrid, Int[:, int(xNum/2)], '-b', label='I(x, y=max/2)')
plot(xGrid, Int[int(xNum/2), :], '-r', label='I(x=max/2, y)')
ax.set_aspect('equal')
legend()
xlabel(r"x or y")
ylabel(r"I(x or y)")
The figure thrown out is this:
On top the contour plot which has the larger width in x direction (not y). Below are slices shown, one across x direction (at constant y=0, blue), the other in y direction (at constant x=0, red). Here, everything seems fine, the y direction is broader than the x direction. So why would I have to transpose the array in order to have it plotted as I want? This seems unintuitive to me and not in agreement with the documentation.
It helps if you think of a 2D array's shape not as (x, y) but as (rows, columns), because that is how most math routines interpret them - including matplotlib's 2D plotting functions. Therefore, the first dimension is vertical (which you call y) and the second dimension is horizontal (which you call x).
Note that this convention is very prominent, even in numpy. The function np.vstack is supposed to concatenate arrays vertically works along the first dimension and np.hstack works horizontally on the second dimension.
To illustrate the point:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([[0, 0, 1, 0, 0],
[0, 1, 1, 1, 0],
[1, 1, 1, 1, 1]])
a[:, 2] = 2 # set column
print(a)
plt.imshow(a)
plt.contour(a, colors='k')
This prints
[[0 0 2 0 0]
[0 1 2 1 0]
[1 1 2 1 1]]
and consistently plots
According to your convention that an array is (x, y) the command a[:, 2] = 2 should have assigned to the third row, but numpy and matplotlib both agree that it was the column :)
You can of course use your own convention how to interpret the dimensions of your arrays, but in the long run it will be more consistent to treat them as (y, x).

Resources