Rotating 2D grayscale image with transformation matrix - python-3.x

I am new to image processing so i am really confused regarding the coordinate system with images. I have a sample image and i am trying to rotate it 45 clockwise. My transformation matrix is T = [ [cos45 sin45] [-sin45 cos45] ]
Here is the code:
import numpy as np
from matplotlib import pyplot as plt
from skimage import io
image = io.imread('sample_image')
img_transformed = np.zeros((image.shape), dtype=np.uint8)
trans_matrix = np.array([[np.cos(45), np.sin(45)], [-np.sin(45), np.cos(45)]])
for i, row in enumerate(image):
for j,col in enumerate(row):
pixel_data = image[i,j] #get the value of pixel at corresponding location
input_coord = np.array([i, j]) #this will be my [x,y] matrix
result = trans_matrix # input_coord
i_out, j_out = result #store the resulting coordinate location
#make sure the the i and j values remain within the index range
if (0 < int(i_out) < image.shape[0]) and (0 < int(j_out) < image.shape[1]):
img_transformed[int(i_out)][int(j_out)] = pixel_data
plt.imshow(img_transformed, cmap='gray')
The image comes out distorted and doesn't seems right. I know that in pixel coordinate, the origin is at the top left corner (row, column). is the rotation happening with respect to origin from the top left corner? is there a way to shift origin to center or any other given point?
Thank you all!

Yes, as you suspect, the rotation is happening with respect to the top left corner, which has coordinates (0, 0). (Also: the NumPy trigonometric functions use radians rather than degrees, so you need to convert your angle.) To compute a rotation with respect to the center, you do a little hack: you compute the transformation for moving the image so that it is centered on (0, 0), then you rotate it, then you move the result back. You need to combine these transformations in a sequence because if you do it one after the other, you'll lose everything in negative coordinates.
It's much, much easier to do this using Homogeneous coordinates, which add an extra "dummy" dimension to your image. Here's what your code would look like in homogeneous coordinates:
import numpy as np
from matplotlib import pyplot as plt
from skimage import io
image = io.imread('sample_image')
img_transformed = np.zeros((image.shape), dtype=np.uint8)
c, s = np.cos(np.radians(45)), np.sin(np.radians(45))
rot_matrix = np.array([[c, s, 0], [-s, c, 0], [0, 0, 1]])
x, y = np.array(image.shape) // 2
# move center to (0, 0)
translate1 = np.array([[1, 0, -x], [0, 1, -y], [0, 0, 1]])
# move center back to (x, y)
translate2 = np.array([[1, 0, x], [0, 1, y], [0, 0, 1]])
# compose all three transformations together
trans_matrix = translate2 # rot_matrix # translate1
for i, row in enumerate(image):
for j,col in enumerate(row):
pixel_data = image[i,j] #get the value of pixel at corresponding location
input_coord = np.array([i, j, 1]) #this will be my [x,y] matrix
result = trans_matrix # input_coord
i_out, j_out, _ = result #store the resulting coordinate location
#make sure the the i and j values remain within the index range
if (0 < int(i_out) < image.shape[0]) and (0 < int(j_out) < image.shape[1]):
img_transformed[int(i_out)][int(j_out)] = pixel_data
plt.imshow(img_transformed, cmap='gray')
The above should work ok, but you will probably get some black spots due to aliasing. What can happen is that no coordinates i, j from the input land exactly on an output pixel, so that pixel never gets updated. Instead, what you need to do is iterate over the pixels of the output image, then use the inverse transform to find which pixel in the input image maps closest to that output pixel. Something like:
inverse_tform = np.linalg.inv(trans_matrix)
for i, j in np.ndindex(img_transformed.shape):
i_orig, j_orig, _ = np.round(inverse_tform # [i, j, 1]).astype(int)
if i_orig in range(image.shape[0]) and j_orig in range(image.shape[1]):
img_transformed[i, j] = image[i_orig, j_orig]
Hope this helps!

Related

Image intensity distribution changes during opencv warp affine

I am using python 3.8.5 and opencv 4.5.1 on windows 7
I am using the following code to rotate images.
def pad_rotate(image, ang, pad, pad_value=0):
(h, w) = image.shape[:2]
#create larger image and paste original image at the center.
# this is done to avoid any cropping during rotation
nH, nW = h + 2*pad, w + 2*pad #new height and width
cY, cX = nW//2, nH//2 #center of the new image
#create new image with pad_values
newImg = np.zeros((h+2*pad, w+2*pad), dtype=image.dtype)
newImg[:,:] = pad_value
#paste new image at the center
newImg[pad:pad+h, pad:pad+w] = image
#rotate CCW (for positive angles)
M = cv2.getRotationMatrix2D(center=(cX, cY), angle=ang, scale=1.0)
rotImg = cv2.warpAffine(newImg, M, (nW, nH), cv2.INTER_CUBIC,
borderMode=cv2.BORDER_CONSTANT, borderValue=pad_value)
return rotImg
My issue is that after the rotation, image intensity distribution is different than original.
Following part of the question is edited to clarify the issue
img = np.random.rand(500,500)
Rimg = pad_rotate(img, 15, 300, np.nan)
Here is what these images look like:
Their intensities have clearly shifted:
np.percentile(img, [20, 50, 80])
# prints array([0.20061218, 0.50015415, 0.79989986])
np.nanpercentile(Rimg, [20, 50, 80])
# prints array([0.32420028, 0.50031483, 0.67656537])
Can someone please tell me how to avoid this normalization?
The averaging effect of the interpolation changes the distribution...
Note:
There is a mistake in your code sample (not related to the percentiles).
The 4'th argument of warpAffine is dst.
replace cv2.warpAffine(newImg, M, (nW, nH), cv2.INTER_CUBIC with:
cv2.warpAffine(newImg, M, (nW, nH), flags=cv2.INTER_CUBIC
I tried to simplify the code sample that reproduces the problem.
The code sample uses linear interpolation, 1 degree rotation, and no NaN values.
import numpy as np
import cv2
img = np.random.rand(1000, 1000)
M = cv2.getRotationMatrix2D((img.shape[1]//2, img.shape[0]//2), 1, 1) # Rotate by 1 degree
Rimg = cv2.warpAffine(img, M, (img.shape[1], img.shape[0]), flags=cv2.INTER_LINEAR) # Use Linear interpolation
Rimg = Rimg[20:-20, 20:-20] # Crop the part without the margins.
print(np.percentile(img, [20, 50, 80])) #[0.20005696 0.49990526 0.79954818]
print(np.percentile(Rimg, [20, 50, 80])) #[0.32244747 0.4998595 0.67698961]
cv2.imshow('img', img)
cv2.imshow('Rimg', Rimg)
cv2.waitKey()
cv2.destroyAllWindows()
When we disable the interpolation,
Rimg = cv2.warpAffine(img, M, (img.shape[1], img.shape[0]), flags=cv2.INTER_NEAREST)
The percentiles are: [0.19943713 0.50004768 0.7995525 ].
Simpler example for showing that averaging elements changes the distribution:
A = np.random.rand(10000000)
B = (A[0:-1:2] + A[1::2])/2 # Averaging every two elements.
print(np.percentile(A, [20, 50, 80])) # [0.19995436 0.49999472 0.80007232]
print(np.percentile(B, [20, 50, 80])) # [0.31617922 0.50000145 0.68377251]
Why does interpolation skews the distribution towered the median?
I am not a mathematician.
I am sure you can get a better explanation...
Here is an intuitive example:
Assume there is list of values with uniform distribution in range [0, 1].
Assume there is a zero value in the list:
[0.2, 0.7, 0, 0.5... ]
After averaging every two sequential elements, the probability for getting a zero element in the output list is very small (only two sequential zeros result a zero).
The example shows that averaging pushes the extreme values towered the center.

How can i get the inner contour points without redundancy in OpenCV - Python

I'm new with OpenCV and the thing is that i need to get all the contour points. This is easy setting the cv2.RETR_TREE mode in findContours method. The thing is that in this way, returns redundant coordinates. So, for example, in this polygon, i don't want to get the contour points like this:
But like this:
So according to the first image, green color are the contours detected with RETR_TREE mode, and points 1-2, 3-5, 4-6, ... are redundant, because they are so close to each other. I need to put together those redundant points into one, and append it in the customContours array.
For the moment, i only have the code according for the first picture, setting up the distance between the points and the points coordinates:
def getContours(img, minArea=20000, cThr=[100, 100]):
font = cv2.FONT_HERSHEY_COMPLEX
imgColor = img
imgGray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
imgBlur = cv2.GaussianBlur(imgGray, (5, 5), 1)
imgCanny = cv2.Canny(imgBlur, cThr[0], cThr[1])
kernel = np.ones((5, 5))
imgDial = cv2.dilate(imgCanny, kernel, iterations=3)
imgThre = cv2.erode(imgDial, kernel, iterations=2)
cv2.imshow('threshold', imgThre)
contours, hierachy = cv2.findContours(imgThre, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
customContours = []
for cnt in contours:
area = cv2.contourArea(cnt)
if area > minArea:
peri = cv2.arcLength(cnt, True)
approx = cv2.approxPolyDP(cnt, 0.009*peri, True)
bbox = cv2.boundingRect(approx)
customContours.append([len(approx), area, approx, bbox, cnt])
print('points: ', len(approx))
n = approx.ravel()
i = 0
for j in n:
if i % 2 == 0:
x = n[i]
y = n[i + 1]
string = str(x)+" " + str(y)
cv2.putText(imgColor, str(i//2+1) + ': ' + string, (x, y), font, 2, (0, 0, 0), 2)
i = i + 1
customContours = sorted(customContours, key=lambda x: x[1], reverse=True)
for cnt in customContours:
cv2.drawContours(imgColor, [cnt[2]], 0, (0, 0, 255), 5)
return imgColor, customContours
Could you help me to get the real points regarding to i.e. the second picture?
(EDIT 01/07/21)
I want a generic solution, because the image could be more complex, such as the following picture:
NOTE: notice that the middle arrow (points 17 and 18) doesn't have a closed area, so isn't a polygon to study. Then, that region is not interested to obtain his points. Also, notice that the order of the points aren't important, but if the entry is the hole image, it should know that there are 4 polygons, so for each polygon points starts with 0, then 1, etc.
Here's my approach. It is mainly morphological-based. It involves convolving the image with a special kernel. This convolution identifies the end-points of the triangle as well as the intersection points where the middle line is present. This will result in a points mask containing the pixel that matches the points you are looking for. After that, we can apply a little bit of morphology to join possible duplicated points. What remains is to get a list of the coordinate of these points for further processing.
These are the steps:
Get a binary image of the input via Otsu's thresholding
Get the skeleton of the binary image
Define the special kernel and convolve the skeleton image
Apply a morphological dilate to join possible duplicated points
Get the centroids of the points and store them in a list
Here's the code:
# Imports:
import numpy as np
import cv2
# image path
path = "D://opencvImages//"
fileName = "triangle.png"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Prepare a deep copy for results:
inputImageCopy = inputImage.copy()
# Convert BGR to Grayscale
grayImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu:
_, binaryImage = cv2.threshold(grayImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
The first bit computes the binary image. Very straightforward. I'm using this image as base, which is just a cleaned-up version of what you posted without the annotations. This is the resulting binary image:
Now, to perform the convolution we must first get the image "skeleton". The skeleton is a version of the binary image where lines have been normalized to have a width of 1 pixel. This is useful because we can then convolve the image with a 3 x 3 kernel and look for specific pixel patterns. Let's compute the skeleton using OpenCV's extended image processing module:
# Get image skeleton:
skeleton = cv2.ximgproc.thinning(binaryImage, None, 1)
This is the image obtained:
We can now apply the convolution. The approach is based on Mark Setchell's info on this post. The post mainly shows the method for finding end-points of a shape, but I extended it to also identify line intersections, such as the middle portion of the triangle. The main idea is that the convolution yields a very specific value where patterns of black and white pixels are found in the input image. Refer to the post for the theory behind this idea, but here, we are looking for two values: 110 and 40. The first one occurs when an end-point has been found. The second one when a line intersections is found. Let's setup the convolution:
# Threshold the image so that white pixels get a value of 0 and
# black pixels a value of 10:
_, binaryImage = cv2.threshold(skeleton, 128, 10, cv2.THRESH_BINARY)
# Set the convolution kernel:
h = np.array([[1, 1, 1],
[1, 10, 1],
[1, 1, 1]])
# Convolve the image with the kernel:
imgFiltered = cv2.filter2D(binaryImage, -1, h)
# Create list of thresholds:
thresh = [110, 40]
The first part is done. We are going to detect end-points and intersections in two separated steps. Each step will produce a partial result, we can OR both results to get a final mask:
# Prepare the final mask of points:
(height, width) = binaryImage.shape
pointsMask = np.zeros((height, width, 1), np.uint8)
# Perform convolution and create points mask:
for t in range(len(thresh)):
# Get current threshold:
currentThresh = thresh[t]
# Locate the threshold in the filtered image:
tempMat = np.where(imgFiltered == currentThresh, 255, 0)
# Convert and shape the image to a uint8 height x width x channels
# numpy array:
tempMat = tempMat.astype(np.uint8)
tempMat = tempMat.reshape(height,width,1)
# Accumulate mask:
pointsMask = cv2.bitwise_or(pointsMask, tempMat)
This is the final mask of points:
Note that the white pixels are the locations that matched our target patterns. Those are the points we are looking for. As the shape is not a perfect triangle, some points could be duplicated. We can "merge" neighboring blobs by applying a morphological dilation:
# Set kernel (structuring element) size:
kernelSize = 7
# Set operation iterations:
opIterations = 3
# Get the structuring element:
morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Perform Dilate:
morphoImage = cv2.morphologyEx(pointsMask, cv2.MORPH_DILATE, morphKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
This is the result:
Very nice, we have now big clusters of pixels (or blobs). To get their coordinates, one possible approach would be to get the bounding rectangles of these contours and compute their centroids:
# Look for the outer contours (no children):
contours, _ = cv2.findContours(morphoImage, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Store the points here:
pointsList = []
# Loop through the contours:
for i, c in enumerate(contours):
# Get the contours bounding rectangle:
boundRect = cv2.boundingRect(c)
# Get the centroid of the rectangle:
cx = int(boundRect[0] + 0.5 * boundRect[2])
cy = int(boundRect[1] + 0.5 * boundRect[3])
# Store centroid into list:
pointsList.append( (cx,cy) )
# Set centroid circle and text:
color = (0, 0, 255)
cv2.circle(inputImageCopy, (cx, cy), 3, color, -1)
font = cv2.FONT_HERSHEY_COMPLEX
string = str(cx) + ", " + str(cy)
cv2.putText(inputImageCopy, str(i) + ':' + string, (cx, cy), font, 0.5, (255, 0, 0), 1)
# Show image:
cv2.imshow("Circles", inputImageCopy)
cv2.waitKey(0)
These are the points located in the original input:
Note also that I've stored their coordinates in the pointsList list:
# Print the list of points:
print(pointsList)
This prints the centroids as the tuple (centroidX, centroidY):
[(717, 971), (22, 960), (183, 587), (568, 586), (388, 98)]

flip with any degree of angle?

This is the one of python example of flip the image with the corresponding box
is it possible to write this as flip 20 degree angle or 20,25 or 45 degree angle?
how to change this code to make any degree of angles???
import torchvision.transforms.functional as FT
def flip(image, boxes):
"""
Flip image horizontally.
:param image: image, a PIL Image
:param boxes: bounding boxes in boundary coordinates, a tensor of dimensions (n_objects, 4)
:return: flipped image, updated bounding box coordinates
"""
# Flip image
new_image = FT.hflip(image)
#print('new_image->',new_image)
# Flip boxes
new_boxes = boxes
new_boxes[:, 0] = image.width - boxes[:, 0] - 1
new_boxes[:, 2] = image.width - boxes[:, 2] - 1
new_boxes = new_boxes[:, [2, 1, 0, 3]]
return new_image, new_boxes
You can use torchvision.transforms.functional.rotate to rotate image by an angle
e.g.
import torchvision.transforms.functional as FT
image = FT.rotate(image, angle)
You can also randomly select angle either using
import random
# randonly select angle b/w 20 and 45
angle = random.randint(20, 45)
or using torchvision.transforms.RandomRotation
select random rotation b/w min and max degree
loader_transform = torchvision.transforms.RandomRotation(degrees=(min, max))
img = loader_transform(img)

Apply an affine transform to a bounding rectangle

I am working on a pedestrian tracking algorithm using Python3 & OpenCV.
We can use SIFT keypoints as an identifier of a pedestrian silhouette on a frame and then perform brute force matching between two sets of SIFT keypoints (i.e. between one frame and the next one) to find the pedestrian in the next frame.
To visualize this on the sequence of frames, we can draw a bounding rectangle delimiting the pedestrian. This is what it looks like :
The main problem is about characterizing the motion of the pedestrian using the keypoints. The idea here is to find an affine transform (that is translation in x & y, rotation & scaling) using the coordinates of the keypoints on 2 successives frames. Ideally, this affine transform somehow corresponds to the motion of the pedestrian. To track this pedestrian, we would then just have to apply the same affine transform on the bounding rectangle coordinates.
That last part doesn’t work well. The rectangle consistently shrinks over several frames to inevitably disappear or drifts away from the pedestrian, as you see below or on the previous image :
To specify, we characterize the bounding rectangle with 2 extreme points :
There are some built-in cv2 functions that can apply an affine transform to an image, like cv2.warpAffine(), but I want to apply it only to the bounding rectangle coordinates (i.e 2 points or 1 point + width & height).
To find the affine transform between the 2 sets of keypoints, I’ve written my own function (I can post the code if it helps), but I’ve observed similar results when using cv2.getAffineTransform() for instance.
Do you know how to properly apply an affine transform to this bounding rectangle ?
EDIT : here’s some explanation & code for better context :
The pedestrian detection is done with the pre-trained SVM classifier available in openCV : hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector()) & hog.detectMultiScale()
Once a first pedestrian is detected, the SVM returns the coordinates of the associated bounding rectangle (xA, yA, w, h) (we stop using the SVM after the 1st detection as it is quite slow, and we are focusing on one pedestrian for now)
We select the corresponding region of the current frame, with image[yA: yA+h, xA: xA+w] and search for SURF keypoints within with surf.detectAndCompute()
This returns the keypoints & their associated descriptors (an array of 64 characteristics for each keypoint)
We perform brute force matching, based on the L2-norm between the descriptors and the distance in pixels between the keypoints to construct pairs of keypoints between the current frame & the previous one. The code for this function is pretty long, but should be similar to cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
Once we have the matched pairs of keypoints, we can use them to find the affine transform with this function :
previousKpts = previousKpts[:5] # select 4 best matches
currentKpts = currentKpts[:5]
# build A matrix of shape [2 * Nb of keypoints, 4]
A = np.ndarray(((2 * len(previousKpts), 4)))
for idx, keypoint in enumerate(previousKpts):
# Keypoint.pt = (x-coord, y-coord)
A[2 * idx, :] = [keypoint.pt[0], -keypoint.pt[1], 1, 0]
A[2 * idx + 1, :] = [keypoint.pt[1], keypoint.pt[0], 0, 1]
# build b matrix of shape [2 * Nb of keypoints, 1]
b = np.ndarray((2 * len(previousKpts), 1))
for idx, keypoint in enumerate(currentKpts):
b[2 * idx, :] = keypoint.pt[0]
b[2 * idx + 1, :] = keypoint.pt[1]
# convert the numpy.ndarrays to matrix :
A = np.matrix(A)
b = np.matrix(b)
# solution of the form x = [x1, x2, x3, x4]' = ((A' * A)^-1) * A' * b
x = np.linalg.inv(A.T * A) * A.T * b
theta = math.atan2(x[1, 0], x[0, 0]) # outputs rotation angle in [-pi, pi]
alpha = math.sqrt(x[0, 0] ** 2 + x[1, 0] ** 2) # scaling parameter
bx = x[2, 0] # translation along x-axis
by = x[3, 0] # translation along y-axis
return theta, alpha, bx, by
We then just have to apply the same affine transform to the corner points of the bounding rectangle :
# define the 4 bounding points using xA, yA
xB = xA + w
yB = yA + h
rect_pts = np.array([[[xA, yA]], [[xB, yA]], [[xA, yB]], [[xB, yB]]], dtype=np.float32)
# warp the affine transform into a full perspective transform
affine_warp = np.array([[alpha*np.cos(theta), -alpha*np.sin(theta), tx],
[alpha*np.sin(theta), alpha*np.cos(theta), ty],
[0, 0, 1]], dtype=np.float32)
# apply affine transform
rect_pts = cv2.perspectiveTransform(rect_pts, affine_warp)
xA = rect_pts[0, 0, 0]
yA = rect_pts[0, 0, 1]
xB = rect_pts[3, 0, 0]
yB = rect_pts[3, 0, 1]
return xA, yA, xB, yB
Save the updated rectangle coordinates (xA, yA, xB, yB), all current keypoints & descriptors, and iterate over the next frame : select image[yA: yB, xA: xA] using (xA, yA, xB, yB) we previously saved, get SURF keypoints etc.
As Micka suggested, cv2.perspectiveTransform() is an easy way to accomplish this. You'll just need to turn your affine warp into a full perspective transform (homography) by adding a third row at the bottom with the values [0, 0, 1]. For example, let's put a box with w, h = 100, 200 at the point (10, 20) and then use an affine transformation to shift the points so that the box is moved to (0, 0) (i.e. shift 10 pixels to the left and 20 pixels up):
>>> xA, yA, w, h = (10, 20, 100, 200)
>>> xB, yB = xA + w, yA + h
>>> rect_pts = np.array([[[xA, yA]], [[xB, yA]], [[xA, yB]], [[xB, yB]]], dtype=np.float32)
>>> affine_warp = np.array([[1, 0, -10], [0, 1, -20], [0, 0, 1]], dtype=np.float32)
>>> cv2.perspectiveTransform(rect_pts, affine_warp)
array([[[ 0., 0.]],
[[ 100., 0.]],
[[ 0., 200.]],
[[ 100., 200.]]], dtype=float32)
So that works perfectly as expected. You could also just simply transform the points yourself with matrix multiplication:
>>> rect_pts.dot(affine_warp[:, :2]) + affine_warp[:, 2]
array([[[ 0., 0.]],
[[ 100., 0.]],
[[ 0., 200.]],
[[ 100., 200.]]], dtype=float32)

Why does contourf (matplotlib) switch x and y coordinates?

I am trying to get contourf to plot my stuff right, but it seems to switch the x and y coordinates. In the example below, I show this by evaluating a 2d Gaussian function that has different widths in x and y directions. With the values given, the width in y direction should be larger. Here is the script:
from numpy import *
from matplotlib.pyplot import *
xMax = 50
xNum = 100
w0x = 10
w0y = 15
dx = xMax/xNum
xGrid = linspace(-xMax/2+dx/2, xMax/2-dx/2, xNum, endpoint=True)
yGrid = xGrid
Int = zeros((xNum, xNum))
for idX in range(xNum):
for idY in range(xNum):
Int[idX, idY] = exp(-((xGrid[idX]/w0x)**2 + (yGrid[idY]/(w0y))**2))
fig = figure(6)
clf()
ax = subplot(2,1,1)
X, Y = meshgrid(xGrid, yGrid)
contour(X, Y, Int, colors='k')
plot(array([-xMax, xMax])/2, array([0, 0]), '-b')
plot(array([0, 0]), array([-xMax, xMax])/2, '-r')
ax.set_aspect('equal')
xlabel("x")
ylabel("y")
subplot(2,1,2)
plot(xGrid, Int[:, int(xNum/2)], '-b', label='I(x, y=max/2)')
plot(xGrid, Int[int(xNum/2), :], '-r', label='I(x=max/2, y)')
ax.set_aspect('equal')
legend()
xlabel(r"x or y")
ylabel(r"I(x or y)")
The figure thrown out is this:
On top the contour plot which has the larger width in x direction (not y). Below are slices shown, one across x direction (at constant y=0, blue), the other in y direction (at constant x=0, red). Here, everything seems fine, the y direction is broader than the x direction. So why would I have to transpose the array in order to have it plotted as I want? This seems unintuitive to me and not in agreement with the documentation.
It helps if you think of a 2D array's shape not as (x, y) but as (rows, columns), because that is how most math routines interpret them - including matplotlib's 2D plotting functions. Therefore, the first dimension is vertical (which you call y) and the second dimension is horizontal (which you call x).
Note that this convention is very prominent, even in numpy. The function np.vstack is supposed to concatenate arrays vertically works along the first dimension and np.hstack works horizontally on the second dimension.
To illustrate the point:
import numpy as np
import matplotlib.pyplot as plt
a = np.array([[0, 0, 1, 0, 0],
[0, 1, 1, 1, 0],
[1, 1, 1, 1, 1]])
a[:, 2] = 2 # set column
print(a)
plt.imshow(a)
plt.contour(a, colors='k')
This prints
[[0 0 2 0 0]
[0 1 2 1 0]
[1 1 2 1 1]]
and consistently plots
According to your convention that an array is (x, y) the command a[:, 2] = 2 should have assigned to the third row, but numpy and matplotlib both agree that it was the column :)
You can of course use your own convention how to interpret the dimensions of your arrays, but in the long run it will be more consistent to treat them as (y, x).

Resources