I'd like to use the node indico API. I need to convert the image to grayscale and then to arrays containing arrays/rows of pixel values. Where do I start?
These tools take a specific format for images, a list of lists, each
sub-list containing a 'row' of values corresponding to n pixels in the
image.
e.g. [[float, float, float ... *n ], [float, float, float ... *n ], ... *n]
Since pixels tend to be represented by RGBA values, you can use the
following formula to convert to grayscale.
Y = (0.2126 * R + 0.7152 * G + 0.0722 * B) * A
We're working on automatically scaling images, but for the moment it's
up to you provide a square image
It looks like node's image manipulation tools are sadly a little lacking, but there is a good solution.
get-pixels allows reading in images either from URL or from local path and will convert it into an ndarray that should work excellently for the API.
The API will accept RGB images in the format that get-pixels produces them, but if you're still interested in converting the images to grayscale, which can be helpful for other applications it's actually a little strange.
In a standard RGB image there's basically a luminence score given to each color, which is how bright the color appears. Based on the luminance, a conversion to grayscale for each pixel happens as follows:
Grayscale = 0.2126*R + 0.7152*G + 0.0722*B
Soon the API will also support the direct use of URLs, stay tuned on that front.
I maintain the sharp Node.js module that may be able to get you a little closer to what you need.
The following example will convert input to greyscale and generate a Bufferof integer values, one byte per pixel.
You'll need to add logic to divide by 255 to convert to float then split into an array of arrays to keep the Indico API happy.
sharp(input)
.resize(width, height)
.grayscale()
.raw()
.toBuffer(function(err, data) {
// data is a Buffer containing uint8 values (0-255)
// with each byte representing one pixel
});
Related
The PyTorch function torch.nn.functional.interpolate contains several modes for upsampling, such as: nearest, linear, bilinear, bicubic, trilinear, area.
What is the area upsampling modes used for?
As jodag said, it is resizing using adaptive average pooling. While the answer at the link aims to explain what adaptive average pooling is, I find the explanation a bit vague.
TL;DR the area mode of torch.nn.functional.interpolate is probably one of the most intuitive ways to think of when one wants to downsample an image.
You can think of it as applying an averaging Low-Pass Filter(LPF) to the original image and then sampling. Applying an LPF before sampling is to prevent potential aliasing in the downsampled image. Aliasing can result in Moiré patterns in the downscaled image.
It is probably called "area" because it (roughly) preserves the area ratio between the input and output shapes when averaging the input pixels. More specifically, every pixel in the output image will be the average of a respective region in the input image where the 1/area of this region will be roughly the ratio between output image's area and input image's area.
Furthermore, the interpolate function with mode = 'area' calls the source function adaptie_avg_pool2d (implemented in C++) which assigns each pixel in the output tensor the average of all pixel intensities within a computed region of the input. That region is computed per pixel and can vary in size for different pixels. The way it is computed is by multiplying the output pixel's height and width by the ratio between the input and output (in that order) height and width (respectively) and then taking once the floor (for the region's starting index) and once the ceil (for the region's ending index) of the resulting value.
Here's an in-depth analysis of what happens in nn.AdaptiveAvgPool2d:
First of all, as stated there you can find the source code for adaptive average pooling (in C++) here: source
Taking a look at the function where the magic happens (or at least the magic on CPU for a single frame), static void adaptive_avg_pool2d_single_out_frame, we have 5 nested loops, running over channel dimension, then width, then height and within the body of the 3rd loop the magic happens:
First compute the region within the input image which is used to calculate the value of the current pixel (recall we had width and height loop to run over all pixels in the output).
How is this done?
Using a simple computation of start and end indices for height and width as follows: floor((input_height/output_height) * current_output_pixel_height) for the start and ceil((input_height/output_height) * (current_output_pixel_height+1)) and similarly for the width.
Then, all that is done is to simply average the intensities of all pixels in that region and current channel and place the result in the current output pixel.
I wrote a simple Python snippet that does the same thing, in the same fashion (loops, naive) and produces equivalent results. It takes tensor a and uses adaptive average pool to resize a to shape output_shape in 2 ways - once using the built-in nn.AdaptiveAvgPool2d and once with my translation into Python of the source function in C++: static void adaptive_avg_pool2d_single_out_frame. Built-in function's result is saved into b and my translation is saved into b_hat. You can see that the results are equivalent (you can further play with the spatial shapes and validate this):
import torch
from math import floor, ceil
from torch import nn
a = torch.randn(1, 3, 15, 17)
out_shape = (10, 11)
b = nn.AdaptiveAvgPool2d(out_shape)(a)
b_hat = torch.zeros(b.shape)
for d in range(a.shape[1]):
for w in range(b_hat.shape[3]):
for h in range(b_hat.shape[2]):
startW = floor(w * a.shape[3] / out_shape[1])
endW = ceil((w + 1) * a.shape[3] / out_shape[1])
startH = floor(h * a.shape[2] / out_shape[0])
endH = ceil((h + 1) * a.shape[2] / out_shape[0])
b_hat[0, d, h, w] = torch.mean(a[0, d, startH: endH, startW: endW])
'''
Prints Mean Squared Error = 0 (or a very small number, due to precision error)
as both outputs are the same, proof of output equivalence:
'''
print(nn.MSELoss()(b_hat, b))
Looking at the source code it appears area interpolation is equivalent to resizing a tensor via adaptive average pooling. You can refer to this question for an explanation of adaptive average pooling. Therefore area interpolation is more applicable to downsampling than upsampling.
Suppose I have two OpenCV (python package cv2) loaded grayscale images img1 and img2, both of same dimensions. Now, I wish to take the mean of both img1 and img2. Here are two ways to do it:
# Method 1
mean = (img1 * 0.5) + (img2 * 0.5)
# Method 2
mean = cv2.addWeighted(img1,0.5,img2,0.5,0)
However, mean is visually different in both methods, when I display them using cv2.imshow. Why is this so?
I am glad that you have found a working solution to your problem, but this seems to be a workaround. The real reason for this behaviour lies somewhere else. The problem here is that mean = (img1 * 0.5) + (img2 * 0.5) is returning a matrix with float32 data type which contains values in range 0.0 - 255.0. You can verify this by using print mean.dtype. Since the new matrix values have been converted to float unintentionally, we can revert this operation by using (img_1 * 0.5 + img_2 * 0.5).astype("uint8"). In case of cv2.addWeighted() it automatically returns you a matrix of data type uint8 and all things would work fine.
My concern is with the conclusion that you have drawn:
The issue is that the cv2.imshow() method used to display images,
expects your image arrays to be normalized, i.e. in the range [0,1].
cv2.imshow() works just fine with range of [0-255] and [0.0-1.0], but the issue arises when you pass a matrix whose values are in range [0-255], but the dtype is float32 instead of uint8.
Answering my own question, to help others who get confused by this:
Both methods 1 and 2 yield the same result. You can verify this by writing the mean image to disk using cv2.imwrite. The issue is not with the methods.
The issue is that the cv2.imshow method used to display images, expects your image arrays to be normalized, i.e. in the range [0,1]. In my case, both the image arrays are 8-bit unsigned integers and so, its pixel values are in the range [0,255]. Since mean is an average of the two arrays, its pixel values are also in the range [0,255]. So when I passed mean to cv2.imshow, pixels having values greater than 1 were interpreted as having a value of 255, resulting in vastly different visuals.
The solution is to normalize mean before passing it to cv2.imshow:
# Method 1
mean = (img1 * 0.5) + (img2 * 0.5)
# Method 2
mean = cv2.addWeighted(img1,0.5,img2,0.5,0)
# Note that the division by 255 results in the image array values being squeezed to [0,1].
cv2.imshow("Averaged", mean/255.)
I am trying to use K-Means to find the dominant color of each image in an array of images. The example below uses KMeans from python's sklearn.cluster import.
Say, for example, I have a 100x100 pixel image, I would like to find the dominant color of 5x5 blocks within the 100x100 image.
My current implementation (below) is using K-Means to analyze each 5x5 block one at a time, which is extremely slow for larger image sizes. I would like to feed an array of images in to K-Means and have it return an array of dominant colors, where each index in the returned array corresponds to the index in the images array.
Current implementation:
def get_dominant_color(image):
image = image.reshape((image.shape[0] * image.shape[1], 3))
clt = KMeans(n_clusters = 4)
labels = clt.fit_predict(image)
label_counts = Counter(labels)
dominant_color = clt.cluster_centers_[label_counts.most_common(1)[0][0]]
divisor = np.sum(dominant_color)
if divisor != 0:
# normalize the rgb values
dominant_color = dominant_color / np.sum(dominant_color)
return dominant_color
I've tried modifying this to call clt.fit_predict(images), where images is an array of 5x5 blocks, but I believe this will mix in all the colors of all the images to produce a single output. If possible, how can I manipulate this to independently analyze each individual image?
I am trying get 3 new random floats into my pixel shader for each pixel. Based on what I have read here, here, and also here, I believe that I need to generate a large texture containing random RGB values and then during each draw call randomly generate a couple of texture coordinate offset values to produce a pseudo-random effect. Is the only way to do this through the LockRect and UnlockRect API? I hope not.
They only way I have found to do this is the lock and unlock rectangle method. But it is much easier then I initially thought. Here is how I filled the texture.
'Create Random texture for dithering shader
rando = New Random()
randomText = New Texture(device, 1000, 1000, 1, Usage.Dynamic,
Format.A16B16G16R16, Pool.Default)
'89599
'89510
Dim data(1000 * 1000 * 8 + 1000 * 63 + 936) As Byte
rando.NextBytes(data)
Dim dataBox As DataRectangle =
randomText.GetSurfaceLevel(0).LockRectangle(LockFlags.None)
dataBox.Data.Seek(0, IO.SeekOrigin.Begin)
dataBox.Data.Write(data, 0, data.Length)
dataBox.Data.Close()
As you can see from the code I had to add a lot of extra bytes to completely fill the texture with random values. I used a 1000 x 1000 64bit texture so you would think I would need 1000*1000*8 Bytes of data but I need an extra 63936 Bytes to fill the texture and I am not sure why. But it seems to work for my needs.
I've successfully packed floats with values in [0,1] without losing too much precision using:
byte packedVal = floatVal * 255.0f ; // [0,1] -> [0,255]
Then when I want to unpack the packedVal back into a float, I simply do
float unpacked = packedVal / 255.0f ; // [0,255] -> [0,1]
That works fine, as long as the floats are between 0 and 1.
Now here's the real deal. I'm trying to turn a 3d space vector (with 3 float components) into 4 bytes. The reason I'm doing this is because I am using a texture to store these vectors, with 1 pixel per vector. It should be something like a "normal map", (but not exactly this, you'll see why after the jump)
So there, each pixel represents a 3d space vector. Where the value is very red, the normal vector's direction is mostly +x (to the right).
So of course, normals are normalized. So they don't require a magnitude (scaling) vector. But I'm trying to store a vector with arbitrary magnitude, 1 vector per pixel.
Because textures have 4 components (rgba), I am thinking of storing a scaling vector in the w component.
Any other suggestions for packing an arbitrary sized 3 space vector, (say with upper limit on magnitude of 200 or so on each of x,y,z), into a 4-byte pixel color value?
Storing the magnitude in the 4th component sounds very reasonable. As long as the magnitude is bounded to something reasonable and not completely arbitrary.
If you want a more flexible range of magnitudes you can pre-multiply the normalized direction vector by (0.5, 1.0] when you store it, and when you unpack it multiply it by pow(2, w).
Such method is used for storing high dynamic range images - RGBM encoding (M stands for magnitude). One of it's drawbacks is wrong results from interpolation so you can't use bilinear filtering for your texture.
You can look for other options from HDR encodings: here is a small list of few most popular