crop xarray unstructured mesh by geographic extension - geospatial

The problem
Hi all,
I am working with an unstructured triangular mesh in Xarray and I have problems cropping it by spatial extent in an efficient way. My xarray dims and coordinates are as follows:
Dimensions:
time: 25, nMesh2D_face: 149596, max_nMesh2D_face_nodes: 3, nMesh2D_node: 77059
Coordinates:
time(time)datetime64[ns] 2022-01-01 ... 2022-01-01T04:00:00
Mesh2D_node_x(nMesh2D_node)float64 -81.54 -81.54 ... -81.53 -81.55
Mesh2D_node_y(nMesh2D_node)float64 30.41 30.41 30.41 ... 30.4 30.4
Mesh2D_face_x(nMesh2D_face)float32 ...
Mesh2D_face_y(nMesh2D_face)float32 ...
I am trying to crop this efficiently by x and y values.
What have I tried
The common solution for other types of data is the .sel() function (see for example here or here, as in:
cropped_data = data.sel(Mesh2D_node_x=slice(min_x, max_x), Mesh2D_node_x=slice(min_y, max_y))
but this doesn't work in my case because Mesh2D_node_x and Mesh2D_node_y are just coordinates and not dimensions, but my dimension do not have geographic coordinates themselves, so this throws an error when I try to run it.
I have also tried to use the .where() such as :
mask_x = (data.Mesh2D_node_x >= xmin) & (data.Mesh2D_node_x <= xmax)
mask_y = (data.Mesh2D_node_y >= ymin) & (data.Mesh2D_node_y <= ymax)
masked_data = data.where(mask_x & mask_y, drop=True)
but this loads the whole dataset in memory and causes memory issues aside form being extremely inefficient.
I would try RioXarray but it seems it doesn't work for unstructured mesh before translating to grid which makes very inefficient.
So what are my options here, is there something else I should try?

The bets solution I found was to crop the vertex-face layer. Once you eliminate your vertex outside the bounding box from there, the other layers get cropped as well

Related

Generate an Image Dataset from a Single Image

I have a single image that looks like this:
And I need to generate an image dataset that keeps the basic characteristics of this image but adds some noise, such as we see a line at 1:30 time in the image.
Mainly, there's the pink part of the image (vertical lines), blue part (central bluesh hue) and yellow/green part at the edges. I'm looking to "learn" the image in a way that I could control these 3 things and randomly generate:
bluesh central hue's small colors changes and size
vertical pink lines thickness and color
Yellow/Green edges and their size (I could expand them at the expense of blue in the middle or vice virsa
CONSTRAINT: The yellowish circle (which is image of a semi-conductor wafer) cannot change in size or shape. It can move on top of the black square though. structures inside it can change as well, as mentioned in above 3 points.
This might be an easy question for people with experience in computer vision but I, unfortunately, don't have a lot of experience in this domain. So, I'd love to get any ideas on making progress in this direction. Thanks.
Changing the shape of your inner structures while safely keeping all possible characteristics seems non-trivial to me. There are however a number of simple transformation you could do to create an augmented dataset such as:
Mirroring: Horizontally, vertically, diagonally - will keep all of your line characteristics
Rotation: Normally you would also do some rotations, but this will obviously change the orientation of your lines which you want to preserve, so this does not apply in your case
Shearing: Might still apply and work nicely to add some robustness, as long as you don't overdo it and end up bending your features too much
Other than that you might also want to add some noise to your image, or transformed versions of it as listed above, such as Gaussian noise or salt and pepper noise.
You could also play around with the color values, e.g. by slighly shifting the saturation of different hue values in HSV space.
You can combine any of those methods in different combinations, if you try all possible permutations with different amount/type of noise you will get quite a big dataset.
One approach is using keras's ImageDataGenerator
Decide how many samples you want? Assume 5.
total_number = 5
Initialize ImageDataGenerator class. For instance
data_gen = ImageDataGenerator(rescale=1. / 255, shear_range=0.2,
zoom_range=0.2, horizontal_flip=True)
Turn your image to the tensor.
img = load_img("xIzEG.png", grayscale=False) # You can also create gray-images.
arr = img_to_array(img)
tensor_img = arr.reshape((1, ) + arr.shape)
Create a folder you want to store the result, i.e. populated, then Populate
for i, _ in enumerate(data_gen.flow(x=tensor_img,
batch_size=1,
save_to_dir="populated",
save_prefix="generated",
save_format=".png")):
if i > total_number:
break
Now, if you look at your populated folder:
Code
from keras.preprocessing.image import load_img, img_to_array
from keras.preprocessing.image import ImageDataGenerator
# Total Generated number
total_number = 5
data_gen = ImageDataGenerator(rescale=1. / 255, shear_range=0.2,
zoom_range=0.2, horizontal_flip=True)
# Create image to tensor
img = load_img("xIzEG.png", grayscale=False)
arr = img_to_array(img)
tensor_image = arr.reshape((1, ) + arr.shape)
for i, _ in enumerate(data_gen.flow(x=tensor_image,
batch_size=1,
save_to_dir="populated",
save_prefix="generated",
save_format=".png")):
if i > total_number:
break

Comparing 2 image content using python [duplicate]

I'm trying to compare images to each other to find out whether they are different. First I tried to make a Pearson correleation of the RGB values, which works also quite good unless the pictures are a litte bit shifted. So if a have a 100% identical images but one is a little bit moved, I get a bad correlation value.
Any suggestions for a better algorithm?
BTW, I'm talking about to compare thousand of imgages...
Edit:
Here is an example of my pictures (microscopic):
im1:
im2:
im3:
im1 and im2 are the same but a little bit shifted/cutted, im3 should be recognized as completly different...
Edit:
Problem is solved with the suggestions of Peter Hansen! Works very well! Thanks to all answers! Some results can be found here
http://labtools.ipk-gatersleben.de/image%20comparison/image%20comparision.pdf
A similar question was asked a year ago and has numerous responses, including one regarding pixelizing the images, which I was going to suggest as at least a pre-qualification step (as it would exclude very non-similar images quite quickly).
There are also links there to still-earlier questions which have even more references and good answers.
Here's an implementation using some of the ideas with Scipy, using your above three images (saved as im1.jpg, im2.jpg, im3.jpg, respectively). The final output shows im1 compared with itself, as a baseline, and then each image compared with the others.
>>> import scipy as sp
>>> from scipy.misc import imread
>>> from scipy.signal.signaltools import correlate2d as c2d
>>>
>>> def get(i):
... # get JPG image as Scipy array, RGB (3 layer)
... data = imread('im%s.jpg' % i)
... # convert to grey-scale using W3C luminance calc
... data = sp.inner(data, [299, 587, 114]) / 1000.0
... # normalize per http://en.wikipedia.org/wiki/Cross-correlation
... return (data - data.mean()) / data.std()
...
>>> im1 = get(1)
>>> im2 = get(2)
>>> im3 = get(3)
>>> im1.shape
(105, 401)
>>> im2.shape
(109, 373)
>>> im3.shape
(121, 457)
>>> c11 = c2d(im1, im1, mode='same') # baseline
>>> c12 = c2d(im1, im2, mode='same')
>>> c13 = c2d(im1, im3, mode='same')
>>> c23 = c2d(im2, im3, mode='same')
>>> c11.max(), c12.max(), c13.max(), c23.max()
(42105.00000000259, 39898.103896795357, 16482.883608327804, 15873.465425120798)
So note that im1 compared with itself gives a score of 42105, im2 compared with im1 is not far off that, but im3 compared with either of the others gives well under half that value. You'd have to experiment with other images to see how well this might perform and how you might improve it.
Run time is long... several minutes on my machine. I would try some pre-filtering to avoid wasting time comparing very dissimilar images, maybe with the "compare jpg file size" trick mentioned in responses to the other question, or with pixelization. The fact that you have images of different sizes complicates things, but you didn't give enough information about the extent of butchering one might expect, so it's hard to give a specific answer that takes that into account.
I have one done this with an image histogram comparison. My basic algorithm was this:
Split image into red, green and blue
Create normalized histograms for red, green and blue channel and concatenate them into a vector (r0...rn, g0...gn, b0...bn) where n is the number of "buckets", 256 should be enough
subtract this histogram from the histogram of another image and calculate the distance
here is some code with numpy and pil
r = numpy.asarray(im.convert( "RGB", (1,0,0,0, 1,0,0,0, 1,0,0,0) ))
g = numpy.asarray(im.convert( "RGB", (0,1,0,0, 0,1,0,0, 0,1,0,0) ))
b = numpy.asarray(im.convert( "RGB", (0,0,1,0, 0,0,1,0, 0,0,1,0) ))
hr, h_bins = numpy.histogram(r, bins=256, new=True, normed=True)
hg, h_bins = numpy.histogram(g, bins=256, new=True, normed=True)
hb, h_bins = numpy.histogram(b, bins=256, new=True, normed=True)
hist = numpy.array([hr, hg, hb]).ravel()
if you have two histograms, you can get the distance like this:
diff = hist1 - hist2
distance = numpy.sqrt(numpy.dot(diff, diff))
If the two images are identical, the distance is 0, the more they diverge, the greater the distance.
It worked quite well for photos for me but failed on graphics like texts and logos.
You really need to specify the question better, but, looking at those 5 images, the organisms all seem to be oriented the same way. If this is always the case, you can try doing a normalized cross-correlation between the two images and taking the peak value as your degree of similarity. I don't know of a normalized cross-correlation function in Python, but there is a similar fftconvolve() function and you can do the circular cross-correlation yourself:
a = asarray(Image.open('c603225337.jpg').convert('L'))
b = asarray(Image.open('9b78f22f42.jpg').convert('L'))
f1 = rfftn(a)
f2 = rfftn(b)
g = f1 * f2
c = irfftn(g)
This won't work as written since the images are different sizes, and the output isn't weighted or normalized at all.
The location of the peak value of the output indicates the offset between the two images, and the magnitude of the peak indicates the similarity. There should be a way to weight/normalize it so that you can tell the difference between a good match and a poor match.
This isn't as good of an answer as I want, since I haven't figured out how to normalize it yet, but I'll update it if I figure it out, and it will give you an idea to look into.
If your problem is about shifted pixels, maybe you should compare against a frequency transform.
The FFT should be OK (numpy has an implementation for 2D matrices), but I'm always hearing that Wavelets are better for this kind of tasks ^_^
About the performance, if all the images are of the same size, if I remember well, the FFTW package created an specialised function for each FFT input size, so you can get a nice performance boost reusing the same code... I don't know if numpy is based on FFTW, but if it's not maybe you could try to investigate a little bit there.
Here you have a prototype... you can play a little bit with it to see which threshold fits with your images.
import Image
import numpy
import sys
def main():
img1 = Image.open(sys.argv[1])
img2 = Image.open(sys.argv[2])
if img1.size != img2.size or img1.getbands() != img2.getbands():
return -1
s = 0
for band_index, band in enumerate(img1.getbands()):
m1 = numpy.fft.fft2(numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size))
m2 = numpy.fft.fft2(numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size))
s += numpy.sum(numpy.abs(m1-m2))
print s
if __name__ == "__main__":
sys.exit(main())
Another way to proceed might be blurring the images, then subtracting the pixel values from the two images. If the difference is non nil, then you can shift one of the images 1 px in each direction and compare again, if the difference is lower than in the previous step, you can repeat shifting in the direction of the gradient and subtracting until the difference is lower than a certain threshold or increases again. That should work if the radius of the blurring kernel is larger than the shift of the images.
Also, you can try with some of the tools that are commonly used in the photography workflow for blending multiple expositions or doing panoramas, like the Pano Tools.
I have done some image processing course long ago, and remember that when matching I normally started with making the image grayscale, and then sharpening the edges of the image so you only see edges. You (the software) can then shift and subtract the images until the difference is minimal.
If that difference is larger than the treshold you set, the images are not equal and you can move on to the next. Images with a smaller treshold can then be analyzed next.
I do think that at best you can radically thin out possible matches, but will need to personally compare possible matches to determine they're really equal.
I can't really show code as it was a long time ago, and I used Khoros/Cantata for that course.
First off, correlation is a very CPU intensive rather inaccurate measure for similarity. Why not just go for the sum of the squares if differences between individual pixels?
A simple solution, if the maximum shift is limited: generate all possible shifted images and find the one that is the best match. Make sure you calculate your match variable (i.e. correllation) only over the subset of pixels that can be matched in all shifted images. Also, your maximum shift should be significantly smaller than the size of your images.
If you want to use some more advances image processing techniques I suggest you look at SIFT this is a very powerfull method that (theoretically anyway) can properly match items in images independent of translation, rotation and scale.
I guess you could do something like this:
estimate vertical / horizontal displacement of reference image vs the comparison image. a
simple SAD (sum of absolute difference) with motion vectors would do to.
shift the comparison image accordingly
compute the pearson correlation you were trying to do
Shift measurement is not difficult.
Take a region (say about 32x32) in comparison image.
Shift it by x pixels in horizontal and y pixels in vertical direction.
Compute the SAD (sum of absolute difference) w.r.t. original image
Do this for several values of x and y in a small range (-10, +10)
Find the place where the difference is minimum
Pick that value as the shift motion vector
Note:
If the SAD is coming very high for all values of x and y then you can anyway assume that the images are highly dissimilar and shift measurement is not necessary.
To get the imports to work correctly on my Ubuntu 16.04 (as of April 2017), I installed python 2.7 and these:
sudo apt-get install python-dev
sudo apt-get install libtiff5-dev libjpeg8-dev zlib1g-dev libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python-tk
sudo apt-get install python-scipy
sudo pip install pillow
Then I changed Snowflake's imports to these:
import scipy as sp
from scipy.ndimage import imread
from scipy.signal.signaltools import correlate2d as c2d
How awesome that Snowflake's scripted worked for me 8 years later!
I propose a solution based on the Jaccard index of similarity on the image histograms. See: https://en.wikipedia.org/wiki/Jaccard_index#Weighted_Jaccard_similarity_and_distance
You can compute the difference in the distribution of the pixel colors. This is indeed pretty invariant to translations.
from PIL.Image import Image
from typing import List
def jaccard_similarity(im1: Image, im2: Image) -> float:
"""Compute the similarity between two images.
First, for each image an histogram of the pixels distribution is extracted.
Then, the similarity between the histograms is compared using the weighted Jaccard index of similarity, defined as:
Jsimilarity = sum(min(b1_i, b2_i)) / sum(max(b1_i, b2_i)
where b1_i, and b2_i are the ith histogram bin of images 1 and 2, respectively.
The two images must have same resolution and number of channels (depth).
See: https://en.wikipedia.org/wiki/Jaccard_index
Where it is also called Ruzicka similarity."""
if im1.size != im2.size:
raise Exception("Images must have the same size. Found {} and {}".format(im1.size, im2.size))
n_channels_1 = len(im1.getbands())
n_channels_2 = len(im2.getbands())
if n_channels_1 != n_channels_2:
raise Exception("Images must have the same number of channels. Found {} and {}".format(n_channels_1, n_channels_2))
assert n_channels_1 == n_channels_2
sum_mins = 0
sum_maxs = 0
hi1 = im1.histogram() # type: List[int]
hi2 = im2.histogram() # type: List[int]
# Since the two images have the same amount of channels, they must have the same amount of bins in the histogram.
assert len(hi1) == len(hi2)
for b1, b2 in zip(hi1, hi2):
min_b = min(b1, b2)
sum_mins += min_b
max_b = max(b1, b2)
sum_maxs += max_b
jaccard_index = sum_mins / sum_maxs
return jaccard_index
With respect to mean squared error, the Jaccard index lies always in the range [0,1], thus allowing for comparisons among different image sizes.
Then, you can compare the two images, but after rescaling to the same size! Or pixel counts will have to be somehow normalized. I used this:
import sys
from skincare.common.utils import jaccard_similarity
import PIL.Image
from PIL.Image import Image
file1 = sys.argv[1]
file2 = sys.argv[2]
im1 = PIL.Image.open(file1) # type: Image
im2 = PIL.Image.open(file2) # type: Image
print("Image 1: mode={}, size={}".format(im1.mode, im1.size))
print("Image 2: mode={}, size={}".format(im2.mode, im2.size))
if im1.size != im2.size:
print("Resizing image 2 to {}".format(im1.size))
im2 = im2.resize(im1.size, resample=PIL.Image.BILINEAR)
j = jaccard_similarity(im1, im2)
print("Jaccard similarity index = {}".format(j))
Testing on your images:
$ python CompareTwoImages.py im1.jpg im2.jpg
Image 1: mode=RGB, size=(401, 105)
Image 2: mode=RGB, size=(373, 109)
Resizing image 2 to (401, 105)
Jaccard similarity index = 0.7238955686269157
$ python CompareTwoImages.py im1.jpg im3.jpg
Image 1: mode=RGB, size=(401, 105)
Image 2: mode=RGB, size=(457, 121)
Resizing image 2 to (401, 105)
Jaccard similarity index = 0.22785529941822316
$ python CompareTwoImages.py im2.jpg im3.jpg
Image 1: mode=RGB, size=(373, 109)
Image 2: mode=RGB, size=(457, 121)
Resizing image 2 to (373, 109)
Jaccard similarity index = 0.29066426814105445
You might also consider experimenting with different resampling filters (like NEAREST or LANCZOS), as they, of course, alter the color distribution when resizing.
Additionally, consider that swapping images change the results, as the second image might be downsampled instead of upsampled (After all, cropping might better suit your case rather than rescaling.)

How can I use VIPS for image normalization?

I want to normalize the exposure and color palettes of a set of images. For context, this is for training a neural net in image classification on medical images. I'm also doing this for hundreds of thousands of images, so efficiency is very important.
So far I've been using VIPS, specifically PyVIPS, and would prefer a solution using that library. After finding this answer and looking through the documentation, I tried
x = pyvips.Image.new_from_file('test.ndpi')
x = x.hist_norm()
x.write_to_file('test_normalized.tiff')
but that seems to always produce a pure-white image.
You need hist_equal for histogram equalisation.
The main docs are here:
https://libvips.github.io/libvips/API/current/libvips-histogram.html
However, that will be extremely slow for large slide images. It will need to scan the whole slide once to build the histogram, then scan again to equalise it. It would be much faster to find the histogram of a low-res layer, then use that to equalise the high-res one.
For example:
#!/usr/bin/env python3
import sys
import pyvips
# open the slide image and get the number of layers ... we are not fetching
# pixels, so this is quick
x = pyvips.Image.new_from_file(sys.argv[1])
levels = int(x.get("openslide.level-count"))
# find the histogram of the highest level ... again, this should be quick
x = pyvips.Image.new_from_file(sys.argv[1],
level=levels - 1)
hist = x.hist_find()
# from that, compute the transform for histogram equalisation
equalise = hist.hist_cum().hist_norm()
# and use that on the full-res image
x = pyvips.Image.new_from_file(sys.argv[1])
x = x.maplut(equalise)
x.write_to_file(sys.argv[2])
Another factor is that histogram equalisation is non-linear, so it will distort lightness relationships. It can also distort colour relationships and make noise and compression artifacts look crazy. I tried that program on an image I have here:
$ ~/try/equal.py bild.ndpi[level=7] y.jpg
The stripes are from the slide scanner and the ugly fringes from compression.
I think I would instead find image max and min from the low-res level, then use them to do a simple linear stretch of pixel values.
Something like:
x = pyvips.Image.new_from_file(sys.argv[1])
levels = int(x.get("openslide.level-count"))
x = pyvips.Image.new_from_file(sys.argv[1],
level=levels - 1)
mn = x.min()
mx = x.max()
x = pyvips.Image.new_from_file(sys.argv[1])
x = (x - mn) * (256 / (mx - mn))
x.write_to_file(sys.argv[2])
Did you find the new Region feature in pyvips? It makes generating patches for training MUCH faster, up to 100x faster in some cases:
https://github.com/libvips/pyvips/issues/100#issuecomment-493960943

Geospatial fixed radius cluster hunting in python

I want to take an input of millions of lat long points (with a numerical attribute) and then find all fixed radius geospatial clusters where the sum of the attribute within the circle is above a defined threshold.
I started by using sklearn BallTree to sum the attribute within any defined circle, with the intention of then expanding this out to run across a grid or lattice of circles. The run time for one circle is around 0.01s, so this is fine for small lattices, but won't scale if I want to run 200m radius circles across the whole of the UK.
#example data (use 2m rows from postcode centroid file)
df = pandas.read_csv('National_Statistics_Postcode_Lookup_Latest_Centroids.csv', usecols=[0,1], nrows=2000000)
#this will be our grid of points (or lattice) use points from same file for example
df2 = pandas.read_csv('National_Statistics_Postcode_Lookup_Latest_Centroids.csv', usecols=[0,1], nrows=2000)
#reorder lat long columns for balltree input
columnTitles=["Y","X"]
df = df.reindex(columns=columnTitles)
df2 = df2.reindex(columns=columnTitles)
# assign new columns to existing dataframe. attribute will hold the data we want to sum over (set to 1 for now)
df['attribute'] = 1
df2['aggregation'] = 0
RADIANT_TO_KM_CONSTANT = 6367
class BallTreeIndex:
def __init__(self, lat_longs):
self.lat_longs = np.radians(lat_longs)
self.ball_tree_index =BallTree(self.lat_longs, metric='haversine')
def query_radius(self,query,radius):
radius_km = radius/1000
radius_radiant = radius_km / RADIANT_TO_KM_CONSTANT
query = np.radians(np.array([query]))
indices = self.ball_tree_index.query_radius(query,r=radius_radiant)
return indices[0]
#index the base data
a=BallTreeIndex(df.iloc[:,0:2])
#begin to loop over the lattice to test performance
for i in range(0,100):
b = df2.iloc[i,0:2]
output = a.query_radius(b, 200)
accumulation = sum(df.iloc[output, 2])
df2.iloc[i,2] = accumulation
It feels as if the above code is really inefficient as I don't need to run the calculation across all circles on my lattice (as most will be well below my threshold - or will have no data points in at all).
Instead of this for loop, is there a better way of scaling this algorithm to give me the most dense circles?
I'm new to python, so any help would be massively appreciated!!
First don't try to do this on a sphere! GB is small and we have a well defined geographic projection that will work. So use the oseast1m and osnorth1m columns as X and Y. They are in metres so no need to convert (roughly) to degrees and use Haversine. That should help.
Next add a spatial index to speed up lookups.
If you need more speed there are various tricks like loading a 2R strip across the country into memory and then running your circles across that strip, then moving down a grid step and updating that strip (checking Y values against a fixed value is quick, especially if you store the data sorted on Y then X value). If you need more speed then look at any of the papers the Stan Openshaw (and sometimes I) wrote about parallelising the GAM. There are examples of implementing GAM in python (e.g. this paper, this paper) that may also point to better ways.

Find contour of a sock that contains patterns

I am trying to figure out how I can isolate a non-uniform sock on a picture.
For now I am using edge detection principally as you can see in my code :
main :
# We import the image
image = importImage(filename)
# Save the shapes variables
height, width, _ = np.shape(image)
# Get the gray scale image in a foot shape
grayImage, bigContourArea = getFootShapeImage(image, True)
minArea = width * height / 50
# Extract all contours
contours = getAllContours(grayImage)
# Keep only the contours that are not too big nor too small
relevantContours = getRelevantContours(contours, minArea, maxArea)
And getAllContours does the following :
kernel = np.ones((5, 5), np.uint8)
# Apply Canny Edge detection algorithm
# We apply a Gaussian blur first
edges = cv2.GaussianBlur(grayIm, (5, 5), 0)
# Then we apply Edge detection
edges = cv2.Canny(edges, 10, 100)
# And we do a dilatation followed by erosion to fill gaps
edges = cv2.dilate(edges, kernel, iterations=2)
edges = cv2.erode(edges, kernel, iterations=2)
_, contours, _ = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
Here are some pictures resulting from my code :
Original picture with foot on the drawed shape
Only the biggers contours
All contours
So as you can see there are some parts of the socks that are not taken in the sock contour, and I tried to include the whole sock with several techniques but never succeeded.
I tried the following :
Segmentation using Otsu thresholding, Itti's saliency (In order to have a mask of the sock in the image and avoid all the remaining)
Regroup the smaller contours with the big one to create an even bigger one (But then I can't avoid taking others that are outside the socks)
Do you have an idea on how i can proceed ?
Thanks in advance ! I hope it is clear enough, if you need clarifications just ask.
In order to solve this I had to perform some color detection algorithm in order to detect the white sheet of paper that is here for this special purpose. I did so with the following :
# Define a mask for color I want to isolate
mask = cv2.inRange(image, lowerWhiteVals, upperWhiteVals)
# I also applied some morphological operations on the mask to make it cleaner
Here is the mask image obtained doing so before and after operations:
Then I detect the paper on the image by taking the left-most contour on the mask, and use it as a left boundary, I also split the paper contour to get a bottom boundary well representative.
And for the top and right I used the first sock contour I had, assuming this one will always at least have theses 2 boundaries because of how the socks are.
Once this was done I just took all the contours in my boundaries and created a new contour from that by drawing them all onto a blank image and finding the new contour again (Thanks to #Alexander Reynolds).
I also had to fine tune a bit my algorithm in order to have the more representative contour of the sock at the end and you can see what my final result is on this following image, even if it's not perfect it's more than enough for this small trial with opencv.
Thanks #Alexander for your help. And hope it will help others someday !

Resources