OpenAI Gym Observation Space for a list with unspecified size - openai-gym

I have an observation space that could be an array of any size (0,1,2,..., self.max_size) and all elements are anywhere from 1 to self.size. How can I implement this in code (which gym space can I use)?

Related

(Python 3) Mapping objects to colors in an image according to their size

Say, after a some segmentation, you have the following image:
It doesn't look polished, I know, but after some morphological operations it will turn exactly into what I need it for.
My question is: is there any scikit-image / OpenCV / other library's method which would take this image as input, would calculate objects' size and return a mapped/labeled result as follows: All objects which are bigger than average size - colored in red, the rest - in blue?
I've used scipy.ndimage.label() to get total objects' amount and number of pixels per object, and then calculated the average object size in the array.

How to fit a huge distance matrix into a memory?

I have a huge distance matrix of size aroud 590000 * 590000 (data type of each element is float16). Will it fit in Memory for clustering algorithm ?? If not could anyone give an idea of using it in clustering DBSCAN algorithm??
590000 * 590000 * 2 bytes (float16 size) = 696.2 GB of RAM
It won't fit in memory with a standard computer. Moreover, float16 are converted to float32 in order to perform computations (see Python numpy float16 datatype operations, and float8?), so it might use a lot more than 700GB of RAM.
Why do you have a square matrix ? Can't you use a condensed matrix ? It will use half the memory needed with a square matrix.
Clustering (creating chunks) to decrease the problem size for DBSCAN can e.g. be done by having areas with overlapping regions.
The size of the overlapping regions has to fit your problem.
Find a reasonable size for the chunks of your problem and the overlapping region.
Afterwards stitch the results manually by iterating and comparing the clusters found in the overlapping regions.
You have to check if the elements in one cluster are also present in other chunks.
You might have to apply some stitching parameters, e.g. if some number of elements are in clusters in two different chunks they are the same cluster.
I just saw this:
The problem apparently is a non-standard DBSCAN implementation in
scikit-learn. DBSCAN does not need a distance matrix.
But this has probably been fixed years ago.
Which implementation are you using?
DBSCAN only needs the neighbors of each point.
So if you would know the appropriate parameters (which I doubt), you could read the huge matrix one row at a time, and build a list of neighbors within your distance threshold. Assuming that less than 1% are neighbors (on such huge data, you'll likely want to go even lower) that would reduce the memory need 100x.
But usually you want to avoid computing such a matrix at all!

python 3.x: which is more efficient: list of lists vs. dict?

Imaging you are doing a BFS over a grid (e.g. shortest distance between two cells). Two data structures can be used to host the visited info:
1) List of lists, i.e. data = [[False for _ in range(cols)] for _ in range(rows)]. Later we can access the data in a certain cell by data[r][c].
2) Dict, i.e. data = dict(). Later we can access the data in a certain cell by data[(r, c)].
My question is: which is computationally more efficient in such BFS scenario?
Coding wise it seems the dict approach saves more characters/lines. Memory wise the dict approach can potentially save some space for untouched cells, but can also waste some space for hashtable's extra space.
EDIT
#Peteris mentioned numpy arrays. The advantage over list of lists is obvious: numpy arrays operate on continuous blocks of memory, which allow faster addressing and more cache hits. However I'm not sure how they compare to hashtables (i.e. dict). If the algorithm touches relatively small number of elements, hashtables might provide more cache hits given it's potentially smaller memory footprint.
Also, the truth is that numpy arrays are unavailable to me. So I really need to compare list of lists against dict.
A 2D array
The efficient answer to storing 2D data is 2D arrays/matrixes allocated into a continuous area of memory (not like a list of lists). This avoids the multiple memory lookups required otherwise, and the calculation of a hash value at every lookup that's needed for a dict.
The standard way to do this in python is with the numpy library, here's a simple example
import numpy as np
data = np.zeros( (100, 200) ) # create a 100x200 array and initialize it with zeroes
data[10,20] = 1 # set element at coordinates 10,20 to 1

How to reduce an unknown size data into a fixed size data? Please read details

Example:
Given n number of images marked 1 to n where n is unknown, I can calculate a property of every image which is a scalar quantity. Now I have to represent this property of all images in a fixed size vector (say 5 or 10).
One naive approach can be this vector- [avg max min std_deviation]
And I also want to include the effect of relative positions of those images.
What your are looking for is called feature extraction.
There are many techniques for the same, for images:
For your purpose try:
PCA
Auto-encoders
Convolutional Auto-encoders, 1 & 2
You could also look into conventional (old) methods like SIFT, HOG, Edge Detection, but they all will need an extra step for making them to a smaller-fixed size.

process 35 x 35 kernel using convolution method

Dear all, I would like to do a convolution using a 35 x 35 kernel. Any suggestion? or any method already in opencv i can use? Because now the cvfilter2d can only support until 10 x 10 kernel.
If you just need quick-and-dirty solution due to OpenCV's size limitation, then you can divide the 35x35 kernel into a 5x5 set of 7x7 "kernel tiles", apply each "kernel tile" to the image to get an output, then shift the result and combine them to get the final sum.
General suggestions for convolution with large 2D kernels:
Try to use kernels that are separable, i.e. a kernel that is the outer product of a column vector and a row vector. In other words, the matrix that represents the kernel is rank-1.
Try use the FFT method. Convolution in the spatial domain is the same as elementwise conjugate multiplication in the frequency domain.
If the kernel is full-rank and for the application's purpose it cannot be modified, then consider using SVD to decompose the kernel into a set of 35 rank-1 matrices (each of which can be expressed as the outer product of a column vector and a row vector), and perform convolution only with the matrices associated with the largest singular values. This introduces errors into the results, but the error can be estimated based on the singular values. (a.k.a. the MATLAB method)
Other special cases:
Kernels that can be expressed as sum of overlapping rectangular blocks can be computed using the integral image (the method used in Viola-Jones face detection).
Kernels that are smooth and modal (with a small number of peaks) can be approximated by sum of 2D Gaussians.

Resources