How to post-process a Yolov7 ONNX model? - pytorch

I trained a YOLOv7 model on a custom dataset and converted it to ONNX. The input of the model on Netron reads "Float32(1,3,640,640)" which I understand. The output, however, is unclear to me as other tutorials mentioned there should be 6 elements representing bounding box position and size (xywh objectness and class number) but this model outputs 7 elements with an extra 0 as follows:
Float32(concatoutput_dim_0,7)
The output (sample data):
0, 24.744838, 50, 24.744838, 70.46938, 1 ,1, 0, 40.495939, 30.95939, 40.495939, 123.2848439, 1, 1, ...
(Each 7 values start with zero, and the second and fourth are almost always equal). Does the 0 mean (dimension 0)?
Successfully converted yolov7 to onnx, and pre-processed input image. Output is unclear as to why it has a 7th element.

Related

Flat-field correction on hyperspectral data

I am working on hyperspectral data set using the spectral python library. I started using python for the first time on Monday, so everything is taking me a long time.
My data is in envi format, and i believe I have successfully read it in and connverted to numpy arrays.
I am attempting a flat field correction using this code
corrected_nparr = np.divide(np.subtract(data_nparr, dark_nparr), np.subtract(white_nparr, dark_nparr))
ValueError: operands could not be broadcast together with shapes (1367,384,288) (100,384,288)
This doesnt work because my white reference and dark reference are a different size to the data capture.
print(white_nparr.shape)
(297, 384, 288)
print(dark_nparr.shape)
(100, 384, 288)
print(data_nparr.shape)
(1367, 384, 288)
So, I understand why I am getting the error. The original white and dark ref were captured using different image sizes to the dataset. So, my problem is creating a correction for the dataset whilst only having access to references of different sizes
Has anyone handled this before? What approach did you use?
btw the data I am using is mineral hyperspectral data captured from drill core, there is a huge dataset held by Geological Survey Ireland and is free upon request
So, I recieved and extremely helpful answer, which actually sparked a further question
# created these files to broadcast as they are a horizontal line of spectra,
#a 2D array which captures the variation
white_nparr_horiz = white_nparr[-2]
dark_nparr_horiz = dark_nparr[-2]
corrected_nparr = np.divide(np.subtract(data_nparr, dark_nparr_horiz), np.subtract(white_nparr_horiz, dark_nparr_horiz))
white_nparr_horiz.shape
Out[28]: (384, 288)
dark_nparr_horiz.shape Out[29]: (384, 288)
So the shape of these arrays are broadcastable accross the data_ref, and I have tested that it works as I expect with this, on a few different indices, and it does.
a = white_nparr_horiz[150, 144]
b = dark_nparr_horiz[150, 144]
c = data_nparr[500, 150, 144]
d = (c - b)/(a-b)
test = d == corrected_nparr[500, 150, 144]
print(test)
The output from this looks much more as I would expect reflectance data for this material to look, so I believe I am on the right path.
What I would like to do now is have white_nparr_horiz be the mean of each band along the original first axis in the white_ref (297, 384, 288), returned in an array of (384, 288), as opposed to a single value as I believe it is now. I am sure that this is possible, but I cannot figure out how.
As I said above, very new to python, numpy and image analysis, so apologies if this is obvious or I am going in the wrong direction
The problem is that your white and dark references should each be a single spectrum (1D array with 288 values), whereas yours are both 3-dimensional arrays (likely corresponding to image regions). To convert them to 1D, you can compute the mean, max, or min of each array, as appropriate. For example, to take the min of the dark reference and max of the white reference, you could convert them as follows:
dark_nparr = np.min(dark_nparr.reshape(-1, dark_nparr.shape[-1]), axis=0)
white_nparr = np.max(white_nparr.reshape(-1, white_nparr.shape[-1]), axis=0)
The lines above reshape the arrays to 2 dimensions and compute the max (or min) of the reshaped arrays.
If you prefer to use the spectral mean of each array instead, just replace np.max and np.min above with np.mean.
If you want each array to just be averaged over its first dimension, then (i.e., have shape (384, 288)), then just don't reshape the arrays when doing the reduction.
dark_nparr = np.min(dark_nparr, axis=0)
white_nparr = np.max(white_nparr, axis=0)

How do I train a sklearn model on a list of numbers?

I have a sampling type:
Text
Target
TEXT
Yes
TEXT
No
TEXT
Yes
...
...
each text can only belong to one class. But the sample contains items with only 2 out of 3 possible target values.
I use the GradientBoostingClassifier model to train text classification, and the .predict_proba function to get a probabilistic answer. But the sample contains only 2 of the 3 possible values, so the function returns answers of type [float,float] (e.g. [0.8,0.2]), although I want an answer of type [float, float, float] (e.g. [0.7,0.2,0.1]). So I converted the sample values as follows:
Text
Target
TEXT
[1,0,0]
TEXT
[0,1,0]
TEXT
[1,0,0]
...
...
But the model doesn't want to learn from them. An error is displayed.
~/.local/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
108 # for object dtype data, we only check for NaNs (GH-13254)
109 elif X.dtype == np.dtype('object') and not allow_nan:
--> 110 if _object_dtype_isnan(X).any():
111 raise ValueError("Input contains NaN")
112
AttributeError: 'bool' object has no attribute 'any'
How can I make the model be trained on lists?
My code:
gbc = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
max_depth=1, random_state=0)
text_clf = Pipeline([('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', gbc)
])
text_clf.fit(x,y)
classes = text_clf.classes_
logits = text_clf.predict_proba(X_val)
P.S. I'm going to use not only GradientBoostingClassifier, but also GaussianProcessClassifier, LinearSVC, LogisticRegression, LogisticRegressionCV.
Try to add a dummy line with the missing target value to see if anything changes.
This kind of situation could also mean that you don't have enough data for a good machine learning model in general (more data -> more accurate results, no data -> ???)
The Target column should have 3 distinct classes, for example
Text
Target
TEXT
One
TEXT
Two
TEXT
Three
TEXT
Two
...
...
If the training set contains items with only 2 out of 3 possible target values, then the model will be trained as a 2-class classification model. You need to provide all 3 classes (prefer to be in equal proportion) to the model to train it to be a 3-class classification model.

why before embedding, have to make the item be sequential starting at zero

I learn collaborative filtering from this bolg, Deep Learning With Keras: Recommender Systems.
The tutorial is good, and the code working well. Here is my code.
There is one thing confuse me, the author said,
The user/movie fields are currently non-sequential integers representing some unique ID for that entity. We need them to be sequential starting at zero to use for modeling (you'll see why later).
user_enc = LabelEncoder()
ratings['user'] = user_enc.fit_transform(ratings['userId'].values)
n_users = ratings['user'].nunique()
But he didn't seem to metion the reason, I don't why need to do that.Can some one explain for me?
Embeddings are assumed to be sequential.
The first input of Embedding is the input dimension.
So, if the input exceeds the input dimension the value is ignored.
Embedding assumes that max value in the input is input dimension -1 (it starts from 0).
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding?hl=ja
As an example, the following code will generate embeddings only for input [4,3] and will skip the input [7, 8] since input dimension is 5.
I think it is more clear to explain it with tensorflow;
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding
model = Sequential()
model.add(Embedding(5, 1, input_length=2))
input_array = np.array([[4,3], [7,8]])
model.compile('rmsprop', 'mse')
output_array = model.predict(input_array)
You can increase the input dimension to 9 and then you will get embeddings for both inputs.
You could increase the input dimension to max number + 1 in the original data set, but this is not efficient.
It is actually similar to one-hot encoding where sequential data saves great amount of memory.

How to set Keras TimeseriesGenerator to predict the second next value?

Currently I have the following code using TimeseriesGenerator from Keras:
TimeseriesGenerator(train, prediction, length=TIME_STEPS, batch_size=1)
Currently this shifts prediction one value backwards, so the train data for t will have the output of t+1. Which makes sense, but I want to predict t+2, thus train data for t will have the output of t+2.
Is there any way to do it using TimeseriesGenerator?
The quickest solution is to just shift your predictions by 1, ie.:
TimeseriesGenerator(train[:-1], prediction[1:], length=TIME_STEPS, batch_size=1)
Note that you have to trim the train set, so both datasets have equal lengths.
You can also use the timeseries_dataset_from_array function where you can align the data and targets according to your needs as you can read in the documentation:
data: Numpy array or eager tensor containing consecutive data points
(timesteps). Axis 0 is expected to be the time dimension.
targets:
Targets corresponding to timesteps in data. It should have same length
as data. targets[i] should be the target corresponding to the window
that starts at index i (see example 2 below). Pass None if you don't
have target data (in this case the dataset will only yield the input
data).
So in your case it would be something like this:
tf.keras.preprocessing.timeseries_dataset_from_array(
train[:-TIME_STEPS-2],
prediction[TIME_STEPS+2:],
length=TIME_STEPS,
batch_size=1
)

How to load image data by keys in keras?

I've been working on a sequential model which takes images as inputs. However, different thing is that the input images are actually determined by keys.
For example, the training sequence is (You may assume fi is frame id of a video)
{ f1, f2, f3, ..., fn }
and the corresponding image sequence is
{ M[f1], M[f2], M[f3], ..., M[fn] }
where M is a map storing {fi->image} mapping.
Suppose in the next batch, my training sequence become
{ f2, f3, ..., fn+1 }
and the image sequence becomes
{ M[f2], M[f3], M[f4], ..., M[fn+1] }
As you can see, if I directly save the image sequences into disk, there are lot of redundancies (in the above case, M[f2] to M[fn] are saved twice). So it seems necessary that the images are referenced by keys and thus imagedataloader class can not be used.
[EDIT]
My model is a 2-class classifier takes image sequences as input, in which the images are mapped with the frame id(fi). Whether an image sequence is positive or negative is pre-generated in my data_preprocess code.
Positive samples may look like this:
{f3, f4, f5, f6, f7} 1
{f4, f5, f6, f7, f8} 1
{f5, f6, f7, f8, f9} 1
...
While negative samples look like this:
{f1, f2, f3, f4, f5} 0
{f2, f3, f4, f5, f6} 0
{f10, f11, f12, f13, f14} 0
...
So, it is not like image classifying problem, where an image has exactly a fixed label. In my case, every image will be used many times and their being positive or negative are together determined by the whole sequence, but not itself.
[EDIT II]
The images are frames of N videos and are stored on disk like this:
|-data_root/
|-Video 1/
| |-frame_1_1.jpg
| |-frame_1_2.jpg
| ...
|-Video 2/
| |-frame_2_1.jpg
| |-frame_2_2.jpg
| ...
...
...
|-Video N/
| |-frame_N_1.jpg
| |-frame_N_2.jpg
...
What I'd like to do is, given two sequences of frames/images of scenes, the model predicts whether the two scenes are of the same kind.
Since a video may contain a long time span for each scene, I divide the whole sequence of a scene into a number of non-overlap sub-sequences (omit the indexes of videos):
Sequence of scene i: frame_1, frame_2, frame_3, ..., frame_n
Sub-sequence i_1: frame_1, frame_2, frame_3, ..., frame_10
Sub-sequence i_2: frame_11, frame_12, frame_13, ..., frame_20
Sub-sequence i_3: frame_21, frame_22, frame_23, ..., frame_30
...
Then, I randomly generate positive samples Pi (pairs of sub-sequences generated from the same sequence), like:
<Pair of sub-sequences> <Labels>
P1 {sub-sequence i_4, sub-sequence i_2}, 1
P2 {sub-sequence i_3, sub-sequence i_5}, 1
... ...
For negative samples, I generate pairs of sub-sequences (Ni) from different scenes:
<Pair of sub-sequences> <Labels>
N1 {sub-sequence i_1, sub-sequence j_6}, 0
N2 {sub-sequence i_2, sub-sequence j_4}, 0
... ...
It is obvious that one frame/image can occur multiple times in different training samples. E.g. in the above case, both N2 and P1 contain sub-sequence i_2. So I choose to save the generated sample pairs by sequences of frame id(fi) and during training, fetch the corresponding frames/images of a sequence by frame id(fi).
How should I do it elegantly with Keras?
Not sure how you build your sequences but have you considered using the ImageDataGenerator from keras.preprocessing.image ?
Once you have built this object with whatever parameters you want, you can use the flow_from_directory(directory_path) method. Once you have done this, you can use the filename attribute of this object :
my_generator = ImageDataGenerator(...)
my_generator.flow_from_directory(path_dir)
list_of_file_names = my_generator.filename
you now have a mapping between indexes of the list and the elements(=file_paths) of the list.
I hope this helps?
EDIT :
From this, you can build a mapping a dictionnary
map_images = {str(os.path.splitext(os.path.split(file_path)[1])[0]): file_path for file_path in list_of_file_names}
This takes the file_path retrieved from you image folder using ImageDataGenerator, it extracts the file name, removes the file extension and transforms the name of the file into an string which is your frame_id.
You now have a map between frame_id and file_path that you can use with load_img() and img_to_array() from keras.preprocessing.image
the function load_img() is defined like this and returns a PIL image instance:
def load_img(path, grayscale=False, target_size=None):
"""Loads an image into PIL format.
# Arguments
path: Path to image file
grayscale: Boolean, whether to load the image as grayscale.
target_size: Either 'None' (default to original size)
or tuple of ints '(img_height, img_width)'.
# Returns
A PIL Image instance.
# Raises
ImportError: if PIL is not available.
"""
Then img_to_array() is defined like this and returns a 3D numpy array to feed your model:
def img_to_array(img, dim_ordering='default'):
"""Converts a PIL Image instance to a Numpy array.
# Arguments
img: PIL Image instance.
dim_ordering: Image data format.
# Returns
A 3D Numpy array.
# Raises
ValueError: if invalid 'img' or 'dim_ordering' is passed.
"""
So to summarize : 1 build a mapping between your frame_id and the path of the corresponding file. Then load the file using img_load() and img_to_array(). I hope I have understood your question correctly !
EDIT 2:
Seeing your new edit, now that I understand the structure of your file system, we can even add the video in your dictionary like this :
# list of video_id of each frame
videos = my_generator.classes
# mapping of the frame_id to path_of_file and vid_id
map_images = {str(os.path.splitext(os.path.split(file_path)[1])[0]): (file_path, vid_id) for file_path,vid_id in zip(list_of_file_names,videos) }

Resources