Whence MaskRCNN's segm IoU metrics = 0? - pytorch

When training a MaskRCNN on my multi-class instance segmentation custom data set, given an input formatted as:
image -) shape: torch.Size([3, 850, 600]), dtype: torch.float32, min: tensor(0.0431), max: tensor(0.9137)
boxes -) shape: torch.Size([4, 4]), dtype: torch.float32, min: tensor(47.), max: tensor(807.)
masks -) shape: torch.Size([850, 600, 600]), dtype: torch.uint8, min: tensor(0, dtype=torch.uint8), max: tensor(1, dtype=torch.uint8)
areas -) shape: torch.Size([4]), dtype: torch.float32, min: tensor(1479.), max: tensor(8014.)
labels -) shape: torch.Size([4]), dtype: torch.int64, min: tensor(1), max: tensor(1)
iscrowd -) shape: torch.Size([4]), dtype: torch.int64, min: tensor(0), max: tensor(0)
I consistently obtain all segmentation IoU metrics as shown below:
DONE (t=0.03s).
IoU metric: bbox
Average Precision (AP) #[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.004
Average Precision (AP) #[ IoU=0.50 | area= all | maxDets=100 ] = 0.010
Average Precision (AP) #[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) #[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) #[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) #[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.004
Average Recall (AR) #[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.001
IoU metric: segm
Average Precision (AP) #[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) #[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) #[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) #[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) #[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) #[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) #[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
How can I think, debug and fix this?

As your input image size is (850, 600) (H, W) and considering that for this given image you have 4 objects, not 850 with (600, 600) masks.
your masks tensor should have dimension (number of objects, 850, 600), thus your input should be:
image -) shape: torch.Size([3, 850, 600]), dtype: torch.float32, min: tensor(0.0431), max: tensor(0.9137)
boxes -) shape: torch.Size([4, 4]), dtype: torch.float32, min: tensor(47.), max: tensor(807.)
masks -) shape: torch.Size([4, 850, 600]), dtype: torch.uint8, min: tensor(0, dtype=torch.uint8), max: tensor(1, dtype=torch.uint8)
areas -) shape: torch.Size([4]), dtype: torch.float32, min: tensor(1479.), max: tensor(8014.)
labels -) shape: torch.Size([4]), dtype: torch.int64, min: tensor(1), max: tensor(1)
iscrowd -) shape: torch.Size([4]), dtype: torch.int64, min: tensor(0), max: tensor(0)
How to fix it
Because you are trying to solve an instance segmentation problem, ensure that each of your (850, 600) masks are stacked so as to yield a tensor in the (number of masks, 850, 600) shape.

Related

Why not look at the precision and recall of both classes combined in a classification report?

I was looking at the classification report from sklearn. I am wondering, why did they omit a potential third row with precision and recall values for both classes together? Why were they split apart, and what's the disadvantage to considering these metrics with both classes combined?
"Precision and recall values for both classes together" is contained in the classification_report as macro averages and weighted averages for precision, recall, and f1-score.
Compare the column in classification_report to the values computed when calling precision_score(y_true, y_pred):
from sklearn.metrics import classification_report
from sklearn.metrics import precision_score
y_true = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2]
y_pred = [0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 0]
print(classification_report(y_true, y_pred))
print(round(precision_score(y_true, y_pred, average='macro'), 2))
print(round(precision_score(y_true, y_pred, average='weighted'), 2))
Running this results in the following. Notice that macro-averaged precision is 0.64 and weighted-average precision is 0.67, and both those are listed in the bottom rows of the table:
precision recall f1-score support
0 0.43 0.60 0.50 5
1 0.50 0.57 0.53 7
2 1.00 0.57 0.73 7
accuracy 0.58 19
macro avg 0.64 0.58 0.59 19
weighted avg 0.67 0.58 0.60 19
0.64
0.67

How to create this year_month sales and previous year_month sales in two different columns?

I need to create two different columns, one for this year sales and one column for last year sales from a transactional level data?
Data format:-
Date | bill amount
2019-07-22 | 500
2019-07-25 | 200
2020-11-15 | 100
2020-11-06 | 900
2020-12-09 | 50
2020-12-21 | 600
Required format:-
Year_month |This month Sales | Prev month sales
2019_07 | 700 | -
2020_11 | 1000 | -
2020_12 | 650 | 1000
The relatively tricky bit is to figure out what the previous month is. We do it by figuring out the beginning of the month for each date and then rolling back by 1 month. Note that this will take care of January -> December of previous year issues
We start by creating a sample dataframe and importing some useful modules
from io import StringIO
from datetime import datetime,timedelta
from dateutil.relativedelta import relativedelta
data = StringIO(
"""
date|amount
2019-07-22|500
2019-07-25|200
2020-11-15|100
2020-11-06|900
2020-12-09|50
2020-12-21|600
""")
df = pd.read_csv(data,sep='|')
df['date'] = pd.to_datetime(df['date'])
df
we get
date amount
0 2019-07-22 500
1 2019-07-25 200
2 2020-11-15 100
3 2020-11-06 900
4 2020-12-09 50
5 2020-12-21 600
Then we figure out the month start and the previous month start using datetime utilities
df['month_start'] = df['date'].apply(lambda d:datetime(year = d.year, month = d.month, day = 1))
df['prev_month_start'] = df['month_start'].apply(lambda d:d+relativedelta(months = -1))
Then we summarize monthly sales using groupby on month start
ms_df = df.drop(columns = 'date').groupby('month_start').agg({'prev_month_start':'first','amount':sum}).reset_index()
ms_df
so we get
month_start prev_month_start amount
0 2019-07-01 2019-06-01 700
1 2020-11-01 2020-10-01 1000
2 2020-12-01 2020-11-01 650
Then we join (merge) ms_df on itself by mapping 'prev_month_start' to 'month_start'
ms_df2 = ms_df.merge(ms_df, left_on='prev_month_start', right_on='month_start', how = 'left', suffixes = ('','_prev'))
We are more or less there but now make it pretty by getting rid of superfluous columns, adding labels, etc
ms_df2['label'] = ms_df2['month_start'].dt.strftime('%Y_%m')
ms_df2 = ms_df2.drop(columns = ['month_start','prev_month_start','month_start_prev','prev_month_start_prev'])
columns = ['label','amount','amount_prev']
ms_df2 = ms_df2[columns]
and we get
| | label | amount | amount_prev |
|---:|--------:|---------:|--------------:|
| 0 | 2019_07 | 700 | nan |
| 1 | 2020_11 | 1000 | nan |
| 2 | 2020_12 | 650 | 1000 |
Using #piterbarg's data, we can use resample, combined with shift and concat to get your desired data:
import pandas as pd
from io import StringIO
data = StringIO(
"""
date|amount
2019-07-22|500
2019-07-25|200
2020-11-15|100
2020-11-06|900
2020-12-09|50
2020-12-21|600
"""
)
df = pd.read_csv(data, sep="|", parse_dates=["date"])
df
date amount
0 2019-07-22 500
1 2019-07-25 200
2 2020-11-15 100
3 2020-11-06 900
4 2020-12-09 50
5 2020-12-21 600
Get the sum for current sales:
data = df.resample(on="date", rule="1M").amount.sum().rename("This_month")
data
date
2019-07-31 700
2019-08-31 0
2019-09-30 0
2019-10-31 0
2019-11-30 0
2019-12-31 0
2020-01-31 0
2020-02-29 0
2020-03-31 0
2020-04-30 0
2020-05-31 0
2020-06-30 0
2020-07-31 0
2020-08-31 0
2020-09-30 0
2020-10-31 0
2020-11-30 1000
2020-12-31 650
Freq: M, Name: This_month, dtype: int64
Now, we can shift the month to get values for previous month, and drop rows that have 0 as total sales to get your final output:
(pd.concat([data, data.shift().rename("previous_month")], axis=1)
.query("This_month!=0")
.fillna(0))
This_month previous_month
date
2019-07-31 700 0.0
2020-11-30 1000 0.0
2020-12-31 650 1000.0

gnuplot frequency table stats

I have a data frequency table and would like to calculate it's mean and standard deviation. The first column symbolises the frequency and second - the value of data.The way I need the mean to be calculated is (446*0+864*1+277*2+...+1*12)/(0+1+2+...+12) = ~1.35, yet when I use gnuplot stats, it gives me the output of separate columns. How can I change my code so that it would give me the output that I want?
Data table:
446 0
864 1
277 2
111 3
62 4
32 5
19 6
9 7
8 8
3 10
3 11
1 12
Gnuplot code:
stats "$input" using 2:1
Output:
* FILE:
Records: 12
Out of range: 0
Invalid: 0
Column headers: 0
Blank: 0
Data Blocks: 1
* COLUMNS:
Mean: 5.7500 152.9167
Std Dev: 3.7887 251.5374
Sample StdDev: 3.9572 262.7223
Skewness: 0.1569 1.9131
Kurtosis: 1.8227 5.5436
Avg Dev: 3.2500 188.0417
Sum: 69.0000 1835.0000
Sum Sq.: 569.0000 1.03986e+06
Mean Err.: 1.0937 72.6126
Std Dev Err.: 0.7734 51.3449
Skewness Err.: 0.7071 0.7071
Kurtosis Err.: 1.4142 1.4142
Minimum: 0.0000 [ 0] 1.0000 [11]
Maximum: 12.0000 [11] 864.0000 [ 1]
Quartile: 2.5000 5.5000
Median: 5.5000 25.5000
Quartile: 9.0000 194.0000
Linear Model: y = -46.89 x + 422.5
Slope: -46.89 +- 14.86
Intercept: 422.5 +- 102.4
Correlation: r = -0.7062
Sum xy: 2475
Try this:
Code:
### special mean
reset session
$Data <<EOD
446 0
864 1
277 2
111 3
62 4
32 5
19 6
9 7
8 8
3 10
3 11
1 12
EOD
stats $Data u ($1*$2):1
print STATS_sum_x, STATS_sum_y
print STATS_sum_x/STATS_sum_y
### end of code
Result:
* FILE:
Records: 12
Out of range: 0
Invalid: 0
Column headers: 0
Blank: 0
Data Blocks: 1
* COLUMNS:
Mean: 206.2500 152.9167
Std Dev: 252.3441 251.5374
Sample StdDev: 263.5648 262.7223
Skewness: 1.5312 1.9131
Kurtosis: 4.2761 5.5436
Avg Dev: 195.6667 188.0417
Sum: 2475.0000 1835.0000
Sum Sq.: 1.27460e+06 1.03986e+06
Mean Err.: 72.8455 72.6126
Std Dev Err.: 51.5095 51.3449
Skewness Err.: 0.7071 0.7071
Kurtosis Err.: 1.4142 1.4142
Minimum: 0.0000 [ 0] 1.0000 [11]
Maximum: 864.0000 [ 1] 864.0000 [ 1]
Quartile: 31.5000 5.5000
Median: 89.0000 25.5000
Quartile: 290.5000 194.0000
Linear Model: y = 0.7622 x - 4.279
Slope: 0.7622 +- 0.2032
Intercept: -4.279 +- 66.21
Correlation: r = 0.7646
Sum xy: 9.609e+05
Your values:
2475.0 1835.0
1.34877384196185

Plot Shaded Error Bars from Pandas Agg

I have data in the following format:
| | Measurement 1 | | Measurement 2 | |
|------|---------------|------|---------------|------|
| | Mean | Std | Mean | Std |
| Time | | | | |
| 0 | 17 | 1.10 | 21 | 1.33 |
| 1 | 16 | 1.08 | 21 | 1.34 |
| 2 | 14 | 0.87 | 21 | 1.35 |
| 3 | 11 | 0.86 | 21 | 1.33 |
I am using the following code to generate a matplotlib line graph from this data, which shows the standard deviation as a filled in area, see below:
def seconds_to_minutes(x, pos):
minutes = f'{round(x/60, 0)}'
return minutes
fig, ax = plt.subplots()
mean_temperature_over_time['Measurement 1']['mean'].plot(kind='line', yerr=mean_temperature_over_time['Measurement 1']['std'], alpha=0.15, ax=ax)
mean_temperature_over_time['Measurement 2']['mean'].plot(kind='line', yerr=mean_temperature_over_time['Measurement 2']['std'], alpha=0.15, ax=ax)
ax.set(title="A Line Graph with Shaded Error Regions", xlabel="x", ylabel="y")
formatter = FuncFormatter(seconds_to_minutes)
ax.xaxis.set_major_formatter(formatter)
ax.grid()
ax.legend(['Mean 1', 'Mean 2'])
Output:
This seems like a very messy solution, and only actually produces shaded output because I have so much data. What is the correct way to produce a line graph from the dataframe I have with shaded error regions? I've looked at Plot yerr/xerr as shaded region rather than error bars, but am unable to adapt it for my case.
What's wrong with the linked solution? It seems pretty straightforward.
Allow me to rearrange your dataset so it's easier to load in a Pandas DataFrame
Time Measurement Mean Std
0 0 1 17 1.10
1 1 1 16 1.08
2 2 1 14 0.87
3 3 1 11 0.86
4 0 2 21 1.33
5 1 2 21 1.34
6 2 2 21 1.35
7 3 2 21 1.33
for i, m in df.groupby("Measurement"):
ax.plot(m.Time, m.Mean)
ax.fill_between(m.Time, m.Mean - m.Std, m.Mean + m.Std, alpha=0.35)
And here's the result with some random generated data:
EDIT
Since the issue is apparently iterating over your particular dataframe format let me show how you could do it (I'm new to pandas so there may be better ways). If I understood correctly your screenshot you should have something like:
Measurement 1 2
Mean Std Mean Std
Time
0 17 1.10 21 1.33
1 16 1.08 21 1.34
2 14 0.87 21 1.35
3 11 0.86 21 1.33
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
(1, Mean) 4 non-null int64
(1, Std) 4 non-null float64
(2, Mean) 4 non-null int64
(2, Std) 4 non-null float64
dtypes: float64(2), int64(2)
memory usage: 160.0 bytes
df.columns
MultiIndex(levels=[[1, 2], [u'Mean', u'Std']],
labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
names=[u'Measurement', None])
And you should be able to iterate over it with and obtain the same plot:
for i, m in df.groupby("Measurement"):
ax.plot(m["Time"], m['Mean'])
ax.fill_between(m["Time"],
m['Mean'] - m['Std'],
m['Mean'] + m['Std'], alpha=0.35)
Or you could restack it to the format above with
(df.stack("Measurement") # stack "Measurement" columns row by row
.reset_index() # make "Time" a normal column, add a new index
.sort_values("Measurement") # group values from the same Measurement
.reset_index(drop=True)) # drop sorted index and make a new one

How is this infinite list computed?

The problem is the following:
Define a Haskell variable dollars that is the infinite list of amounts
of money you have every year, assuming you start with $100 and get
paid 5% interest, compounded yearly. (Ignore inflation, deflation,
taxes, bailouts, the possibility of total economic collapse, and other
such details.) So dollars should be equal to: [100.0, 105.0, 110.25,
...].
My solution is the following and it works:
dollars::[Double]
dollars = 100.0 : [1.05 * x | x<- dollars ]
The problem is that I have trouble understanding how the list is computed practically:
dollars= 100.0 : [1.05 * x | x<- dollars ]
= 100.0 : [1.05 * x | x<- 100.0 : [1.05 * x | x<- dollars ] ]
= 100.0 : (1.05 * 100.0) : [1.05 * x | x<- [1.05 * x | x<- dollars ] ]
= 100.0 : 105.0 : [1.05 * x | x<- [1.05 * x | x<- dollars ] ]
= 100.0 : 105.0 : [1.05 * x | x<- [1.05 * x | x<- 100.0 : [1.05 * x | x<- dollars ] ] ]
= 100.0 : 105.0 : [1.05 * x | x<- 105.0:[1.05 * x | x<-[1.05 * x | x<- dollars ] ] ]
= 100.0 : 105.0 : 110.25 :[1.05 * x | x<-[1.05 * x | x<-[1.05 * x | x<- dollars ] ] ]
etc.
Is that how it is computed? If not then how? If yes, is there a simpler way to conceptualize these kinds of computations?
You are pretty much correct. It might help if you de-sugared the list comprehension into a function call. The equivalent is
dollars = 100.0 : map (* 1.05) dollars
This then evaluates to
= 100.0 : let dollars1 = 100 * 1.05 : map (*1.05) dollars1 in dollars1
= 100.0 : 105.0 : let dollars2 = 105 * 1.05 : map (*1.05) dollars2 in dollars2
and so on. I'm using dollars1, dollars2 as identifiers, even though they don't really exist.
That is more or less correct. In which order the substitutions happen depends on the code that prints out the results. The substitutions in the 2. and 3. line could be swapped.

Resources