VOC2012: PIL Image.open converts PNG to 2d array - keras

I am working with VOC2012 dataset. The input image is in PNG format which has a shape of (375, 500, 4) when I use imageio to open the image. When I use PIL to open the image, then suddenly the shape becomes (500, 375). PNG images should have four dimensions on the last axis: r g b & alpha.
The image is obviously colored image, so it should have 3 dimensions (height, width, depth). PIL seems to suggest that it only has two dimensions: width & height.
Can PNG images be represented by a 2d array? Please help! So lost at the moment. Thanks!
from PIL import Image
from keras.preprocessing.image import img_to_array
import os, imageio
import numpy as np
root_path = '/Users/johnson/Downloads/'
imageio_img = imageio.imread(
os.path.join(root_path, '2009_003193.png')
)
# (375, 500, 4)
print(imageio_img.shape)
# [ 0 128 192 224 255]
print(np.unique(imageio_img))
PIL_img = Image.open(
os.path.join(root_path, '2009_003193.png')
)
# (500, 375)
print(PIL_img.size)
PIL_img_to_array = img_to_array(PIL_img)
# (375, 500, 1)
print(PIL_img_to_array.shape)
# [ 0. 2. 255.]
print(np.unique(PIL_img_to_array))
It's also quite magical that PIL seems to know how VOC2012 labels the data. PIL_image_to_array has a unique value of [0, 2, 255]. Conveniently, 2 denotes bicycle in VOC2012. 0 means background and 255 probably means the yellowish boundary around the bicycle. But from the first code snippet, I never passed the pascal classes to PIL for conversion.
def pascal_classes():
classes = {'aeroplane' : 1, 'bicycle' : 2, 'bird' : 3, 'boat' : 4,
'bottle' : 5, 'bus' : 6, 'car' : 7, 'cat' : 8,
'chair' : 9, 'cow' : 10, 'diningtable' : 11, 'dog' : 12,
'horse' : 13, 'motorbike' : 14, 'person' : 15, 'potted-plant' : 16,
'sheep' : 17, 'sofa' : 18, 'train' : 19, 'tv/monitor' : 20}
return classes
def pascal_palette():
palette = {( 0, 0, 0) : 0 ,
(128, 0, 0) : 1 ,
( 0, 128, 0) : 2 ,
(128, 128, 0) : 3 ,
( 0, 0, 128) : 4 ,
(128, 0, 128) : 5 ,
( 0, 128, 128) : 6 ,
(128, 128, 128) : 7 ,
( 64, 0, 0) : 8 ,
(192, 0, 0) : 9 ,
( 64, 128, 0) : 10,
(192, 128, 0) : 11,
( 64, 0, 128) : 12,
(192, 0, 128) : 13,
( 64, 128, 128) : 14,
(192, 128, 128) : 15,
( 0, 64, 0) : 16,
(128, 64, 0) : 17,
( 0, 192, 0) : 18,
(128, 192, 0) : 19,
( 0, 64, 128) : 20 }

Your image is palletised, not RGB. Each pixel is represented by an 8-bit index into a palette. You can see this by looking at image.mode which shows up as P.
If you want an RGB image, use:
rgb = Image.open('bike.png').convert('RGB')
If you want and RGBA image with transparency, use:
RGBA = Image.open('bike.png').convert('RGBA')
However, there is no useful information in the alpha channel, so that seems pointless.
Regarding the pascal palette, you can get that via PIL like this:
im = Image.open('bike.png')
p = im.getpalette()
for i in range (256):
print(p[3*i:3*i+3])
[0, 0, 0]
[128, 0, 0]
[0, 128, 0]
[128, 128, 0]
[0, 0, 128]
[128, 0, 128]
[0, 128, 128]
[128, 128, 128]
[64, 0, 0]
[192, 0, 0]
[64, 128, 0]
[192, 128, 0]
[64, 0, 128]
[192, 0, 128]
[64, 128, 128]
[192, 128, 128]
[0, 64, 0]
[128, 64, 0]
[0, 192, 0]
[128, 192, 0]
[0, 64, 128]
[128, 64, 128]
[0, 192, 128]
[128, 192, 128]
[64, 64, 0]
[192, 64, 0]
[64, 192, 0]
[192, 192, 0]
[64, 64, 128]
[192, 64, 128]
[64, 192, 128]
[192, 192, 128]
[0, 0, 64]
[128, 0, 64]
[0, 128, 64]
[128, 128, 64]
[0, 0, 192]
[128, 0, 192]
[0, 128, 192]
[128, 128, 192]
[64, 0, 64]
[192, 0, 64]
[64, 128, 64]
[192, 128, 64]
[64, 0, 192]
[192, 0, 192]
[64, 128, 192]
[192, 128, 192]
[0, 64, 64]
[128, 64, 64]
[0, 192, 64]
[128, 192, 64]
[0, 64, 192]
[128, 64, 192]
[0, 192, 192]
[128, 192, 192]
[64, 64, 64]
[192, 64, 64]
[64, 192, 64]
[192, 192, 64]
[64, 64, 192]
[192, 64, 192]
[64, 192, 192]
[192, 192, 192]
[32, 0, 0]
[160, 0, 0]
[32, 128, 0]
[160, 128, 0]
[32, 0, 128]
[160, 0, 128]
[32, 128, 128]
[160, 128, 128]
[96, 0, 0]
[224, 0, 0]
[96, 128, 0]
[224, 128, 0]
[96, 0, 128]
[224, 0, 128]
[96, 128, 128]
[224, 128, 128]
[32, 64, 0]
[160, 64, 0]
[32, 192, 0]
[160, 192, 0]
[32, 64, 128]
[160, 64, 128]
[32, 192, 128]
[160, 192, 128]
[96, 64, 0]
[224, 64, 0]
[96, 192, 0]
[224, 192, 0]
[96, 64, 128]
[224, 64, 128]
[96, 192, 128]
[224, 192, 128]
[32, 0, 64]
[160, 0, 64]
[32, 128, 64]
[160, 128, 64]
[32, 0, 192]
[160, 0, 192]
[32, 128, 192]
[160, 128, 192]
[96, 0, 64]
[224, 0, 64]
[96, 128, 64]
[224, 128, 64]
[96, 0, 192]
[224, 0, 192]
[96, 128, 192]
[224, 128, 192]
[32, 64, 64]
[160, 64, 64]
[32, 192, 64]
[160, 192, 64]
[32, 64, 192]
[160, 64, 192]
[32, 192, 192]
[160, 192, 192]
[96, 64, 64]
[224, 64, 64]
[96, 192, 64]
[224, 192, 64]
[96, 64, 192]
[224, 64, 192]
[96, 192, 192]
[224, 192, 192]
[0, 32, 0]
[128, 32, 0]
[0, 160, 0]
[128, 160, 0]
[0, 32, 128]
[128, 32, 128]
[0, 160, 128]
[128, 160, 128]
[64, 32, 0]
[192, 32, 0]
[64, 160, 0]
[192, 160, 0]
[64, 32, 128]
[192, 32, 128]
[64, 160, 128]
[192, 160, 128]
[0, 96, 0]
[128, 96, 0]
[0, 224, 0]
[128, 224, 0]
[0, 96, 128]
[128, 96, 128]
[0, 224, 128]
[128, 224, 128]
[64, 96, 0]
[192, 96, 0]
[64, 224, 0]
[192, 224, 0]
[64, 96, 128]
[192, 96, 128]
[64, 224, 128]
[192, 224, 128]
[0, 32, 64]
[128, 32, 64]
[0, 160, 64]
[128, 160, 64]
[0, 32, 192]
[128, 32, 192]
[0, 160, 192]
[128, 160, 192]
[64, 32, 64]
[192, 32, 64]
[64, 160, 64]
[192, 160, 64]
[64, 32, 192]
[192, 32, 192]
[64, 160, 192]
[192, 160, 192]
[0, 96, 64]
[128, 96, 64]
[0, 224, 64]
[128, 224, 64]
[0, 96, 192]
[128, 96, 192]
[0, 224, 192]
[128, 224, 192]
[64, 96, 64]
[192, 96, 64]
[64, 224, 64]
[192, 224, 64]
[64, 96, 192]
[192, 96, 192]
[64, 224, 192]
[192, 224, 192]
[32, 32, 0]
[160, 32, 0]
[32, 160, 0]
[160, 160, 0]
[32, 32, 128]
[160, 32, 128]
[32, 160, 128]
[160, 160, 128]
[96, 32, 0]
[224, 32, 0]
[96, 160, 0]
[224, 160, 0]
[96, 32, 128]
[224, 32, 128]
[96, 160, 128]
[224, 160, 128]
[32, 96, 0]
[160, 96, 0]
[32, 224, 0]
[160, 224, 0]
[32, 96, 128]
[160, 96, 128]
[32, 224, 128]
[160, 224, 128]
[96, 96, 0]
[224, 96, 0]
[96, 224, 0]
[224, 224, 0]
[96, 96, 128]
[224, 96, 128]
[96, 224, 128]
[224, 224, 128]
[32, 32, 64]
[160, 32, 64]
[32, 160, 64]
[160, 160, 64]
[32, 32, 192]
[160, 32, 192]
[32, 160, 192]
[160, 160, 192]
[96, 32, 64]
[224, 32, 64]
[96, 160, 64]
[224, 160, 64]
[96, 32, 192]
[224, 32, 192]
[96, 160, 192]
[224, 160, 192]
[32, 96, 64]
[160, 96, 64]
[32, 224, 64]
[160, 224, 64]
[32, 96, 192]
[160, 96, 192]
[32, 224, 192]
[160, 224, 192]
[96, 96, 64]
[224, 96, 64]
[96, 224, 64]
[224, 224, 64]
[96, 96, 192]
[224, 96, 192]
[96, 224, 192]
[224, 224, 192]
Then, if you want to make the bicycle red, you can do:
# Load the image and make Numpy version
im = Image.open('bike.png')
n = np.array(im)
# Make all pixels belonging to bike (2) into red (palette index 9)
n[n==2] = 9
# Make all pixels not red (9) into grey (palette index 7)
n[n!=9] = 7
# Convert back into PIL palettised image and re-apply original palette
r = Image.fromarray(n,mode='P')
r.putpalette(im.getpalette())
r.save('result.png')
Keywords: Python, PIL, Pillow, image processing, palette, palette operations, masked image, mask, extract palette, apply palette.

Related

Problem with pytorch hooks? Activation maps allways positiv

I was looking at the activation maps of vgg19 in pytorch.
I found that all the values of the maps are positive even before I applied the ReLU.
This seems very strange to me... If this would be correct (could be that I not used the register_forward_hook method correctly?) why would one then apply ReLu at all?
This is my code to produce this:
import torch
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms
from torchsummary import summary
import os, glob
import matplotlib.pyplot as plt
import numpy as np
# settings:
batch_size = 4
# load the model
model = models.vgg19(pretrained=True)
summary(model.cuda(), (3, 32, 32))
model.cpu()
# how to preprocess??? See here:
# https://discuss.pytorch.org/t/how-to-preprocess-input-for-pre-trained-networks/683/2
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
transform = transforms.Compose(
[transforms.ToTensor(),
normalize])
# build data loader
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
shuffle=True, num_workers=2)
# show one image
dataiter = iter(trainloader)
images, labels = dataiter.next()
# set a hook
activation = {}
def get_activation(name):
def hook(model, input, output):
activation[name] = output.detach()
return hook
# hook at the first conv layer
hook = model.features[0].register_forward_hook(get_activation("firstConv"))
model(images)
hook.remove()
# show results:
flatted_feat_maps = activation["firstConv"].detach().numpy().flatten()
print("All positiv??? --> ",np.all(flatted_feat_maps >= 0))
plt.hist(flatted_feat_maps)
plt.show()
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 32, 32] 1,792
ReLU-2 [-1, 64, 32, 32] 0
Conv2d-3 [-1, 64, 32, 32] 36,928
ReLU-4 [-1, 64, 32, 32] 0
MaxPool2d-5 [-1, 64, 16, 16] 0
Conv2d-6 [-1, 128, 16, 16] 73,856
ReLU-7 [-1, 128, 16, 16] 0
Conv2d-8 [-1, 128, 16, 16] 147,584
ReLU-9 [-1, 128, 16, 16] 0
MaxPool2d-10 [-1, 128, 8, 8] 0
Conv2d-11 [-1, 256, 8, 8] 295,168
ReLU-12 [-1, 256, 8, 8] 0
Conv2d-13 [-1, 256, 8, 8] 590,080
ReLU-14 [-1, 256, 8, 8] 0
Conv2d-15 [-1, 256, 8, 8] 590,080
ReLU-16 [-1, 256, 8, 8] 0
Conv2d-17 [-1, 256, 8, 8] 590,080
ReLU-18 [-1, 256, 8, 8] 0
MaxPool2d-19 [-1, 256, 4, 4] 0
Conv2d-20 [-1, 512, 4, 4] 1,180,160
ReLU-21 [-1, 512, 4, 4] 0
Conv2d-22 [-1, 512, 4, 4] 2,359,808
ReLU-23 [-1, 512, 4, 4] 0
Conv2d-24 [-1, 512, 4, 4] 2,359,808
ReLU-25 [-1, 512, 4, 4] 0
Conv2d-26 [-1, 512, 4, 4] 2,359,808
ReLU-27 [-1, 512, 4, 4] 0
MaxPool2d-28 [-1, 512, 2, 2] 0
Conv2d-29 [-1, 512, 2, 2] 2,359,808
ReLU-30 [-1, 512, 2, 2] 0
Conv2d-31 [-1, 512, 2, 2] 2,359,808
ReLU-32 [-1, 512, 2, 2] 0
Conv2d-33 [-1, 512, 2, 2] 2,359,808
ReLU-34 [-1, 512, 2, 2] 0
Conv2d-35 [-1, 512, 2, 2] 2,359,808
ReLU-36 [-1, 512, 2, 2] 0
MaxPool2d-37 [-1, 512, 1, 1] 0
AdaptiveAvgPool2d-38 [-1, 512, 7, 7] 0
Linear-39 [-1, 4096] 102,764,544
ReLU-40 [-1, 4096] 0
Dropout-41 [-1, 4096] 0
Linear-42 [-1, 4096] 16,781,312
ReLU-43 [-1, 4096] 0
Dropout-44 [-1, 4096] 0
Linear-45 [-1, 1000] 4,097,000
================================================================
Total params: 143,667,240
Trainable params: 143,667,240
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 5.25
Params size (MB): 548.05
Estimated Total Size (MB): 553.31
----------------------------------------------------------------
Could it be that I somehow did not use the register_forward_hook correctly?
You should clone the output in
def get_activation(name):
def hook(model, input, output):
activation[name] = output.detach().clone() #
return hook
Note that Tensor.detach only detaches the tensor from the graph, but both tensors will still share the same underlying storage.
Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks. IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as resize_ / resize_as_ / set_ / transpose_) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error. For sparse tensors: In-place indices / values changes (such as zero_ / copy_ / add_) to the returned tensor will not update the original tensor anymore, and will instead trigger an error.

Get the length of every sentence before padding in torchtext bucketiterator

Is it possible to get the length of every sentence before padding in torchtext bucketiterator :
train_loader = torchtext.legacy.data.BucketIterator(train_data, batch_size = 64, repeat=True, shuffle=True, sort_key = lambda x: len(x.text), sort=False, sort_within_batch=True, device = device)
bucketiterator dataloader :
inputs: tensor([[ 34, 87, 2, ..., 227, 239, 263],
[ 138, 7, 1006, ..., 840, 142, 665],
[ 549, 4, 1028, ..., 11, 14, 4],
...,
[ 1, 1, 5, ..., 66, 23, 13],
[ 1, 1, 1062, ..., 177, 252, 1587],
[ 1, 1, 66, ..., 553, 52, 73]]), shape: torch.Size([64, 91])
Like when using pytorch dataloader:
train_loader = data.DataLoader(train_data, batch_size = 64, shuffle=True, collate_fn=padding)
def padding(batch):
doc = [doc['input'] for doc in batch]
len_doc = [len(doc['input']) for doc in batch]
doc_pad = pad_sequence(doc, batch_first=True, padding_value=0)
return doc_pad, len_doc
pytorch dataloader :
inputs: tensor([[ 2, 1396, 2686, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0],
[ 2, 2018, 2597, ..., 0, 0, 0],
...,
[ 2, 1546, 1623, ..., 0, 0, 0],
[ 2, 1435, 1396, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0]]), shape: torch.Size([64, 40])
inputs_len_before_padding: tensor([18, 8, 21, 16, 16, 12, 40, 12, 9, 12, 17, 12, 17, 15, 16, 12, 8, 24,
25, 10, 22, 8, 8, 13, 12, 22, 17, 14, 21, 14, 19, 13, 21, 8, 28, 16,
31, 24, 23, 19, 10, 7, 16, 12, 16, 12, 17, 12, 18, 11, 8, 13, 17, 14,
11, 13, 13, 20, 8, 12, 22, 7, 9, 11]), shape: torch.Size([64])
Here is a minimal example that uses torchtext.data.Field and torchtext.data.BucketIterator:
import torchtext.data as data
# sample data
text = [
'This is sentence 1.',
'This sentence is a bit longer than the previous sentence.'
]
# define field -- notice include_lengths is set to True
text_field = data.Field(include_lengths=True, tokenize=lambda x: x.split())
fields = [('text', text_field)]
# create dataset and build vocabulary
examples = [data.Example.fromlist([t], fields) for t in text]
dataset = data.Dataset(examples, fields)
text_field.build_vocab(dataset)
# create iterator
data_iter = data.BucketIterator(dataset, batch_size=2, shuffle=False)
# the text field will now return both the data tensor and the length of the input text
for x in data_iter:
print('Data:', x.text[0])
print('Lengths:', x.text[1])
This should print (data tensor shortened for brevity):
Data: tensor([[ 2, 2],
...
[ 1, 10]])
Lengths: tensor([ 4, 10])

Looping over list in reverse, removing elements as we go

I'm comparing two lists. A & B. Hi A, how you doin'? I'm okay, B. If item A appears in list B it should be removed from list A. The comparisons are lists of RGB colours.
Only it's out of range and I don't know why as I'm looping over the list in reverse.
data = [
[224, 96, 96],
[128, 32, 192], # match
[192, 32, 160], # match
]
myColours = [
[207, 30, 30],
[207,159, 30],
[ 79,207, 30],
[ 32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
for x in list(range(len(data)-1, -1, -1)):
for y in range(0, len(myColours)):
if myColours[y] == data[x]:
print (data[x])
data.remove(data[x]) # list index out of range
The list data should end up as
data = [
[224, 96, 96]
]
Data will be generated by code, myColours by hand. Their order is not important, just the values.
Use list comprehension:
data = [
[224, 96, 96],
[128, 32, 192], # match
[192, 32, 160], # match
]
myColours = [
[207, 30, 30],
[207,159, 30],
[ 79,207, 30],
[ 32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
data = [item for item in data if item not in myColours]
#to iterate backwards:
#data = [item for item in reversed(data) if item not in myColours]
there is two ways to solve your problem:
first:
data = [
[224, 96, 96],
[128, 32, 192],
[192, 32, 160],
]
myColours = [
[207, 30, 30],
[207, 159, 30],
[79, 207, 30],
[32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
a = 0
x = len(data)
while a < len(data):
for i in myColours:
if data[a] == i:
del(data[a])
a = 0
break
if a < len(data):
a += 1
print(data)
second:
data = [
[224, 96, 96],
[128, 32, 192],
[192, 32, 160],
]
myColours = [
[207, 30, 30],
[207, 159, 30],
[79, 207, 30],
[32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
data2 = data.copy()
x = 0
for i in range(len(data)):
for i2 in myColours:
if data[i] == i2:
del(data2[i-x])
x += 1
break
print(data2)
and also there is a way with branches but it's not perfect way.
The code can be:
data = [
[224, 96, 96],
[128, 32, 192],
[192, 32, 160],
]
myColours = [
[207, 30, 30],
[207,159, 30],
[ 79,207, 30],
[ 32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
l=len(myColours)
for i in range(0,l):
for x in data:
if x in myColours:
data.remove(x)
print(data)
After running the code I wrote, the output is coming as:
[[224, 96, 96]]

How to get the information like the output of Model.get_config() in keras?

I want to get some information like this.
The input layer of one layer and the output layer of one layer.
You can try with summary, here is an example
from torchsummary import summary
vgg = models.vgg16()
summary(vgg, (3, 224, 224))
----------------------------------------------------------------
Layer (type) Output Shpae Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1792
ReLU-2 [-1, 64, 224, 224] 0
Conv2d-3 [-1, 64, 224, 224] 36928
ReLU-4 [-1, 64, 224, 224] 0
MaxPool2d-5 [-1, 64, 112, 112] 0
Conv2d-6 [-1, 128, 112, 112] 73856
ReLU-7 [-1, 128, 112, 112] 0
Conv2d-8 [-1, 128, 112, 112] 147584
ReLU-9 [-1, 128, 112, 112] 0
MaxPool2d-10 [-1, 128, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 295168
ReLU-12 [-1, 256, 56, 56] 0
Conv2d-13 [-1, 256, 56, 56] 590080
ReLU-14 [-1, 256, 56, 56] 0
Conv2d-15 [-1, 256, 56, 56] 590080
ReLU-16 [-1, 256, 56, 56] 0
MaxPool2d-17 [-1, 256, 28, 28] 0
Conv2d-18 [-1, 512, 28, 28] 1180160
ReLU-19 [-1, 512, 28, 28] 0
Conv2d-20 [-1, 512, 28, 28] 2359808
ReLU-21 [-1, 512, 28, 28] 0
Conv2d-22 [-1, 512, 28, 28] 2359808
ReLU-23 [-1, 512, 28, 28] 0
MaxPool2d-24 [-1, 512, 14, 14] 0
Conv2d-25 [-1, 512, 14, 14] 2359808
ReLU-26 [-1, 512, 14, 14] 0
Conv2d-27 [-1, 512, 14, 14] 2359808
ReLU-28 [-1, 512, 14, 14] 0
Conv2d-29 [-1, 512, 14, 14] 2359808
ReLU-30 [-1, 512, 14, 14] 0
MaxPool2d-31 [-1, 512, 7, 7] 0
Linear-32 [-1, 4096] 102764544
ReLU-33 [-1, 4096] 0
Dropout-34 [-1, 4096] 0
Linear-35 [-1, 4096] 16781312
ReLU-36 [-1, 4096] 0
Dropout-37 [-1, 4096] 0
Linear-38 [-1, 1000] 4097000
================================================================
Total params: 138357544
Trainable params: 138357544
Non-trainable params: 0
----------------------------------------------------------------

Error when checking target: expected softmax to have shape (1100,)

I'm trying to create a model on some data with 2 classes, but I keep getting an error saying:
ValueError: Error when checking target: expected softmax to have shape (1100,) but got array with shape (2,)
I know it's a fairly common error, but I can't seem to fix mine. I believe the error suggests that the model has an output shape of (1100,) but the outputs have dimension (2,). Anyone know how it can be fixed?
Here's my model:
def TestModel(nb_classes=2, inputs=(3, 224, 224)):
input_img = Input(shape=inputs)
conv1 = Convolution2D(
96, 7, 7, activation='relu', init='glorot_uniform',
subsample=(2, 2), border_mode='same', name='conv1')(input_img)
maxpool1 = MaxPooling2D(
pool_size=(3, 3), strides=(2, 2), name='maxpool1', dim_ordering="th")(conv1)
fire2_squeeze = Convolution2D(
16, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire2_squeeze')(maxpool1)
fire2_expand1 = Convolution2D(
64, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire2_expand1')(fire2_squeeze)
fire2_expand2 = Convolution2D(
64, 3, 3, activation='relu', init='glorot_uniform',
border_mode='same', name='fire2_expand2')(fire2_squeeze)
merge2 = merge(
[fire2_expand1, fire2_expand2], mode='concat', concat_axis=1)
fire3_squeeze = Convolution2D(
16, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire3_squeeze')(merge2)
fire3_expand1 = Convolution2D(
64, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire3_expand1')(fire3_squeeze)
fire3_expand2 = Convolution2D(
64, 3, 3, activation='relu', init='glorot_uniform',
border_mode='same', name='fire3_expand2')(fire3_squeeze)
merge3 = merge(
[fire3_expand1, fire3_expand2], mode='concat', concat_axis=1)
fire4_squeeze = Convolution2D(
32, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire4_squeeze')(merge3)
fire4_expand1 = Convolution2D(
128, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire4_expand1')(fire4_squeeze)
fire4_expand2 = Convolution2D(
128, 3, 3, activation='relu', init='glorot_uniform',
border_mode='same', name='fire4_expand2')(fire4_squeeze)
merge4 = merge(
[fire4_expand1, fire4_expand2], mode='concat', concat_axis=1)
maxpool4 = MaxPooling2D(
pool_size=(3, 3), strides=(2, 2), name='maxpool4')(merge4)
fire5_squeeze = Convolution2D(
32, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire5_squeeze')(maxpool4)
fire5_expand1 = Convolution2D(
128, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire5_expand1')(fire5_squeeze)
fire5_expand2 = Convolution2D(
128, 3, 3, activation='relu', init='glorot_uniform',
border_mode='same', name='fire5_expand2')(fire5_squeeze)
merge5 = merge(
[fire5_expand1, fire5_expand2], mode='concat', concat_axis=1)
fire6_squeeze = Convolution2D(
48, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire6_squeeze')(merge5)
fire6_expand1 = Convolution2D(
192, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire6_expand1')(fire6_squeeze)
fire6_expand2 = Convolution2D(
192, 3, 3, activation='relu', init='glorot_uniform',
border_mode='same', name='fire6_expand2')(fire6_squeeze)
merge6 = merge(
[fire6_expand1, fire6_expand2], mode='concat', concat_axis=1)
fire7_squeeze = Convolution2D(
48, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire7_squeeze')(merge6)
fire7_expand1 = Convolution2D(
192, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire7_expand1')(fire7_squeeze)
fire7_expand2 = Convolution2D(
192, 3, 3, activation='relu', init='glorot_uniform',
border_mode='same', name='fire7_expand2')(fire7_squeeze)
merge7 = merge(
[fire7_expand1, fire7_expand2], mode='concat', concat_axis=1)
fire8_squeeze = Convolution2D(
64, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire8_squeeze')(merge7)
fire8_expand1 = Convolution2D(
256, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire8_expand1')(fire8_squeeze)
fire8_expand2 = Convolution2D(
256, 3, 3, activation='relu', init='glorot_uniform',
border_mode='same', name='fire8_expand2')(fire8_squeeze)
merge8 = merge(
[fire8_expand1, fire8_expand2], mode='concat', concat_axis=1)
maxpool8 = MaxPooling2D(
pool_size=(3, 3), strides=(2, 2), name='maxpool8')(merge8)
fire9_squeeze = Convolution2D(
64, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire9_squeeze')(maxpool8)
fire9_expand1 = Convolution2D(
256, 1, 1, activation='relu', init='glorot_uniform',
border_mode='same', name='fire9_expand1')(fire9_squeeze)
fire9_expand2 = Convolution2D(
256, 3, 3, activation='relu', init='glorot_uniform',
border_mode='same', name='fire9_expand2')(fire9_squeeze)
merge9 = merge(
[fire9_expand1, fire9_expand2], mode='concat', concat_axis=1)
fire9_dropout = Dropout(0.5, name='fire9_dropout')(merge9)
conv10 = Convolution2D(
nb_classes, 1, 1, init='glorot_uniform',
border_mode='valid', name='conv10')(fire9_dropout)
# The size should match the output of conv10
avgpool10 = AveragePooling2D((13, 13), name='avgpool10')(conv10)
flatten = Flatten(name='flatten')(avgpool10)
softmax = Activation("softmax", name='softmax')(flatten)
return Model(input=input_img, output=softmax)
Here's the code creating the model:
def main():
np.random.seed(45)
nb_class = 2
width, height = 224, 224
sn = model.TestModel(nb_classes=nb_class, inputs=(height, width, 3))
print('Build model')
sgd = SGD(lr=0.001, decay=0.0002, momentum=0.9, nesterov=True)
sn.compile(
optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])
print(sn.summary())
# Training
train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
nb_epoch = 500
# Generator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
#train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(width, height),
batch_size=32,
class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(width, height),
batch_size=32,
class_mode='categorical')
# Instantiate AccLossPlotter to visualise training
plotter = AccLossPlotter(graphs=['acc', 'loss'], save_graph=True)
early_stopping = EarlyStopping(monitor='val_loss', patience=3, verbose=0)
checkpoint = ModelCheckpoint(
'weights.{epoch:02d}-{val_loss:.2f}.h5',
monitor='val_loss',
verbose=0,
save_best_only=True,
save_weights_only=True,
mode='min',
period=1)
sn.fit_generator(
train_generator,
samples_per_epoch=nb_train_samples,
nb_epoch=nb_epoch,
validation_data=validation_generator,
nb_val_samples=nb_validation_samples,
callbacks=[plotter, checkpoint])
sn.save_weights('weights.h5')
Here's summary():
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
____________________________________________________________________________________________________
conv1 (Convolution2D) (None, 112, 112, 96) 14208 input_1[0][0]
____________________________________________________________________________________________________
maxpool1 (MaxPooling2D) (None, 112, 55, 47) 0 conv1[0][0]
____________________________________________________________________________________________________
fire2_squeeze (Convolution2D) (None, 112, 55, 16) 768 maxpool1[0][0]
____________________________________________________________________________________________________
fire2_expand1 (Convolution2D) (None, 112, 55, 64) 1088 fire2_squeeze[0][0]
____________________________________________________________________________________________________
fire2_expand2 (Convolution2D) (None, 112, 55, 64) 9280 fire2_squeeze[0][0]
____________________________________________________________________________________________________
merge_1 (Merge) (None, 224, 55, 64) 0 fire2_expand1[0][0]
fire2_expand2[0][0]
____________________________________________________________________________________________________
fire3_squeeze (Convolution2D) (None, 224, 55, 16) 1040 merge_1[0][0]
____________________________________________________________________________________________________
fire3_expand1 (Convolution2D) (None, 224, 55, 64) 1088 fire3_squeeze[0][0]
____________________________________________________________________________________________________
fire3_expand2 (Convolution2D) (None, 224, 55, 64) 9280 fire3_squeeze[0][0]
____________________________________________________________________________________________________
merge_2 (Merge) (None, 448, 55, 64) 0 fire3_expand1[0][0]
fire3_expand2[0][0]
____________________________________________________________________________________________________
fire4_squeeze (Convolution2D) (None, 448, 55, 32) 2080 merge_2[0][0]
____________________________________________________________________________________________________
fire4_expand1 (Convolution2D) (None, 448, 55, 128) 4224 fire4_squeeze[0][0]
____________________________________________________________________________________________________
fire4_expand2 (Convolution2D) (None, 448, 55, 128) 36992 fire4_squeeze[0][0]
____________________________________________________________________________________________________
merge_3 (Merge) (None, 896, 55, 128) 0 fire4_expand1[0][0]
fire4_expand2[0][0]
____________________________________________________________________________________________________
maxpool4 (MaxPooling2D) (None, 447, 27, 128) 0 merge_3[0][0]
____________________________________________________________________________________________________
fire5_squeeze (Convolution2D) (None, 447, 27, 32) 4128 maxpool4[0][0]
____________________________________________________________________________________________________
fire5_expand1 (Convolution2D) (None, 447, 27, 128) 4224 fire5_squeeze[0][0]
____________________________________________________________________________________________________
fire5_expand2 (Convolution2D) (None, 447, 27, 128) 36992 fire5_squeeze[0][0]
____________________________________________________________________________________________________
merge_4 (Merge) (None, 894, 27, 128) 0 fire5_expand1[0][0]
fire5_expand2[0][0]
____________________________________________________________________________________________________
fire6_squeeze (Convolution2D) (None, 894, 27, 48) 6192 merge_4[0][0]
____________________________________________________________________________________________________
fire6_expand1 (Convolution2D) (None, 894, 27, 192) 9408 fire6_squeeze[0][0]
____________________________________________________________________________________________________
fire6_expand2 (Convolution2D) (None, 894, 27, 192) 83136 fire6_squeeze[0][0]
____________________________________________________________________________________________________
merge_5 (Merge) (None, 1788, 27, 192) 0 fire6_expand1[0][0]
fire6_expand2[0][0]
____________________________________________________________________________________________________
fire7_squeeze (Convolution2D) (None, 1788, 27, 48) 9264 merge_5[0][0]
____________________________________________________________________________________________________
fire7_expand1 (Convolution2D) (None, 1788, 27, 192) 9408 fire7_squeeze[0][0]
____________________________________________________________________________________________________
fire7_expand2 (Convolution2D) (None, 1788, 27, 192) 83136 fire7_squeeze[0][0]
____________________________________________________________________________________________________
merge_6 (Merge) (None, 3576, 27, 192) 0 fire7_expand1[0][0]
fire7_expand2[0][0]
____________________________________________________________________________________________________
fire8_squeeze (Convolution2D) (None, 3576, 27, 64) 12352 merge_6[0][0]
____________________________________________________________________________________________________
fire8_expand1 (Convolution2D) (None, 3576, 27, 256) 16640 fire8_squeeze[0][0]
____________________________________________________________________________________________________
fire8_expand2 (Convolution2D) (None, 3576, 27, 256) 147712 fire8_squeeze[0][0]
____________________________________________________________________________________________________
merge_7 (Merge) (None, 7152, 27, 256) 0 fire8_expand1[0][0]
fire8_expand2[0][0]
____________________________________________________________________________________________________
maxpool8 (MaxPooling2D) (None, 3575, 13, 256) 0 merge_7[0][0]
____________________________________________________________________________________________________
fire9_squeeze (Convolution2D) (None, 3575, 13, 64) 16448 maxpool8[0][0]
____________________________________________________________________________________________________
fire9_expand1 (Convolution2D) (None, 3575, 13, 256) 16640 fire9_squeeze[0][0]
____________________________________________________________________________________________________
fire9_expand2 (Convolution2D) (None, 3575, 13, 256) 147712 fire9_squeeze[0][0]
____________________________________________________________________________________________________
merge_8 (Merge) (None, 7150, 13, 256) 0 fire9_expand1[0][0]
fire9_expand2[0][0]
____________________________________________________________________________________________________
fire9_dropout (Dropout) (None, 7150, 13, 256) 0 merge_8[0][0]
____________________________________________________________________________________________________
conv10 (Convolution2D) (None, 7150, 13, 2) 514 fire9_dropout[0][0]
____________________________________________________________________________________________________
avgpool10 (AveragePooling2D) (None, 550, 1, 2) 0 conv10[0][0]
____________________________________________________________________________________________________
flatten (Flatten) (None, 1100) 0 avgpool10[0][0]
____________________________________________________________________________________________________
softmax (Activation) (None, 1100) 0 flatten[0][0]
====================================================================================================
Total params: 683,954
Trainable params: 683,954
Non-trainable params: 0
____________________________________________________________________________________________________
None
Found 22778 images belonging to 2 classes.
Found 2222 images belonging to 2 classes.
Epoch 1/500
Any thought appreciated.
You shouldn't be using AveragePooling2D, but GlobalAveragePooling2D, that will reduce the spatial dimensions to 1, making the Flatten work and produce an output of (None, 2).

Resources