Looping over list in reverse, removing elements as we go - python-3.x

I'm comparing two lists. A & B. Hi A, how you doin'? I'm okay, B. If item A appears in list B it should be removed from list A. The comparisons are lists of RGB colours.
Only it's out of range and I don't know why as I'm looping over the list in reverse.
data = [
[224, 96, 96],
[128, 32, 192], # match
[192, 32, 160], # match
]
myColours = [
[207, 30, 30],
[207,159, 30],
[ 79,207, 30],
[ 32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
for x in list(range(len(data)-1, -1, -1)):
for y in range(0, len(myColours)):
if myColours[y] == data[x]:
print (data[x])
data.remove(data[x]) # list index out of range
The list data should end up as
data = [
[224, 96, 96]
]
Data will be generated by code, myColours by hand. Their order is not important, just the values.

Use list comprehension:
data = [
[224, 96, 96],
[128, 32, 192], # match
[192, 32, 160], # match
]
myColours = [
[207, 30, 30],
[207,159, 30],
[ 79,207, 30],
[ 32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
data = [item for item in data if item not in myColours]
#to iterate backwards:
#data = [item for item in reversed(data) if item not in myColours]

there is two ways to solve your problem:
first:
data = [
[224, 96, 96],
[128, 32, 192],
[192, 32, 160],
]
myColours = [
[207, 30, 30],
[207, 159, 30],
[79, 207, 30],
[32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
a = 0
x = len(data)
while a < len(data):
for i in myColours:
if data[a] == i:
del(data[a])
a = 0
break
if a < len(data):
a += 1
print(data)
second:
data = [
[224, 96, 96],
[128, 32, 192],
[192, 32, 160],
]
myColours = [
[207, 30, 30],
[207, 159, 30],
[79, 207, 30],
[32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
data2 = data.copy()
x = 0
for i in range(len(data)):
for i2 in myColours:
if data[i] == i2:
del(data2[i-x])
x += 1
break
print(data2)
and also there is a way with branches but it's not perfect way.

The code can be:
data = [
[224, 96, 96],
[128, 32, 192],
[192, 32, 160],
]
myColours = [
[207, 30, 30],
[207,159, 30],
[ 79,207, 30],
[ 32, 64, 192],
[128, 32, 192],
[192, 32, 160],
]
l=len(myColours)
for i in range(0,l):
for x in data:
if x in myColours:
data.remove(x)
print(data)
After running the code I wrote, the output is coming as:
[[224, 96, 96]]

Related

randomly sample from a high dimensional array along with a specific dimension

There has a 3-dimensional array x of shape (2000,60,5). If we think it represents a video, the 2000 can represent 2000 frames. I would like to randomly sample it along with the first dimension, i.e., get a set of frame samples. For instance, how to get an array of (500,60,5) which is randomly sampled from x along with the first dimension?
You can pass x as the first argument of the choice method. If you don't want repeated frames in your sample, use replace=False.
For example,
In [10]: x = np.arange(72).reshape(9, 2, 4) # Small array for the demo.
In [11]: x
Out[11]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23]],
[[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[48, 49, 50, 51],
[52, 53, 54, 55]],
[[56, 57, 58, 59],
[60, 61, 62, 63]],
[[64, 65, 66, 67],
[68, 69, 70, 71]]])
Sample "frames" from x with the choice method of NumPy random generator instance.
In [12]: rng = np.random.default_rng()
In [13]: rng.choice(x, size=3)
Out[13]:
array([[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [14]: rng.choice(x, size=3, replace=False)
Out[14]:
array([[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]]])
Note that the frames will be in random order; if you want to preserve the order, you could use choice to generate an array of indices, then use the sorted indices to pull the frames out of x.

Get the length of every sentence before padding in torchtext bucketiterator

Is it possible to get the length of every sentence before padding in torchtext bucketiterator :
train_loader = torchtext.legacy.data.BucketIterator(train_data, batch_size = 64, repeat=True, shuffle=True, sort_key = lambda x: len(x.text), sort=False, sort_within_batch=True, device = device)
bucketiterator dataloader :
inputs: tensor([[ 34, 87, 2, ..., 227, 239, 263],
[ 138, 7, 1006, ..., 840, 142, 665],
[ 549, 4, 1028, ..., 11, 14, 4],
...,
[ 1, 1, 5, ..., 66, 23, 13],
[ 1, 1, 1062, ..., 177, 252, 1587],
[ 1, 1, 66, ..., 553, 52, 73]]), shape: torch.Size([64, 91])
Like when using pytorch dataloader:
train_loader = data.DataLoader(train_data, batch_size = 64, shuffle=True, collate_fn=padding)
def padding(batch):
doc = [doc['input'] for doc in batch]
len_doc = [len(doc['input']) for doc in batch]
doc_pad = pad_sequence(doc, batch_first=True, padding_value=0)
return doc_pad, len_doc
pytorch dataloader :
inputs: tensor([[ 2, 1396, 2686, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0],
[ 2, 2018, 2597, ..., 0, 0, 0],
...,
[ 2, 1546, 1623, ..., 0, 0, 0],
[ 2, 1435, 1396, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0]]), shape: torch.Size([64, 40])
inputs_len_before_padding: tensor([18, 8, 21, 16, 16, 12, 40, 12, 9, 12, 17, 12, 17, 15, 16, 12, 8, 24,
25, 10, 22, 8, 8, 13, 12, 22, 17, 14, 21, 14, 19, 13, 21, 8, 28, 16,
31, 24, 23, 19, 10, 7, 16, 12, 16, 12, 17, 12, 18, 11, 8, 13, 17, 14,
11, 13, 13, 20, 8, 12, 22, 7, 9, 11]), shape: torch.Size([64])
Here is a minimal example that uses torchtext.data.Field and torchtext.data.BucketIterator:
import torchtext.data as data
# sample data
text = [
'This is sentence 1.',
'This sentence is a bit longer than the previous sentence.'
]
# define field -- notice include_lengths is set to True
text_field = data.Field(include_lengths=True, tokenize=lambda x: x.split())
fields = [('text', text_field)]
# create dataset and build vocabulary
examples = [data.Example.fromlist([t], fields) for t in text]
dataset = data.Dataset(examples, fields)
text_field.build_vocab(dataset)
# create iterator
data_iter = data.BucketIterator(dataset, batch_size=2, shuffle=False)
# the text field will now return both the data tensor and the length of the input text
for x in data_iter:
print('Data:', x.text[0])
print('Lengths:', x.text[1])
This should print (data tensor shortened for brevity):
Data: tensor([[ 2, 2],
...
[ 1, 10]])
Lengths: tensor([ 4, 10])

Merging pickled .npz files in a desired format

I have multiple npz files which i want to merge into one npz.file with the format similar to "mnist.npz"
the format of mnist.npz is:
((array([[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[0, 0, 0, ..., 0, 0, 0]]], dtype=uint8),
array([5, 0, 4, ..., 5, 6, 8], dtype=uint8))
Here two arrays are merged into one big npz file.
My two npz arrays are:
x_array:
[[[252, 251, 253],
[151, 150, 152],
[ 28, 25, 27],
...,
[ 30, 25, 27],
[ 30, 25, 27],
[ 32, 27, 29]],
[ 23, 18, 20]],
[[ 50, 92, 163],
[ 55, 90, 163],
[ 75, 105, 176],
...,
[148, 197, 242],
[109, 157, 208],
[109, 165, 222]],
[[ 87, 104, 155],
[ 82, 112, 168],
...,
[ 29, 52, 105],
[ 30, 55, 111],
[ 36, 55, 106]]]
y_array:
[1, 1, 1, 1, 1, 1]
When i tried to merge my files, the output i got is:
(array([[[252, 251, 253],
[151, 150, 152],
[ 28, 25, 27],
...,
[ 30, 25, 27],
[ 30, 25, 27],
[ 32, 27, 29]],
[ 23, 18, 20]]], dtype=uint8), array([[[ 50, 92, 163],
[ 55, 90, 163],
[ 75, 105, 176],
...,
[148, 197, 242],
[109, 157, 208],
[109, 165, 222]],
[ 87, 104, 155],
[ 82, 112, 168],
...,
[ 29, 52, 105],
[ 30, 55, 111],
[ 36, 55, 106]]], dtype=uint8),1, 1, 1, 1, 1, 1)
So in the last line, my array is formated as
1, 1, 1, 1, 1, 1
instead of something like:
array([1, 1, 1, 1, 1, 1], dtype=uint8)
My code for merging two npz files is:
data = load('x_array.npz',allow_pickle=True)
lst = data.files
for item in lst:
x_train = data[item]
#print((x_item,x_train))
data1 = load('y_array.npz',allow_pickle=True)
lst1 = data1.files
for item in lst1:
y_train = data1[item]
out1 = (*x_train,*y_train)
np.savez('out1.npz',out1)
print(out1)
Can anyone please suggest how i can convert my second array of (1, 1, 1, 1, 1, 1) to array([1, 1, 1, 1, 1, 1], dtype=uint8)? Any suggestions are helpful
After going through my code i found out that by changing the line
out1 = (*x_train,*y_train)
to
out1 = (*x_train,y_train)

VOC2012: PIL Image.open converts PNG to 2d array

I am working with VOC2012 dataset. The input image is in PNG format which has a shape of (375, 500, 4) when I use imageio to open the image. When I use PIL to open the image, then suddenly the shape becomes (500, 375). PNG images should have four dimensions on the last axis: r g b & alpha.
The image is obviously colored image, so it should have 3 dimensions (height, width, depth). PIL seems to suggest that it only has two dimensions: width & height.
Can PNG images be represented by a 2d array? Please help! So lost at the moment. Thanks!
from PIL import Image
from keras.preprocessing.image import img_to_array
import os, imageio
import numpy as np
root_path = '/Users/johnson/Downloads/'
imageio_img = imageio.imread(
os.path.join(root_path, '2009_003193.png')
)
# (375, 500, 4)
print(imageio_img.shape)
# [ 0 128 192 224 255]
print(np.unique(imageio_img))
PIL_img = Image.open(
os.path.join(root_path, '2009_003193.png')
)
# (500, 375)
print(PIL_img.size)
PIL_img_to_array = img_to_array(PIL_img)
# (375, 500, 1)
print(PIL_img_to_array.shape)
# [ 0. 2. 255.]
print(np.unique(PIL_img_to_array))
It's also quite magical that PIL seems to know how VOC2012 labels the data. PIL_image_to_array has a unique value of [0, 2, 255]. Conveniently, 2 denotes bicycle in VOC2012. 0 means background and 255 probably means the yellowish boundary around the bicycle. But from the first code snippet, I never passed the pascal classes to PIL for conversion.
def pascal_classes():
classes = {'aeroplane' : 1, 'bicycle' : 2, 'bird' : 3, 'boat' : 4,
'bottle' : 5, 'bus' : 6, 'car' : 7, 'cat' : 8,
'chair' : 9, 'cow' : 10, 'diningtable' : 11, 'dog' : 12,
'horse' : 13, 'motorbike' : 14, 'person' : 15, 'potted-plant' : 16,
'sheep' : 17, 'sofa' : 18, 'train' : 19, 'tv/monitor' : 20}
return classes
def pascal_palette():
palette = {( 0, 0, 0) : 0 ,
(128, 0, 0) : 1 ,
( 0, 128, 0) : 2 ,
(128, 128, 0) : 3 ,
( 0, 0, 128) : 4 ,
(128, 0, 128) : 5 ,
( 0, 128, 128) : 6 ,
(128, 128, 128) : 7 ,
( 64, 0, 0) : 8 ,
(192, 0, 0) : 9 ,
( 64, 128, 0) : 10,
(192, 128, 0) : 11,
( 64, 0, 128) : 12,
(192, 0, 128) : 13,
( 64, 128, 128) : 14,
(192, 128, 128) : 15,
( 0, 64, 0) : 16,
(128, 64, 0) : 17,
( 0, 192, 0) : 18,
(128, 192, 0) : 19,
( 0, 64, 128) : 20 }
Your image is palletised, not RGB. Each pixel is represented by an 8-bit index into a palette. You can see this by looking at image.mode which shows up as P.
If you want an RGB image, use:
rgb = Image.open('bike.png').convert('RGB')
If you want and RGBA image with transparency, use:
RGBA = Image.open('bike.png').convert('RGBA')
However, there is no useful information in the alpha channel, so that seems pointless.
Regarding the pascal palette, you can get that via PIL like this:
im = Image.open('bike.png')
p = im.getpalette()
for i in range (256):
print(p[3*i:3*i+3])
[0, 0, 0]
[128, 0, 0]
[0, 128, 0]
[128, 128, 0]
[0, 0, 128]
[128, 0, 128]
[0, 128, 128]
[128, 128, 128]
[64, 0, 0]
[192, 0, 0]
[64, 128, 0]
[192, 128, 0]
[64, 0, 128]
[192, 0, 128]
[64, 128, 128]
[192, 128, 128]
[0, 64, 0]
[128, 64, 0]
[0, 192, 0]
[128, 192, 0]
[0, 64, 128]
[128, 64, 128]
[0, 192, 128]
[128, 192, 128]
[64, 64, 0]
[192, 64, 0]
[64, 192, 0]
[192, 192, 0]
[64, 64, 128]
[192, 64, 128]
[64, 192, 128]
[192, 192, 128]
[0, 0, 64]
[128, 0, 64]
[0, 128, 64]
[128, 128, 64]
[0, 0, 192]
[128, 0, 192]
[0, 128, 192]
[128, 128, 192]
[64, 0, 64]
[192, 0, 64]
[64, 128, 64]
[192, 128, 64]
[64, 0, 192]
[192, 0, 192]
[64, 128, 192]
[192, 128, 192]
[0, 64, 64]
[128, 64, 64]
[0, 192, 64]
[128, 192, 64]
[0, 64, 192]
[128, 64, 192]
[0, 192, 192]
[128, 192, 192]
[64, 64, 64]
[192, 64, 64]
[64, 192, 64]
[192, 192, 64]
[64, 64, 192]
[192, 64, 192]
[64, 192, 192]
[192, 192, 192]
[32, 0, 0]
[160, 0, 0]
[32, 128, 0]
[160, 128, 0]
[32, 0, 128]
[160, 0, 128]
[32, 128, 128]
[160, 128, 128]
[96, 0, 0]
[224, 0, 0]
[96, 128, 0]
[224, 128, 0]
[96, 0, 128]
[224, 0, 128]
[96, 128, 128]
[224, 128, 128]
[32, 64, 0]
[160, 64, 0]
[32, 192, 0]
[160, 192, 0]
[32, 64, 128]
[160, 64, 128]
[32, 192, 128]
[160, 192, 128]
[96, 64, 0]
[224, 64, 0]
[96, 192, 0]
[224, 192, 0]
[96, 64, 128]
[224, 64, 128]
[96, 192, 128]
[224, 192, 128]
[32, 0, 64]
[160, 0, 64]
[32, 128, 64]
[160, 128, 64]
[32, 0, 192]
[160, 0, 192]
[32, 128, 192]
[160, 128, 192]
[96, 0, 64]
[224, 0, 64]
[96, 128, 64]
[224, 128, 64]
[96, 0, 192]
[224, 0, 192]
[96, 128, 192]
[224, 128, 192]
[32, 64, 64]
[160, 64, 64]
[32, 192, 64]
[160, 192, 64]
[32, 64, 192]
[160, 64, 192]
[32, 192, 192]
[160, 192, 192]
[96, 64, 64]
[224, 64, 64]
[96, 192, 64]
[224, 192, 64]
[96, 64, 192]
[224, 64, 192]
[96, 192, 192]
[224, 192, 192]
[0, 32, 0]
[128, 32, 0]
[0, 160, 0]
[128, 160, 0]
[0, 32, 128]
[128, 32, 128]
[0, 160, 128]
[128, 160, 128]
[64, 32, 0]
[192, 32, 0]
[64, 160, 0]
[192, 160, 0]
[64, 32, 128]
[192, 32, 128]
[64, 160, 128]
[192, 160, 128]
[0, 96, 0]
[128, 96, 0]
[0, 224, 0]
[128, 224, 0]
[0, 96, 128]
[128, 96, 128]
[0, 224, 128]
[128, 224, 128]
[64, 96, 0]
[192, 96, 0]
[64, 224, 0]
[192, 224, 0]
[64, 96, 128]
[192, 96, 128]
[64, 224, 128]
[192, 224, 128]
[0, 32, 64]
[128, 32, 64]
[0, 160, 64]
[128, 160, 64]
[0, 32, 192]
[128, 32, 192]
[0, 160, 192]
[128, 160, 192]
[64, 32, 64]
[192, 32, 64]
[64, 160, 64]
[192, 160, 64]
[64, 32, 192]
[192, 32, 192]
[64, 160, 192]
[192, 160, 192]
[0, 96, 64]
[128, 96, 64]
[0, 224, 64]
[128, 224, 64]
[0, 96, 192]
[128, 96, 192]
[0, 224, 192]
[128, 224, 192]
[64, 96, 64]
[192, 96, 64]
[64, 224, 64]
[192, 224, 64]
[64, 96, 192]
[192, 96, 192]
[64, 224, 192]
[192, 224, 192]
[32, 32, 0]
[160, 32, 0]
[32, 160, 0]
[160, 160, 0]
[32, 32, 128]
[160, 32, 128]
[32, 160, 128]
[160, 160, 128]
[96, 32, 0]
[224, 32, 0]
[96, 160, 0]
[224, 160, 0]
[96, 32, 128]
[224, 32, 128]
[96, 160, 128]
[224, 160, 128]
[32, 96, 0]
[160, 96, 0]
[32, 224, 0]
[160, 224, 0]
[32, 96, 128]
[160, 96, 128]
[32, 224, 128]
[160, 224, 128]
[96, 96, 0]
[224, 96, 0]
[96, 224, 0]
[224, 224, 0]
[96, 96, 128]
[224, 96, 128]
[96, 224, 128]
[224, 224, 128]
[32, 32, 64]
[160, 32, 64]
[32, 160, 64]
[160, 160, 64]
[32, 32, 192]
[160, 32, 192]
[32, 160, 192]
[160, 160, 192]
[96, 32, 64]
[224, 32, 64]
[96, 160, 64]
[224, 160, 64]
[96, 32, 192]
[224, 32, 192]
[96, 160, 192]
[224, 160, 192]
[32, 96, 64]
[160, 96, 64]
[32, 224, 64]
[160, 224, 64]
[32, 96, 192]
[160, 96, 192]
[32, 224, 192]
[160, 224, 192]
[96, 96, 64]
[224, 96, 64]
[96, 224, 64]
[224, 224, 64]
[96, 96, 192]
[224, 96, 192]
[96, 224, 192]
[224, 224, 192]
Then, if you want to make the bicycle red, you can do:
# Load the image and make Numpy version
im = Image.open('bike.png')
n = np.array(im)
# Make all pixels belonging to bike (2) into red (palette index 9)
n[n==2] = 9
# Make all pixels not red (9) into grey (palette index 7)
n[n!=9] = 7
# Convert back into PIL palettised image and re-apply original palette
r = Image.fromarray(n,mode='P')
r.putpalette(im.getpalette())
r.save('result.png')
Keywords: Python, PIL, Pillow, image processing, palette, palette operations, masked image, mask, extract palette, apply palette.

Generate random int in 3D array

l would like to generate a random 3d array containing random integers (coordinates) in the intervalle [0,100].
so, coordinates=dim(30,10,2)
What l have tried ?
coordinates = [[random.randint(0,100), random.randint(0,100)] for _i in range(30)]
which returns
array([[97, 68],
[11, 23],
[47, 99],
[52, 58],
[95, 60],
[89, 29],
[71, 47],
[80, 52],
[ 7, 83],
[30, 87],
[53, 96],
[70, 33],
[36, 12],
[15, 52],
[30, 76],
[61, 52],
[87, 99],
[19, 74],
[37, 63],
[40, 2],
[ 8, 84],
[70, 32],
[63, 8],
[98, 89],
[27, 12],
[75, 59],
[76, 17],
[27, 12],
[48, 61],
[39, 98]])
of shape (30,10)
What l'm supposed to get ?
dim=(30,10,2) rather than (30,10)
Use the size parameter:
import numpy as np
coordinates = np.random.randint(0, 100, size=(30, 10, 2))
will produce a NumPy array with integer values between 0 and 100 and of shape (30, 10, 2).

Resources