Related
Is it possible to get the length of every sentence before padding in torchtext bucketiterator :
train_loader = torchtext.legacy.data.BucketIterator(train_data, batch_size = 64, repeat=True, shuffle=True, sort_key = lambda x: len(x.text), sort=False, sort_within_batch=True, device = device)
bucketiterator dataloader :
inputs: tensor([[ 34, 87, 2, ..., 227, 239, 263],
[ 138, 7, 1006, ..., 840, 142, 665],
[ 549, 4, 1028, ..., 11, 14, 4],
...,
[ 1, 1, 5, ..., 66, 23, 13],
[ 1, 1, 1062, ..., 177, 252, 1587],
[ 1, 1, 66, ..., 553, 52, 73]]), shape: torch.Size([64, 91])
Like when using pytorch dataloader:
train_loader = data.DataLoader(train_data, batch_size = 64, shuffle=True, collate_fn=padding)
def padding(batch):
doc = [doc['input'] for doc in batch]
len_doc = [len(doc['input']) for doc in batch]
doc_pad = pad_sequence(doc, batch_first=True, padding_value=0)
return doc_pad, len_doc
pytorch dataloader :
inputs: tensor([[ 2, 1396, 2686, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0],
[ 2, 2018, 2597, ..., 0, 0, 0],
...,
[ 2, 1546, 1623, ..., 0, 0, 0],
[ 2, 1435, 1396, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0]]), shape: torch.Size([64, 40])
inputs_len_before_padding: tensor([18, 8, 21, 16, 16, 12, 40, 12, 9, 12, 17, 12, 17, 15, 16, 12, 8, 24,
25, 10, 22, 8, 8, 13, 12, 22, 17, 14, 21, 14, 19, 13, 21, 8, 28, 16,
31, 24, 23, 19, 10, 7, 16, 12, 16, 12, 17, 12, 18, 11, 8, 13, 17, 14,
11, 13, 13, 20, 8, 12, 22, 7, 9, 11]), shape: torch.Size([64])
Here is a minimal example that uses torchtext.data.Field and torchtext.data.BucketIterator:
import torchtext.data as data
# sample data
text = [
'This is sentence 1.',
'This sentence is a bit longer than the previous sentence.'
]
# define field -- notice include_lengths is set to True
text_field = data.Field(include_lengths=True, tokenize=lambda x: x.split())
fields = [('text', text_field)]
# create dataset and build vocabulary
examples = [data.Example.fromlist([t], fields) for t in text]
dataset = data.Dataset(examples, fields)
text_field.build_vocab(dataset)
# create iterator
data_iter = data.BucketIterator(dataset, batch_size=2, shuffle=False)
# the text field will now return both the data tensor and the length of the input text
for x in data_iter:
print('Data:', x.text[0])
print('Lengths:', x.text[1])
This should print (data tensor shortened for brevity):
Data: tensor([[ 2, 2],
...
[ 1, 10]])
Lengths: tensor([ 4, 10])
I am using a stack class to store 2d lists of strings and integers.
The lists serve as tables and I have the following code:
print('pushing')
print(lookup_table)
tables_to_be_tested.push(lookup_table)
print('new table')
print(lookup_table)
print('top of stack: ')
print(tables_to_be_tested.peek())
lookup_table[0][c2index] = c1_value
print('top of stack 2: ')
print(tables_to_be_tested.peek())
The line lookup_table[0][c2index] = c1_value only updates one value in the first list
Here is my output:
pushing
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
new table
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
top of stack:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
top of stack 2:
[[0, 1, 2, 3, 4, 10, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
The lists are created independently like this: lookup_table = [[],[],[]] and are appended to in a for loop.
The calculation should not affect the 2d list in the stack and yet it does. Why is this? What is a solution?
I'm trying to knowing which is the color of a pixel through it's x and y. The colors are from this image.
Capturing the colors with Photoshop I've got this list of colors:
"#5D385A", "#6D3B47", "#6F5C4B", "#50717A", "#547057", "#4C6180", "#717080", "#705574", "#726B59", "#5E4854", "#415A4B", "#425A64", "#3A4E6F"
However, when I try to get the color of a pixel from the image, this color doesn't match with the previous list. And, I've got 95 different colors when in the image there are only 13 different colors.
I open the image and get the color from a pixel with this class:
import PIL.Image
class Image:
def __init__(self, file):
self.image = PIL.Image.open(file).convert("RGB")
def get_color(self, x, y):
color = self.image.getpixel((x,y))
color = ("#%02x%02x%02x" % color).upper()
return color
Here is a short list of x and y of positions where I take the color:
144, 74
140, 46
150, 53
85, 87
160, 48
147, 60
137, 49
149, 53
148, 60
143, 52
161, 30
166, 23
134, 38
146, 29
155, 40
129, 37
154, 66
153, 38
151, 33
128, 36
How is that possible? How can I get 95 different colors from the image when there is only 13 different colors?
Edit I:
I have get all the colors from each pixel in the image and no one has the color what I get with Photoshop.
I have got 256 different colors, this is the list and number times found it.
{'#885F7D': 15, '#541B47': 15, '#68355B': 819, '#65355D': 17, '#78384A': 19, '#7E3942': 19, '#7B3846': 4588, '#7C3346': 39, '#7D3046': 50, '#773F4C': 21, '#785A49': 4, '#775F49': 35, '#765C49': 17540, '#7A4648': 21, '#756349': 62, '#785B49': 56, '#7C3546': 14, '#765D49': 12, '#7A4F48': 14, '#7C3746': 29, '#785549': 7, '#775D4A': 8, '#785749': 8, '#551743': 1, '#6A3158': 39, '#68325A': 6, '#86617E': 1, '#66385D': 31, '#6C2C56': 6, '#6C2A56': 7, '#6D2B54': 3, '#678D97': 88, '#2C5B6A': 60, '#416C79': 43, '#3F717A': 7, '#43686A': 64, '#5C5F71': 32, '#465771': 3, '#5E5666': 14, '#5D4C66': 7, '#644160': 2, '#683C5F': 2, '#659197': 2, '#1C606C': 88, '#32767E': 61, '#227B84': 59, '#3A757A': 60, '#803342': 16, '#7D3745': 6, '#3A727B': 7374, '#3B7479': 3, '#36747C': 11, '#6C4450': 104, '#82303F': 18, '#852B3B': 28, '#694A56': 3, '#3D7179': 15, '#694E59': 15, '#7D3545': 11, '#387283': 30, '#3B717B': 17, '#3A727D': 16, '#7B5A48': 37, '#832B43': 11, '#3B7184': 21, '#2A7C66': 1, '#5D5D4E': 2, '#3B7180': 23, '#41715A': 6, '#45714D': 44, '#297D59': 6, '#407256': 32, '#417160': 13, '#437155': 5275, '#467055': 16, '#327A58': 7, '#68514E': 4, '#407756': 2, '#3C7356': 22, '#56654F': 17, '#437154': 15, '#387457': 30, '#3F7169': 14, '#4B6D54': 9, '#805C49': 105, '#735E4A': 10, '#7F5747': 63, '#755C49': 9, '#457154': 16, '#337558': 45, '#536B52': 18, '#735944': 95, '#7B614F': 96, '#5D6750': 36, '#437156': 43, '#69624D': 21, '#457151': 29, '#3D7172': 10, '#70604B': 10, '#487458': 2, '#45744D': 96, '#447352': 2, '#23596C': 2, '#3C6A7E': 59, '#3F696B': 41, '#64819B': 37, '#204D73': 92, '#3C5E82': 60, '#3A5E8A': 93, '#385B92': 1, '#3C6182': 4212, '#5D7F9A': 1, '#0C4A72': 2, '#305E82': 118, '#5C6982': 118, '#8F8D9B': 26, '#646473': 3, '#7B7482': 118, '#5A7169': 14, '#39714D': 12, '#727182': 2691, '#797189': 13, '#3E724D': 1, '#3B7155': 51, '#885947': 1, '#7D5744': 1, '#866251': 1, '#4F7056': 51, '#675C48': 106, '#707289': 10, '#736E6C': 11, '#746B51': 12, '#756C58': 116, '#82705C': 27, '#135941': 27, '#235D44': 105, '#255B44': 24, '#1D5943': 45, '#2B5C46': 108, '#2B5C45': 8, '#746C58': 5469, '#2E5C46': 17561, '#7E705B': 32, '#4F634D': 10, '#7B6E5A': 32, '#45614B': 14, '#707584': 117, '#6E788C': 1, '#72716E': 1, '#75677E': 117, '#746684': 1, '#766D59': 26, '#3D5F49': 11, '#255943': 33, '#957890': 39, '#7A5174': 117, '#7C4B7C': 2, '#775E6A': 62, '#727152': 39, '#726C58': 32, '#365E47': 12, '#683F63': 37, '#7A5476': 4212, '#79507A': 37, '#766166': 38, '#7A6D57': 15, '#6E6B56': 13, '#2D5D46': 5, '#696A54': 4, '#2C5B45': 8, '#626852': 8, '#305C46': 24, '#2E5C44': 26, '#7E577B': 2, '#7C567A': 55, '#7A517A': 58, '#784F79': 1, '#5F3855': 1, '#724F68': 57, '#727053': 59, '#856C77': 89, '#51303E': 91, '#62444F': 56, '#60404E': 1, '#767558': 56, '#654654': 7521, '#623F53': 16, '#674B54': 7, '#747057': 25, '#746B58': 40, '#623E53': 15, '#654754': 40, '#757158': 11, '#6F6C56': 2, '#644554': 29, '#613D53': 16, '#6B5555': 15, '#6F5E56': 15, '#756D57': 11, '#634354': 7, '#634153': 13, '#716457': 7, '#644254': 7, '#654354': 4, '#305C48': 3, '#726C59': 2, '#7E7055': 6, '#817155': 7, '#48615F': 4, '#0A5649': 1, '#2E5C3E': 26, '#135669': 2, '#2C5B68': 34, '#2B5C53': 21, '#2E5C41': 58, '#415F60': 3, '#0F5667': 5, '#2C5B64': 4676, '#2C5B66': 19, '#2C5B5B': 17, '#2E5C4D': 8, '#175966': 7, '#375D61': 2, '#61675B': 1, '#2F5B64': 20, '#2C5B60': 16, '#2F5B4A': 3, '#55675E': 2, '#2E5C4A': 8, '#275C64': 23, '#674654': 10, '#385260': 1, '#684553': 26, '#1C5E66': 46, '#564D59': 5, '#3D5660': 8, '#4F4F5B': 10, '#5E4A57': 7, '#365961': 5, '#47525D': 8, '#5C4B57': 4, '#614756': 2, '#5A4759': 36, '#504A60': 10, '#404B67': 7, '#2C5667': 18, '#8B6B75': 1, '#2B4D71': 876, '#2D5D62': 18, '#7C6D7B': 1, '#58728D': 16, '#0A365F': 16, '#21553E': 4, '#335F4B': 1, '#35624D': 20, '#3D6752': 4}
I don't understand anything. How is it possible that no one pixel has the color that I've got in Photoshop?
Edit II:
With the same code, I have got the color map of another image. This is the image:
The predominant colors that you can see in this image are these:
"#F50A22", "#00EC83", "#00A200", "#0007A4", "#9D132B", "#734500", "#6230FF", "#F42AFF", "#BEFF00", "#EC7800", "#65DCD1", "#FF6D00" : "#004500"
Executing the test, how I said, the same code. I've got that all these colors are found it in the image among others! And no one of them how in the first image.
The results are:
Colors matched: {'#F50A22': 2245, '#00EC83': 9437, '#00A200': 21039, '#0007A4': 8772, '#9D132B': 99, '#734500': 2970, '#6230FF': 112, '#F42AFF': 5271, '#BEFF00': 2380, '#EC7800': 3076, '#65DCD1': 6503, '#FF6D00': 4709, '#004500': 6612}
colors matched: 13
And other colors found it in the image are:
Other colors: {'#FFFFFF': 1931, '#FCFFFD': 27, '#FAFFFB': 2, '#F7FEF9': 12, '#F4FEF7': 10, '#F6FEF8': 20, '#F6FDF8': 1, '#F9FEFA': 12, '#FBFEFC': 9, '#FEFFFE': 40, '#FAFEFB': 12, '#FBFFFC': 7, '#F3FEF6': 7, '#F4FDF6': 2, '#F5FDF7': 1, '#F2FDF5': 3, '#EEFDF2': 3, '#F2FDF6': 7, '#F4FEF8': 12, '#EFFDF4': 3, '#E5FCEC': 4, '#DAFAE5': 1, '#D3FAE0': 3, '#D4FAE0': 1, '#DAFAE4': 1, '#DFFBE8': 1, '#E9FCEF': 3, '#EDFDF2': 2, '#EFFDF3': 3, '#E2FBEA': 3, '#E2FCEA': 3, '#EFFEF3': 1, '#F2FEF5': 1, '#EDFCF1': 2, '#EBFDF0': 1, '#F1FDF4': 1, '#F3FEF7': 4, '#EDFDF1': 2, '#E7FCEE': 3, '#E3FCEB': 1, '#E0FCE9': 1, '#DCFBE6': 5, '#DAFBE5': 1, '#D9FAE4': 1, '#D9FAE3': 1, '#E3FCEC': 1, '#EEFDF3': 1, '#D7FAE2': 1, '#D1FADF': 1, '#D1FADE': 1, '#D6FAE2': 1, '#E1FBEA': 2, '#EBFDF1': 1, '#DFFBE9': 1, '#DEFBE7': 2, '#DBFBE5': 1, '#F6132A': 111, '#00EC84': 33, '#00EC85': 16, '#04EC86': 11, '#14EC87': 3, '#F40D23': 3, '#F20E24': 1, '#F50B22': 8, '#F11426': 2, '#F40C23': 1, '#EF1A28': 1, '#EE1B29': 1, '#F01827': 1, '#F21125': 1, '#F40D24': 1, '#F40E24': 1, '#774A03': 165, '#F40E23': 1, '#F50C22': 1, '#F6142A': 3, '#00EC82': 1, '#00EB82': 2, '#00EA7F': 1, '#00EB81': 1, '#6FE09C': 1, '#7E5416': 2, '#00A300': 78, '#00A500': 43, '#D9403B': 1, '#00AB16': 1, '#00A600': 40, '#00A700': 1123, '#5E2AFF': 2471, '#00B213': 2, '#00AA00': 6, '#7A4F0D': 3, '#6636FF': 2, '#00AE02': 2, '#00AC00': 3, '#00AB08': 2, '#00A800': 12, '#00A900': 8, '#00B317': 1, '#6C3CFF': 1, '#00AE00': 2, '#00AE14': 1, '#00A903': 1, '#7F55FE': 1, '#6CEE9F': 1, '#00AD00': 2, '#6CDCD2': 268, '#6A3CFE': 2, '#7549FF': 1, '#4ED688': 1, '#6B3DFF': 1, '#5E2BFF': 24, '#6839FD': 1, '#6231FE': 1, '#5E31FC': 2, '#00AF08': 1, '#00AC07': 1, '#6339FA': 1, '#5F33FB': 3, '#5F30FD': 3, '#00B10E': 1, '#656565': 1, '#00AB00': 2, '#00B02D': 2, '#6037F9': 1, '#5F2EFE': 2, '#5F3EF5': 1, '#5F32FC': 1, '#6040F4': 1, '#5F32FB': 2, '#6041F3': 1, '#6042F2': 1, '#7145FC': 1, '#5F2CFF': 10, '#6147EF': 1, '#6454EA': 1, '#6036F9': 1, '#685AEA': 1, '#00AF2F': 1, '#6B57EE': 1, '#00B110': 1, '#00AA02': 1, '#8ADBD3': 3, '#683CFB': 1, '#72DDD2': 3, '#6D47F8': 1, '#775EF3': 1, '#9CD7D1': 2, '#5E31FD': 1, '#00AB18': 1, '#82DCD3': 1, '#673EFB': 1, '#7450F9': 1, '#612EFF': 8, '#6236FB': 1, '#602CFF': 5, '#6B49F7': 1, '#602DFF': 7, '#5F2BFF': 6, '#6334FD': 1, '#2EEB8B': 1, '#704AFB': 1, '#6231FF': 1, '#6738FE': 1, '#612DFF': 3, '#3FEB8F': 1, '#66DBD1': 5, '#67D8D2': 1, '#00AE2B': 1, '#65DAD2': 1, '#F42DFF': 15, '#FC67FF': 6, '#F246FA': 1, '#F84CFF': 7, '#6233FF': 1, '#6ADCD2': 22, '#6132FE': 1, '#FBFEFE': 2, '#F434FF': 5, '#F8FDFC': 1, '#68DCD1': 33, '#6034FE': 1, '#FB5DFF': 2, '#FAFEFD': 2, '#F2FBFA': 1, '#6442FA': 1, '#6031FF': 1, '#F539FF': 7, '#F5FCFC': 1, '#E7F9F6': 1, '#F02AFF': 5, '#EFFBF9': 2, '#DDF6F3': 1, '#5F2EFF': 1, '#DD2BFF': 1, '#E82AFF': 1, '#F32AFF': 8, '#F744FF': 3, '#E7F9F7': 1, '#CFF2EF': 1, '#6136FD': 1, '#5F2AFF': 1, '#DD2AFF': 1, '#E42AFF': 1, '#EC2AFF': 2, '#E1F7F4': 1, '#C3EFEA': 1, '#6031FE': 1, '#EA2AFF': 2, '#ED2AFF': 1, '#DAF6F2': 1, '#BAEEE7': 1, '#6DDDD3': 2, '#6937FF': 1, '#ED37FE': 1, '#D7F5F1': 2, '#B6EDE6': 1, '#69DDD2': 2, '#74DFD4': 1, '#81DED9': 1, '#EF2BFF': 1, '#B3ECE7': 1, '#7ED7D4': 1, '#F22AFF': 2, '#D9F5F2': 1, '#B7EDE7': 1, '#DB39FC': 1, '#F12EFF': 1, '#E0F7F4': 1, '#C2EFEA': 1, '#87DBD3': 1, '#E737FE': 1, '#E6F8F6': 2, '#CCF2EE': 1, '#84DCD3': 1, '#ECFAF9': 1, '#D8F5F2': 1, '#65DCD0': 5, '#69DBCF': 6, '#6ADCD1': 1, '#98D3CD': 1, '#F440FC': 1, '#F42CFF': 7, '#F4FCFB': 1, '#6FD9CC': 2, '#6FD9CB': 3, '#6BDBCF': 1, '#7ED7C8': 1, '#80D3C1': 1, '#F531FF': 3, '#F42BFF': 35, '#FDFEFE': 2, '#F8FDFD': 1, '#83D8CB': 1, '#7ED3C2': 1, '#FF7100': 78, '#FEFFFF': 1, '#97D2CC': 1, '#FF7000': 40, '#FF6E00': 40, '#FF7925': 1, '#F33FF7': 1, '#6FDDD2': 2, '#FF6B00': 8, '#F62DF4': 1, '#F52BFB': 1, '#FF7409': 1, '#F62DF3': 1, '#F52BFC': 3, '#A2CFCA': 1, '#F73FE3': 1, '#F52DF9': 1, '#F42AFE': 1, '#FF7400': 4, '#FF730E': 1, '#FC36D5': 1, '#F62DF1': 1, '#F52BFD': 1, '#F52CFF': 6, '#F52DFF': 13, '#76DDD3': 2, '#FF6C00': 8, '#F831EA': 1, '#F52BFA': 3, '#F632FF': 1, '#8DDAD2': 1, '#F836E6': 1, '#F52BF9': 2, '#A4CCC8': 1, '#FF6A08': 1, '#7ADDD3': 1, '#FF690B': 1, '#F42BFE': 1, '#92D9D2': 2, '#FF6E0B': 1, '#F031FA': 1, '#A7C8C5': 1, '#FF6429': 1, '#FF7200': 62, '#FF671A': 1, '#7EDCD3': 1, '#EC35F6': 1, '#6CDACE': 1, '#6DDBD0': 1, '#FF671C': 1, '#FF7104': 1, '#FF6911': 1, '#FF642C': 1, '#FF6B23': 1, '#FF6E13': 1, '#FF7300': 5, '#F530FF': 1, '#F532FF': 3, '#6DDDD2': 1, '#F533FF': 1, '#F635FF': 1, '#F537FF': 8, '#F539FE': 2, '#F538FF': 9, '#00AC1A': 4, '#FF780E': 1, '#004B04': 29, '#FF873C': 1, '#FF7C1B': 1, '#FF7606': 4, '#FF780C': 1, '#FF7502': 1, '#FF7504': 1, '#FF770A': 2, '#004A03': 9, '#004A02': 5, '#F73FFF': 2, '#F435FF': 1, '#004700': 9, '#FF7A0F': 1, '#F52EFF': 1, '#F63BFF': 1, '#F638FF': 1, '#004600': 22, '#004B03': 3, '#004901': 8, '#FF7D1D': 1, '#F43EFB': 1, '#FF8533': 1, '#F62DF6': 1, '#FF7F24': 1, '#004902': 3, '#004900': 1, '#F441FC': 1, '#C1E057': 1, '#C2FD00': 5, '#C1F700': 1, '#C0FE00': 176, '#C4EE30': 2, '#C3E846': 1, '#C2FB00': 2, '#FEFEFE': 9, '#004C07': 11, '#B8FB00': 2, '#C3FB00': 1, '#FDFEFD': 5, '#BAFB00': 2, '#C5F11A': 2, '#B3F600': 2, '#BEFC00': 1, '#C1FD00': 7, '#FBFCFB': 3, '#BCDE52': 1, '#BBFE00': 9, '#FAFBFA': 2, '#B6F700': 1, '#BDFB00': 1, '#C3F800': 5, '#F331FF': 1, '#B2F500': 1, '#BDF900': 1, '#BDFD00': 1, '#BBFC00': 2, '#BDFE00': 10, '#C3EA40': 1, '#FCFDFC': 4, '#B5F600': 2, '#BCFD00': 7, '#C4E847': 1, '#CDFD09': 3, '#2337B3': 1, '#4251B6': 1, '#C5ED37': 1, '#D5FF3E': 17, '#0012A7': 625, '#004B06': 1, '#CFFE22': 5, '#B6F900': 2, '#C5FD00': 5, '#D3FF3C': 3, '#005010': 1, '#CBFD00': 5, '#C2FE00': 3, '#B8F900': 2, '#D2FE31': 8, '#C8FD00': 3, '#B9FA00': 2, '#C4FD00': 3, '#F8FBF9': 2, '#CCFE08': 3, '#F4F6F4': 2, '#C7FD00': 5, '#EBF1EC': 1, '#F8F9F7': 1, '#E1EAE4': 1, '#004701': 1, '#132AAF': 2, '#D5E2D8': 1, '#F1F5F2': 2, '#D1DFD5': 1, '#EC8417': 1, '#D4E1D7': 1, '#F3F5F3': 1, '#D9E4DC': 1, '#EB7800': 15, '#ED7B00': 133, '#F6F8F5': 1, '#DEE8E0': 1, '#E6EDE6': 1, '#FAFDFB': 1, '#EAF0EB': 1, '#EEF3EF': 1, '#EA7700': 2, '#F5F7F5': 1, '#C2E64E': 1, '#CAFD00': 2, '#F7FAF8': 1, '#E87700': 2, '#EA7800': 7, '#004C06': 2, '#CFFE20': 2, '#004A05': 1, '#E37600': 1, '#E67700': 1, '#00591D': 1, '#990A22': 2077, '#A6293C': 1, '#021EAA': 1, '#0007A3': 6, '#0009A1': 2, '#001697': 2, '#000B9F': 2, '#00119B': 1})
total other colors: 448
Both images are png.
How is it possible that I found all the colors among others in the second image and not found anyone of the color searched in the first image?
you can see 13 colors yes! but the code doesn't because it's more precise than your eyes.
try zooming into the picture more, you'll see that between the colors there is another lighter one, which can consist of more than one color to go from one to the other, also I noticed some black and white at the left side "maybe it's just from your snipping tool or something"
but what I'm saying is, the code is right :)
you can try and create a photo using paint and only two colors with the fill tool, and make sure it's only one color without any gradient.
I found the problem and the solution. The problem is that I'm using images which has been created from a previous export. I mean, I have resized and make an export from an original imagin and in this momento something happens in Photoshop or whatever other program which produce an image with many other colors and not the original colors.
So, you have to run the process over the original version of the image, the export from the vectorized image. If you make an export from this export and then run the process, you will have problems like me.
I'm using pytesseract to return the coordinates of the objects in an image.
By using this piece of code:
import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('wine.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
print(d)
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
I get that:
{'level': [1, 2, 3, 4, 5, 5, 2, 3, 4, 5, 4, 5, 2, 3, 4, 5], 'page_num': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'par_num': [0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1], 'word_num': [0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1], 'left': [0, 485, 485, 485, 485, 612, 537, 537, 555, 555, 537, 537, 454, 454, 454, 454], 'top': [0, 323, 323, 323, 323, 324, 400, 400, 400, 400, 426, 426, 0, 0, 0, 0], 'width': [1200, 229, 229, 229, 115, 102, 123, 123, 89, 89, 123, 123, 296, 296, 296, 296], 'height': [900, 29, 29, 29, 28, 28, 40, 40, 15, 15, 14, 14, 892, 892, 892, 892], 'conf': ['-1', '-1', '-1', '-1', 58, 96, '-1', '-1', '-1', 95, '-1', 95, '-1', '-1', '-1', 95], 'text': ['', '', '', '', "JACOB'S", 'CREEK', '', '', '', 'SHIRAZ', '', 'CABERNET', '', '', '', '']}
[image used][]1
However, when I use this image:
I get that:
{'level': [1, 2, 3, 4, 5], 'page_num': [1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1], 'par_num': [0, 0, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1], 'word_num': [0, 0, 0, 0, 1], 'left': [0, 0, 0, 0, 0], 'top': [0, 162, 162, 162, 162], 'width': [1200, 0, 0, 0, 0], 'height': [900, 276, 276, 276, 276], 'conf': ['-1', '-1', '-1', '-1', 95], 'text': ['', '', '', '', '']}
Any idea why some image are working and some aren't?
It is mainly caused by different quality and contrast. it is much easier for the OCR engine to detect texts in desired images.
you can add a few pre-processing routines, including thresholding, blurring, histogram equalization and lots of other techniques. it is mainly subjective so I can not provide you with working code, it is more like trial and error to find the best technique for your scope
UPDATE:
here is a code that might help you
def preprocessing_typing_detection(inputImage):
inputImage= cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
inputImage= cv2.Laplacian(inputImage, cv2.CV_8U)
return inputImage
i have list with numbers and i want to slice all the elements between numbers 192 that exist on the list and pass them to a list
my list
[192, 0, 1, 0, 1, 192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108, 192, 20, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]
i want someting like this
[192, 0, 1, 0, 1 ]
[192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108]
[192, 20, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155]
until the end of the list.
Here's one possible way to do it:
# input list
lst = [192, 0, 1, 0, 1, 192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108, 192, 20, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]
# list of indexes where 192 is found,
# plus one extra index for the final slice
indexes = [i for i, n in enumerate(lst) if n == 192] + [len(lst)]
# create the slices between consecutive indexes
[lst[indexes[i]:indexes[i+1]] for i in range(len(indexes) - 1)]
The result will be:
[[192, 0, 1, 0, 1],
[192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108],
[192, 20],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]]
You can build a generator with itertools.groupby that uses 192's equality method as a key function, pair the output of the generator with zip and then use itertools.chain.from_iterable to join the pairs (the example below assumes your list is stored in variable l):
from itertools import groupby, chain
i = (list(g) for _, g in groupby(l, key=(192).__eq__))
[list(chain.from_iterable(p)) for p in zip(i, i)]
This returns:
[[192, 0, 1, 0, 1],
[192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108],
[192, 20],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]]