PyTesseract image_to_data function isn't recognizing my image - python-3.x

I'm using pytesseract to return the coordinates of the objects in an image.
By using this piece of code:
import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('wine.jpg')
d = pytesseract.image_to_data(img, output_type=Output.DICT)
print(d)
for i in range(n_boxes):
(x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
I get that:
{'level': [1, 2, 3, 4, 5, 5, 2, 3, 4, 5, 4, 5, 2, 3, 4, 5], 'page_num': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'par_num': [0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1], 'word_num': [0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1], 'left': [0, 485, 485, 485, 485, 612, 537, 537, 555, 555, 537, 537, 454, 454, 454, 454], 'top': [0, 323, 323, 323, 323, 324, 400, 400, 400, 400, 426, 426, 0, 0, 0, 0], 'width': [1200, 229, 229, 229, 115, 102, 123, 123, 89, 89, 123, 123, 296, 296, 296, 296], 'height': [900, 29, 29, 29, 28, 28, 40, 40, 15, 15, 14, 14, 892, 892, 892, 892], 'conf': ['-1', '-1', '-1', '-1', 58, 96, '-1', '-1', '-1', 95, '-1', 95, '-1', '-1', '-1', 95], 'text': ['', '', '', '', "JACOB'S", 'CREEK', '', '', '', 'SHIRAZ', '', 'CABERNET', '', '', '', '']}
[image used][]1
However, when I use this image:
I get that:
{'level': [1, 2, 3, 4, 5], 'page_num': [1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1], 'par_num': [0, 0, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1], 'word_num': [0, 0, 0, 0, 1], 'left': [0, 0, 0, 0, 0], 'top': [0, 162, 162, 162, 162], 'width': [1200, 0, 0, 0, 0], 'height': [900, 276, 276, 276, 276], 'conf': ['-1', '-1', '-1', '-1', 95], 'text': ['', '', '', '', '']}
Any idea why some image are working and some aren't?

It is mainly caused by different quality and contrast. it is much easier for the OCR engine to detect texts in desired images.
you can add a few pre-processing routines, including thresholding, blurring, histogram equalization and lots of other techniques. it is mainly subjective so I can not provide you with working code, it is more like trial and error to find the best technique for your scope
UPDATE:
here is a code that might help you
def preprocessing_typing_detection(inputImage):
inputImage= cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
inputImage= cv2.Laplacian(inputImage, cv2.CV_8U)
return inputImage

Related

Colors detected are not equals to colors image

I'm trying to knowing which is the color of a pixel through it's x and y. The colors are from this image.
Capturing the colors with Photoshop I've got this list of colors:
"#5D385A", "#6D3B47", "#6F5C4B", "#50717A", "#547057", "#4C6180", "#717080", "#705574", "#726B59", "#5E4854", "#415A4B", "#425A64", "#3A4E6F"
However, when I try to get the color of a pixel from the image, this color doesn't match with the previous list. And, I've got 95 different colors when in the image there are only 13 different colors.
I open the image and get the color from a pixel with this class:
import PIL.Image
class Image:
def __init__(self, file):
self.image = PIL.Image.open(file).convert("RGB")
def get_color(self, x, y):
color = self.image.getpixel((x,y))
color = ("#%02x%02x%02x" % color).upper()
return color
Here is a short list of x and y of positions where I take the color:
144, 74
140, 46
150, 53
85, 87
160, 48
147, 60
137, 49
149, 53
148, 60
143, 52
161, 30
166, 23
134, 38
146, 29
155, 40
129, 37
154, 66
153, 38
151, 33
128, 36
How is that possible? How can I get 95 different colors from the image when there is only 13 different colors?
Edit I:
I have get all the colors from each pixel in the image and no one has the color what I get with Photoshop.
I have got 256 different colors, this is the list and number times found it.
{'#885F7D': 15, '#541B47': 15, '#68355B': 819, '#65355D': 17, '#78384A': 19, '#7E3942': 19, '#7B3846': 4588, '#7C3346': 39, '#7D3046': 50, '#773F4C': 21, '#785A49': 4, '#775F49': 35, '#765C49': 17540, '#7A4648': 21, '#756349': 62, '#785B49': 56, '#7C3546': 14, '#765D49': 12, '#7A4F48': 14, '#7C3746': 29, '#785549': 7, '#775D4A': 8, '#785749': 8, '#551743': 1, '#6A3158': 39, '#68325A': 6, '#86617E': 1, '#66385D': 31, '#6C2C56': 6, '#6C2A56': 7, '#6D2B54': 3, '#678D97': 88, '#2C5B6A': 60, '#416C79': 43, '#3F717A': 7, '#43686A': 64, '#5C5F71': 32, '#465771': 3, '#5E5666': 14, '#5D4C66': 7, '#644160': 2, '#683C5F': 2, '#659197': 2, '#1C606C': 88, '#32767E': 61, '#227B84': 59, '#3A757A': 60, '#803342': 16, '#7D3745': 6, '#3A727B': 7374, '#3B7479': 3, '#36747C': 11, '#6C4450': 104, '#82303F': 18, '#852B3B': 28, '#694A56': 3, '#3D7179': 15, '#694E59': 15, '#7D3545': 11, '#387283': 30, '#3B717B': 17, '#3A727D': 16, '#7B5A48': 37, '#832B43': 11, '#3B7184': 21, '#2A7C66': 1, '#5D5D4E': 2, '#3B7180': 23, '#41715A': 6, '#45714D': 44, '#297D59': 6, '#407256': 32, '#417160': 13, '#437155': 5275, '#467055': 16, '#327A58': 7, '#68514E': 4, '#407756': 2, '#3C7356': 22, '#56654F': 17, '#437154': 15, '#387457': 30, '#3F7169': 14, '#4B6D54': 9, '#805C49': 105, '#735E4A': 10, '#7F5747': 63, '#755C49': 9, '#457154': 16, '#337558': 45, '#536B52': 18, '#735944': 95, '#7B614F': 96, '#5D6750': 36, '#437156': 43, '#69624D': 21, '#457151': 29, '#3D7172': 10, '#70604B': 10, '#487458': 2, '#45744D': 96, '#447352': 2, '#23596C': 2, '#3C6A7E': 59, '#3F696B': 41, '#64819B': 37, '#204D73': 92, '#3C5E82': 60, '#3A5E8A': 93, '#385B92': 1, '#3C6182': 4212, '#5D7F9A': 1, '#0C4A72': 2, '#305E82': 118, '#5C6982': 118, '#8F8D9B': 26, '#646473': 3, '#7B7482': 118, '#5A7169': 14, '#39714D': 12, '#727182': 2691, '#797189': 13, '#3E724D': 1, '#3B7155': 51, '#885947': 1, '#7D5744': 1, '#866251': 1, '#4F7056': 51, '#675C48': 106, '#707289': 10, '#736E6C': 11, '#746B51': 12, '#756C58': 116, '#82705C': 27, '#135941': 27, '#235D44': 105, '#255B44': 24, '#1D5943': 45, '#2B5C46': 108, '#2B5C45': 8, '#746C58': 5469, '#2E5C46': 17561, '#7E705B': 32, '#4F634D': 10, '#7B6E5A': 32, '#45614B': 14, '#707584': 117, '#6E788C': 1, '#72716E': 1, '#75677E': 117, '#746684': 1, '#766D59': 26, '#3D5F49': 11, '#255943': 33, '#957890': 39, '#7A5174': 117, '#7C4B7C': 2, '#775E6A': 62, '#727152': 39, '#726C58': 32, '#365E47': 12, '#683F63': 37, '#7A5476': 4212, '#79507A': 37, '#766166': 38, '#7A6D57': 15, '#6E6B56': 13, '#2D5D46': 5, '#696A54': 4, '#2C5B45': 8, '#626852': 8, '#305C46': 24, '#2E5C44': 26, '#7E577B': 2, '#7C567A': 55, '#7A517A': 58, '#784F79': 1, '#5F3855': 1, '#724F68': 57, '#727053': 59, '#856C77': 89, '#51303E': 91, '#62444F': 56, '#60404E': 1, '#767558': 56, '#654654': 7521, '#623F53': 16, '#674B54': 7, '#747057': 25, '#746B58': 40, '#623E53': 15, '#654754': 40, '#757158': 11, '#6F6C56': 2, '#644554': 29, '#613D53': 16, '#6B5555': 15, '#6F5E56': 15, '#756D57': 11, '#634354': 7, '#634153': 13, '#716457': 7, '#644254': 7, '#654354': 4, '#305C48': 3, '#726C59': 2, '#7E7055': 6, '#817155': 7, '#48615F': 4, '#0A5649': 1, '#2E5C3E': 26, '#135669': 2, '#2C5B68': 34, '#2B5C53': 21, '#2E5C41': 58, '#415F60': 3, '#0F5667': 5, '#2C5B64': 4676, '#2C5B66': 19, '#2C5B5B': 17, '#2E5C4D': 8, '#175966': 7, '#375D61': 2, '#61675B': 1, '#2F5B64': 20, '#2C5B60': 16, '#2F5B4A': 3, '#55675E': 2, '#2E5C4A': 8, '#275C64': 23, '#674654': 10, '#385260': 1, '#684553': 26, '#1C5E66': 46, '#564D59': 5, '#3D5660': 8, '#4F4F5B': 10, '#5E4A57': 7, '#365961': 5, '#47525D': 8, '#5C4B57': 4, '#614756': 2, '#5A4759': 36, '#504A60': 10, '#404B67': 7, '#2C5667': 18, '#8B6B75': 1, '#2B4D71': 876, '#2D5D62': 18, '#7C6D7B': 1, '#58728D': 16, '#0A365F': 16, '#21553E': 4, '#335F4B': 1, '#35624D': 20, '#3D6752': 4}
I don't understand anything. How is it possible that no one pixel has the color that I've got in Photoshop?
Edit II:
With the same code, I have got the color map of another image. This is the image:
The predominant colors that you can see in this image are these:
"#F50A22", "#00EC83", "#00A200", "#0007A4", "#9D132B", "#734500", "#6230FF", "#F42AFF", "#BEFF00", "#EC7800", "#65DCD1", "#FF6D00" : "#004500"
Executing the test, how I said, the same code. I've got that all these colors are found it in the image among others! And no one of them how in the first image.
The results are:
Colors matched: {'#F50A22': 2245, '#00EC83': 9437, '#00A200': 21039, '#0007A4': 8772, '#9D132B': 99, '#734500': 2970, '#6230FF': 112, '#F42AFF': 5271, '#BEFF00': 2380, '#EC7800': 3076, '#65DCD1': 6503, '#FF6D00': 4709, '#004500': 6612}
colors matched: 13
And other colors found it in the image are:
Other colors: {'#FFFFFF': 1931, '#FCFFFD': 27, '#FAFFFB': 2, '#F7FEF9': 12, '#F4FEF7': 10, '#F6FEF8': 20, '#F6FDF8': 1, '#F9FEFA': 12, '#FBFEFC': 9, '#FEFFFE': 40, '#FAFEFB': 12, '#FBFFFC': 7, '#F3FEF6': 7, '#F4FDF6': 2, '#F5FDF7': 1, '#F2FDF5': 3, '#EEFDF2': 3, '#F2FDF6': 7, '#F4FEF8': 12, '#EFFDF4': 3, '#E5FCEC': 4, '#DAFAE5': 1, '#D3FAE0': 3, '#D4FAE0': 1, '#DAFAE4': 1, '#DFFBE8': 1, '#E9FCEF': 3, '#EDFDF2': 2, '#EFFDF3': 3, '#E2FBEA': 3, '#E2FCEA': 3, '#EFFEF3': 1, '#F2FEF5': 1, '#EDFCF1': 2, '#EBFDF0': 1, '#F1FDF4': 1, '#F3FEF7': 4, '#EDFDF1': 2, '#E7FCEE': 3, '#E3FCEB': 1, '#E0FCE9': 1, '#DCFBE6': 5, '#DAFBE5': 1, '#D9FAE4': 1, '#D9FAE3': 1, '#E3FCEC': 1, '#EEFDF3': 1, '#D7FAE2': 1, '#D1FADF': 1, '#D1FADE': 1, '#D6FAE2': 1, '#E1FBEA': 2, '#EBFDF1': 1, '#DFFBE9': 1, '#DEFBE7': 2, '#DBFBE5': 1, '#F6132A': 111, '#00EC84': 33, '#00EC85': 16, '#04EC86': 11, '#14EC87': 3, '#F40D23': 3, '#F20E24': 1, '#F50B22': 8, '#F11426': 2, '#F40C23': 1, '#EF1A28': 1, '#EE1B29': 1, '#F01827': 1, '#F21125': 1, '#F40D24': 1, '#F40E24': 1, '#774A03': 165, '#F40E23': 1, '#F50C22': 1, '#F6142A': 3, '#00EC82': 1, '#00EB82': 2, '#00EA7F': 1, '#00EB81': 1, '#6FE09C': 1, '#7E5416': 2, '#00A300': 78, '#00A500': 43, '#D9403B': 1, '#00AB16': 1, '#00A600': 40, '#00A700': 1123, '#5E2AFF': 2471, '#00B213': 2, '#00AA00': 6, '#7A4F0D': 3, '#6636FF': 2, '#00AE02': 2, '#00AC00': 3, '#00AB08': 2, '#00A800': 12, '#00A900': 8, '#00B317': 1, '#6C3CFF': 1, '#00AE00': 2, '#00AE14': 1, '#00A903': 1, '#7F55FE': 1, '#6CEE9F': 1, '#00AD00': 2, '#6CDCD2': 268, '#6A3CFE': 2, '#7549FF': 1, '#4ED688': 1, '#6B3DFF': 1, '#5E2BFF': 24, '#6839FD': 1, '#6231FE': 1, '#5E31FC': 2, '#00AF08': 1, '#00AC07': 1, '#6339FA': 1, '#5F33FB': 3, '#5F30FD': 3, '#00B10E': 1, '#656565': 1, '#00AB00': 2, '#00B02D': 2, '#6037F9': 1, '#5F2EFE': 2, '#5F3EF5': 1, '#5F32FC': 1, '#6040F4': 1, '#5F32FB': 2, '#6041F3': 1, '#6042F2': 1, '#7145FC': 1, '#5F2CFF': 10, '#6147EF': 1, '#6454EA': 1, '#6036F9': 1, '#685AEA': 1, '#00AF2F': 1, '#6B57EE': 1, '#00B110': 1, '#00AA02': 1, '#8ADBD3': 3, '#683CFB': 1, '#72DDD2': 3, '#6D47F8': 1, '#775EF3': 1, '#9CD7D1': 2, '#5E31FD': 1, '#00AB18': 1, '#82DCD3': 1, '#673EFB': 1, '#7450F9': 1, '#612EFF': 8, '#6236FB': 1, '#602CFF': 5, '#6B49F7': 1, '#602DFF': 7, '#5F2BFF': 6, '#6334FD': 1, '#2EEB8B': 1, '#704AFB': 1, '#6231FF': 1, '#6738FE': 1, '#612DFF': 3, '#3FEB8F': 1, '#66DBD1': 5, '#67D8D2': 1, '#00AE2B': 1, '#65DAD2': 1, '#F42DFF': 15, '#FC67FF': 6, '#F246FA': 1, '#F84CFF': 7, '#6233FF': 1, '#6ADCD2': 22, '#6132FE': 1, '#FBFEFE': 2, '#F434FF': 5, '#F8FDFC': 1, '#68DCD1': 33, '#6034FE': 1, '#FB5DFF': 2, '#FAFEFD': 2, '#F2FBFA': 1, '#6442FA': 1, '#6031FF': 1, '#F539FF': 7, '#F5FCFC': 1, '#E7F9F6': 1, '#F02AFF': 5, '#EFFBF9': 2, '#DDF6F3': 1, '#5F2EFF': 1, '#DD2BFF': 1, '#E82AFF': 1, '#F32AFF': 8, '#F744FF': 3, '#E7F9F7': 1, '#CFF2EF': 1, '#6136FD': 1, '#5F2AFF': 1, '#DD2AFF': 1, '#E42AFF': 1, '#EC2AFF': 2, '#E1F7F4': 1, '#C3EFEA': 1, '#6031FE': 1, '#EA2AFF': 2, '#ED2AFF': 1, '#DAF6F2': 1, '#BAEEE7': 1, '#6DDDD3': 2, '#6937FF': 1, '#ED37FE': 1, '#D7F5F1': 2, '#B6EDE6': 1, '#69DDD2': 2, '#74DFD4': 1, '#81DED9': 1, '#EF2BFF': 1, '#B3ECE7': 1, '#7ED7D4': 1, '#F22AFF': 2, '#D9F5F2': 1, '#B7EDE7': 1, '#DB39FC': 1, '#F12EFF': 1, '#E0F7F4': 1, '#C2EFEA': 1, '#87DBD3': 1, '#E737FE': 1, '#E6F8F6': 2, '#CCF2EE': 1, '#84DCD3': 1, '#ECFAF9': 1, '#D8F5F2': 1, '#65DCD0': 5, '#69DBCF': 6, '#6ADCD1': 1, '#98D3CD': 1, '#F440FC': 1, '#F42CFF': 7, '#F4FCFB': 1, '#6FD9CC': 2, '#6FD9CB': 3, '#6BDBCF': 1, '#7ED7C8': 1, '#80D3C1': 1, '#F531FF': 3, '#F42BFF': 35, '#FDFEFE': 2, '#F8FDFD': 1, '#83D8CB': 1, '#7ED3C2': 1, '#FF7100': 78, '#FEFFFF': 1, '#97D2CC': 1, '#FF7000': 40, '#FF6E00': 40, '#FF7925': 1, '#F33FF7': 1, '#6FDDD2': 2, '#FF6B00': 8, '#F62DF4': 1, '#F52BFB': 1, '#FF7409': 1, '#F62DF3': 1, '#F52BFC': 3, '#A2CFCA': 1, '#F73FE3': 1, '#F52DF9': 1, '#F42AFE': 1, '#FF7400': 4, '#FF730E': 1, '#FC36D5': 1, '#F62DF1': 1, '#F52BFD': 1, '#F52CFF': 6, '#F52DFF': 13, '#76DDD3': 2, '#FF6C00': 8, '#F831EA': 1, '#F52BFA': 3, '#F632FF': 1, '#8DDAD2': 1, '#F836E6': 1, '#F52BF9': 2, '#A4CCC8': 1, '#FF6A08': 1, '#7ADDD3': 1, '#FF690B': 1, '#F42BFE': 1, '#92D9D2': 2, '#FF6E0B': 1, '#F031FA': 1, '#A7C8C5': 1, '#FF6429': 1, '#FF7200': 62, '#FF671A': 1, '#7EDCD3': 1, '#EC35F6': 1, '#6CDACE': 1, '#6DDBD0': 1, '#FF671C': 1, '#FF7104': 1, '#FF6911': 1, '#FF642C': 1, '#FF6B23': 1, '#FF6E13': 1, '#FF7300': 5, '#F530FF': 1, '#F532FF': 3, '#6DDDD2': 1, '#F533FF': 1, '#F635FF': 1, '#F537FF': 8, '#F539FE': 2, '#F538FF': 9, '#00AC1A': 4, '#FF780E': 1, '#004B04': 29, '#FF873C': 1, '#FF7C1B': 1, '#FF7606': 4, '#FF780C': 1, '#FF7502': 1, '#FF7504': 1, '#FF770A': 2, '#004A03': 9, '#004A02': 5, '#F73FFF': 2, '#F435FF': 1, '#004700': 9, '#FF7A0F': 1, '#F52EFF': 1, '#F63BFF': 1, '#F638FF': 1, '#004600': 22, '#004B03': 3, '#004901': 8, '#FF7D1D': 1, '#F43EFB': 1, '#FF8533': 1, '#F62DF6': 1, '#FF7F24': 1, '#004902': 3, '#004900': 1, '#F441FC': 1, '#C1E057': 1, '#C2FD00': 5, '#C1F700': 1, '#C0FE00': 176, '#C4EE30': 2, '#C3E846': 1, '#C2FB00': 2, '#FEFEFE': 9, '#004C07': 11, '#B8FB00': 2, '#C3FB00': 1, '#FDFEFD': 5, '#BAFB00': 2, '#C5F11A': 2, '#B3F600': 2, '#BEFC00': 1, '#C1FD00': 7, '#FBFCFB': 3, '#BCDE52': 1, '#BBFE00': 9, '#FAFBFA': 2, '#B6F700': 1, '#BDFB00': 1, '#C3F800': 5, '#F331FF': 1, '#B2F500': 1, '#BDF900': 1, '#BDFD00': 1, '#BBFC00': 2, '#BDFE00': 10, '#C3EA40': 1, '#FCFDFC': 4, '#B5F600': 2, '#BCFD00': 7, '#C4E847': 1, '#CDFD09': 3, '#2337B3': 1, '#4251B6': 1, '#C5ED37': 1, '#D5FF3E': 17, '#0012A7': 625, '#004B06': 1, '#CFFE22': 5, '#B6F900': 2, '#C5FD00': 5, '#D3FF3C': 3, '#005010': 1, '#CBFD00': 5, '#C2FE00': 3, '#B8F900': 2, '#D2FE31': 8, '#C8FD00': 3, '#B9FA00': 2, '#C4FD00': 3, '#F8FBF9': 2, '#CCFE08': 3, '#F4F6F4': 2, '#C7FD00': 5, '#EBF1EC': 1, '#F8F9F7': 1, '#E1EAE4': 1, '#004701': 1, '#132AAF': 2, '#D5E2D8': 1, '#F1F5F2': 2, '#D1DFD5': 1, '#EC8417': 1, '#D4E1D7': 1, '#F3F5F3': 1, '#D9E4DC': 1, '#EB7800': 15, '#ED7B00': 133, '#F6F8F5': 1, '#DEE8E0': 1, '#E6EDE6': 1, '#FAFDFB': 1, '#EAF0EB': 1, '#EEF3EF': 1, '#EA7700': 2, '#F5F7F5': 1, '#C2E64E': 1, '#CAFD00': 2, '#F7FAF8': 1, '#E87700': 2, '#EA7800': 7, '#004C06': 2, '#CFFE20': 2, '#004A05': 1, '#E37600': 1, '#E67700': 1, '#00591D': 1, '#990A22': 2077, '#A6293C': 1, '#021EAA': 1, '#0007A3': 6, '#0009A1': 2, '#001697': 2, '#000B9F': 2, '#00119B': 1})
total other colors: 448
Both images are png.
How is it possible that I found all the colors among others in the second image and not found anyone of the color searched in the first image?
you can see 13 colors yes! but the code doesn't because it's more precise than your eyes.
try zooming into the picture more, you'll see that between the colors there is another lighter one, which can consist of more than one color to go from one to the other, also I noticed some black and white at the left side "maybe it's just from your snipping tool or something"
but what I'm saying is, the code is right :)
you can try and create a photo using paint and only two colors with the fill tool, and make sure it's only one color without any gradient.
I found the problem and the solution. The problem is that I'm using images which has been created from a previous export. I mean, I have resized and make an export from an original imagin and in this momento something happens in Photoshop or whatever other program which produce an image with many other colors and not the original colors.
So, you have to run the process over the original version of the image, the export from the vectorized image. If you make an export from this export and then run the process, you will have problems like me.

Merging pickled .npz files in a desired format

I have multiple npz files which i want to merge into one npz.file with the format similar to "mnist.npz"
the format of mnist.npz is:
((array([[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[0, 0, 0, ..., 0, 0, 0]]], dtype=uint8),
array([5, 0, 4, ..., 5, 6, 8], dtype=uint8))
Here two arrays are merged into one big npz file.
My two npz arrays are:
x_array:
[[[252, 251, 253],
[151, 150, 152],
[ 28, 25, 27],
...,
[ 30, 25, 27],
[ 30, 25, 27],
[ 32, 27, 29]],
[ 23, 18, 20]],
[[ 50, 92, 163],
[ 55, 90, 163],
[ 75, 105, 176],
...,
[148, 197, 242],
[109, 157, 208],
[109, 165, 222]],
[[ 87, 104, 155],
[ 82, 112, 168],
...,
[ 29, 52, 105],
[ 30, 55, 111],
[ 36, 55, 106]]]
y_array:
[1, 1, 1, 1, 1, 1]
When i tried to merge my files, the output i got is:
(array([[[252, 251, 253],
[151, 150, 152],
[ 28, 25, 27],
...,
[ 30, 25, 27],
[ 30, 25, 27],
[ 32, 27, 29]],
[ 23, 18, 20]]], dtype=uint8), array([[[ 50, 92, 163],
[ 55, 90, 163],
[ 75, 105, 176],
...,
[148, 197, 242],
[109, 157, 208],
[109, 165, 222]],
[ 87, 104, 155],
[ 82, 112, 168],
...,
[ 29, 52, 105],
[ 30, 55, 111],
[ 36, 55, 106]]], dtype=uint8),1, 1, 1, 1, 1, 1)
So in the last line, my array is formated as
1, 1, 1, 1, 1, 1
instead of something like:
array([1, 1, 1, 1, 1, 1], dtype=uint8)
My code for merging two npz files is:
data = load('x_array.npz',allow_pickle=True)
lst = data.files
for item in lst:
x_train = data[item]
#print((x_item,x_train))
data1 = load('y_array.npz',allow_pickle=True)
lst1 = data1.files
for item in lst1:
y_train = data1[item]
out1 = (*x_train,*y_train)
np.savez('out1.npz',out1)
print(out1)
Can anyone please suggest how i can convert my second array of (1, 1, 1, 1, 1, 1) to array([1, 1, 1, 1, 1, 1], dtype=uint8)? Any suggestions are helpful
After going through my code i found out that by changing the line
out1 = (*x_train,*y_train)
to
out1 = (*x_train,y_train)

How to slice elements between other elements with python

i have list with numbers and i want to slice all the elements between numbers 192 that exist on the list and pass them to a list
my list
[192, 0, 1, 0, 1, 192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108, 192, 20, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]
i want someting like this
[192, 0, 1, 0, 1 ]
[192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108]
[192, 20, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155]
until the end of the list.
Here's one possible way to do it:
# input list
lst = [192, 0, 1, 0, 1, 192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108, 192, 20, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]
# list of indexes where 192 is found,
# plus one extra index for the final slice
indexes = [i for i, n in enumerate(lst) if n == 192] + [len(lst)]
# create the slices between consecutive indexes
[lst[indexes[i]:indexes[i+1]] for i in range(len(indexes) - 1)]
The result will be:
[[192, 0, 1, 0, 1],
[192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108],
[192, 20],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]]
You can build a generator with itertools.groupby that uses 192's equality method as a key function, pair the output of the generator with zip and then use itertools.chain.from_iterable to join the pairs (the example below assumes your list is stored in variable l):
from itertools import groupby, chain
i = (list(g) for _, g in groupby(l, key=(192).__eq__))
[list(chain.from_iterable(p)) for p in zip(i, i)]
This returns:
[[192, 0, 1, 0, 1],
[192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108],
[192, 20],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]]

Plotting the frequency associated with bigrams

I have frequency of each bigrams of a dataset.I need to sort it by descending order and visualise the top n bigrams.This is my frequency associated with each bigrams
{('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1, ('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1, ('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14, ('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1
Can anyone help me visualise these bigrams?
If I understand you correctly, this is what you need
import seaborn as sns
bg_dict = {('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1,
('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1,
('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14,
('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1}
bg_dict_sorted = sorted(bg_dict.items(), key=lambda kv: kv[1], reverse=True)
bg, counts = list(zip(*bg_dict_sorted))
bg_str = list(map(lambda x: '-'.join(x), bg))
sns.barplot(bg_str, counts)

How to correctly insert integers sqlite3 python

I need to insert rows from a excel file into a sqlite3 database i created ;
so far I managed i convert the excel into a dataframe I create the database , the table i wanted with the fields , i used a for loop to get my rows in the table through a "insert into tablename values (?..,?)" , (value1,...valuen) however only the date who got the text type is is clearly visible into the database , all the integers are passed into the database as bytes and even an int.from_bytes() don't get me my integers under the right form..
so if anyone can help
devices = df['id_device']
time = df['utc_datetime']
vote_yes = df['yes']
vote_neutre = df['neutre']
vote_no = df['no']
questions = ['question']*len(df)
kpi = ['KPI']*len(df)
id_status = [None]*len(df)
indexing = [index for index in range(len(df))]
base = list(map(lambda l,t,x,y,z,k,status , quest , index : [l,t.to_datetime(),x,y,z , k , status , quest , index] , devices , time , vote_yes , vote_neutre , vote_no , kpi , id_status , questions , indexing ))
base = [[507, datetime.datetime(2016, 8, 1, 11, 10, 30), 1, 0, 0, 'KPI', None, 'question', 0],
[507, datetime.datetime(2016, 8, 1, 11, 40, 33), 2, 0, 0, 'KPI', None, 'question', 1],
[507, datetime.datetime(2016, 8, 1, 12, 10, 39), 5, 3, 1, 'KPI', None, 'question', 2],
[507, datetime.datetime(2016, 8, 1, 13, 10, 43), 1, 0, 0, 'KPI', None, 'question', 3],
[507, datetime.datetime(2016, 8, 1, 14, 40, 43), 2, 1, 0, 'KPI', None, 'question', 4],
[507, datetime.datetime(2016, 8, 1, 15, 10, 47), 2, 0, 0, 'KPI', None, 'question', 5],
[507, datetime.datetime(2016, 8, 1, 16, 10, 47), 2, 0, 0, 'KPI', None, 'question', 6],
[507, datetime.datetime(2016, 8, 1, 16, 40, 51), 2, 1, 0, 'KPI', None, 'question', 7],
[507, datetime.datetime(2016, 8, 1, 17, 10, 56), 1, 2, 0, 'KPI', None, 'question', 8],
[507, datetime.datetime(2016, 8, 1, 17, 40, 57), 1, 0, 0, 'KPI', None, 'question', 9]]
cur = conn.cursor()
cur.execute('''create table if not exists coord4 (device int , time text)''')
for line in base:
cur.execute('''insert into coord4 values (?,?)''', (line[0], line[1]))
conn.commit()
res = cur.execute('select * from coord4')
print(res.fetchone())
#output
(b'\xfb\x01\x00\x00\x00\x00\x00\x00', '2016-08-01 11:10:30')
this is my code if you need..
The solution I was looking for was :
for line in base:
cur.execute('''insert into coord4 values (?,?)''', (int(line[0]), line[1]))
conn.commit()

Resources