Strange Behaviour when Updating Cassandra row - cassandra

I am using pyspark and pyspark-cassandra.
I have noticed this behaviour on multiple versions of Cassandra(3.0.x and 3.6.x) using COPY, sstableloader, and now saveToCassandra in pyspark.
I have the following schema
CREATE TABLE test (
id int,
time timestamp,
a int,
b int,
c int,
PRIMARY KEY ((id), time)
) WITH CLUSTERING ORDER BY (time DESC);
and the following data
(1, datetime.datetime(2015, 3, 1, 0, 18, 18, tzinfo=<UTC>), 1, 0, 0)
(1, datetime.datetime(2015, 3, 1, 0, 19, 12, tzinfo=<UTC>), 0, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 22, 59, tzinfo=<UTC>), 1, 0, 0)
(1, datetime.datetime(2015, 3, 1, 0, 23, 52, tzinfo=<UTC>), 0, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 32, 2, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 32, 8, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 43, 30, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 44, 12, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 48, 49, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 49, 7, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 50, 5, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 50, 53, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 51, 53, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 51, 59, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 54, 35, tzinfo=<UTC>), 1, 1, 0)
(1, datetime.datetime(2015, 3, 1, 0, 55, 28, tzinfo=<UTC>), 0, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 55, 55, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 0, 56, 24, tzinfo=<UTC>), 0, 3, 0)
(1, datetime.datetime(2015, 3, 1, 1, 11, 14, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 11, 17, tzinfo=<UTC>), 2, 1, 0)
(1, datetime.datetime(2015, 3, 1, 1, 12, 8, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 12, 10, tzinfo=<UTC>), 0, 3, 0)
(1, datetime.datetime(2015, 3, 1, 1, 17, 43, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 17, 49, tzinfo=<UTC>), 0, 3, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 12, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 18, tzinfo=<UTC>), 2, 1, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 18, tzinfo=<UTC>), 1, 2, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 24, tzinfo=<UTC>), 2, 1, 0)
Towards the end of the data, there are two rows which have the same timestamp.
(1, datetime.datetime(2015, 3, 1, 1, 24, 18, tzinfo=<UTC>), 2, 1, 0)
(1, datetime.datetime(2015, 3, 1, 1, 24, 18, tzinfo=<UTC>), 1, 2, 0)
It is my understanding that when I save to Cassandra, one of these will "win" - there will only be one row.
After writing to cassandra using
rdd.saveToCassandra(keyspace, table, ['id', 'time', 'a', 'b', 'c'])
Neither row appears to have won. Rather, the rows seem to have "merged".
1 | 2015-03-01 01:17:43+0000 | 1 | 2 | 0
1 | 2015-03-01 01:17:49+0000 | 0 | 3 | 0
1 | 2015-03-01 01:24:12+0000 | 1 | 2 | 0
1 | 2015-03-01 01:24:18+0000 | 2 | 2 | 0
1 | 2015-03-01 01:24:24+0000 | 2 | 1 | 0
Rather than the 2015-03-01 01:24:18+0000 containing (1, 2, 0) or (2, 1, 0), it contains (2, 2, 0).
What is happening here? I can't for the life of me figure out this behaviour is being caused.

This is a little known effect that comes from the batching together of data. Batching writes assigns the same timestamp to all Inserts in the batch. Next, if two writes are done with the exact same timestamp then there is a special merge rule since there was no "last" write. The Spark Cassandra Connector uses intra-partition batches by default so this is very likely to happen if you have this kind of clobbering of values.
The behavior with two identical write timestamps is a merge based on the Greater value.
Given Table (key, a, b)
Batch
Insert "foo", 2, 1
Insert "foo", 1, 2
End batch
The batch gives both mutations the same timestamp. Cassandra cannot chose a "last-written" since they both happened at the same time, instead it just chooses the greater value of the two. The merged result will be
"foo", 2, 2

Related

A calculation affects an identical (but different) variable in a stack elsewhere in python-3.x?

I am using a stack class to store 2d lists of strings and integers.
The lists serve as tables and I have the following code:
print('pushing')
print(lookup_table)
tables_to_be_tested.push(lookup_table)
print('new table')
print(lookup_table)
print('top of stack: ')
print(tables_to_be_tested.peek())
lookup_table[0][c2index] = c1_value
print('top of stack 2: ')
print(tables_to_be_tested.peek())
The line lookup_table[0][c2index] = c1_value only updates one value in the first list
Here is my output:
pushing
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
new table
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
top of stack:
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
top of stack 2:
[[0, 1, 2, 3, 4, 10, 6, 7, 8, 9], [39, 50, 38, 53, 28, 37, 49, 52, 31, 42], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
The lists are created independently like this: lookup_table = [[],[],[]] and are appended to in a for loop.
The calculation should not affect the 2d list in the stack and yet it does. Why is this? What is a solution?

Colors detected are not equals to colors image

I'm trying to knowing which is the color of a pixel through it's x and y. The colors are from this image.
Capturing the colors with Photoshop I've got this list of colors:
"#5D385A", "#6D3B47", "#6F5C4B", "#50717A", "#547057", "#4C6180", "#717080", "#705574", "#726B59", "#5E4854", "#415A4B", "#425A64", "#3A4E6F"
However, when I try to get the color of a pixel from the image, this color doesn't match with the previous list. And, I've got 95 different colors when in the image there are only 13 different colors.
I open the image and get the color from a pixel with this class:
import PIL.Image
class Image:
def __init__(self, file):
self.image = PIL.Image.open(file).convert("RGB")
def get_color(self, x, y):
color = self.image.getpixel((x,y))
color = ("#%02x%02x%02x" % color).upper()
return color
Here is a short list of x and y of positions where I take the color:
144, 74
140, 46
150, 53
85, 87
160, 48
147, 60
137, 49
149, 53
148, 60
143, 52
161, 30
166, 23
134, 38
146, 29
155, 40
129, 37
154, 66
153, 38
151, 33
128, 36
How is that possible? How can I get 95 different colors from the image when there is only 13 different colors?
Edit I:
I have get all the colors from each pixel in the image and no one has the color what I get with Photoshop.
I have got 256 different colors, this is the list and number times found it.
{'#885F7D': 15, '#541B47': 15, '#68355B': 819, '#65355D': 17, '#78384A': 19, '#7E3942': 19, '#7B3846': 4588, '#7C3346': 39, '#7D3046': 50, '#773F4C': 21, '#785A49': 4, '#775F49': 35, '#765C49': 17540, '#7A4648': 21, '#756349': 62, '#785B49': 56, '#7C3546': 14, '#765D49': 12, '#7A4F48': 14, '#7C3746': 29, '#785549': 7, '#775D4A': 8, '#785749': 8, '#551743': 1, '#6A3158': 39, '#68325A': 6, '#86617E': 1, '#66385D': 31, '#6C2C56': 6, '#6C2A56': 7, '#6D2B54': 3, '#678D97': 88, '#2C5B6A': 60, '#416C79': 43, '#3F717A': 7, '#43686A': 64, '#5C5F71': 32, '#465771': 3, '#5E5666': 14, '#5D4C66': 7, '#644160': 2, '#683C5F': 2, '#659197': 2, '#1C606C': 88, '#32767E': 61, '#227B84': 59, '#3A757A': 60, '#803342': 16, '#7D3745': 6, '#3A727B': 7374, '#3B7479': 3, '#36747C': 11, '#6C4450': 104, '#82303F': 18, '#852B3B': 28, '#694A56': 3, '#3D7179': 15, '#694E59': 15, '#7D3545': 11, '#387283': 30, '#3B717B': 17, '#3A727D': 16, '#7B5A48': 37, '#832B43': 11, '#3B7184': 21, '#2A7C66': 1, '#5D5D4E': 2, '#3B7180': 23, '#41715A': 6, '#45714D': 44, '#297D59': 6, '#407256': 32, '#417160': 13, '#437155': 5275, '#467055': 16, '#327A58': 7, '#68514E': 4, '#407756': 2, '#3C7356': 22, '#56654F': 17, '#437154': 15, '#387457': 30, '#3F7169': 14, '#4B6D54': 9, '#805C49': 105, '#735E4A': 10, '#7F5747': 63, '#755C49': 9, '#457154': 16, '#337558': 45, '#536B52': 18, '#735944': 95, '#7B614F': 96, '#5D6750': 36, '#437156': 43, '#69624D': 21, '#457151': 29, '#3D7172': 10, '#70604B': 10, '#487458': 2, '#45744D': 96, '#447352': 2, '#23596C': 2, '#3C6A7E': 59, '#3F696B': 41, '#64819B': 37, '#204D73': 92, '#3C5E82': 60, '#3A5E8A': 93, '#385B92': 1, '#3C6182': 4212, '#5D7F9A': 1, '#0C4A72': 2, '#305E82': 118, '#5C6982': 118, '#8F8D9B': 26, '#646473': 3, '#7B7482': 118, '#5A7169': 14, '#39714D': 12, '#727182': 2691, '#797189': 13, '#3E724D': 1, '#3B7155': 51, '#885947': 1, '#7D5744': 1, '#866251': 1, '#4F7056': 51, '#675C48': 106, '#707289': 10, '#736E6C': 11, '#746B51': 12, '#756C58': 116, '#82705C': 27, '#135941': 27, '#235D44': 105, '#255B44': 24, '#1D5943': 45, '#2B5C46': 108, '#2B5C45': 8, '#746C58': 5469, '#2E5C46': 17561, '#7E705B': 32, '#4F634D': 10, '#7B6E5A': 32, '#45614B': 14, '#707584': 117, '#6E788C': 1, '#72716E': 1, '#75677E': 117, '#746684': 1, '#766D59': 26, '#3D5F49': 11, '#255943': 33, '#957890': 39, '#7A5174': 117, '#7C4B7C': 2, '#775E6A': 62, '#727152': 39, '#726C58': 32, '#365E47': 12, '#683F63': 37, '#7A5476': 4212, '#79507A': 37, '#766166': 38, '#7A6D57': 15, '#6E6B56': 13, '#2D5D46': 5, '#696A54': 4, '#2C5B45': 8, '#626852': 8, '#305C46': 24, '#2E5C44': 26, '#7E577B': 2, '#7C567A': 55, '#7A517A': 58, '#784F79': 1, '#5F3855': 1, '#724F68': 57, '#727053': 59, '#856C77': 89, '#51303E': 91, '#62444F': 56, '#60404E': 1, '#767558': 56, '#654654': 7521, '#623F53': 16, '#674B54': 7, '#747057': 25, '#746B58': 40, '#623E53': 15, '#654754': 40, '#757158': 11, '#6F6C56': 2, '#644554': 29, '#613D53': 16, '#6B5555': 15, '#6F5E56': 15, '#756D57': 11, '#634354': 7, '#634153': 13, '#716457': 7, '#644254': 7, '#654354': 4, '#305C48': 3, '#726C59': 2, '#7E7055': 6, '#817155': 7, '#48615F': 4, '#0A5649': 1, '#2E5C3E': 26, '#135669': 2, '#2C5B68': 34, '#2B5C53': 21, '#2E5C41': 58, '#415F60': 3, '#0F5667': 5, '#2C5B64': 4676, '#2C5B66': 19, '#2C5B5B': 17, '#2E5C4D': 8, '#175966': 7, '#375D61': 2, '#61675B': 1, '#2F5B64': 20, '#2C5B60': 16, '#2F5B4A': 3, '#55675E': 2, '#2E5C4A': 8, '#275C64': 23, '#674654': 10, '#385260': 1, '#684553': 26, '#1C5E66': 46, '#564D59': 5, '#3D5660': 8, '#4F4F5B': 10, '#5E4A57': 7, '#365961': 5, '#47525D': 8, '#5C4B57': 4, '#614756': 2, '#5A4759': 36, '#504A60': 10, '#404B67': 7, '#2C5667': 18, '#8B6B75': 1, '#2B4D71': 876, '#2D5D62': 18, '#7C6D7B': 1, '#58728D': 16, '#0A365F': 16, '#21553E': 4, '#335F4B': 1, '#35624D': 20, '#3D6752': 4}
I don't understand anything. How is it possible that no one pixel has the color that I've got in Photoshop?
Edit II:
With the same code, I have got the color map of another image. This is the image:
The predominant colors that you can see in this image are these:
"#F50A22", "#00EC83", "#00A200", "#0007A4", "#9D132B", "#734500", "#6230FF", "#F42AFF", "#BEFF00", "#EC7800", "#65DCD1", "#FF6D00" : "#004500"
Executing the test, how I said, the same code. I've got that all these colors are found it in the image among others! And no one of them how in the first image.
The results are:
Colors matched: {'#F50A22': 2245, '#00EC83': 9437, '#00A200': 21039, '#0007A4': 8772, '#9D132B': 99, '#734500': 2970, '#6230FF': 112, '#F42AFF': 5271, '#BEFF00': 2380, '#EC7800': 3076, '#65DCD1': 6503, '#FF6D00': 4709, '#004500': 6612}
colors matched: 13
And other colors found it in the image are:
Other colors: {'#FFFFFF': 1931, '#FCFFFD': 27, '#FAFFFB': 2, '#F7FEF9': 12, '#F4FEF7': 10, '#F6FEF8': 20, '#F6FDF8': 1, '#F9FEFA': 12, '#FBFEFC': 9, '#FEFFFE': 40, '#FAFEFB': 12, '#FBFFFC': 7, '#F3FEF6': 7, '#F4FDF6': 2, '#F5FDF7': 1, '#F2FDF5': 3, '#EEFDF2': 3, '#F2FDF6': 7, '#F4FEF8': 12, '#EFFDF4': 3, '#E5FCEC': 4, '#DAFAE5': 1, '#D3FAE0': 3, '#D4FAE0': 1, '#DAFAE4': 1, '#DFFBE8': 1, '#E9FCEF': 3, '#EDFDF2': 2, '#EFFDF3': 3, '#E2FBEA': 3, '#E2FCEA': 3, '#EFFEF3': 1, '#F2FEF5': 1, '#EDFCF1': 2, '#EBFDF0': 1, '#F1FDF4': 1, '#F3FEF7': 4, '#EDFDF1': 2, '#E7FCEE': 3, '#E3FCEB': 1, '#E0FCE9': 1, '#DCFBE6': 5, '#DAFBE5': 1, '#D9FAE4': 1, '#D9FAE3': 1, '#E3FCEC': 1, '#EEFDF3': 1, '#D7FAE2': 1, '#D1FADF': 1, '#D1FADE': 1, '#D6FAE2': 1, '#E1FBEA': 2, '#EBFDF1': 1, '#DFFBE9': 1, '#DEFBE7': 2, '#DBFBE5': 1, '#F6132A': 111, '#00EC84': 33, '#00EC85': 16, '#04EC86': 11, '#14EC87': 3, '#F40D23': 3, '#F20E24': 1, '#F50B22': 8, '#F11426': 2, '#F40C23': 1, '#EF1A28': 1, '#EE1B29': 1, '#F01827': 1, '#F21125': 1, '#F40D24': 1, '#F40E24': 1, '#774A03': 165, '#F40E23': 1, '#F50C22': 1, '#F6142A': 3, '#00EC82': 1, '#00EB82': 2, '#00EA7F': 1, '#00EB81': 1, '#6FE09C': 1, '#7E5416': 2, '#00A300': 78, '#00A500': 43, '#D9403B': 1, '#00AB16': 1, '#00A600': 40, '#00A700': 1123, '#5E2AFF': 2471, '#00B213': 2, '#00AA00': 6, '#7A4F0D': 3, '#6636FF': 2, '#00AE02': 2, '#00AC00': 3, '#00AB08': 2, '#00A800': 12, '#00A900': 8, '#00B317': 1, '#6C3CFF': 1, '#00AE00': 2, '#00AE14': 1, '#00A903': 1, '#7F55FE': 1, '#6CEE9F': 1, '#00AD00': 2, '#6CDCD2': 268, '#6A3CFE': 2, '#7549FF': 1, '#4ED688': 1, '#6B3DFF': 1, '#5E2BFF': 24, '#6839FD': 1, '#6231FE': 1, '#5E31FC': 2, '#00AF08': 1, '#00AC07': 1, '#6339FA': 1, '#5F33FB': 3, '#5F30FD': 3, '#00B10E': 1, '#656565': 1, '#00AB00': 2, '#00B02D': 2, '#6037F9': 1, '#5F2EFE': 2, '#5F3EF5': 1, '#5F32FC': 1, '#6040F4': 1, '#5F32FB': 2, '#6041F3': 1, '#6042F2': 1, '#7145FC': 1, '#5F2CFF': 10, '#6147EF': 1, '#6454EA': 1, '#6036F9': 1, '#685AEA': 1, '#00AF2F': 1, '#6B57EE': 1, '#00B110': 1, '#00AA02': 1, '#8ADBD3': 3, '#683CFB': 1, '#72DDD2': 3, '#6D47F8': 1, '#775EF3': 1, '#9CD7D1': 2, '#5E31FD': 1, '#00AB18': 1, '#82DCD3': 1, '#673EFB': 1, '#7450F9': 1, '#612EFF': 8, '#6236FB': 1, '#602CFF': 5, '#6B49F7': 1, '#602DFF': 7, '#5F2BFF': 6, '#6334FD': 1, '#2EEB8B': 1, '#704AFB': 1, '#6231FF': 1, '#6738FE': 1, '#612DFF': 3, '#3FEB8F': 1, '#66DBD1': 5, '#67D8D2': 1, '#00AE2B': 1, '#65DAD2': 1, '#F42DFF': 15, '#FC67FF': 6, '#F246FA': 1, '#F84CFF': 7, '#6233FF': 1, '#6ADCD2': 22, '#6132FE': 1, '#FBFEFE': 2, '#F434FF': 5, '#F8FDFC': 1, '#68DCD1': 33, '#6034FE': 1, '#FB5DFF': 2, '#FAFEFD': 2, '#F2FBFA': 1, '#6442FA': 1, '#6031FF': 1, '#F539FF': 7, '#F5FCFC': 1, '#E7F9F6': 1, '#F02AFF': 5, '#EFFBF9': 2, '#DDF6F3': 1, '#5F2EFF': 1, '#DD2BFF': 1, '#E82AFF': 1, '#F32AFF': 8, '#F744FF': 3, '#E7F9F7': 1, '#CFF2EF': 1, '#6136FD': 1, '#5F2AFF': 1, '#DD2AFF': 1, '#E42AFF': 1, '#EC2AFF': 2, '#E1F7F4': 1, '#C3EFEA': 1, '#6031FE': 1, '#EA2AFF': 2, '#ED2AFF': 1, '#DAF6F2': 1, '#BAEEE7': 1, '#6DDDD3': 2, '#6937FF': 1, '#ED37FE': 1, '#D7F5F1': 2, '#B6EDE6': 1, '#69DDD2': 2, '#74DFD4': 1, '#81DED9': 1, '#EF2BFF': 1, '#B3ECE7': 1, '#7ED7D4': 1, '#F22AFF': 2, '#D9F5F2': 1, '#B7EDE7': 1, '#DB39FC': 1, '#F12EFF': 1, '#E0F7F4': 1, '#C2EFEA': 1, '#87DBD3': 1, '#E737FE': 1, '#E6F8F6': 2, '#CCF2EE': 1, '#84DCD3': 1, '#ECFAF9': 1, '#D8F5F2': 1, '#65DCD0': 5, '#69DBCF': 6, '#6ADCD1': 1, '#98D3CD': 1, '#F440FC': 1, '#F42CFF': 7, '#F4FCFB': 1, '#6FD9CC': 2, '#6FD9CB': 3, '#6BDBCF': 1, '#7ED7C8': 1, '#80D3C1': 1, '#F531FF': 3, '#F42BFF': 35, '#FDFEFE': 2, '#F8FDFD': 1, '#83D8CB': 1, '#7ED3C2': 1, '#FF7100': 78, '#FEFFFF': 1, '#97D2CC': 1, '#FF7000': 40, '#FF6E00': 40, '#FF7925': 1, '#F33FF7': 1, '#6FDDD2': 2, '#FF6B00': 8, '#F62DF4': 1, '#F52BFB': 1, '#FF7409': 1, '#F62DF3': 1, '#F52BFC': 3, '#A2CFCA': 1, '#F73FE3': 1, '#F52DF9': 1, '#F42AFE': 1, '#FF7400': 4, '#FF730E': 1, '#FC36D5': 1, '#F62DF1': 1, '#F52BFD': 1, '#F52CFF': 6, '#F52DFF': 13, '#76DDD3': 2, '#FF6C00': 8, '#F831EA': 1, '#F52BFA': 3, '#F632FF': 1, '#8DDAD2': 1, '#F836E6': 1, '#F52BF9': 2, '#A4CCC8': 1, '#FF6A08': 1, '#7ADDD3': 1, '#FF690B': 1, '#F42BFE': 1, '#92D9D2': 2, '#FF6E0B': 1, '#F031FA': 1, '#A7C8C5': 1, '#FF6429': 1, '#FF7200': 62, '#FF671A': 1, '#7EDCD3': 1, '#EC35F6': 1, '#6CDACE': 1, '#6DDBD0': 1, '#FF671C': 1, '#FF7104': 1, '#FF6911': 1, '#FF642C': 1, '#FF6B23': 1, '#FF6E13': 1, '#FF7300': 5, '#F530FF': 1, '#F532FF': 3, '#6DDDD2': 1, '#F533FF': 1, '#F635FF': 1, '#F537FF': 8, '#F539FE': 2, '#F538FF': 9, '#00AC1A': 4, '#FF780E': 1, '#004B04': 29, '#FF873C': 1, '#FF7C1B': 1, '#FF7606': 4, '#FF780C': 1, '#FF7502': 1, '#FF7504': 1, '#FF770A': 2, '#004A03': 9, '#004A02': 5, '#F73FFF': 2, '#F435FF': 1, '#004700': 9, '#FF7A0F': 1, '#F52EFF': 1, '#F63BFF': 1, '#F638FF': 1, '#004600': 22, '#004B03': 3, '#004901': 8, '#FF7D1D': 1, '#F43EFB': 1, '#FF8533': 1, '#F62DF6': 1, '#FF7F24': 1, '#004902': 3, '#004900': 1, '#F441FC': 1, '#C1E057': 1, '#C2FD00': 5, '#C1F700': 1, '#C0FE00': 176, '#C4EE30': 2, '#C3E846': 1, '#C2FB00': 2, '#FEFEFE': 9, '#004C07': 11, '#B8FB00': 2, '#C3FB00': 1, '#FDFEFD': 5, '#BAFB00': 2, '#C5F11A': 2, '#B3F600': 2, '#BEFC00': 1, '#C1FD00': 7, '#FBFCFB': 3, '#BCDE52': 1, '#BBFE00': 9, '#FAFBFA': 2, '#B6F700': 1, '#BDFB00': 1, '#C3F800': 5, '#F331FF': 1, '#B2F500': 1, '#BDF900': 1, '#BDFD00': 1, '#BBFC00': 2, '#BDFE00': 10, '#C3EA40': 1, '#FCFDFC': 4, '#B5F600': 2, '#BCFD00': 7, '#C4E847': 1, '#CDFD09': 3, '#2337B3': 1, '#4251B6': 1, '#C5ED37': 1, '#D5FF3E': 17, '#0012A7': 625, '#004B06': 1, '#CFFE22': 5, '#B6F900': 2, '#C5FD00': 5, '#D3FF3C': 3, '#005010': 1, '#CBFD00': 5, '#C2FE00': 3, '#B8F900': 2, '#D2FE31': 8, '#C8FD00': 3, '#B9FA00': 2, '#C4FD00': 3, '#F8FBF9': 2, '#CCFE08': 3, '#F4F6F4': 2, '#C7FD00': 5, '#EBF1EC': 1, '#F8F9F7': 1, '#E1EAE4': 1, '#004701': 1, '#132AAF': 2, '#D5E2D8': 1, '#F1F5F2': 2, '#D1DFD5': 1, '#EC8417': 1, '#D4E1D7': 1, '#F3F5F3': 1, '#D9E4DC': 1, '#EB7800': 15, '#ED7B00': 133, '#F6F8F5': 1, '#DEE8E0': 1, '#E6EDE6': 1, '#FAFDFB': 1, '#EAF0EB': 1, '#EEF3EF': 1, '#EA7700': 2, '#F5F7F5': 1, '#C2E64E': 1, '#CAFD00': 2, '#F7FAF8': 1, '#E87700': 2, '#EA7800': 7, '#004C06': 2, '#CFFE20': 2, '#004A05': 1, '#E37600': 1, '#E67700': 1, '#00591D': 1, '#990A22': 2077, '#A6293C': 1, '#021EAA': 1, '#0007A3': 6, '#0009A1': 2, '#001697': 2, '#000B9F': 2, '#00119B': 1})
total other colors: 448
Both images are png.
How is it possible that I found all the colors among others in the second image and not found anyone of the color searched in the first image?
you can see 13 colors yes! but the code doesn't because it's more precise than your eyes.
try zooming into the picture more, you'll see that between the colors there is another lighter one, which can consist of more than one color to go from one to the other, also I noticed some black and white at the left side "maybe it's just from your snipping tool or something"
but what I'm saying is, the code is right :)
you can try and create a photo using paint and only two colors with the fill tool, and make sure it's only one color without any gradient.
I found the problem and the solution. The problem is that I'm using images which has been created from a previous export. I mean, I have resized and make an export from an original imagin and in this momento something happens in Photoshop or whatever other program which produce an image with many other colors and not the original colors.
So, you have to run the process over the original version of the image, the export from the vectorized image. If you make an export from this export and then run the process, you will have problems like me.

Compare two lists of lists and fill in blank values

I'm reading data from an API and have a list of lists like this:
listData = [[datetime.datetime(2018, 1, 1, 5, 0), -6.78125],
[datetime.datetime(2018, 1, 1, 7, 0), -6.125],
[datetime.datetime(2018, 1, 1, 8, 0), -5.90625]]
I need to create a complete list filling in the missing values. I've created a destination, like this:
listDest = [[datetime.datetime(2018, 1, 1, 5, 0), None],
[datetime.datetime(2018, 1, 1, 6, 0), None],
[datetime.datetime(2018, 1, 1, 7, 0), None],
[datetime.datetime(2018, 1, 1, 8, 0), None]]
The end result should look like this:
[[datetime.datetime(2018, 1, 1, 5, 0), -6.78125],
[datetime.datetime(2018, 1, 1, 6, 0), None],
[datetime.datetime(2018, 1, 1, 7, 0), -6.125],
[datetime.datetime(2018, 1, 1, 8, 0), -5.90625]]
Here is the code I've tried:
for blankTime, blankValue in listDest:
for dataTime, dataValue in listData:
if blankTime == dataTime:
blankIndex = listDest.index(blankTime)
dataIndex = listData.index(dataTime)
listDest[blankIndex] = tempRm7[dataIndex]
This returns the following error, which is confusing since I know that value is in both lists.
ValueError: datetime.datetime(2018, 1, 1, 5, 0) is not in list
I attempted to adapt the methods in this answer but that's for a 1D list and I couldn't figure out how to make it work for my 2D list.
If both lists are sorted, you can merge them and then group them (using heapq.merge/itertools.groupby):
import datetime
from heapq import merge
from itertools import groupby
listData = [[datetime.datetime(2018, 1, 1, 5, 0), -6.78125],
[datetime.datetime(2018, 1, 1, 7, 0), -6.125],
[datetime.datetime(2018, 1, 1, 8, 0), -5.90625]]
listDest = [[datetime.datetime(2018, 1, 1, 5, 0), None],
[datetime.datetime(2018, 1, 1, 6, 0), None],
[datetime.datetime(2018, 1, 1, 7, 0), None],
[datetime.datetime(2018, 1, 1, 8, 0), None]]
out = [next(g) for _, g in groupby(merge(listData, listDest, key=lambda k: k[0]), lambda k: k[0])]
# pretty print to screen:
from pprint import pprint
pprint(out)
Prints:
[[datetime.datetime(2018, 1, 1, 5, 0), -6.78125],
[datetime.datetime(2018, 1, 1, 6, 0), None],
[datetime.datetime(2018, 1, 1, 7, 0), -6.125],
[datetime.datetime(2018, 1, 1, 8, 0), -5.90625]]

Plotting the frequency associated with bigrams

I have frequency of each bigrams of a dataset.I need to sort it by descending order and visualise the top n bigrams.This is my frequency associated with each bigrams
{('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1, ('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1, ('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14, ('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1
Can anyone help me visualise these bigrams?
If I understand you correctly, this is what you need
import seaborn as sns
bg_dict = {('best', 'price'): 95, ('price', 'range'): 190, ('range', 'got'): 5, ('got', 'diwali'): 2, ('diwali', 'sale'): 2, ('sale', 'simply'): 1,
('simply', 'amazed'): 1, ('amazed', 'performance'): 1, ('performance', 'camera'): 30, ('camera', 'clarity'): 35, ('clarity', 'device'): 1,
('device', 'speed'): 1, ('speed', 'looks'): 1, ('looks', 'display'): 1, ('display', 'everything'): 2, ('everything', 'nice'): 5, ('nice', 'heats'): 2, ('heats', 'lot'): 14,
('lot', 'u'): 2, ('u', 'using'): 3, ('using', 'months'): 20, ('months', 'no'): 10, ('no', 'problems'): 8, ('problems', 'whatsoever'): 1, ('whatsoever', 'great'): 1}
bg_dict_sorted = sorted(bg_dict.items(), key=lambda kv: kv[1], reverse=True)
bg, counts = list(zip(*bg_dict_sorted))
bg_str = list(map(lambda x: '-'.join(x), bg))
sns.barplot(bg_str, counts)

Get the list of RGB pixel values of each superpixel

l have an RGB image of dimension (224,224,3). l applied superpixel segmentation on it using SLIC algorithm.
As follow :
img= skimageIO.imread("first_image.jpeg")
print('img shape', img.shape) # (224,224,3)
segments_slic = slic(img, n_segments=1000, compactness=0.01, sigma=1) # Up to 1000 segments
segments_slic.shape
(224,224)
Number of returned segments are :
np.max(segments_slic)
Out[49]: 595
From 0 to 595. So, we have 596 superpixels (regions).
Let's take a look at segments_slic[0]
segments_slic[0]
Out[51]:
array([ 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5,
5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7,
8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9,
10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12,
12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14,
14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16,
16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18,
18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20,
20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21,
21, 21, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23,
23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25,
25, 25, 25])
What l would like to get ?
for each superpixel region make two arrays as follow:
1) Array : contain the indexes of the pixels belonging to the same superpixel.
For instance
superpixel_list[0] contains all the indexes of the pixels belonging to superpixel 0 .
superpixel_list[400] contains all the indexes of the pixels belonging to superpixel 400
2)superpixel_pixel_values[0] : contains the pixel values (in RGB) of the pixels belonging to superpixel 0.
For instance, let's say that pixels 0, 24 , 29, 53 belongs to the superpixel 0. Then we get
superpixel[0]= [[223,118,33],[245,222,198],[98,17,255],[255,255,0]]# RGB values of pixels belonging to superpixel 0
What is the efficient/optimized way to do that ? (Because l have l dataset of images to loop over)
EDIT-1
def sp_idx(s, index = True):
u = np.unique(s)
if index:
return [np.where(s == i) for i in u]
else:
return [s[s == i] for i in u]
#return [s[np.where(s == i)] for i in u] gives the same but is slower
superpixel_list = sp_idx(segments_slic)
superpixel = sp_idx(segments_slic, index = False)
In superpixel_list we are supposed to get a list containing the index of pixels belonging to the same superpixel.
For instance
superpixel_list[0] is supposed to get all the pixel indexes of the pixel affected to superpixel 0
however l get the following :
superpixel_list[0]
Out[73]:
(array([ 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5,
5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7,
7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10,
10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13]),
array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5,
6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6,
7, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 0, 1,
2, 3, 4, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2]))
Why two arrays ?
In superpixel[0] for instance we are supposed to get the RGB pixel values of each pixel affected to supepixel 0 as follow :
for instance pixels 0, 24 , 29, 53 are affected to superpixel 0 then :
superpixel[0]= [[223,118,33],[245,222,198],[98,17,255],[255,255,0]]
However when l use your function l get the following :
superpixel[0]
Out[79]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Thank you for your help
Can be done using np.where and the resulting indices.
def sp_idx(s, index = True):
u = np.unique(s)
return [np.where(s == i) for i in u]
superpixel_list = sp_idx(segments_slic)
superpixel = [img[idx] for idx in superpixel_list]

Resources