How do i create a similarity matrix based on the below code? - python-3.x

I'm trying to use the gower function from this link https://sourceforge.net/projects/gower-distance-4python/files/. I'm trying to apply it to my dataframe of categorical variables. However I can see that when i use the gower_distances function i have some non-zero values in my diagonals ( i need them to all be 0).
I've been trying to de-bug the code. I think i know where this is happening and it's occuring in the _gower_distance_row function. There is this line of code which i don;t understand sij_cat = np.where(xi_cat == xj_cat,np.zeros_like(xi_cat),np.ones_like(xi_cat)). But i will present it in a easier format to understand.
Say i have:
xi=np.array(['cat','dog','monkey'])
xj=np.array([['cat','dog','monkey'],['horse','dog','hairy']])
sij_cat = np.where(xi == xj,np.zeros_like(xi),np.ones_like(xi))
I get this as my result:
array([['', '', ''],
['1', '', '1']], dtype='<U6')
since i am comparing cat with cat i want to assign zero, and where it is different e.g. cat vs horse and monkey vs hairy it should be 1. I don't get why in the above result i am getting ''? i want zeroes here. How do i fix this?

np.logical_not(xi == xj).astype(int)
output will be:
array([[0, 0, 0],
[1, 0, 1]])
explanation:
np.logical_not changes True to False and False to True and astype(int) changes to 0 and 1

Related

Comparing values from nested dictionaries in Python 3

I have some code that uses a nested dictionary to hold results from a linear regression. The program runs a regression on multiple sets of data ('targets'), and stores the regression variables for each target as a dictionary, and these dictionaries are contained within another dictionary using the target name as the key with the value being the dictionary holding the variables. It's initialized like this (note that self.targets is just a list of the different targets):
self.results = {}
for keys in self.targets:
self.results[keys] = {
'lod': '', 'p': '', 'm': '', 'b': '', 'xr': ''}
In another method that gets called within a for-loop, the values in the dictionary for that target are filled:
for tar in targets:
...some calculations to produce the values...
self.results[tar].update(
'lod': lod, 'p': p, 'm': m, 'b': b, 'xr': xr)
Later on, I want to plot the different regressions on one plot. To do that, I want to find the maximum value of xr from the sub-dictionaries (xr is the right-bound needed to plot each regression).
I tried accessing all the xr values like self.results[:]['xr']. That didn't work, so I pulled the xr values in a for-loop:
xrs = []
for tar in self.targets:
xrs.append(self.results[tar]['xr'])
and this gets the job done, as I can just do max(xrs) to find the largest value. I was wondering, though, if there was a more direct way to pull the maximum value of the 'xr' keys, without having to create a separate list as a middleman.
a generator comprehension might help:
data = {
"a" : {'lod': '', 'p': '', 'm': '', 'b': '', 'xr': 0},
"b" : {'lod': '', 'p': '', 'm': '', 'b': '', 'xr': 1}
}
max_xr = max(value["xr"] for value in data.values())
print(max_xr)
That should give you 1 I think.
The only way I can come up with is this. You essentially do what max() does without filling the list (assuming xr is > 0, otherwise use a very negative number for themax like -1e-6):
themax = 0 # initialise maximum
for tar in self.targets:
val = self.results[tar]['xr']
if val > themax:
themax = val
print(themax)
It will only save a minor amount of computation time at most, but I hope it helps.

can't remove some duplicate elements from a list in python

well I was trying to remove duplicate items from a list so it has unique items and I also wanted to use for and if my code went so well but in one condition I faced something I don't understand. this is the example :
a = [1,2,2,3,3,3,21,21,16,20,28,28,7]
for x in a:
if a.count(x) > 1:
for z in range(a.count(x)):
a.remove(x)
print(a)
[1, 21, 21, 16, 20, 7]
I don't understand why !! It removes 2,3,28 which was predicted but not 21 !
any help would be great , thanks.
The best solution for this case is using set(). If you do list(set(a)) it will remove all duplicates.
Notice that set() is not the same as list() so be sure to turn it back to a list if you want to keep using list methods.
About your code, the problem with your code is that you're running on the list as you're changing it.
While you run over the items the indexes changes and that's why you miss some of the items.
You can see more clearly what happens if you add a print to understand what's x's value:
a = [1,2,2,3,3,3,21,21,16,20,28,28,7]
for x in a:
print(x)
if a.count(x) > 1:
for z in range(a.count(x)):
a.remove(x)
I believe your issue is that you are changing the list while you're looping over it
Try using a.copy() to create a new copy of the list to loop over like so.
a = [1,2,2,3,3,3,21,21,16,20,28,28,7]
for x in a.copy():
if a.count(x) > 1:
for z in range(a.count(x)):
a.remove(x)
print(a)
This code will output
[1, 16, 20, 7]

Robot framework: Lists should be equal problem with empty values: "None" and ''

So problem here is that I am comparing two lists from different locations. One list is from excel and other list is from particular table which represents the imported values of the same excel values.
So all values are correct; but the excel gives one or possbily more values which are "none" and from the table i get those values only empty value as astrophes ''. How can i change "None" to '' or vice versa?
In this particular case "None" and '' are in the 10th value slot in lists but over time it can change because different values are put to the excel.
So how can I remove or replace/modify these "nones" to '':s or vice versa?
Excel list: [1, 'X', 'Y', 200, 1999, 'Z', 'W', 4, 'V', None, 2, 1100]
Table list: [1, 'X', 'Y', 200, 1999, 'Z', 'W', 4, 'V', '', 2, 1100]
Using ExcelLibrary and ExcelRobot to get the mixture of keywords .. below is the similar approach
${iTotalRows} = Get Row Count Sheet1 (etc.) # excel
${item1} = Get Table cell //table[#class="xx"] 2 1
${item1} = Get Table cell //table[#class="xx"] 2 2 #etc..
Lists should be equal ${x} ${y}
Thank you in advance
I don't think there is a prepared keyword for this (e.g. in Collections library). If I'm wrong and I'm reinventing the wheel, please let me know, I can edit or delete my answer.
I'd create a custom keyword in Python and import it as a library into RF. This could be easily done in Python (one line in fact), so it doesn't even take much time or effort to create it.
Libraries/ListUtils.py:
def substitute_values_in_list(list, value_to_substitute, substitute_to):
return [substitute_to if ele == value_to_substitute else ele for ele in list]
Then in a test or in keywords:
*** Settings ***
Library ../Libraries/ListUtils.py
*** Test Cases ***
Empty List Value
${list}= Create List 1 2 ${None}
Log To Console ${list}
${new_list}= Substitute Values In List ${list} ${None} ${Empty}
Log To Console ${new_list}
The first console output will be:
['1', '2', None]
and the second one with substituted values:
['1', '2', '']
You can parametrize custom keyword Substitute Values In List in another way, so you can substitute empty string for None values or something like that.

Save Results from cursor.callproc in dict

I'm trying to get some data from a MySQL database with a stored procedure and store it in a dict (as mentioned by blair here):
def createProduct(self):
self.cursor.callproc('newProduct')
result = []
for recordset in self.cursor.stored_results():
for row in recordset:
result.append(dict(zip(recordset.column_names,row)))
print(result)
The cursor is created with the option dictionary=True. The output for print(result) is:
[{'manufacturers_id': 0, 'alarm_threshold': 10, 'users_id_tech': 0, 'name': 'item1',
'entities_id': 0, 'notepad': None, 'locations_id': 0, 'groups_id_tech': 0,
'consumableitemtypes_id': 0, 'id': 1, 'comment': '', 'is_deleted': 0, 'ref': ''}]
I tried to access the value of the key name (which is item1) with this code:
print(result['name'])
TypeError: list indices must be integers, not str
and:
print(result(name))
NameError: name 'name' is not defined
I thought the code from blair would create a dict whose values are accessible by keys (for example 'name')? Is this wrong or what am I doing wrong?
Looking at the list of dictionary you posted, I think there is a problem with the way you are printing.
I think this
print(result['name'])
should become this
print(result[0]['name'])
since you are trying to acess a dictionary inside a list.
Hope it works.

Python3.4 Dictionary value replacement issue

I have some code which takes a list of dictionaries and creates another list of dictionaries.
Each dictionary in the list has two key/value pairs "ID" and "opcode", where "opcode" is a 32 bit number.
My code needs to create a second list of dictionaries where the opcodes are separated, i.e. a dictionary with opcode=5 would become two dictionaries with opcode=1 and opcode=4.
(opcode is a 32 bit number and my requirement is that only 1 bit is high, ie opcode=1,2,4,8,16 etc)
I've simplified the problem into the following; my code needs to turn this:
part=[{"ID":1,"opcode":4},{"ID":2,"opcode":5},{"ID":3,"opcode":6}]
into this:
part_=[{"ID":1,"opcode":4},{"ID":2,"opcode":1},{"ID":2,"opcode":4},{"ID":3,"opcode":2},{"ID":3,"opcode":4}]
Currently my code is the following
def bit_set(theNumber,bit):
return theNumber&(1<<bit)!=0
part=[{"ID":1,"opcode":4},{"ID":2,"opcode":5},{"ID":3,"opcode":6}]
part_=[]
for i in part:
for j in range(32):
if bit_set(i["opcode"],j):
part_.append(i)
part_[-1]["opcode"]=(1<<j)
for i in part_:
print(i)
The output of the code is:
{'opcode': 4, 'ID': 1}
{'opcode': 1, 'ID': 2}
{'opcode': 2, 'ID': 3}
Interestingly if I modify the code slightly so that the value modification line is not there, the extra dictionaries are created, but obviously the opcode is not correct.
def bit_set(theNumber,bit):
return theNumber&(1<<bit)!=0
part=[{"ID":1,"opcode":4},{"ID":2,"opcode":5},{"ID":3,"opcode":6}]
part_=[]
for i in part:
for j in range(32):
if bit_set(i["opcode"],j):
part_.append(i)
#part_[-1]["opcode"]=(1<<j)
for i in part_:
print(i)
The output is
{'ID': 1, 'opcode': 4}
{'ID': 2, 'opcode': 5}
{'ID': 2, 'opcode': 5}
{'ID': 3, 'opcode': 6}
{'ID': 3, 'opcode': 6}
I can get around the issue by going about the problem a different way, but in the interest in learning what is going on I'm out of my depth.
This is caused as when you append i to the new list you do not create a copy of the dictionary instead you add a reference to the original dictionary. This means that when you change the dictionary in the next line you also change value in part. This causes the loop not to match the any more parts of the opcode. You can see this if you print out the values of part at the end of your code.
The python documentation explains this as:
Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.
Reference
You can fix this by creating a copy of the dictionary when you append it. This will allow you change the value without affecting the original dictionary. Python allows you to copy objects using the copy module (Documentation).
Just import copy and then do part_.append(copy.copy(i)) instead of part_.append(i).
import copy
def bit_set(theNumber,bit):
return theNumber&(1<<bit)!=0
part = [{"ID": 1, "opcode": 4}, {"ID": 2, "opcode": 5}, {"ID": 3, "opcode": 6}]
part_=[]
for i in part:
for j in range(32):
if bit_set(i["opcode"],j):
part_.append(copy.copy(i))
part_[-1]["opcode"]=(1<<j)
for i in part_:
print(i)

Resources