Two dictionary nested inside - python-3.x

I have nested dictionary like this:
dic={'dic1':'a': , 'b': , 'dic2':'a': , 'b': , 'dic3':'a': , 'b': }
each inner dictionary has a many rows of data.
There is two problem:
1. I want to compare value of 'a' in nested dictionary to the value of one of hdf5 file dataset containing two dataset dataset1 and dataset2 such as if values of a exists in dataset1, access to the dataset2 values.
2.Access to the 'b'information corresponds to 'a' data?
for the first part I'm doing following procedure which is a never ending solution and for the second question I don't know how to access to the b in the the same tuple of a!
Does anybody have any clue how can I solve this?
for key, value in dict.items():
for k,v in value.items():
if 'a' in k:
for t in entry[key][k]:
if t in file['/dataset1']:
joint = file['/dataset2'][file['/dataset1'] == t]

You probably don't need the second loop, if your 'a' and 'b' keys are always present and known in advance (if not, you could add a test if 'a' in inner_dict and 'b' in inner_dict). Your test 'a' in k probably doesn't do what you expect (it's doing a substring match on an inner key string, which might give false positives if not all the keys are single characters).
Try something like this:
for outer_key, inner_dict in dic.items():
for t in inner_dict['a']:
if t in file['/dataset1']:
joint = file['/dataset2'][file['/dataset1'] == t] # not sure this makes sense
b_value = inner_dict['b']
# I think you want to do something with b_value here, but I'm not sure what

Related

how to best iterate through dictionary keys and compare the values?

I am very new to Programming and only started learning Python 3 about 2 wks ago.
Doing an exercise that I found rather difficult, that is designed to create a function that accepts a dictionary as an argument and is supposed to determine if the dictionary represents a "valid" chessboard. Plz note the following codes only address a single aspect of the function. The part I had the greatest struggle with.
I spent quite a bit of time working on this particular project and trying to insure that both options are "valid" code so afaik there are no errors in either?
Imagine a grid (I will print the list) that is supposed to represent the squares on a chessboard. Could someone tell me which code would be deemed as more acceptable? and Why? Or if there is a simpler way I could have done this? I will only post what I feel is "relevant" to my question if more is needed plz lmk.
checks that dictionary keys are valid Chessboard Squares
# acceptable range is columns 1 - 8 rows a - h
for board_squares in dic:
try: # this will accept any value as int
if (int(board_squares[0:-1]) <= 8 # as slice up to last key char
and board_squares[-1] <= 'h') \
is False:
print((Err) + ' Square outside range')
return False
except ValueError as e:
print((Err) + ' Improper dictionary')
return False # when testing function this can be replaced with return False
Important note: In this occurrence I am referring to "board_squares" as the dictionary keys. This is the first code I came up with after a lot of effort. It slices the dictionary key and compares it to what is supposed to be represent a "valid" chessboard square. I got a bit of negative feedback on it so I went back to the drawing board and came up with this code:
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`, inclusive."""
for c in range(ord(c1), ord(c2)+1):
yield chr(c)
chessboard_squares = []
for chr1 in range(1, 9):
for chr2 in char_range('a', 'h'):
chessboard_squares.append(str(chr1) + chr2)
print(chessboard_squares) # this is simply to print list so I have a visual representation
for key in dic:
if key in list \
is False:
print((Err) + ' Square outside range')
return False
Important note: In this occurrence I am referring to chessboard_squares as values in the list that the dictionary keys are compared to. This second code requires the function at the top to range over letters. I tried to insure it was very readable by using clearly defined variable labels. It creates a list of what the "valid dictionary keys should be" to represent Chessboard Squares. And lastly here is the printed list of what the valid dictionary keys "should be". Post is in the format of chessboard squares for clarity.
['1a', '1b', '1c', '1d', '1e', '1f', '1g', '1h',
'2a', '2b', '2c', '2d', '2e', '2f', '2g', '2h',
'3a', '3b', '3c', '3d', '3e', '3f', '3g', '3h',
'4a', '4b', '4c', '4d', '4e', '4f', '4g', '4h',
'5a', '5b', '5c', '5d', '5e', '5f', '5g', '5h',
'6a', '6b', '6c', '6d', '6e', '6f', '6g', '6h',
'7a', '7b', '7c', '7d', '7e', '7f', '7g', '7h',
'8a', '8b', '8c', '8d', '8e', '8f', '8g', '8h']
Since I posted this question I've learned a lot of new things and decided to answer my own question. Or if there is a simpler way I could have done this? Here is a much cleaner, and should be considered the "best", option.
try:
if all (
(1 <= int(row) <= 8) and ('a' <= col <= 'h')
for row, col in dict
):
return True
except ValueError:
return False
First we use the all() function that takes ALL the arguments passed to it and returns True if all are True. Empty strings count as a special exception of True.
All our dictionary keys are (supposed to be) 2 character strings which are in themselves iterable, and I can use multiple assignment(aka tuple unpacking) here if I assign exactly as many characters as are in the dictionary key to variables. In this case we assign the 1st char of the dictionary key to row and the 2nd char of the dictionary key to col(umn). I can still use try/except ValueError because if the dictionary key isn't exactly 2 characters it will raise the same error and I am checking for specific keys.
A simple understanding short version of a list or generator "comprehension" is doSomething for variable in iterable this is a "generator comprehension". What we end up with is:
Do something: cmp int(row) to 1 - 8 and col 'a' - 'h'
for: row(1st char of dict key), col(2nd char of dict key)
in: dictionary keys.
Because this is a "generator comprehension" it will create a set of values based off each loop iteration. and as an example might look something like this: True, False, False, True etc.
These values will in turn be passed to all() that will consume them and return True if ALL are True else False.
here are several resources to help understand the code should anyone wish to look further:
the all function:
https://docs.python.org/3/library/functions.html#all
understanding list comprehension:
https://medium.com/swlh/list-comprehensions-in-python-3-for-beginners-8c2b18966d93
this is great in that it explains "yield" which is vital in understanding generator comprehension:
What does the "yield" keyword do?
Multiple Assignment:
https://treyhunner.com/2018/03/tuple-unpacking-improves-python-code-readability/

How to get the innermost value among dictionaries at one go?

Here is my dictionary:
my_dict = {'00.Life': help}
help ={'A.Death':['dying','dead','mourir','pass away']}
I have one dictionary inside the other one.
How to get the innermost value at one go?
I hope I could just input 'dying'(one of the elements in the list) to get ['dying','dead','mourir','pass away'] list.
How to do that?
You can't do it "at one go" using your existing data structure. You will have to either iterate all the values, or construct a reversed, lookup dictionary first. For example:
>>> my_help ={'A.Death':['dying','dead','mourir','pass away']}
>>> my_dict = {'00.Life': my_help}
>>> lookup_dict = {k: v for v in my_dict["00.Life"].values() for k in v}
>>> lookup_dict["dying"]
['dying', 'dead', 'mourir', 'pass away']

I want to arrange the list of strings with a certain condition

I want to arrange the list of strings alphabetically but with the condition that strings that start with x go first. For example, the input is list=['apple','pear','xanadu','stop'].
I'm sure you need to add some condition at the sort function but I'm not sure what to put.
list2=[]
string=input("Enter a string:")
list2.append(string)
while string!="stop":
string=input("Enter a string:")
list2.append(string)
list2.remove("stop")
print("Your list is:",list2)
print("Sorted list:",sorted(list2))
I want the output to be list=['xanadu','apple','pear']. I removed the 'stop' btw.
Use the key function that will determine the ordering of elements:
>>> sorted(['apple','pear','xanadu','stop'], key=lambda val: (0, val) if val.startswith('x') else (1, val))
['xanadu', 'apple', 'pear', 'stop']
The lambda means the following:
lambda val:\ # determine the ordering of the element `val`
(0, val)\ # make the algorithm compare tuples!
if val.startswith('x')\
else (1, val) # use default alphabetical ordering otherwise
Since we're now comparing tuples (but ordering the actual values), tuples whose first element is zero will always sort as being greater than those whose first element is 1.

How to Sort Alphabets

Input : abcdABCD
Output : AaBbCcDd
ms=[]
n = input()
for i in n:
ms.append(i)
ms.sort()
print(ms)
It gives me ABCDabcd.
How to sort this in python?
Without having to import anything, you could probably do something like this:
arr = "abcdeABCDE"
temp = sorted(arr, key = lambda i: (i.lower(), i))
result = "".join(temp)
print(result) # AaBbCcDdEe
The key will take in each element of arr and sort it first by lower-casing it, then if it ties, it will sort it based on its original value. It will group all similar letters together (A with a, B with b) and then put the capital first.
Use a sorting key:
ms = "abcdABCD"
sorted_ms = sorted(ms, key=lambda letter:(letter.upper(), letter.islower()))
# sorted_ms = ['A', 'a', 'B', 'b', 'C', 'c', 'D', 'd']
sorted_str = ''.join(sorted_ms)
# sorted_str = 'AaBbCcDd'
Why this works:
You can specify the criteria by which to sort by using the key argument in the sorted function, or the list.sort() method - this expects a function or lambda that takes the element in question, and outputs a new criteria by which to sort it. If that "new criteria" is a tuple, then the first element takes precedence - if it's equal, then the second argument, and so on.
So, the lambda I provided here returns a 2-tuple:
(letter.upper(), letter.islower())
letter.upper() as the first element here means that the strings are going to be sorted lexigraphically, but case-insensitively (as it will sort them as if they were all uppercase). Then, I use letter.islower() as the second argument, which is True if the letter was lowercase and False otherwise. When sorting, False comes before True - which means that if you give a capital letter and a lowercase letter, the capital letter will come first.
Try this:
>>>s='abcdABCD'
>>>''.join(sorted(s,key=lambda x:x.lower()))
'aAbBcCdD'

Find distinct values for each column in an RDD in PySpark

I have an RDD that is both very long (a few billion rows) and decently wide (a few hundred columns). I want to create sets of the unique values in each column (these sets don't need to be parallelized, as they will contain no more than 500 unique values per column).
Here is what I have so far:
data = sc.parallelize([["a", "one", "x"], ["b", "one", "y"], ["a", "two", "x"], ["c", "two", "x"]])
num_columns = len(data.first())
empty_sets = [set() for index in xrange(num_columns)]
d2 = data.aggregate((empty_sets), (lambda a, b: a.add(b)), (lambda x, y: x.union(y)))
What I am doing here is trying to initate a list of empty sets, one for each column in my RDD. For the first part of the aggregation, I want to iterate row by row through data, adding the value in column n to the nth set in my list of sets. If the value already exists, it doesn't do anything. Then, it performs the union of the sets afterwards so only distinct values are returned across all partitions.
When I try to run this code, I get the following error:
AttributeError: 'list' object has no attribute 'add'
I believe the issue is that I am not accurately making it clear that I am iterating through the list of sets (empty_sets) and that I am iterating through the columns of each row in data. I believe in (lambda a, b: a.add(b)) that a is empty_sets and b is data.first() (the entire row, not a single value). This obviously doesn't work, and isn't my intended aggregation.
How can I iterate through my list of sets, and through each row of my dataframe, to add each value to its corresponding set object?
The desired output would look like:
[set(['a', 'b', 'c']), set(['one', 'two']), set(['x', 'y'])]
P.S I've looked at this example here, which is extremely similar to my use case (it's where I got the idea to use aggregate in the first place). However, I find the code very difficult to convert into PySpark, and I'm very unclear what the case and zip code is doing.
There are two problems. One, your combiner functions assume each row is a single set, but you're operating on a list of sets. Two, add doesn't return anything (try a = set(); b = a.add('1'); print b), so your first combiner function returns a list of Nones. To fix this, make your first combiner function non-anonymous and have both of them loop over the lists of sets:
def set_plus_row(sets, row):
for i in range(len(sets)):
sets[i].add(row[i])
return sets
unique_values_per_column = data.aggregate(
empty_sets,
set_plus_row, # can't be lambda b/c add doesn't return anything
lambda x, y: [a.union(b) for a, b in zip(x, y)]
)
I'm not sure what zip does in Scala, but in Python, it takes two lists and puts each corresponding element together into tuples (try x = [1, 2, 3]; y = ['a', 'b', 'c']; print zip(x, y);) so you can loop over two lists simultaneously.

Resources