Sort a dictionary according to the lists size and then alphabetically by their key in Python? - python-3.x

So I have a tiny problem. I got help here a while ago about sorting a dictionary with keys that have a list to each key according to the value of things in the list. The keys with lists with the least amount of values on the left and to the right the keys with lists with the most amount of values. That worked great. Now I know how to sort dictionary keys alphabetically but I cant get it to work combined with the above..
I'm trying to sort the dictionary below according to first how many values the key list contains... and then alphabetically if the key list contains the same amount of values as a previous key list.
so before I would have this:
Dict = {"anna":[1,2,3],"billy":[1,2],"cilla":[1,2,3,4],"cecillia":[1,2,3,4],"dan":[1]}
And after if everything goes well I would like to have...
Dict = {"dan":[1],"billy":[1,2],"anna":[1,2,3],"cecillia":[1,2,3,4],"cilla":[1,2,3,4]}
As you see in the above, cecillia comes before cilla since they both have 4 values in their lists... and dan comes in first since he has the least amount of values in his list. I hope this makes sense. What I have right now to get the below result is:
ascending = sorted(Dict, key =lambda x: len(Dict[x]))
this gives me for example:
{"dan":[1],"billy":[1,2],"anna":[1,2,3],"cilla":[1,2,3,4],"cecillia":[1,2,3,4]}
So it works but only for the values in the list.. now when I go
ascending.sort()
it sorts the dictionary alphabetically but then the order of values from least to greatest is gone. Anyone know how to to combine the two things? I would greatly appreciate it.

You cannot keep dictionaries sorted so you must convert it to a list of tuples:
D = [ (x, Dict[x]) for x in Dict]
ascending = sorted(D, key = lambda x: x[1])
ascending.sort()
See http://wiki.python.org/moin/HowTo/Sorting . BTW the feature you are relying on is in fact because the sorting algorithm is stable (which apparently was not the case when I was programming in Python).

Related

When I combine two pandas columns with zip into a dict it reduces my samples

I have two colums in pandas: df.lat and df.lon.
Both have a length of 3897 and 556 NaN values.
My goal is to combine both columns and make a dict out of them.
I use the code:
dict(zip(df.lat,df.lon))
This creates a dict, but with one element less than my original columns.
I used len()to confirm this. I can not figure out why the dict has one element
less than my columns, when both columns have the same length.
Another problem is that the dict has only raw values, but not the keys "lat" respectively "lon".
Maybe someone here has an idea?
You may have a different length if there are repeated values in df.lat as you can't have duplicate keys in the dictionary and so these values would be dropped.
A more flexible approach may be to use the df.to_dict() native method in pandas. In this example the orientation you want is probably 'records'. Full code:
df[['lat', 'lon']].to_dict('records')

How to identify unique combinations of pairs in Excel?

My goal is to get a two-column array of all the possible unique pairs of N items (so N*(N-1)/2 pairs).
I have done this in the past using extra calculation columns, but with the advent of LET() I was wondering if I can get a single function call.
This is my example data:
In the end I came up with this, which is a little unwieldy but does the job:
=LET(x,B2:B5,INDEX(x,LET(n,ROWS(x),s,(1+SEQUENCE((n*n*2),2))/2,r,INT((s-1)/n)+1,LET(a,IF(INT(s)=s,r,INT(s)-((r-1)*n)),FILTER(a,INDEX(a,,1)<INDEX(a,,2))))))

How to remove a dataframe from an array

Hello programmers around the world! I'm having a problem with removing dataframes from a list based on a condition
Here is what I tried:
1.Loop through every dataframe in the list.
2.Check the condition which is if a particular column is not present in the
current dataframe
3.Remove dataframe
For some reason I get the following error:
ValueError: Can only compare identically-labeled DataFrame objects
def removeCorruptData(array):
for dataframe in array:
if 'LoC' not in dataframe.columns:
array.remove(dataframe);
I expected that to work but for some reason I get the following error:
ValueError: Can only compare identically-labeled DataFrame objects.
Sadly,other than what is provided in the error message itself I haven't been able to find a solution to my problem. If anybody can help it would be greatly appreciated.
Your Problem
You are trying to use remove on your list with a dataframe value. Python is going to compare that dataframe with others to determine which one to remove. This is not ideal.
What I'd do
This is going to mutate your existing list rather than returning a copy of your list that satisfies a condition.
Iterate through the list backwards while keeping track of where you are at. You can use the _i index value to determine what you should pop off. Because you are going backwards, you don't have to worry about the list's positions changing index values beneath you.
def removeCorruptData(array):
n = len(array)
for _i, d in enumerate(array[::-1], 1):
if 'LoC' not in d:
array.pop(n - _i);

How to compare two dicts based on roundoff values in python?

I need to check if two dicts are equal. If the values rounded off to 6 decimal places are equal, then the program must say that they are equal. For e.g. the following two dicts are equal
{'A': 0.00025037208557341116}
and
{'A': 0.000250372085573415}
Can anyone suggest me how to do this? My dictionaries are big (more than 8000 entries) and I need to access this values multiple times to do other calculations.
Test each key as you produce the second dict iteratively. Looking up a key/value pair from the dict you are comparing with is cheap (linear cost), and round the values as you find them.
You are essentially performing a set difference to test for equality of the keys, which requires at least a full loop over the smallest of the sets. If you already need to loop to generate one of the dicts, you are at an advantage as that'll give you the shortest route to determining inequality soonest.
To test for two floats being the same within a set tolerance, see What is the best way to compare floats for almost-equality in Python?.

Array Algorithms

I have one question to ask on Algorithms. I have been asked to write the algorithm on this: Not asking you to write the algo for me, but just let me know the efficient process what I need to do:
There is an array of n elements like the book or contents of Bible, and Suppose you have inserted a input string "Gaurav Agarwal" in that. What you want to do you need to fetch unique elements that are present in the array for that String. Just an algorithm how you will proceed further (unsorted)
If you did not understand then let me know and I will try to help on this.
One good way to find duplicates in an unsorted array is to sort it based on the string elements, therefore the algorithm for your homework question would be:
Sort the array
check your array for existence of "Gaurav Agarwal". Since it is sorted, neighboring elements would be the same string, and what you need to do then is to keep a counter and increment it until you find the first array element that is not equal to the string you're looking for
it will take some time to sort the string array and then to parse it. I would recommend just to parse the array of string and verify if length of your string is the same as the length of the string from the current position of the array. If the length is the same, compare the 2 strings
I dont think sorting and searching is the most efficient solution to your problem.
Sorting itself has nlogn complexity.
Just doing a bruteforce search of array is more efficient(has a complexity of n)
This is the case if you are finding unique elements for one string or a few strings.
If you are trying to find unique elements for a lot of input strings instead of one only then sorting makes sense.
I would proceed in the following steps:
I would use a hash table with chaining, using a hash function that
works well for strings.
find the hash of the new string and search the linked list of the
slot corresponding that hash for duplicates.

Resources