Related
example: I have two data frames df1 and df2.
Q. I want to select df2's all rows in for loop based on 'favorite_color'. if
'favorite_color' is "'blue','yellow'", then return all rows of df2 having 'favorite_color'
as 'blue' OR 'yellow' OR 'blue' and 'yellow'.
raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer McDaniel'],
'age': [20, 19, 22, 21],
'favorite_color': ['blue', "'blue','yellow'", 'yellow', "'green','yellow'"],
'grade': [88, 92, 95, 70]}
df1 = pd.DataFrame(raw_data)
df1.head()
raw_data = {'name': ['Willard Morris', 'Al Jennings', 'Omar Mullins', 'Spencer
McDaniel','Omar','Spencer','abds'],
'age': [20, 19, 22, 21,24,30,35],
'favorite_color': ['blue', "'blue','yellow'", "'green','yellow'",
"green","'green','blue'",'yellow',"'blue','green','yellow'"],
'grade': [88, 92, 95, 70,80,75,85]}
df2 = pd.DataFrame(raw_data)
df2.head(7)
I have the following lists:
TAKEOFF LIST: [['EHAM', 55], ['EGLL', 46], ['LOWW', 44], ['LFPG', 43], ['LTBA', 38], ['EDDF', 37], ['LEMD', 34], ['EKCH', 33], ['EGKK', 31], ['LFPO', 30], ['LEBL', 28], ['LEPA', 28], ['LIRF', 25], ['EBBR', 25], ['ENGM', 23]]
LANDING LIST: [['LEMD', 38], ['EDDM', 35], ['LEBL', 34], ['LFPO', 33], ['LFPG', 32], ['EKCH', 29], ['LTBA', 27], ['ENGM', 25], ['LSZH', 25], ['LTFJ', 24], ['LOWW', 23], ['EHAM', 23], ['EGKK', 22], ['EDDL', 22], ['ESSA', 21]]
I want to compare both lists, if the first item for every row exist in both lists, then add it to a new lists with both values and the sum of it.
Example output top 3 by the sum of its values:
OUTPUT_LIST: [['EHAM', 78, 55, 23], ['LFPG', 75, 43, 32], ['LOWW', 67, 44, 23]]
Constraints: The final structure must be a lists of lists and this is just a simplified part of the original code. Actually the lists are composed from reading a file with thousands of lines (up to 500000).
Lists are not really ideal for that kind of data. You should consider storing the data in e.g. dicts in the first place. However given input and output lists as above, we could first convert to dicts (using dict comprehensions) and then back to a double list using a list comprehension:
takeoffs = {k: v for k, v in takeoff_list}
landings = {k: v for k, v in landing_list}
output = [[k, v1+landings[k], v1, landings[k]] for k, v1 in takeoffs.items() if k in landings]
Note that this code does not check for errors, e.g. for duplicate items in either list.
My problem is I am trying to sort this lists within a dictionary where I am using the dictionary name as the field name and the lists as the data rows. I have a feeling I am using the wrong approach (should this be a tuple or a dataframe or what?)
I have tried using sorted() but can't get past only sorting by the key (e.g. name, score1, score2). I want to maintain the key order but rearrange the values while keeping their relationship across keys.
This is a sample dataset that I want to sort by score1 (or score2):
scores = {'name': ['joe', 'pete', 'betsy', 'susan', 'pat'], 'score1': [99, 90, 84, 65, 100], 'score2': [85, 91, 90, 55, 98]}
After sorting for score1, I would like to see:
pat, joe, pete, betsy, susan
After sorting for score1, I would like to see:
pat, pete, betsy, joe, susan
If you create scores yourself, it will be much easier if it is structured differently, ie [{'name': 'joe', 'score1': 1, 'score2': 2}, ...].
Then the usage of sorted is quite simple:
scores = [{'name': 'joe', 'score1': 99, 'score2': 85},
{'name': 'pete', 'score1': 90, 'score2': 91},
{'name': 'betsy', 'score1': 84, 'score2': 90},
{'name': 'susan', 'score1': 65, 'score2': 55},
{'name': 'pat', 'score1': 100, 'score2': 98}]
by_score_1 = [d['name'] for d in sorted(scores, key=lambda d: d['score1'], reverse=True)]
print(by_score_1)
by_score_2 = [d['name'] for d in sorted(scores, key=lambda d: d['score2'], reverse=True)]
print(by_score_2)
Outputs:
['pat', 'joe', 'pete', 'betsy', 'susan']
['pat', 'pete', 'betsy', 'joe', 'susan']
The other answer is nice. You could also turn it into a list of tuples then easily sort it:
scores = {
'name': ['joe', 'pete', 'betsy', 'susan', 'pat'],
'score1': [99, 90, 84, 65, 100],
'score2': [85, 91, 90, 55, 98]}
t = list(zip(*scores.values()))
print(t)
Output:
[('joe', 99, 85), ('pete', 90, 91), ('betsy', 84, 90), ('susan', 65, 55), ('pat', 100, 98)]
Then you can sort it:
# Sort by score1
print(sorted(t, key=lambda x: (x[1]), reverse=True))
# Sort by score2
print(sorted(t, key=lambda x: (x[2]), reverse=True))
# Sort by both scores:
print(sorted(t, key=lambda x: (x[1], x[2]), reverse=True))
Just a different way to attack the same problem. Doing it this way you can easily print the scores of the individuals as well.
this is a possiblity:
print(sorted(scores['name'], reverse=True,
key=lambda x: scores['score1'][scores['name'].index(x)]))
# ['pat', 'joe', 'pete', 'betsy', 'susan']
print(sorted(scores['name'], reverse=True,
key=lambda x: scores['score2'][scores['name'].index(x)]))
# ['pat', 'pete', 'betsy', 'joe', 'susan']
Was looking for a way to get the list of a partial row.
Name x y r
a 9 81 63
a 98 5 89
b 51 50 73
b 41 22 14
c 6 18 1
c 1 93 55
d 57 2 90
d 58 24 20
So i was trying to get the dictionary as follows,
di = {a:{0: [9,81,63], 1: [98,5,89]},
b:{0:[51,50,73], 1:[41,22,14]},
c:{0:[6,18,1], 1:[1,93,55]},
d:{0:[57,2,90], 1:[58,24,20]}}
Use groupby with custom function for count lists, last convert output Series to_dict:
di = (df.groupby('Name')['x','y','r']
.apply(lambda x: dict(zip(range(len(x)),x.values.tolist())))
.to_dict())
print (di)
{'b': {0: [51, 50, 73], 1: [41, 22, 14]},
'a': {0: [9, 81, 63], 1: [98, 5, 89]},
'c': {0: [6, 18, 1], 1: [1, 93, 55]},
'd': {0: [57, 2, 90], 1: [58, 24, 20]}}
Detail:
print (df.groupby('Name')['x','y','r']
.apply(lambda x: dict(zip(range(len(x)),x.values.tolist()))))
Name
a {0: [9, 81, 63], 1: [98, 5, 89]}
b {0: [51, 50, 73], 1: [41, 22, 14]}
c {0: [6, 18, 1], 1: [1, 93, 55]}
d {0: [57, 2, 90], 1: [58, 24, 20]}
dtype: object
Thank you volcano for suggestion use enumerate:
di = (df.groupby('Name')['x','y','r']
.apply(lambda x: dict(enumerate(x.values.tolist())))
.to_dict())
For better testing is possible use custom function:
def f(x):
#print (x)
a = range(len(x))
b = x.values.tolist()
print (a)
print (b)
return dict(zip(a,b))
[[9, 81, 63], [98, 5, 89]]
range(0, 2)
[[9, 81, 63], [98, 5, 89]]
range(0, 2)
[[51, 50, 73], [41, 22, 14]]
range(0, 2)
[[6, 18, 1], [1, 93, 55]]
range(0, 2)
[[57, 2, 90], [58, 24, 20]]
di = df.groupby('Name')['x','y','r'].apply(f).to_dict()
print (di)
Sometimes it is best to minimize the footprint and overhead.
Using itertools.count, collections.defaultdict
from itertools import count
from collections import defaultdict
counts = {k: count(0) for k in df.Name.unique()}
d = defaultdict(dict)
for k, *v in df.values.tolist():
d[k][next(counts[k])] = v
dict(d)
{'a': {0: [9, 81, 63], 1: [98, 5, 89]},
'b': {0: [51, 50, 73], 1: [41, 22, 14]},
'c': {0: [6, 18, 1], 1: [1, 93, 55]},
'd': {0: [57, 2, 90], 1: [58, 24, 20]}}
I have these 2 dicts:
a={"test1":90, "test2":45, "test3":67, "test4":74}
b={"test1":32, "test2":45, "test3":82, "test4":100}
how to extract the maximum value for the same key to get new dict as this below:
c={"test1":90, "test2":45, "test3":82, "test4":100}
You can try like this,
>>> a={"test1":90, "test2":45, "test3":67, "test4":74}
>>> b={"test1":32, "test2":45, "test3":82, "test4":100}
>>> c = { key:max(value,b[key]) for key, value in a.iteritems() }
>>> c
{'test1': 90, 'test3': 82, 'test2': 45, 'test4': 100}
Try this:
>>> a={"test1":90, "test2":45, "test3":67, "test4":74}
>>> b={"test1":32, "test2":45, "test3":82, "test4":100}
>>> c={ k:max(a[k],b[k]) for k in a if b.get(k,'')}
{'test1': 90, 'test3': 82, 'test2': 45, 'test4': 100}
Not the best, but still a variant:
from itertools import chain
a = {'test1':90, 'test2': 45, 'test3': 67, 'test4': 74}
b = {'test1':32, 'test2': 45, 'test3': 82, 'test4': 100, 'test5': 1}
c = dict(sorted(chain(a.items(), b.items()), key=lambda t: t[1]))
assert c == {'test1': 90, 'test2': 45, 'test3': 82, 'test4': 100, 'test5': 1}