Dictionary value travel - python-3.x

I need to travel values of the dictionary in a single for loop to add and compare values. Here is my dictionary.
I am aware of iterating dictionary based on key but not sure if we can iterate based on an index of values of the dictionary.
dict = {
0: [1,2,3,4,7]
1: [2,4,6,9,0]
2: [4,6,8,2,1]
}
if I iterate over key-value, it will not help me. I need to execute for loop for each index of the value list.
Assumption: length of list for each dictionary key will be the same.
Total highest_class
7 2
12 2
17 2
15 1
8 0
Basically, sum all number from the value of the dictionary at the same index and also return which key has the highest number.

import pandas as pd
dict = {
0: [1,2,3,4,7],
1: [2,4,6,9,0],
2: [4,6,8,2,1]
}
data = [(sum(item),item.index(max(item))) for item in list(zip(*dict.values()))]
df = pd.DataFrame(data, columns =['Total', 'highest_class'])
print (df)
output:
Total highest_class
0 7 2
1 12 2
2 17 2
3 15 1
4 8 0
EDIT:
data = [(sum(item),item.index(max(item))) for item in list(zip(*dict.values()))]
print ('Total', 'highest_class')
for item in data:
print ("{:<10} {:<10}".format(item[0], item[1]))
output:
Total highest_class
7 2
12 2
17 2
15 1
8 0

As much as I understood what you want. This might help.
mydict = {
0: [1,2,3,4,7],
1: [20,4,6,9,0],
2: [4,6,8,2,1]
}
maxElement={"highest_class":None,
"Total":-1
}
for key,value in mydict.items():
print(sum(value) , maxElement["Total"])
if sum(value) > maxElement["Total"]:
maxElement["highest_class"] = key
maxElement["Total"] = sum(value)
print(maxElement)

Related

Finding biggest element index in a dictionary?

Sample test cases:
Input
3 ------> length of dictionary
abcd 5 3 6 2 //dict elements (key=list[0], value=list[1:])
ok 1 8 5 3 //
best 9 1 3 5 //
Expected output
abcd 2 ----> key and index of max ele of list
ok 1
best 0
are you looking for the just the highest key ?
d = {15:3, 4:1, 11:2, 14:9}
print(max(d)) #gives max key
print(d[max(d)]) # gives max value

How to extract data from data frame when value of column change

I want to extract part of the data frame when value change from 0 to 1.
logic1: when value change from 0 to 1, start to save data until value again change to 0. (also points before 1 and after 1)
logic2: when value change from 0 to 1, start to save data until value again change to 0. (don't need to save points before 1 and after 1)
only save data when the first time value of flag change from 0 to 1, after this if again value change from 0 to 1 don't need to do anything
df=pd.DataFrame({'value':[3,4,7,8,11,1,15,20,15,16,87],'flag':[0,0,0,1,1,1,0,0,1,1,0]})
Desired output:
df_out_1=pd.DataFrame({'value':[7,8,11,1,15]})
Desired output:
df_out_2=pd.DataFrame({'value':[8,11,1]})
Idea is get consecutive groups of 1 and 0 consecutive groups to s, filter only 1 groups and get first 1 group by compare by minimal value:
df = df.reset_index(drop=True)
s = df['flag'].ne(df['flag'].shift()).cumsum()
m = s.eq(s[df['flag'].eq(1)].min())
df2 = df.loc[m, ['value']]
print (df2)
value
3 8
4 11
5 1
And then filter values with aff and remove 1 to default RangeIndex:
df1 = df.loc[(df2.index + 1).union(df2.index - 1), ['value']]
print (df1)
value
2 7
3 8
4 11
5 1
6 15

Highest frequency in a dataframe

I am looking for a way to get the highest frequency in the entire pandas, not in a particular column. I have looked at value count, but it seems that works in a column specific way. Any way to do that?
Use DataFrame.stack with Series.mode for top values, for first select by position:
df = pd.DataFrame({
'B':[4,5,4,5,4,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
})
a = df.stack().mode().iat[0]
print (a)
4
Or if need also frequency is possible use Series.value_counts:
s = df.stack().value_counts()
print (s)
4 6
5 4
3 3
9 2
7 2
2 2
1 2
8 1
6 1
0 1
dtype: int64
print (s.index[0])
4
print (s.iat[0])
6

adding 1 to the previous row based on conditions

I have a pandas dataframe like below:
data=[['A',1,30],
['A',1,2],
['A',0,4],
['A',1,4],
['B',0,5],
['B',1,1],
['B',0,5],
['B',1,8]]
df = pd.DataFrame(data,columns=['group','var_1','var_2'])
I want to create a series of values with index based on below condition:
Step 1) Increment should always happen from 1st row of 'var_2'of each group. For example: for group A, the increment should start from 30 and for group B,
increment should start from 5
Step 2) Incremented value where 'var_1" = 1
My desired output:
0 30
1 31
3 32
5 6
7 7
IIUC:
#Get first index in each group and union index where var_1 ==1
indx = df.drop_duplicates('group').index.union(df[(df['var_1']==1)].index)
#Reindex dataframe group by group, add cusum value to other present values in group.
#Use .loc to filter where var_1 != 0 and get column var_2
df.reindex(indx).groupby('group')\
.transform(lambda x: x.iloc[0] + x.shift().notna().cumsum())\
.loc[lambda x: x.var_1 !=0, 'var_2']
Output:
0 30
1 31
3 32
5 6
7 7
Name: var_2, dtype: int64
Try groupby cumcount and first
df1 = df.loc[df.var_1.eq(1)]
g = df1.groupby('group')['var_2']
g.transform('first') + g.cumcount()
Out[66]:
0 30
1 31
3 32
5 1
7 2
dtype: int64
Or use duplicated with df.where and cumsum
df1 = df.loc[df.var_1.eq(1)]
df1.var_2.where(~df1.duplicated('group'), 1).groupby(df1.group).cumsum()
Out[77]:
0 30
1 31
3 32
5 1
7 2
Name: var_2, dtype: int64

if specific value/string occurs in the entire dataframe I want to sum its index values

i have a dataframe in which I need to find a specific image name in the entire dataframe and sum its index values every time they are found. SO my data frame looks like:
c 1 2 3 4
g
0 180731-1-61.jpg 180731-1-61.jpg 180731-1-61.jpg 180731-1-61.jpg
1 1209270004-2.jpg 180609-2-31.jpg 1209270004-2.jpg 1209270004-2.jpg
2 1209270004-1.jpg 180414-2-38.jpg 180707-1-31.jpg 1209050002-1.jpg
3 1708260004-1.jpg 1209270004-2.jpg 180609-2-31.jpg 1209270004-1.jpg
4 1108220001-5.jpg 1209270004-1.jpg 1108220001-5.jpg 1108220001-2.jpg
I need to find the 1209270004-2.jpg in entire dataframe. And as it is found at index 1 and 3 I want to add the index values so it should be
1+3+1+1=6.
I tried the code:
img_fname = '1209270004-2.jpg'
df2 = df1[df1.eq(img_fname).any(1)]
sum = int(np.sum(df2.index.values))
print(sum)
I am getting the answer of sum 4 i.e 1+3=4. But it should be 6.
If the string occurence is only once or twice or thrice or four times like for eg 180707-1-31 is in column 3. then the sum should be 45+45+3+45 = 138. Which signifies that if the string is not present in the dataframe take vallue as 45 instead the index value.
You can multiple boolean mask by index values and then sum:
img_fname = '1209270004-1.jpg'
s = df1.eq(img_fname).mul(df1.index.to_series(), 0).sum()
print (s)
1 2
2 4
3 0
4 3
dtype: int64
out = np.where(s == 0, 45, s).sum()
print (out)
54
If dataset does not have many columns, this can also work with your original question
df1 = pd.DataFrame({"A":["aa","ab", "cd", "ab", "aa"], "B":["ab","ab", "ab", "aa", "ab"]})
s = 0
for i in df1.columns:
s= s+ sum(df1.index[df1.loc[:,i] == "ab"].tolist())
Input :
A B
0 aa ab
1 ab ab
2 cd ab
3 ab aa
4 aa ab
Output :11
Based on second requirement:

Resources