how to count how many values in a dictionary - python-3.x

Hello I'm new to python so I'm sure this is a simple answer but I'm trying to find how to count the number of values in a certain key in a dictionary from an input.
I've tried something like:
fruits = {'apple' : 1, 2, 3 , 'banana' : 4, 5, 6}
search_fruit = input('Enter the fruit name:')
count = len(fruits[search_fruit])
print(f'{search_fruit} has {count} values that are {fruits[search_fruit]}')
the output should be for example:
apple has 3 values that are 1 2 3
but instead i'm getting:
apple has 5 values that are 1 2 3

Values should be either tuples (1, 2, 3), or lists [1, 2, 3]. The input function is not practical. An input variable works best. Hope this helps.!
input_key = "apple"
fruits = {'apple' : [1, 2, 3] , 'banana' : [4, 5, 6]}
length = len(fruits[input_key])
print(length)

Related

How to structure vlookup calculations in python?

I am having trouble , understanding how to go about structuring an excel model that is basically a bunch of top-down customer calculations, in python.
Excel calcs
excel model consists of many worksheets that look up a bunch of values and from different worksheets and perform calculations on a customer level.
each customer starts of with an amount, a starting year ,an end year, and a starting state.
example of calculation in excel:
customer 1:
amount 100 , at starting state B , and starting year 3.
Multiplied by matrix (worksheet1)
the matrix consists of 10 3d arrays with states 5, (A-E).Each of the 10 3d arrays represent a year (1-10)
I multiply the amount 100 by the matrix at year and get an array, [800,650,400,300,840]
I then take this array and do another vlookup calculation, from another worksheet.
example limits (worksheet2). Which consists of years and limit %.
Year|Limit%
| 0.32
| 0.23
| 0.11
| 0.21
I vlook-up the customers year , year 3 in this case and then multiply [800,650,400,300,840] * 0.11
I then need to do a few more vlookup calculations like the one above.
after that
I need to multiply the result by the matrix at year 4, then do the vlookup calcs for year 4 like i did year 3, and basically continue until year 10 is reached.
It is very difficult to understand what the data looks like form your description. However, I would suggest creating pd.DataFrame and pd.Series of the constant data, with the identifier as the index value. Then you can use .loc() to retrieve the relevant row and use this data. If you returned this into your function until n = 10, you could calculate then return the final value.
A simple example that might help you start:
matrix1 has columns as states ("A" and "B"), rows as years (1, 2, 3, 4, 5).
matrix2 has rows as year (1, 2, 3, 4, 5), and the data is the "limit%".
Code:
import pandas as pd
# matrix1 = pd.DataFrame({'A': {1: 1, 2: 2, 3: 3, 4: 4, 5: 5},
# 'B': {1: 2, 2: 3, 3: 4, 4: 5, 5: 6}})
matrix1 = pd.DataFrame(data=np.random.rand(5, 5, 5).tolist(),
columns=['A', 'B', 'C', 'D', 'E'],
index=[1, 2, 3, 4, 5])
matrix2 = pd.Series({1: 0.32, 2: 0.23, 3: 0.11, 4: 0.21, 5: 0.2})
def calculation(amount, year, starting_state, n, calcs={}):
"""
This is the function that calculates everything.
:input amount: initial amount as int
:input year: starting year as int
:input starting_state: state in ["A", "B"] as str
:input n: (end_year - starting_year) where end_year <= 5 and n > 0 as int
:input calcs: dictionary of calculations, starting as empty as dictionary
:return: final amount
"""
#when there are no more years remaining
if n == 0:
#return amount
return calcs
#multiply by matrix1
amount *= np.asarray(matrix1.loc[year, starting_state])
#multiply by matrix2
amount *= matrix2.loc[year]
# more calculations ...
#add end result to dictionary
calcs[year] = amount
#return the new data to the function
return calculation(amount, year+1, starting_state, n-1, calcs)
calculation(10, 2, "A", 5-2)
#Out: 1.27512
Running through the interations:
#10 * 2 = 20
#20 * 0.23 = 4.6
#return amount=4.6, year=3, "A", 3-1
#4.6 * 3 = 13.8
#13.8 * 0.11 = 1.518
#return amount=1.518, year=4, "A", 2-1
#1.518 * 4 = 6.072
#6.072 * 0.21 = 1.27512
#return amount=1.27512, year=5, "A", 1-1
#as n in now 0
#return 1.27512
If you have an initial dataframe with the input data, you can then append the end result to the data:
people = pd.DataFrame({"amount": [10, 11, 12, 13],
"starting_year": [2, 2, 1, 3],
"end_year": [5, 5, 4, 5],
"state": ["A", "A", "B", "A"]})
people["output"] = people.apply(lambda x: calculation(
x["amount"], x["starting_year"],
x["state"], x["end_year"] - x["starting_year"]), axis=1)
people
#Out:
# amount starting_year end_year state output
#0 10 2 5 A 1.275120
#1 11 2 5 A 1.402632
#2 12 1 4 B 2.331648
#3 13 3 5 A 3.603600
EDIT
The changes made to reflect your additions:
matrix1 is now a 3-dimensional array to dataframe. This has changed the creation of the dataframe (obviously), and the multiplication of matrix1 needs to convert the returned list to an np.array so that it can be multiplied.
There is now an additional input to the function (which has a default of {} is no argument is passed, which is what you would want at the start. Then in the last line of the function before the return I have added calcs[year] = amount which appends the year and the array for that year to the dictionary. This means that the output of running for people is now a column of dictionaries. If you want to expand this to columns for each year, you can add a line afterwards: people = pd.concat([people, people["output"].apply(pd.Series)], axis=1).

Check if all list values in dataframe column are the same [duplicate]

If the type of a column in dataframe is int, float or string, we can get its unique values with columnName.unique().
But what if this column is a list, e.g. [1, 2, 3].
How could I get the unique of this column?
I think you can convert values to tuples and then unique works nice:
df = pd.DataFrame({'col':[[1,1,2],[2,1,3,3],[1,1,2],[1,1,2]]})
print (df)
col
0 [1, 1, 2]
1 [2, 1, 3, 3]
2 [1, 1, 2]
3 [1, 1, 2]
print (df['col'].apply(tuple).unique())
[(1, 1, 2) (2, 1, 3, 3)]
L = [list(x) for x in df['col'].apply(tuple).unique()]
print (L)
[[1, 1, 2], [2, 1, 3, 3]]
You cannot apply unique() on a non-hashable type such as list. You need to convert to a hashable type to do that.
A better solution using the latest version of pandas is to use duplicated() and you avoid iterating over the values to convert to list again.
df[~df.col.apply(tuple).duplicated()]
That would return as lists the unique values.

How can I iterate through a DataFrame with a conditional to reorganize my data?

I have a DataFrame in the following format, and I would like to rearrange it based on a conditional using one of the columns of data.
My current DataFrame has the following format:
df.head()
Room Temp1 Temp2 Temp3 Temp4
R1 1 2 1 3
R1 2 3 2 4
R1 3 4 3 5
R2 1 1 2 2
R2 2 2 3 3
...
R15 1 1 1 1
I would like to 'pivot' this DataFrame to look like this:
Room
R1 = [1, 2, 3, 2, 3, 4, 1, 2, 3, 3, 4, 5]
R2 = [1, 2, 1, 2, 2, 3, 2, 3]
...
R15 = [1, 1, 1, 1,]
Where:
R1 = Temp1 + Temp2 + Temp3
So that:
R1 = [1, 2, 3, 2, 3, 4, 1, 2, 3, 3, 4, 5]
First: I have tried creating a list of each column using the 'where' conditional in which Room = 'R1'
room1 = np.where(df["Room"] == 'R1', df["Temp1"], 0).tolist()
It works, but I would need to do this individually for every column, of which there are many more than 4 in my other datasets.
Second: I tried to iterate through them:
i = ['Temp1', 'Temp2', 'Temp3', 'Temp4']
room1= []
for i in df[i]:
for row in df["Room"]:
while row == "R1":
...and this is where I get very lost. Where do I go next? How can I iterate through the rest of the columns and end up with the DataFrame I have above?
This should work (although it's not very efficient and will be slow on a big DataFrame):
results = {} # dict to store results
cols = ['Temp1', 'Temp2', 'Temp3', 'Temp4']
for r in df['Room'].unique():
room_list = []
sub_frame = df[df['Room'] == r]
for col in cols:
sub_col = sub_frame[col]
for val in sub_col:
room_list.append(val)
results[r] = room_list
results will be stored in the result dict, so you can access, say, R1 with:
results['R1']
Usually iterating over DataFrames is a bad idea though, I'm sure there's a better solution!
I found the answer!
The trick is to use the .pivot() function to rearrange the columns accordingly. I had an additonal column called 'Time' which I did not include in the original post thinking it was not relevant to the solution.
What I ended up doing is pivoting the table based on Columns and Values using index as the rooms:
df = df.pivot(index = "Room", columns = "Time", values = ["Temp1", "Temp2", "Temp3", "Temp4"]
Thank you to those who helped me on the way!

How can I drop rows in data frames which contains empty lists?

I have created a data frame with 3 columns, the third one contains lists, I want to drop rows that contains an empty list in that cell.
I have tried with
df[df.numbers == []] and df[df.numbers == null]
but nothing works.
name country numbers
Lewis Spain [1,4,6]
Nora UK []
Andrew UK [3,5]
The result will be a data frame without Nora's row
Umm check bool
df[df.numbers.astype(bool)]
Use series.str.len() to check the length of elements in the list and then filter out where it equals 0:
df[~df.numbers.str.len().eq(0)]
name country numbers
0 Lewis Spain [1, 4, 6]
2 Andrew UK [3, 5]
just check len > 0
df[df['numbers'].str.len()>0]
Using the idea that the result of any list multiplied by 0 gives an empty list, one way to do this is:
In [29]: df[df.numbers != df.numbers * 0]
Out[29]:
name numbers country
0 Lewis [1, 4, 6] Spain
2 Andrew [3, 5] UK
One way to do it is to create a new column containing the length of df.numbers by:
df['len'] = df.apply(lambda row: len(row.numbers), axis=1)
and then filter by that column by doing:
df[df.len > 0]
Let's say your data is set up like this:
import pandas as pd
df = pd.DataFrame([{'name': "Lewis", 'country': "Spain", "numbers": [1,4,6]},
{'name': "Nora", 'country': "UK", "numbers": []},
{'name': "Andrew", 'country': "UK", "numbers": [3,5]}])
You could iterate over the dataframe and add only the rows that don't have an empty numbers array to a new dataframe called "newDF". For example:
newDFArray = []
for index, row in df.iterrows():
emptyArrayCheck = row["numbers"]
if len(emptyArrayCheck) > 0:
newDFArray.append(row)
newDF = pd.DataFrame(newDFArray)
newDF
This will yield:
country name numbers
0 Spain Lewis [1, 4, 6]
2 UK Andrew [3, 5]

How to sort a list in Python using two attribute values?

Suppose I have a list named L and two attribute dictionaries named arr1 and arr2, whose keys are the elements of the list L. Now I want to sort L in the following manner.
L should be sorted in ascending order by virtue of the attribute values present in arr1.
If two elements i and j of L have same attribute arr1, i,e, if arr1[i] and arr[j] are equal, then we should look for the attribute values in arr2.
To give an example, suppose
L=[0,1,2,3,4,5,6]
arr1={0:30,1:15,2:15,3:20,4:23,5:20,6:35}
arr2={0:6,1:8,2:6,3:17,4:65,5:65,6:34}
Sorted L should be [2,1,3,5,4,0,6], ordering between 1 and 2 is decided by arr2, so does the ordering between 3 and 5. Rest of the ordering are decided by arr1.
Simply use a tuple with the values from arr1 and arr2 as sort key:
L.sort(key=lambda x: (arr1[x], arr2[x]))
# [2, 1, 3, 5, 4, 0, 6]
This is different from your expected result in the ordering of 5 and 3, which would be consistent if the sort order would be descending based on arr2:
L.sort(key=lambda x: (arr1[x], -arr2[x]))
# [1, 2, 5, 3, 4, 0, 6]
But now the ordering of 1, 2 is different, your example doesn't seem to be ordered in a consistent way.

Resources