How to calculate WAPE for given dataframe in python - python-3.x

I want to know to how to calculate WAPE value if we have the dataframe in the below format.
I am using python. I need it for evaluating Forecasting.

According to Wikipedia, the WAPE (Weighted Absolute Percent Error) can be calculated by
dividing the sum of the absolute deviations by the total sales of all products.
In pandas, you can do that by using the - operator, which will work element-wise between series, combined with the abs() and sum() methods for series (and regular float division):
import pandas as pd
df = pd.DataFrame({'Actual': [23, 32, 44, 37, 48, 42, 39],
'Forecasted': [25, 30, 41, 34, 45, 47, 40]})
wape = (df.Actual - df.Forecasted).abs().sum() / df.Actual.sum()
print(wape)
0.07169811320754717

Related

Make predictions on a dataframe with list categorical columns and other types of data

I have a dataframe that looks like this:
df = {'user_id': [23, 34, 12, 9],
'car_id': [[22, 132, 999], [22, 345, 2], [134], [87, 44, 3, 222]],
'start_date': ['2012-02-17', '2013-11-22', '2013-11-22', '2014-03-15'],
'cat_col1': ['str1', 'str2', 'str3', 'str3'],
'cat_col2': [['str1', 'str2'], ['str4'], ['str5, str1'], ['str6', 'str2']],
'cat_col3': [['str11', 'str22', 'str34'], ['str444'], ['str51, str111'], ['str62', 'str233']],
'num_sold': [23, 43, 111, 23],
'to_predict': [0.4, 0.5, 0.22, 0.9]}
There are around 100 000 unique user_ids and 200 000 unique car_ids and categorical columns have thousands of unique values so OHE is not an option. I need to predict to_predict for a given value of cat_col1, cat_col2, cat_col3 (I need to have their original values at the end for predictions). There is a relationship between those categorical columns but it is not clearly defined. Is it possible to do this in keras with embedding layers perhaps and would that make sense for categorical columns? If so, would it make sense utilise the date column and convert it into time series using LSTMs? Or what would be the best approach for this kind of prediction in general?

Taking a 3*3 subset matrix from from a really large numpy ndarray in Python

I am trying to take a 3*3 subset from a really large 400 x 500 ndarray of numpy. But due to some reason, I am not getting the desired result. Rather it is taking the first three rows as a whole.
Here is the code that I wrote.
subset_matrix = mat[0:3][0:3]
But this is what I am getting in my output of my Jupyter Notebook
array([[91, 88, 87, ..., 66, 75, 82],
[91, 89, 88, ..., 68, 78, 84],
[91, 89, 89, ..., 72, 80, 87]], dtype=uint8)
mat[0:3][0:3] slice the axis 0 of the 2D array twice and is equivalent to mat[0:3]. What you need is mat[0:3,0:3].

DataFrame display not as expected

I tried to color the title of the columns in purple, but what I got from the output doesn't seem aligned. Is there a way to fix it?
import pandas as pd
purple_text = '\033[35m'
reset = '\033[m'
list_1 = [12, 27, 33, 56, 11, 90]
list_2 = [43, 55, 76, 26, 84, 62]
df = pd.DataFrame({f'{purple_text} Numbers_1 {reset}': list_1,
f'{purple_text} Numbers_2 {reset}': list_2})
print(df.to_string(index=False))
Your issue comes from the fact that this formatting is making header text an incorrect size.
In order to remedy this, you should use display settings, this one works fine:
pd.set_option('display.colheader_justify', 'left')
Results:
Aligned

Input missed data in DF with predicted data

I have a data 200 cols and 30k rows. I have a missing data and I'd like to predict it to fill in the missing data. I want to predict None values and put the predicted data there.
I want to split data by indexes, train model on Known data, predict Unknown values, join Known and Predicted values and return them back to data on exactly the same places.
P.S. Median, dropna and other methods are not interesting, just prediction of missed values.
df = {'First' : [30, 22, 18, 49, 22], 'Second' : [80, 28, 16, 56, 30], 'Third' : [14, None, None, 30, 27], 'Fourth' : [14, 85, 17, 22, 14], 'Fifth' : [22, 33, 45, 72, 11]}
df = pd.DataFrame(df, columns = ['First', 'Second', 'Third', 'Fourth'])
Same DF with all cols comleated by data.
I do not really understand your question as well but I might have an idea for you. Have a look at the fancyimpute package. This package offers you imputation methods based on predictive models (e.g. KNN). Hope this will solve your question.
It is hard to understand the question. However, it seems like you may be interested in this question and the answer.
Using a custom function Series in fillna
Basically (from the link), you would
create a column with predicted values
use fillna with that column as parameter

map python list of months to values

I have 2 lists whereby the sequence of values in the second list map to the months in the first list:
['Apr-16', 'Jul-16', 'Dec-15', 'Sep-16', 'Aug-16', 'Feb-16', 'Mar-16', 'Jan-16', 'May-16', 'Jun-16', 'Oct-15', 'Nov-15']
[15, 15, 6, 81, 60, 36, 6, 18, 36, 27, 24, 29]
I need to retain 2 seperate lists for use in another function. Using python how do I achieve sorting the lists into monthly order whilst retaining the existing mapping of values to months?
The idea is to
associate both lists
sort the resulting list of couples according to the year/month criteria (months must be converted as month indexes first using an auxiliary dictionary)
then separate the list of couples back to 2 lists, but now sorted according to date.
Here's a commented code which does what you want, maybe not the most compact or academic but works and is simple enough.
a = ['Apr-16', 'Jul-16', 'Dec-15', 'Sep-16', 'Aug-16', 'Feb-16', 'Mar-16', 'Jan-16', 'May-16', 'Jun-16', 'Oct-15', 'Nov-15']
b = [15, 15, 6, 81, 60, 36, 6, 18, 36, 27, 24, 29]
# create a dictionary key=month, value=month index
m = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
monthdict = dict(zip(m,range(len(m))))
# the sort function: returns sort key as (year as integer,month index)
def date_sort(d):
month,year = d[0].split("-")
return int(year),monthdict[month]
# zip both lists together and apply sort
t = sorted(zip(a,b),key=date_sort)
# unzip lists
asort = [e[0] for e in t]
bsort = [e[1] for e in t]
print(asort)
print(bsort)
result:
['Oct-15', 'Nov-15', 'Dec-15', 'Jan-16', 'Feb-16', 'Mar-16', 'Apr-16', 'May-16', 'Jun-16', 'Jul-16', 'Aug-16', 'Sep-16']
[24, 29, 6, 18, 36, 6, 15, 36, 27, 15, 60, 81]

Resources