DataFrame display not as expected - python-3.x

I tried to color the title of the columns in purple, but what I got from the output doesn't seem aligned. Is there a way to fix it?
import pandas as pd
purple_text = '\033[35m'
reset = '\033[m'
list_1 = [12, 27, 33, 56, 11, 90]
list_2 = [43, 55, 76, 26, 84, 62]
df = pd.DataFrame({f'{purple_text} Numbers_1 {reset}': list_1,
f'{purple_text} Numbers_2 {reset}': list_2})
print(df.to_string(index=False))

Your issue comes from the fact that this formatting is making header text an incorrect size.
In order to remedy this, you should use display settings, this one works fine:
pd.set_option('display.colheader_justify', 'left')
Results:
Aligned

Related

Convert list with str elements to list with integer elements [duplicate]

This question already has an answer here:
How to convert str to int in Python list
(1 answer)
Closed 9 months ago.
I am importing data from csv file to python with code below:
import csv
excel = open(r"C:\Users\JP Dayao\Desktop\Python\Constraints Study\sample.csv")
raw = csv.reader(excel)
for i in raw:
print(i)
excel.close()
The output is below:
['21', '34', '25', '31', '27', '36', '24']
Desired output is:
[21, 34, 25, 31, 27, 36, 24]
Please help... thank you!
import csv
excel = open(r"C:\Users\JP Dayao\Desktop\Python\Constraints Study\sample.csv")
raw = csv.reader(excel)
raw_int = []
for i in raw:
for j in i:
raw_int.append(int(j))
print(raw_int)
excel.close()
I am assuming that your csv file has a line like 21,34,25,31,27,36,24. The csv file may contain multiple lines. So you can use list comprehension like this-
raw = csv.reader(excel)
print([int(x) for line in raw for x in line])
output for a single line csv file:
[21, 34, 25, 31, 27, 36, 24]
This is a nested list comprehension. The first one ... for line in raw will loop over all lines in csv file and the second one for x in line will loop over every element in that line and convert it to int. If more than one line present in csv line it will be converted into a list
The solution provided by #MohitC doesn't work because in that solution (i.e. [int(x) for x in raw]), x is itself a list containing elements present in a line and int() will not work on list. That is why you need another list comprehension
myList=['21', '34', '25', '31', '27', '36', '24']
newList=[int(i) for i in myList]
[21, 34, 25, 31, 27, 36, 24]
or
newList2=list(map(int,myList))
[21, 34, 25, 31, 27, 36, 24]

How to calculate WAPE for given dataframe in python

I want to know to how to calculate WAPE value if we have the dataframe in the below format.
I am using python. I need it for evaluating Forecasting.
According to Wikipedia, the WAPE (Weighted Absolute Percent Error) can be calculated by
dividing the sum of the absolute deviations by the total sales of all products.
In pandas, you can do that by using the - operator, which will work element-wise between series, combined with the abs() and sum() methods for series (and regular float division):
import pandas as pd
df = pd.DataFrame({'Actual': [23, 32, 44, 37, 48, 42, 39],
'Forecasted': [25, 30, 41, 34, 45, 47, 40]})
wape = (df.Actual - df.Forecasted).abs().sum() / df.Actual.sum()
print(wape)
0.07169811320754717

Merge two keys of a single dictionary in python

For a dictionary "a", with the keys "x, y and z" containing integer values.
What is the most efficient way to produce a joint list if I want to merge two keys in the dictionary (considering the size of the keys are identical and the values are of interger type)?
x+y and y+z ? .
Explanation:
Suppose you have to merge two keys and merge them into a new list or new dict without altering original dictionaries.
Example:
a = {"x" : {1,2,3,....,50}
"y" : {1,2,3,.....50}
"z" : {1,2,3,.....50}
}
Desired list:
x+y = [2,4,6,8.....,100]
y+z = [2,4,6,......,100]
A very efficient way is to do convert the dictionary to a pandas dataframe and allow it to do the job for you with its vectorized methods:
import pandas as pd
a = {"x" : range(1,51), "y" : range(1,51), "z" : range(1,51)}
df = pd.DataFrame(a)
x_plus_y = (df['x'] + df['y']).to_list()
y_plus_z = (df['y'] + df['z']).to_list()
print(x_plus_y)
#[2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100]
It seems like you're trying to mimic a join-type operation. That is not native to python dicts, so if you really want that type of functionality, I'd recommend looking at the pandas library.
If you just want to merge dict keys without more advanced features, this function should help:
from itertools import chain
from collections import Counter
from typing import Dict, List, Set, Tuple
def merge_keys(data: Dict[str, Set[int]], *merge_list: List[Tuple[str, str]]):
merged_data = dict()
merged_counts = Counter(list(chain(*map(lambda k: list(data.get(k, {})) if k in merge_list else [], data))))
merged_data['+'.join(merge_list)] = [k*v for k,v in merged_counts.items()]
return merged_data
You can run this with merge_keys(a, "x", "y", "z", ...), where a is the name of your dict- you can put as many keys as you want ("x", "y", "z", ...), since this function takes a variable number of arguments.
If you want two separate merges in the same dict, all you need to do is:
b = merge_keys(a, "x", "y") | merge_keys(a, "y", "z")
Note that the order of the keys changes the final merged key ("y+z" vs "z+y") but not the value of their merged sets.
P.S: This was actually a little tricky since the original dict had set values, not lists, which aren't ordered, so you can't just add them elementwise. That's why I used Counter here, in case you were wondering.

Pandas Dataframe plot not showing dates when matplotlib.dates used

I have the following code that plots COVID-19 confirmed cases country-wise against some dates.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.DataFrame({'Countries': ['Australia', 'India', 'UAE', 'UK'],
'3/1/20': [ 27, 3, 21, 36],
'3/2/20': [ 30, 5, 21, 40],
'3/3/20': [ 39, 5, 27, 51],
'3/4/20': [ 52, 28, 27, 86],
},
index = [0, 1, 2, 3])
print('Datframe:\n')
print(df)
dft=df.T
print('\n Transposed data:\n')
print(dft)
print(dft.columns)
dft.columns=dft.iloc[0]
dft=dft[1:]
print('\n Final data:\n')
print(dft)
dft.plot.bar(align='center')
# Set date ticks with 2-day interval
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=2))
# Change date format
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
''' Note: If I comment above two lines, I get back x-axis ticks. '''
# Autoformatting dates ticks
plt.gcf().autofmt_xdate()
plt.title('COVID-19 confirmed cases')
plt.show()
Here I intended to show the dates on the x-axis ticks with 2-day intervals and get the dates formatted in a different style. However, in the plot, I don't get any ticks and labels on the x-axis as shown in the figure below.
However, when I comment out the instructions with matplotlib.dates, I get back the x-ticks and labels.
Can this be explained and fixed in a simple way? Also, can we get the same result using fig, ax = plt.subplots()?
You were almost there. All you need to do is to restructure your dataframe. index the date. One way to do this is as follows;
Data
df = pd.DataFrame({'Countries': ['Australia', 'India', 'UAE', 'UK'],
'3/1/20': [ 27, 3, 21, 36],
'3/2/20': [ 30, 5, 21, 40],
'3/3/20': [ 39, 5, 27, 51],
'3/4/20': [ 52, 28, 27, 86],
},
index = [0, 1, 2, 3])
df2=df.set_index('Countries').T.unstack().reset_index()
df2#.plot(kind='bar')
df2.columns=['Countries','Date','Count']
df2['Date']=pd.to_datetime(df2['Date'])
df2.dtypes
Coarce date to datetime
df2.set_index('Date', inplace=True)
Groupby date, countries and unstack before plotting
df2.groupby([df2.index.date,df2['Countries']])['Count'].sum().unstack().plot.bar()
Outcome

Input missed data in DF with predicted data

I have a data 200 cols and 30k rows. I have a missing data and I'd like to predict it to fill in the missing data. I want to predict None values and put the predicted data there.
I want to split data by indexes, train model on Known data, predict Unknown values, join Known and Predicted values and return them back to data on exactly the same places.
P.S. Median, dropna and other methods are not interesting, just prediction of missed values.
df = {'First' : [30, 22, 18, 49, 22], 'Second' : [80, 28, 16, 56, 30], 'Third' : [14, None, None, 30, 27], 'Fourth' : [14, 85, 17, 22, 14], 'Fifth' : [22, 33, 45, 72, 11]}
df = pd.DataFrame(df, columns = ['First', 'Second', 'Third', 'Fourth'])
Same DF with all cols comleated by data.
I do not really understand your question as well but I might have an idea for you. Have a look at the fancyimpute package. This package offers you imputation methods based on predictive models (e.g. KNN). Hope this will solve your question.
It is hard to understand the question. However, it seems like you may be interested in this question and the answer.
Using a custom function Series in fillna
Basically (from the link), you would
create a column with predicted values
use fillna with that column as parameter

Resources