Visual Studio Code autocomplete comma to '%%!' - python-3.x

While using the Jupyter extension in VS Code, for some reason, every time I type a comma, VS Code suggests %%! which means I have to hit esc every time in order to make comma separated lists over multiple lines. Can anyone tell me why this is happening or how to stop it? It doesn't happen in a blank notebook, but after running two cells it's back again.
import pandas as pd
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import datetime as dt
%matplotlib inline
sns.set()
def open_hbal(file):
df_balance = pd.read_excel(file) #open xlsx
t_0 = dt.datetime(2018, 1, 1) #set start point for time series
#add Date and time in fraction steps starting from t_0
df_balance["Date and Time"] = t_0 + pd.to_timedelta(df_balance["Time"],
unit='h')
#convert to datetime object
df_balance["Date and Time"] = pd.to_datetime(df_balance["Date and Time"])
#replace Index with Date and Time. inplace overwrites df
df_balance.set_index("Date and Time", inplace=True)
#remove Time column as it is no longer needed, axis 0 = row, 1 = column
df_balance.drop("Time", axis=1, inplace=True)
#df_balance = df_balance / 1000 # convert to kWh
#replace units in all columns
#df_balance.columns = df_balance.columns.str.replace(", W", ", kWh")
df_balance.rename(columns = {"Net losses, W" :"_Net losses, W"},
inplace = True)
return(df_balance)

In case anyone else lands here, other suggested solutions did not work for me on v2021.10.1101450599. As per this issue, rolling back to v2021.8.1236758218 removes the problem until it gets fixed.

Related

matplotlib not displaying all axis values

I have a small program that is plotting some data. The program runs without any errors and displays the plot, but it is removing every other x-axis value. What should I be doing to get all twelve axis labels to display properly?
The program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas.plotting import register_matplotlib_converters
print('NumPy: {}'.format(np.__version__))
print('Pandas: {}'.format(pd.__version__))
print('-----')
display_settings = {
'max_columns': 14,
'expand_frame_repr': False, # Wrap to multiple pages
'max_rows': 50,
'show_dimensions': False
}
pd.options.display.float_format = '{:,.2f}'.format
for op, value in display_settings.items():
pd.set_option("display.{}".format(op), value)
file = "e:\\python\\dataquest\\unrate.csv"
unrate = pd.read_csv(file)
print(unrate.shape, '\n')
unrate['DATE'] = pd.to_datetime(unrate['DATE'])
print(unrate.info(), '\n')
print(unrate.head(12), '\n')
register_matplotlib_converters()
plt.xticks(rotation=90)
plt.plot(unrate['DATE'][0:12], unrate['VALUE'][0:12])
plt.show()
I am getting as output: (I am using PyCharm)
I believe I should be getting: (From Dataquests built in IDE)
#Quang Hong, You were on the right track. I had to adjust the interval value and the date format as follows:
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
plt.gca().xaxis.set_major_locator(mdates.DayLocator(interval=30))
Now I get this output:

How to fix 'RuntimeError: Locator ... exceeds Locator.MAXTICKS - matplotlib'

I'm plotting camapign data on a timeline, where only the time (rather than the date) sent is relevant hance the column contains only time data (imported from a csv)
It displays the various line graphs (spaghetti plot) however, when I want to add the labels to the x axis, I receive
RuntimeError: Locator attempting to generate 4473217 ticks from 30282.0 to 76878.0: exceeds Locator.MAXTICKS
I have 140 rows of data for this test file, the times are between 9:05 and 20:55 and my code is supposed to get a tick for every 15 minutes.
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
pandas: 0.23.4
matplotlib: 3.0.2
My actual code looks like:
import pandas
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
file_name = r'''C:\\Users\\A_B_testing.csv'''
df1 = pandas.read_csv(file_name, encoding='utf-8')
df_Campaign1 = df1[df1['DataSource ID'].str.contains('Campaign1')==True]
Campaign1_times = df_Campaign1['time sent'].tolist()
Campaign1_revenue = df_Campaign1['EstValue/sms'].tolist()
Campaign1_times = [datetime.strptime(slot,"%H:%M").time() for slot in Campaign1_times]
df_Campaign2 = df1[df1['DataSource ID'].str.contains('Campaign2')==True]
Campaign2_times = df_Campaign2['time sent'].tolist()
Campaign2_revenue = df_Campaign2['EstValue/sms'].tolist()
Campaign2_times = [datetime.strptime(slot,"%H:%M").time() for slot in Campaign2_times]
fig, ax = plt.subplots(1, 1, figsize=(16, 8))
xlocator = mdates.MinuteLocator(byminute=None, interval=15) # tick every 15 minutes
xformatter = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_locator(xlocator)
ax.xaxis.set_major_formatter(xformatter)
ax.minorticks_off()
plt.grid(True)
plt.plot(Campaign1_times, Campaign1_revenue, c = 'g', linewidth = 1)
plt.plot(Campaign2_times, Campaign2_revenue, c = 'y', linewidth = 2)
plt.show()
I tired to reduce the number of values to be plotted and it worked fine on a dummy set as follows:
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
from matplotlib.dates import HourLocator, MinuteLocator, DateFormatter
from datetime import datetime
fig, ax = plt.subplots(1, figsize=(16, 6))
xlocator = MinuteLocator(interval=15)
xformatter = DateFormatter('%H:%M')
ax.xaxis.set_major_locator(xlocator)
ax.xaxis.set_major_formatter(xformatter)
ax.minorticks_off()
plt.grid(True, )
xvalues = ['9:05', '10:35' ,'12:05' ,'12:35', '13:05']
xvalues = [datetime.strptime(slot,"%H:%M") for slot in xvalues]
yvalues = [2.2, 2.4, 1.7, 3, 2]
zvalues = [3.2, 1.4, 1.8, 2.7, 2.2]
plt.plot(xvalues, yvalues, c = 'g')
plt.plot(xvalues, zvalues, c = 'b')
plt.show()
So I think that issue is related to the way I'm declaring the ticks, tried to find a relevant post here on but none has solved my problem. Can anyone please point me to the right direction? Thanks in advance.
I had a similar issue which got fixed by using datetime objects instead of time objects in the x-axis.
Similarly, in the code of the question, using the full datetime instead of just the time should fix the issue.
replace:
[datetime.strptime(slot,"%H:%M").time() for slot in ...
by:
[datetime.strptime(slot,"<full date format>") for slot in

Keyerror when adding a column to a Dataframe (Pandas)

Pandas DataFrame is not really accepting adding a second column, and I cannot really troubleshoot the issue. I am trying to display Moving Averages. The code works fine just for the first one (MA_9), and gives me error as soon I try to add additional MA (MA_20).
Is it not possible in this case to add more than one column?
The code:
import numpy as np
import pandas as pd
import pandas_datareader as pdr
import matplotlib.pyplot as plt
symbol = 'GOOG.US'
start = '20140314'
end = '20180414'
google = pdr.DataReader(symbol, 'stooq', start, end)
print(google.head())
google_close = pd.DataFrame(google.Close)
print(google_close.last_valid_index)
google_close['MA_9'] = google_close.rolling(9).mean()
google_close['MA_20'] = google_close.rolling(20).mean()
# google_close['MA_60'] = google_close.rolling(60).mean()
# print(google_close)
plt.figure(figsize=(15, 10))
plt.grid(True)
# display MA's
plt.plot(google_close['Close'], label='Google_Cls')
plt.plot(google_close['MA_9'], label='MA 9 day')
plt.plot(google_close['MA_20'], label='MA 20 day')
# plt.plot(google_close['MA_60'], label='MA 60 day')
plt.legend(loc=2)
plt.show()
Please update your code as below and then it should work:
google_close['MA_9'] = google_close.Close.rolling(9).mean()
google_close['MA_20'] = google_close.Close.rolling(20).mean()
Initially there was only one column data of Close so your old code google_close['MA_9'] = google_close.rolling(9).mean() worked but after this line of code now it has two column and so it does not know which data you are trying to mean. So updating with the column details of data you wanted to mean, it works google_close['MA_20'] = google_close.Close.rolling(20).mean()

negative forcasts using facebook prophet

I have a daily time series data for almost 2 years for cluster available space (in GB). I am trying to to use facebook's prophet to do future forecasts. Some forecasts have negative values. Since negative values does not make sense I saw that using carrying capacity for logistic growth model helps in eliminating negative forecasts with cap values. I am not sure if this is applicable for this case and how to get the cap value for my time series. Please help as I am new to this and confused. I am using Python 3.6
import numpy as np
import pandas as pd
import xlrd
import openpyxl
from pandas import datetime
import csv
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
from fbprophet import Prophet
import os
import sys
import signal
df = pd.read_excel("Data_Per_day.xlsx")
df1=df.filter(['cluster_guid','date','avail_capacity'],axis=1)
uniquevalues = np.unique(df1[['cluster_guid']].values)
for id in uniquevalues:
newdf = df1[df1['cluster_guid'] == id]
newdf1=newdf.groupby(['cluster_guid','date'],as_index=False['avail_capacity'].sum()
#newdf11=newdf.groupby(['cluster_guid','date'],as_index=False)['total_capacity'].sum()
#cap[id]=newdf11['total_capacity'].max()
#print(cap[id])
newdf1.set_index('cluster_guid', inplace=True)
newdf1.to_csv('my_csv.csv', mode='a',header=None)
with open('my_csv.csv',newline='') as f:
r = csv.reader(f)
data = [line for line in r]
with open('my_csv.csv','w',newline='') as f:
w = csv.writer(f)
w.writerow(['cluster_guid','DATE_TAKEN','avail_capacity'])
w.writerows(data)
in_df = pd.read_csv('my_csv.csv', parse_dates=True, index_col='DATE_TAKEN' )
in_df.to_csv('my_csv.csv')
dfs= pd.read_csv('my_csv.csv')
uni=dfs.cluster_guid.unique()
while True:
try:
print(" Press Ctrl +C to exit or enter the cluster guid to be forcasted")
i=input('Please enter the cluster guid')
if i not in uni:
print( 'Please enter a valid cluster guid')
continue
else:
dfs1=dfs.loc[df['cluster_guid'] == i]
dfs1.drop('cluster_guid', axis=1, inplace=True)
dfs1.to_csv('dataframe'+i+'.csv', index=False)
dfs2=pd.read_csv('dataframe'+i+'.csv')
dfs2['DATE_TAKEN'] = pd.DatetimeIndex(dfs2['DATE_TAKEN'])
dfs2 = dfs2.rename(columns={'DATE_TAKEN': 'ds','avail_capacity': 'y'})
my_model = Prophet(interval_width=0.99)
my_model.fit(dfs2)
future_dates = my_model.make_future_dataframe(periods=30, freq='D')
forecast = my_model.predict(future_dates)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']])
my_model.plot(forecast,uncertainty=True)
my_model.plot_components(forecast)
plt.show()
os.remove('dataframe'+i+'.csv')
os.remove('my_csv.csv')
except KeyboardInterrupt:
try:
os.remove('my_csv.csv')
except OSError:
pass
sys.exit(0)
Box-Cox transform of order 0 get the trick done. Here are the steps:
1. Add 1 to each values (so as to avoid log(0))
2. Take natural log of each value
3. Make forecasts
4. Take exponent and subtract 1
This way you will not get negative forecasts. Also log have a nice property of converting multiplicative seasonality to additive form.

Timeserie datetick problems when using pandas.DataFrame.plot method

I just discovered something really strange when using plot method of pandas.DataFrame. I am using pandas 0.19.1. Here is my MWE:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
fig, axe = plt.subplots()
x.plot(ax=axe)
plt.show(axe)
xt = axe.get_xticks()
When I try to format my xticklabels I get strange beahviours, then I insepcted objects to understand and I have found the following:
t[-1] - t[0] = Timedelta('7 days 00:00:00'), confirming the DateTimeIndex is what I expect;
xt = [175320, 175488], xticks are integers but they are not equals to a number of days since epoch (I do not have any idea about what it is);
xt[-1] - xt[0] = 168 there are more like index, there is the same amount that len(x) = 169.
This explains why I cannot succed to format my axe using:
axe.xaxis.set_major_locator(mdates.HourLocator(byhour=(0,6,12,18)))
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
The first raise an error that there is to many ticks to generate
The second show that my first tick is Fri 00:00 but it should be Mon 00:00 (in fact matplotlib assumes the first tick to be 0481-01-03 00:00, oops this is where my bug is).
It looks like there is some incompatibility between pandas and matplotlib integer to date conversion but I cannot find out how to fix this issue.
If I run instead:
fig, axe = plt.subplots()
axe.plot(x)
axe.xaxis.set_major_formatter(mdates.DateFormatter("%a %H:%M"))
plt.show(axe)
xt = axe.get_xticks()
Everything works as expected but I miss all cool features from pandas.DataFrame.plot method such as curve labeling, etc. And here xt = [726468. 726475.].
How can I properly format my ticks using pandas.DataFrame.plot method instead of axe.plot and avoiding this issue?
Update
The problem seems to be about origin and scale (units) of underlying numbers for date representation. Anyway I cannot control it, even by forcing it to the correct type:
t = pd.date_range('1990-01-01', '1990-01-08', freq='1H', origin='unix', units='D')
There is a discrepancy between matplotlib and pandas representation. And I could not find any documentation of this problem.
Is this what you are going for? Note I shortened the date_range to make it easier to see the labels.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
t = pd.date_range('1990-01-01', '1990-01-04', freq='1H')
x = pd.DataFrame(np.random.rand(len(t)), index=t)
# resample the df to get the index at 6-hour intervals
l = x.resample('6H').first().index
# set the ticks when you plot. this appears to position them, but not set the label
ax = x.plot(xticks=l)
# set the display value of the tick labels
ax.set_xticklabels(l.strftime("%a %H:%M"))
# hide the labels from the initial pandas plot
ax.set_xticklabels([], minor=True)
# make pretty
ax.get_figure().autofmt_xdate()
plt.show()

Resources