Empty plot on Bokeh Tutorial Exercise - python-3.x

I'm following the bokeh tutorial and in the basic plotting section, I can't manage to show a plot. I only get the axis. What am I missing?
Here is the code:
df = pd.DataFrame.from_dict(AAPL)
weekapple = df.loc["2000-03-01":"2000-04-01"]
p = figure(x_axis_type="datetime", title="AAPL", plot_height=350, plot_width=800)
p.xgrid.grid_line_color=None
p.ygrid.grid_line_alpha=0.5
p.xaxis.axis_label = 'Time'
p.yaxis.axis_label = 'Value'
p.line(weekapple.date, weekapple.close)
show(p)
I get this:
My result
I'm trying to complete the exercise here (10th Code cell - Exercise with AAPL data) I was able to follow all previous code up to that point correctly.
Thanks in advance!

In case this is still relevant, this is how you should do you selection:
df = pd.DataFrame.from_dict(AAPL)
# Convert date column in df from strings to the proper datetime format
date_format="%Y-%m-%d"
df["date"] = pd.to_datetime(df['date'], format=date_format)
# Use the same conversion for selected dates
weekapple = df[(df.date>=dt.strptime("2000-03-01", date_format)) &
(df.date<=dt.strptime("2000-04-01", date_format))]
p = figure(x_axis_type="datetime", title="AAPL", plot_height=350, plot_width=800)
p.xgrid.grid_line_color=None
p.ygrid.grid_line_alpha=0.5
p.xaxis.axis_label = 'Time'
p.yaxis.axis_label = 'Value'
p.line(weekapple.date, weekapple.close)
show(p)
To make this work, before this code, I have (in my Jupyter notebook):
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
import bokeh
import pandas as pd
from datetime import datetime as dt
bokeh.sampledata.download()
from bokeh.sampledata.stocks import AAPL
output_notebook()
As descried at, https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html, .loc is used in operations with the index (or boolean lists); date is not in the index in your dataframe (it is a regular column).
I hope this helps.

You dataframe sub-view is empty:
In [3]: import pandas as pd
...: from bokeh.sampledata.stocks import AAPL
...: df = pd.DataFrame.from_dict(AAPL)
...: weekapple = df.loc["2000-03-01":"2000-04-01"]
In [4]: weekapple
Out[4]:
Empty DataFrame
Columns: [date, open, high, low, close, volume, adj_close]
Index: []

Related

Altair adding date slider for interactive scatter chart pot

Has anyone tried using date as a slider for Altair interactive scatter plots?
I'm trying to reproduce a similar plot to the gapminder scatter:
1) Instead of a year filter I'm trying to use a date e.g. '2020-01-05' and having the follow error:
altair.vegalite.v4.schema.core.BindRange->max, validating 'type'
'2020-05-17T00:00:00' is not of type 'number'
2) When I try to parse it as an int, nothing shows up in the plot
3) Examples of using the Year slider: https://www.datacamp.com/community/tutorials/altair-in-python
https://altair-viz.github.io/gallery/multiple_interactions.html
4) Also a timestamp option wouldn't be ideal as the date needs to be readable
Would appreciate any help. Thanks
#Date Slider
from altair import datum
from datetime import datetime
import altair as alt
import pandas as pd
import numpy as np
import datetime as dt
date_slider = alt.binding_range(min=min(df['date']), max=max(df['date']), step=1)
slider_selection = alt.selection_single(bind=date_slider, fields=['date'], name="Date", init={'week_starting': max(df[‘date’]})
alt.Chart(df).mark_point(filled=True).encode(
x='mom_pct',
y='yoy_pct',
size='n_queries',
color='vertical',
tooltip = ['vertical', 'yoy_pct', 'mom_pct']
).properties(
width=800,
height=600
).add_selection(slider_selection).transform_filter(slider_selection)
Vega-Lite sliders do not support datetime display, but it is possible to display timestamps. Here is a full example (I didn't base it off of your code, because you did not provide any data):
import altair as alt
import pandas as pd
import numpy as np
from datetime import datetime
datelist = pd.date_range(datetime.today(), periods=100).tolist()
rand = np.random.RandomState(42)
df = pd.DataFrame({
'xval': datelist,
'yval': rand.randn(100).cumsum(),
})
def timestamp(t):
return pd.to_datetime(t).timestamp() * 1000
slider = alt.binding_range(name='cutoff:', min=timestamp(min(datelist)), max=timestamp(max(datelist)))
selector = alt.selection_single(name="SelectorName", fields=['cutoff'],
bind=slider,init={"cutoff": timestamp("2020-05-05")})
alt.Chart(df).mark_point().encode(
x='xval',
y='yval',
opacity=alt.condition(
'toDate(datum.xval) < SelectorName.cutoff[0]',
alt.value(1), alt.value(0)
)
).add_selection(
selector
)
Unfortunately, Vega-Lite does not currently provide any native way to create a slider that displays a formatted date.
Another way to workaround this issue is using another chart in place of the slider. This let's your see the date as well as using a range for the selection which is also not possible with sliders at the moment.
import altair as alt
from vega_datasets import data
import pandas as pd
# Convert date column to an actual date and filter the data.
movies = (
data.movies()
.assign(Release_Date=lambda df: pd.to_datetime(df['Release_Date']))
.query('1960 < Release_Date < 2010')
.sample(1_000, random_state=90384))
select_year = alt.selection_interval(encodings=['x'])
bar_slider = alt.Chart(movies).mark_bar().encode(
x='year(Release_Date)',
y='count()').properties(height=50).add_selection(select_year)
scatter_plot = alt.Chart(movies).mark_circle().encode(
x='Rotten_Tomatoes_Rating',
y='IMDB_Rating',
opacity=alt.condition(
select_year,
alt.value(0.7), alt.value(0.1)))
scatter_plot & bar_slider

Pandas plotting graph with timestamp

pandas 0.23.4
python 3.5.3
I have some code that looks like below
import pandas as pd
from datetime import datetime
from matplotlib import pyplot
def dateparse():
return datetime.strptime("2019-05-28T00:06:20,927", '%Y-%m-%dT%H:%M:%S,%f')
series = pd.read_csv('sample.csv', delimiter=";", parse_dates=True,
date_parser=dateparse, header=None)
series.plot()
pyplot.show()
The CSV file looks like below
2019-05-28T00:06:20,167;2070
2019-05-28T00:06:20,426;147
2019-05-28T00:06:20,927;453
2019-05-28T00:06:22,688;2464
2019-05-28T00:06:27,260;216
As you can see 2019-05-28T00:06:20,167 is the timestamp with milliseconds and 2070 is the value that I want plotted.
When I run this the graph gets printed however on the X-Axis I see numbers which is a bit odd. I was expecting to see actual timestamps (like MS Excel). Can someone tell me what I am doing wrong?
You did not set datetime as index. Aslo, you don't need a date parser, just pass the columns you want to parse:
dfstr = '''2019-05-28T00:06:20,167;2070
2019-05-28T00:06:20,426;147
2019-05-28T00:06:20,927;453
2019-05-28T00:06:22,688;2464
2019-05-28T00:06:27,260;216'''
df = pd.read_csv(pd.compat.StringIO(dfstr), sep=';',
header=None, parse_dates=[0])
plt.plot(df[0], df[1])
plt.show()
Output:
Or:
df.set_index(0)[1].plot()
gives a little better plot:

Keyerror in time/Date Components of datetime - what to do?

I am using a pandas DataFrame with datetime indexing. I know from the
Xarray documentation, that datetime indexing can be done as ds['date.year'] with ds being the DataArray of xarray, date the date index and years of the dates. Xarray points to datetime components which again leads to DateTimeIndex, the latter being panda documentation. So I thought of doing the same with pandas, as I really like this feature.
However, it is not working for me. Here is what I did so far:
# Import required modules
import pandas as pd
import numpy as np
# Create DataFrame (name: df)
df=pd.DataFrame({'Date': ['2017-04-01','2017-04-01',
'2017-04-02','2017-04-02'],
'Time': ['06:00:00','18:00:00',
'06:00:00','18:00:00'],
'Active': [True,False,False,True],
'Value': np.random.rand(4)})
# Combine str() information of Date and Time and format to datetime
df['Date']=pd.to_datetime(df['Date'] + ' ' + df['Time'],format = '%Y-%m-%d %H:%M:%S')
# Make the combined data the index
df = df.set_index(df['Date'])
# Erase the rest, as it is not required anymore
df = df.drop(['Time','Date'], axis=1)
# Show me the first day
df['2017-04-01']
Ok, so this shows me only the first entries. So far, so good.
However
df['Date.year']
results in KeyError: 'Date.year'
I would expect an output like
array([2017,2017,2017,2017])
What am I doing wrong?
EDIT:
I have a workaround, which I am able to go on with, but I am still not satisfied, as this doesn't explain my question. I did not use a pandas DataFrame, but an xarray Dataset and now this works:
# Load modules
import pandas as pd
import numpy as np
import xarray as xr
# Prepare time array
Date = ['2017-04-01','2017-04-01', '2017-04-02','2017-04-02']
Time = ['06:00:00','18:00:00', '06:00:00','18:00:00']
time = [Date[i] + ' ' + Time[i] for i in range(len(Date))]
time = pd.to_datetime(time,format = '%Y-%m-%d %H:%M:%S')
# Create Dataset (name: ds)
ds=xr.Dataset({'time': time,
'Active': [True,False,False,True],
'Value': np.random.rand(4)})
ds['time.year']
which gives:
<xarray.DataArray 'year' (time: 4)>
array([2017, 2017, 2017, 2017])
Coordinates:
* time (time) datetime64[ns] 2017-04-01T06:00:00 ... 2017-04-02T18:00:00
Just in terms of what you're doing wrong, your are
a) trying to call an index as a series
b) chaning commands within a string df['Date'] is a single column df['Date.year'] is a column called 'Date.year'
if you're datetime is the index, then use the .year or dt.year if it's a series.
df.index.year
#or assuming your dtype is a proper datetime (your code indicates it is)
df.Date.dt.year
hope that helps bud.

Displaying Pandas Series with Asian text in index

If the index of a Pandas Series is in one of the Asian languages and has variable lengths, then the print-out would not be aligned correctly.
import pandas as pd
from IPython.display import display
df = pd.Series( range(2), index = [ 'ミートボールスパゲッティ', 'ご飯' ] )
display(df)
print(df)
df
Note that this only happens with Series, with DataFrame display can display the content nicely.
How can I fix the output here?
Convert the Series to a DataFrame for display. Currently, Series have no to_html() method. Therefore, they cannot be displayed in this format directly.
import pandas as pd
from IPython.display import display
df = pd.Series( range(2), index = [ 'ミートボールスパゲッティ', 'ご飯' ] )
df.to_frame()

Parse Years in Python 3.4 Pandas and Bokeh from counter dictionary

I'm struggling with creating a Bokeh time series graph from the output of the counter function from collections.
import pandas as pd
from bokeh.plotting import figure, output_file, show
import collections
plotyears = []
counter = collections.Counter(plotyears)
output_file("years.html")
p = figure(width=800, height=250, x_axis_type="datetime")
for number in sorted(counter):
yearvalue = number, counter[number]
p.line(yearvalue, color='navy', alpha=0.5)
show(p)
The output of yearvalue when printed is:
(2013, 132)
(2014, 188)
(2015, 233)
How can I make bokeh make the years as x-axis and numbers as y-axis. I have tried to follow the Time series tutorial, but I can't use the pd.read_csv and parse_dates=['Date'] functionalities since I'm not reading a csv file.
The simple way is to convert your data into a pandas DataFrame (with pd.DataFrame) and after create a datetime column with your year column.
simple example :
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
output_notebook()
years = [2012,2013,2014,2015]
val = [230,120,200,340]
# Convert your data into a panda DataFrame format
data=pd.DataFrame({'year':years, 'value':val})
# Create a new column (yearDate) equal to the year Column but with a datetime format
data['yearDate']=pd.to_datetime(data['year'],format='%Y')
# Create a line graph with datetime x axis and use datetime column(yearDate) for this axis
p = figure(width=800, height=250, x_axis_type="datetime")
p.line(x=data['yearDate'],y=data['value'])
show(p)

Resources