I have 6 graphs done with plotly-subplots where I show the stock price and its simple moving average (20 days for instance). What I would like to do is for each subplot (except one), color the area in red where the current price is below the simple moving average and green if the price is above its simple moving average (and if possible as extra, no color when the stock price is crossing over or under the average: so the color area would start and end at the marker).
I understand I could use add_vrect to define my color area but this seems to be ok for static data: here each subplot would need its "own conditional coloring"
For clarity, I used only 3 stocks/graphs in my example (and the coloring would be on the second and third graphs):
import yfinance as yf
import pandas as pd
import numpy as np
from plotly.subplots import make_subplots
import plotly.graph_objects as go
tickers = ['AAPL', 'MSFT']
df = yf.download(tickers, period='1y', interval='1d', progress=False)['Close']
df_bench = yf.download('SPY', period='1y', interval='1d', progress=False)
df['AAPL_SMA'] = df['AAPL'].rolling(20).mean()
df['MSFT_SMA'] = df['MSFT'].rolling(20).mean()
fig = make_subplots(
rows=1, cols=3,
subplot_titles=('AAPL', 'MSFT')
)
fig.add_trace(
go.Candlestick(x=df_bench .index,
open=df_bench ['Open'],
high=df_bench ['High'],
low=df_bench ['Low'],
close=df_bench ['Close'],
name='Benchmark'),
row=1, col=1
)
fig.add_trace(
go.Scatter(x=df.index, y=df['AAPL'], name='AAPL', marker_color='blue'),
row=1, col=2
)
fig.add_trace(
go.Scatter(x=df.index, y=df['AAPL_SMA'], name='HYG/TLT_SMA', marker_color='red'),
row=1, col=2
)
fig.add_trace(
go.Scatter(x=df.index, y=df['MSFT'], name='MSFT', marker_color='blue'),
row=1, col=3
)
fig.add_trace(
go.Scatter(x=df.index, y=df['MSFT_SMA'], name='MSFT_SMA', marker_color='red'),
row=1, col=3
)
fig.update_layout(height=1200, width=2400, title_text='Dynamic coloring test',
showlegend=False, template='plotly_white',
hovermode='x unified', xaxis_rangeslider_visible=False)
fig.show()
I have added two booleans columns in my dataframe df where I test whether the area should be red (-1) or green (1):
df['AAPL_Sig'] = np.where(df['AAPL'] < df['AAPL_SMA'], -1, 1)
df['MSFT_Sig'] = np.where(df['MSFT'] < df['MSFT_SMA'], -1, 1)
Now I am stuck and would need some pointers as to how to use add_vrect (or maybe there are better functions -?-) dynamically and on some of the subplots.
Related
I have a plot made up of 3 choropleth subplots next to each other. I set the overall height and width to my desired dimensions (800 x 400 pixels). I want each subplot to go from top to bottom, but as it stands, the subplots retain the aspect ratio of 2:1, meaning I have wide margins at top and bottom. Those I want to remove.
As a minimum example, I am attaching the data and plot code:
The toy dataset:
import geopandas as gpd
from shapely.geometry.polygon import Polygon
minidf = gpd.GeoDataFrame(dict(
krs_code = ["08111", "08118"],
m_rugged = [42.795776, 37.324421],
bip = [83747, 43122],
cm3_over_1999 = [47.454688, 47.545940],
geometry = [Polygon(((9.0397, 48.6873),
(9.0397, 48.8557),
(9.3152, 48.8557),
(9.3152, 48.6873),
(9.0397, 48.6873))),
Polygon(((8.8757, 48.7536),
(8.8757, 49.0643),
(9.4167, 49.0643),
(9.4167, 48.7536),
(8.8757, 48.7536)))]
)).set_index("krs_code")
The plotting code:
import json
from plotly.subplots import make_subplots
import plotly.graph_objects as go
fig = make_subplots(rows = 1, cols = 3,
specs = [[{"type": "choropleth"}, {"type": "choropleth"}, {"type": "choropleth"}]],
horizontal_spacing = 0.0025 )
fig.update_layout(height = 400, width = 800,
margin = dict(t=0, r=0, b=0, l=0),
coloraxis_showscale=False )
for i, column in enumerate(["m_rugged", "cm3_over_1999", "bip"]):
fig.add_trace(
go.Choropleth(
locations = minidf.index,
z = minidf[column].astype(float), # Data to be color-coded
geojson = json.loads(minidf[["geometry"]].to_json()),
showscale = False
),
col = i+1, row = 1)
fig.update_geos(fitbounds="locations", visible=True)
fig.show()
Notice the margins at top and bottom, which retain the aspect ratio of each subplot, while they are supposed to stretch from top to bottom:
I tried several parameters within go.Choropleth() and .update_layout(), but to no avail.
I would like to know how I can create a gridded map of a country(i.e. Singapore) with resolution of 200m x 200m squares. (50m or 100m is ok too)
I would then use the 'nearest neighbour' technique to assign a rainfall data and colour code to each square based on the nearest rainfall station's data.
[I have the latitude,longitude & rainfall data for all the stations for each date.]
Then, I would like to store the data in an Array for each 'gridded map' (i.e. from 1-Jan-1980 to 31-Dec-2021)
Can this be done using python?
P.S Below is a 'simple' version I did as an example to how the 'gridded' map should look like for 1 particular day.
https://i.stack.imgur.com/9vIeQ.png
Thank you so much!
Can this be done using python? YES
I have previously provided a similar answer binning spatial dataframe. Reference that also for concepts
you have noted that you are working with Singapore geometry and rainfall data. To setup an answer I have sourced this data from government sources
for purpose on answer I have used 2kmx2km grid so when plotting to demonstrate answer resource utilisation is reduced
core concept: create a grid of box polygons that cover the total bounds of the geometry. Note it's important to use UTM CRS here so that bounds in meters make sense. Once boxes are created remove boxes that are within total bounds but do not intersect with actual geometry
next create a geopandas dataframe of rainfall data. Use longitude and latitude of weather station to create points
final step, join_nearest() grid geometry with rainfall data geometry and data
clearly this final data frame gdf_grid_rainfall is a data frame, which is effectively an array. You can use as an array as you please ...
have provided a folium and plotly interactive visualisations that demonstrate clearly solution is working
solution
Dependent on data sourcing
# number of meters
STEP = 2000
a, b, c, d = gdf_sg.to_crs(gdf_sg.estimate_utm_crs()).total_bounds
# create a grid for Singapore
gdf_grid = gpd.GeoDataFrame(
geometry=[
shapely.geometry.box(minx, miny, maxx, maxy)
for minx, maxx in zip(np.arange(a, c, STEP), np.arange(a, c, STEP)[1:])
for miny, maxy in zip(np.arange(b, d, STEP), np.arange(b, d, STEP)[1:])
],
crs=gdf_sg.estimate_utm_crs(),
).to_crs(gdf_sg.crs)
# restrict grid to only squares that intersect with Singapore geometry
gdf_grid = (
gdf_grid.sjoin(gdf_sg)
.pipe(lambda d: d.groupby(d.index).first())
.set_crs(gdf_grid.crs)
.drop(columns=["index_right"])
)
# geodataframe of weather station locations and rainfall by date
gdf_rainfall = gpd.GeoDataFrame(
df_stations.merge(df, on="id")
.assign(
geometry=lambda d: gpd.points_from_xy(
d["location.longitude"], d["location.latitude"]
)
)
.drop(columns=["location.latitude", "location.longitude"]),
crs=gdf_sg.crs,
)
# weather station to nearest grid
gdf_grid_rainfall = gpd.sjoin_nearest(gdf_grid, gdf_rainfall).drop(
columns=["Description", "index_right"]
)
# does it work? let's visualize with folium
gdf_grid_rainfall.loc[lambda d: d["Date"].eq("20220622")].explore("Rainfall (mm)", height=400, width=600)
data sourcing
import requests, itertools, io
from pathlib import Path
import urllib
from zipfile import ZipFile
import fiona.drvsupport
import geopandas as gpd
import numpy as np
import pandas as pd
import shapely.geometry
# get official Singapore planning area geometry
url = "https://geo.data.gov.sg/planning-area-census2010/2014/04/14/kml/planning-area-census2010.zip"
f = Path.cwd().joinpath(urllib.parse.urlparse(url).path.split("/")[-1])
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(f)
zfile.extractall(f.stem)
fiona.drvsupport.supported_drivers['KML'] = 'rw'
gdf_sg = gpd.read_file(
[_ for _ in Path.cwd().joinpath(f.stem).glob("*.kml")][0], driver="KML"
)
# get data about Singapore weather stations
df_stations = pd.json_normalize(
requests.get("https://api.data.gov.sg/v1/environment/rainfall").json()["metadata"][
"stations"
]
)
# dates to get data from weather.gov.sg
dates = pd.date_range("20220601", "20220730", freq="MS").strftime("%Y%m")
df = pd.DataFrame()
# fmt: off
bad = ['S100', 'S201', 'S202', 'S203', 'S204', 'S205', 'S207', 'S208',
'S209', 'S211', 'S212', 'S213', 'S214', 'S215', 'S216', 'S217',
'S218', 'S219', 'S220', 'S221', 'S222', 'S223', 'S224', 'S226',
'S227', 'S228', 'S229', 'S230', 'S900']
# fmt: on
for stat, month in itertools.product(df_stations["id"], dates):
if not stat in bad:
try:
df_ = pd.read_csv(
io.StringIO(
requests.get(
f"http://www.weather.gov.sg/files/dailydata/DAILYDATA_{stat}_{month}.csv"
).text
)
).iloc[:, 0:5]
except pd.errors.ParserError as e:
bad.append(stat)
print(f"failed {stat} {month}")
df = pd.concat([df, df_.assign(id=stat)])
df["Rainfall (mm)"] = pd.to_numeric(
df["Daily Rainfall Total (mm)"], errors="coerce"
)
df["Date"] = pd.to_datetime(df[["Year","Month","Day"]]).dt.strftime("%Y%m%d")
df = df.loc[:,["id","Date","Rainfall (mm)", "Station"]]
visualisation using plotly animation
import plotly.express as px
# reduce dates so figure builds in sensible time
gdf_px = gdf_grid_rainfall.loc[
lambda d: d["Date"].isin(
gdf_grid_rainfall["Date"].value_counts().sort_index().index[0:15]
)
]
px.choropleth_mapbox(
gdf_px,
geojson=gdf_px.geometry,
locations=gdf_px.index,
color="Rainfall (mm)",
hover_data=gdf_px.columns[1:].tolist(),
animation_frame="Date",
mapbox_style="carto-positron",
center={"lat":gdf_px.unary_union.centroid.y, "lon":gdf_px.unary_union.centroid.x},
zoom=8.5
).update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0, "pad": 4})
I'm trying to plot around 300 users and how many purchases they have made. My data is in a pandas dataframe, where the column 'ID' refers to a user and 'Number' to the number of purchases.
I have tried so far with the following code I have found but never manage to get all the IDs on one plot?
This is the code:
import random
# Prepare Data
n = subs_['Number'].unique().__len__()+1
all_colors = list(plt.cm.colors.cnames.keys())
random.seed(100)
c = random.choices(all_colors, k=n)
# Plot Bars
plt.figure(figsize=(16,10), dpi= 60)
plt.bar(subs_['ID'], subs_['Number'], color=c, width=.5)
for i, val in enumerate(subs_['Number'].values):
plt.text(i, val, float(val), horizontalalignment='center', verticalalignment='bottom', fontdict={'fontweight':500, 'size':10})
# Decoration
plt.gca().set_xticklabels(subs_['ID'], rotation=60, horizontalalignment= 'right')
plt.title("Number of purchases by user", fontsize=22)
plt.ylabel('# Purchases')
plt.ylim(0, 45)
plt.show()
bar chart of user purchases:
I think that your problem is coming from your IDE:
import random
import matplotlib.pyplot as plt
import pandas as pd
# Prepare Data
d = {'ID': range(1, 300), 'Number': range(1, 300)}
subs_ = pd.DataFrame(data=d)
n = subs_['Number'].unique().__len__()+1
all_colors = list(plt.cm.colors.cnames.keys())
random.seed(100)
c = random.choices(all_colors, k=n)
# Plot Bars
plt.figure(figsize=(16,10), dpi= 60)
plt.bar(subs_['ID'], subs_['Number'], color=c, width=.5)
for i, val in enumerate(subs_['Number'].values):
plt.text(i, val, float(val), horizontalalignment='center', verticalalignment='bottom', fontdict={'fontweight':500, 'size':10})
# Decoration
plt.gca().set_xticklabels(subs_['ID'], rotation=60, horizontalalignment= 'right')
plt.title("Number of purchases by user", fontsize=22)
plt.ylabel('# Purchases')
plt.ylim(0, 45)
plt.show()
Is working fine for me:
I am new in coding with python, I am trying to develop a bar chart with percentage on top. I have a sample data frame Quiz2. I developed code and gives only 1600% at first single bar. Kindly any one with help how can i do it correct?
#Approach 2
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set()
%matplotlib inline
Quiz2 = pd.DataFrame({'Kaha': ['16', '5'], 'Shiny': ['16', '10']})
data=Quiz2 .rename(index={0: "Male", 1: "Female"})
data=data.astype(float)
Q1p = data[['Kaha','Shiny']].plot(kind='bar', figsize=(5, 5), legend=True, fontsize=12)
Q1p.set_xlabel("Gender", fontsize=12)
Q1p.set_ylabel("Number of people", fontsize=12)
#Q1p.set_xticklabels(x_labels)
for p in Q1p.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
Q1p.annotate(f'{height:.0%}', (x + width/2, y + height*1.02), ha='center')
plt.show()
I want the percentage of Kaha (with 21 sum total) to appear as (76.2% for Male and 23.8% for Female) and that of shy (with 26 sum total) as (61.5% for Male and 38.5%for Female). Kindly requesting help
In approach 2, the reason you have only 1 value displaying is the plt.show()
should be outdented so it comes after the processing of the for loop. You are getting a value of 1600% because you are plotting the value as the height of the bar in the line beginning with Q1p.annotate(f'{height:.0%}' Instead of height this should be height/10*total or something to give you the percentage.
Here is a solution, but not sure if I am computing the percentages correctly:
Quiz2 = pd.DataFrame({'Kaha': ['16', '5'], 'Shiny': ['16', '10']})
data=Quiz2 .rename(index={0: "Male", 1: "Female"})
data=data.astype(float)
total = len(data)*10
Q1p = data[['Kaha','Shiny']].plot(kind='bar', figsize=(5, 5), legend=True, fontsize=12)
Q1p.set_xlabel("Gender", fontsize=12)
Q1p.set_ylabel("Number of people", fontsize=12)
#Q1p.set_xticklabels(x_labels)
for p in Q1p.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
Q1p.annotate(f'{height/total:.0%}', (x + width/2, y + height*1.02), ha='center')
plt.show()
I'd like to adjust the colour scheme to this boxplot so that the group on the left are dark & light blue, and on the right dark and light red. I've made the colours I want in my_colours but I still can't figure out how to do it. Here's the code for the data:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
a1 = list(np.random.normal(.70, .20, 20))
a2 = list(np.random.normal(.5, .15, 20))
b1 = list(np.random.normal(.78, .20, 20))
b2 = list(np.random.normal(.4, .25, 20))
levsA = ['a' for i in range(40)]
levsB = ['b' for i in range(40)]
itemsa = [1 for i in range(20)] + [2 for i in range(20)]
itemsb = [1 for i in range(20)] + [2 for i in range(20)]
df = pd.DataFrame({'cs':a1 + a2 + b1+ b2,
'levels':levsA + levsB,
'type':itemsa + itemsb})
my_colours = ((0.1216, 0.4667, 0.7059),
(0.8392, 0.1529, 0.1569),
(0.6824, 0.7804, 0.9098),
(1, 0.5961, 0.5882))
sns.set_palette(my_colours)
sns.boxplot(x='type', y='cs', hue='levels', data=df)
I would like them in this order:
The boxes are PathPatches. You may loop over them and set their color. One would need to pay attention to the order they appear in the axes though.
import matplotlib.patches
boxes = ax.findobj(matplotlib.patches.PathPatch)
for color, box in zip(my_colours[::2]+my_colours[1::2], boxes):
box.set_facecolor(color)