How to get error bars on barchart PowerBI? - statistics

I want to have such barchart:
The error bar on each column should show dispersion (I have it calculated in one of the columns). And top lines show whether there is a significant difference. Right now I have only achieved such graph:
I am using simple clustered barchart in PowerBI Desktop. Maybe there is another visual for that or another program which could do it? Maybe Python somehow?

A mentioned here you can do that with matplotlib from python. Just as an example:
import numpy as np
import pylab as plt
data = np.array(np.random.rand(1000))
y,binEdges = np.histogram(data,bins=10)
bincenters = 0.5*(binEdges[1:]+binEdges[:-1])
menStd = np.sqrt(y)
width = 0.05
plt.bar(bincenters, y, width=width, color='r', yerr=menStd)
plt.show()

Related

How to add a regression line for the entire data in seaborn.lmplot?

I'm trying to plot the scatter plot in which each point is colored w.r.t the variable Points. Moreover, I want to add the regression line.
import pandas as pd
import urllib3
import seaborn as sns
decathlon = pd.read_csv("https://raw.githubusercontent.com/leanhdung1994/Deep-Learning/main/decathlon.txt", sep='\t')
g = sns.lmplot(
data = decathlon,
x="100m", y="Long.jump",
hue = 'Points', palette = 'viridis'
)
It seems to me that there are 2 regression lines, one for each group of the data. This is not what I want. I would like to have a regression line for the entire data. Moreover, how can I hide the legend on the right hand side?
Could you please elaborate on how to do so?
You should not use lmplot unless you need to use a FacetGrid to split your dataset in several subplots.
Since the example that you show does not use any of the functionalities provided by FacetGrid, you should instead create your plot using a combination of scatterplot() and regplot()
tips = sns.load_dataset('tips')
ax = sns.scatterplot(data=tips, x="total_bill", y="tip", hue="day")
sns.regplot(data=tips, x="total_bill", y="tip", scatter=False, ax=ax)

Using "hue" for a Seaborn visual: how to get legend in one graph?

I created a scatter plot in seaborn using seaborn.relplot, but am having trouble putting the legend all in one graph.
When I do this simple way, everything works fine:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
df2 = df[df.ln_amt_000s < 700]
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is a scatter plot as desired, with the legend on the right hand side.
However, when I try to generate a matplotlib figure and axes objects ahead of time to specify the figure dimensions I run into problems:
a4_dims = (10, 10) # generating a matplotlib figure and axes objects ahead of time to specify figure dimensions
df2 = df[df.ln_amt_000s < 700]
fig, ax = plt.subplots(figsize = a4_dims)
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is two graphs -- one that has the scatter plots as expected but missing the legend, and another one below it that is all blank except for the legend on the right hand side.
How do I fix this such? My desired result is one graph where I can specify the figure dimensions and have the legend at the bottom in two rows, below the x-axis (if that is too difficult, or not supported, then the default legend position to the right on the same graph would work too)? I know the problem lies with "ax=ax", and in the way I am specifying the dimensions as matplotlib figure, but I'd like to know specifically why this causes a problem so I can learn from this.
Thank you for your time.
The issue is that sns.relplot is a "Figure-level interface for drawing relational plots onto a FacetGrid" (see the API page). With a simple sns.scatterplot (the default type of plot used by sns.relplot), your code works (changed to use reproducible data):
df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv", index_col=0)
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
plt.show()
Further edits to legend
Seaborn's legends are a bit finicky. Some tweaks you may want to employ:
Remove the default seaborn title, which is actually a legend entry, by getting and slicing the handles and labels
Set a new title that is actually a title
Move the location and make use of bbox_to_anchor to move outside the plot area (note that the bbox parameters need some tweaking depending on your plot size)
Specify the number of columns
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], loc=8,
ncol=2, bbox_to_anchor=[0.5,-.3,0,0])
plt.show()

Plotting d orbital diagrams using matplotlib (or seaborn)

guys, I'm a chemist and I've finished an experiment that gave me the energies of a metal d orbitals.
It is relatively easy to get the correct proportion of energies in Excel 1 and use a drawing program like Inkscape to draw the diagram for molecular orbitals (like I did with this one below 2) but I’d love to use python to get a beautiful diagram that considers the energies of my orbitals like we see in the books.
My first attempt using seaborn and swarmplot is obviously too far from the correct approach and maybe (probably!) is not the correct way to get there. I'd be more than happy to achieve something like the right side here in 3.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Energies = [-0.40008, -0.39583, -0.38466, -0.23478, -0.21239]
orbitals = ["dz2", "dxy", "dyz", "dx2y2", "dxz"]
df = pd.DataFrame(Energies)
df["Orbitals"] = pd.DataFrame(orbitals)
sns.swarmplot(y=df[0], size=16)
Thanks for any help.
1 The excel one
2 Drawn by hand using the excel version as the model
3 Extracted from literature
You can draw anything you like deriving from basic shapes and functions in matplotlib. Energy levels could be simple markers, the texts can be produced by annotate.
import numpy as np
import matplotlib.pyplot as plt
Energies = [-0.40008, -0.39583, -0.38466, -0.23478, -0.21239]
orbitals = ["$d_{z^2}$", "$d_{xy}$", "$d_{yz}$", "$d_{x^2 - y^2}$", "$d_{xz}$"]
x = np.arange(len(Energies))
fig, ax = plt.subplots()
ax.scatter(x, Energies, s=1444, marker="_", linewidth=3, zorder=3)
ax.grid(axis='y')
for xi,yi,tx in zip(x,Energies,orbitals):
ax.annotate(tx, xy=(xi,yi), xytext=(0,-4), size=18,
ha="center", va="top", textcoords="offset points")
ax.margins(0.2)
plt.show()

Generating different marker shapes in plotly/cufflinks

This post is similar to this one (Change Marker Shapes in Plotly .js), but I can't seem to get anything to work in python. First off, I am trying to make a multi-line graph (which I have done in both plt and plotly...code below), but being colorblind (which I am) I can't often tell what I am looking in plotly because the markers are always a circle (even though the label is included, it sometimes gets cut off (i.e., when the labels are too long) and I can't figure out what I'm looking at). The plotly/cufflinks graphs are much better in terms of being interactive and since I do a lot of data presentations, this will be my preferred method going forward if I can figure out how to change the markers for each line.
I am using Jupyter Notebook (version: 5.4.0) and Python (version 3.6.4)
Screenshot of the dummy_data file.
dummy_data_screenshot
In matplotlib, I did the following to get the output attached (note the different shape markers):
import matplotlib.pyplot as plt
import matplotlib as mpl ##(version: 2.1.2)
import pandas as pd ##(version: 0.22.0)
import numpy as np ##(version: 1.14.0)
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot
import cufflinks as cf ##(version: 0.12.1)
init_notebook_mode(connected=True)
cf.go_offline()
%matplotlib notebook
df = pd.read_csv("desktop\dummy_data.csv")
fx = df.groupby(['studyarm', 'visit'])\
['totdiffic_chg'].mean().unstack('studyarm').drop(['02_UNSCH','ZEOS'])
valid_markers = ([item[0] for item in
mpl.markers.MarkerStyle.markers.items() if
item[1] is not 'nothing' and not item[1].startswith('tick')
and not item[1].startswith('caret')])
markers = np.random.choice(valid_markers, df.shape[1], replace=False)
ax = fx.plot(kind = 'line', linestyle='-')
for i, line in enumerate(ax.get_lines()):
line.set_marker(markers[i])
ax.legend(loc='best')
ax.set_xticklabels(df.index, rotation=45)
plt.title('Some Made Up Data')
plt.ylabel('Score', fontsize=14)
plt.autoscale(enable=True, axis='x', tight=True)
plt.tight_layout()
plt_image_dummy_data
I used the code below and it created the graph via plotly/cufflinks:
fx.iplot(kind='line', yTitle='Score', title='Some Made Up Data',
mode=markers, filename='cufflinks/simple-line')
plotly_image_dummy_data
I have searched the web for the last few days and I can see many options to change the marker color, opacity, etc., etc., but I can't seem to figure out a way to automatically and randomly change the shape of the markers OR to manually change each individual line to a separate marker shape.
I am sure this is a simple fix, but I can't figure it out. Any help (or nudge in the right direction) would be very much appreciated.!
You can specify the shape for scatter plots using the symbol property, like bellow:
Scatter(x = ..., y = ..., mode = 'lines+markers',
marker = dict(size = 10, symbol = 1, ...))
For example:
0 gives circles
1 gives squares
3 gives '+' signs
5 gives triangles, etc.
Have a look at the 'symbol' entry in Plotly's doc here: https://plot.ly/python/reference/#box-marker-symbol

Matplotlib: personalize imshow axis

I have the results of a (H,ranges) = numpy.histogram2d() computation and I'm trying to plot it.
Given H I can easily put it into plt.imshow(H) to get the corresponding image. (see http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow )
My problem is that the axis of the produced image are the "cell counting" of H and are completely unrelated to the values of ranges.
I know I can use the keyword extent (as pointed in: Change values on matplotlib imshow() graph axis ). But this solution does not work for me: my values on range are not growing linearly (actually they are going exponentially)
My question is: How can I put the value of range in plt.imshow()? Or at least, or can I manually set the label values of the plt.imshow resulting object?
Editing the extent is not a good solution.
You can just change the tick labels to something more appropriate for your data.
For example, here we'll set every 5th pixel to an exponential function:
import numpy as np
import matplotlib.pyplot as plt
im = np.random.rand(21,21)
fig,(ax1,ax2) = plt.subplots(1,2)
ax1.imshow(im)
ax2.imshow(im)
# Where we want the ticks, in pixel locations
ticks = np.linspace(0,20,5)
# What those pixel locations correspond to in data coordinates.
# Also set the float format here
ticklabels = ["{:6.2f}".format(i) for i in np.exp(ticks/5)]
ax2.set_xticks(ticks)
ax2.set_xticklabels(ticklabels)
ax2.set_yticks(ticks)
ax2.set_yticklabels(ticklabels)
plt.show()
Expanding a bit on #thomas answer
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mi
im = np.random.rand(20, 20)
ticks = np.exp(np.linspace(0, 10, 20))
fig, ax = plt.subplots()
ax.pcolor(ticks, ticks, im, cmap='viridis')
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlim([1, np.exp(10)])
ax.set_ylim([1, np.exp(10)])
By letting mpl take care of the non-linear mapping you can now accurately over-plot other artists. There is a performance hit for this (as pcolor is more expensive to draw than AxesImage), but getting accurate ticks is worth it.
imshow is for displaying images, so it does not support x and y bins.
You could either use pcolor instead,
H,xedges,yedges = np.histogram2d()
plt.pcolor(xedges,yedges,H)
or use plt.hist2d which directly plots your histogram.

Resources