I have two data-frames in python.
data_A
Name X Y
A 1 0
B 1 1
C 0 0
data_B
Name X Y
A 0 1
B 1 1
C 0 1
I would like to overlap these heatmaps, where if it is a 1 in data_frame A, then the tile is colored purple (or any color), but if it's a 1 in data_frame B, then a circle is drawn (preferably the first one).
So for example, the heatmap would show A[,X][1] colored purple, but those with 1 in both data frames would be purple with a dot. C[,Y][3] would have just a dot, while C[,X][3] would have nothing.
I can seem to mask, with seaborn, and plot two heatmaps with different colors, but the color differential isn't clear enough that a user can simply see that a tile has only one versus both. I think having a circle to denote a positive in one matrix would be better.
Does anyone have an idea of how to plot circles onto a heatmap using seaborn?
To show a heatmap you may use an imshow plot. To show some dots, you may use a scatter plot. Then just plot both in the same axes.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
dfA = pd.DataFrame([[1,0],[1,1],[0,0]], columns=list("XY"), index=list("ABC"))
dfB = pd.DataFrame([[0,1],[1,1],[0,1]], columns=list("XY"), index=list("ABC"))
assert dfA.shape == dfB.shape
x = np.arange(0,len(dfA.columns))
y = np.arange(0,len(dfB.index))
X,Y=np.meshgrid(x,y)
fig, ax = plt.subplots(figsize=(2.6,3))
ax.invert_yaxis()
ax.imshow(dfA.values, aspect="auto", cmap="Purples")
cond = dfB.values == 1
ax.scatter(X[cond], Y[cond], c="crimson", s=100)
ax.set_xticks(x)
ax.set_yticks(y)
ax.set_xticklabels(dfA.columns)
ax.set_yticklabels(dfA.index)
plt.show()
Alternatives to using a dot to show several datasets on the same heatmap could also
Plotting two distance matrices together on same plot?
something like plt.matshow but with triangles
Now, you can directly plot complex heatmap using python package PyComplexHeatmap: https://github.com/DingWB/PyComplexHeatmap
https://github.com/DingWB/PyComplexHeatmap/blob/main/examples.ipynb
Related
I am programming in Python 3 and I have data structured like this:
coordinates = [(0.15,0.25),(0.35,0.25),(0.55,0.45),(0.65,0.10),(0.15,0.25)]
These are coordinates. Within each pair, the first number is the x coordinate and the second one the y coordinate. Some of the coordinates repeat themselves. I want to plot these data like this:
The coordinates that are most frequently found should appear either as higher intensity (i.e., brighter) points or as points with a different color (for example, red for very frequent coordinates and blue for very infrequent coordinates). Don't worry about the circle and semicircle. That's irrelevant. Is there a matplotlib plot that can do this? Scatter plots do not work because they do not report on the frequency with which each coordinate is found. They just create a cloud.
The answer is:
import matplotlib.pyplot as plt
from scipy.stats import kde
import numpy as np
xvalues = np.random.normal(loc=0.5,scale=0.01,size=50000)
yvalues = np.random.normal(loc=0.25,scale=0.1,size=50000)
nbins=300
k = kde.gaussian_kde([xvalues,yvalues])
xi, yi = np.mgrid[0:1:nbins*1j,0:1:nbins*1j]
zi = k(np.vstack([xi.flatten(),yi.flatten()]))
fig, ax = plt.subplots()
ax.pcolormesh(xi, yi, zi.reshape(xi.shape), shading='auto', cmap=plt.cm.hot)
x = np.arange(0.0,1.01,0.01,dtype=np.float64)
y = np.sqrt((0.5*0.5)-((x-0.5)*(x-0.5)))
ax.axis([0,1,0,0.55])
ax.set_ylabel('S', fontsize=16)
ax.set_xlabel('G', fontsize=16)
ax.tick_params(labelsize=12, width=3)
ax.plot(x,y,'w--')
plt.show()
I'm trying to plot a boxplot for two different datasets on the same plot. The x axis are the hours in a day, while the y axis goes from 0 to 1 (let's call it Efficiency). I would like to have different markers for the means of each dataset' boxes. I use the 'meanprops' for seaborn but that changes the marker style for both datasets at the same time. I've added 2000 lines of data in the excel that can be downloaded here. The values might not coincide with the ones in the picture but should be enough.
Basically I want the red squares to be blue on the orange boxplot, and red on the blue boxplot. Here is what I managed to do so far:
I tried changing the meanprops by using a dictionary with the labels as keys , but it seems to be entering a loop (in PyCharm is says Evaluating...)
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
#make sure you have your path sorted out
group1 = pd.read_excel('group1.xls')
ax,fig = plt.subplots(figsize = (20,10))
#does not work
#ax = sns.boxplot(data=group1, x='hour', y='M1_eff', hue='labels',showfliers=False, showmeans=True,\
# meanprops={"marker":{'7':"s",'8':'s'},"markerfacecolor":{'7':"white",'8':'white'},
#"markeredgecolor":{'7':"blue",'8':'red'})
#works but produces similar markers
ax = sns.boxplot(data=group1, x='hour', y='M1_eff', hue='labels',showfliers=False, showmeans=True,\
meanprops={"marker":"s","markerfacecolor":"white", "markeredgecolor":"blue"})
plt.legend(title='Groups', loc=2, bbox_to_anchor=(1, 1),borderaxespad=0.5)
# Add transparency to colors
for patch in ax.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .4))
ax.set_xlabel("Hours",fontsize=14)
ax.set_ylabel("M1 Efficiency",fontsize=14)
ax.tick_params(labelsize=10)
plt.show()
I also tried the FacetGrid but to no avail (Stops at 'Evaluating...'):
g = sns.FacetGrid(group1, col="M1_eff", hue="labels",hue_kws=dict(marker=["^", "v"]))
g = (g.map(plt.boxplot, "hour", "M1_eff")
.add_legend())
g.show()
Any help is appreciated!
I don't think you can do this using sns.boxplot() directly. I think you'll have to draw the means "by hand"
N=100
df = pd.DataFrame({'hour':np.random.randint(0,3,size=(N,)),
'M1_eff': np.random.random(size=(N,)),
'labels':np.random.choice([7,8],size=(N,))})
x_col = 'hour'
y_col = 'M1_eff'
hue_col = 'labels'
width = 0.8
hue_order=[7,8]
marker_colors = ['red','blue']
# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(hue_order)
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()
fig, ax = plt.subplots()
ax = sns.boxplot(data=df, x=x_col, y=y_col, hue=hue_col, hue_order=hue_order, showfliers=False, showmeans=False)
means = df.groupby([hue_col,x_col])[y_col].mean()
for (gr,temp),o,c in zip(means.groupby(level=0),offsets,marker_colors):
ax.plot(np.arange(temp.values.size)+o, temp.values, 's', c=c)
I have a function and I want to draw points on that function.
For example:
def f(x):
return x ** 2 + 2 * x + 4
x_val= np.linspace(-6,6)
graph = f(x_val)
plt.plot(x_val, graph)
This will give the function evaluated at x_val.
I want to plot points on the graph at f(-2), f(2) like this.
You can use plot with the third parameter being a marker. The different markers can be found here. The third parameter of plot is a string where the first letter indicates the color ('r' for red, 'g' for green, ...) and the second letter the marker. More information can be found in the official docs.
import matplotlib.pyplot as plt
import numpy as np
def f(x):
return x ** 2 + 2 * x + 4
x_val= np.linspace(-6,6)
graph = f(x_val)
plt.plot(x_val, graph)
plt.plot(-2, f(-2), 'ro')
plt.plot(2, f(2), 'ro')
plt.show()
In case you have several points, you have two options:
Plot them individually using a for loop
Plot them all at once using a scatter plot via vectorised operation on a NumPy array of points (shown below)
points = np.array([-2, 2])
plt.plot(x_val, graph)
plt.scatter(points, f(points), c='r')
plt.show()
I am trying to fill the area between two vertical curves(RHOB and NPHI) using matplotlib.pyplot. Both RHOB and NPHI are having different scale of x-axis.
But when i try to plot i noticed that the fill_between is filling the area between RHOB and NPHI in the same scale.
#well_data is the data frame i am reading to get my data
#creating my subplot
fig, ax=plt.subplots(1,2,figsize=(8,6),sharey=True)
ax[0].get_xaxis().set_visible(False)
ax[0].invert_yaxis()
#subplot 1:
#ax01 to house the NPHI curve (NPHI curve are having values between 0-45)
ax01=ax[0].twiny()
ax01.set_xlim(-15,45)
ax01.invert_xaxis()
ax01.set_xlabel('NPHI',color='blue')
ax01.spines['top'].set_position(('outward',0))
ax01.tick_params(axis='x',colors='blue')
ax01.plot(well_data.NPHI,well_data.index,color='blue')
#ax02 to house the RHOB curve (RHOB curve having values between 1.95,2.95)
ax02=ax[0].twiny()
ax02.set_xlim(1.95,2.95)
ax02.set_xlabel('RHOB',color='red')
ax02.spines['top'].set_position(('outward',40))
ax02.tick_params(axis='x',colors='red')
ax02.plot(well_data.RHOB,well_data.index,color='red')
# ax03=ax[0].twiny()
# ax03.set_xlim(0,50)
# ax03.spines['top'].set_position(('outward',80))
# ax03.fill_betweenx(well_data.index,well_data.RHOB,well_data.NPHI,alpha=0.5)
plt.show()
ax03=ax[0].twiny()
ax03.set_xlim(0,50)
ax03.spines['top'].set_position(('outward',80))
ax03.fill_betweenx(well_data.index,well_data.RHOB,well_data.NPHI,alpha=0.5)
above is the code that i tried, but the end result is not what i expected.
it is filling area between RHOB and NPHI assuming RHOB and NPHI is in the same scale.
How can i fill the area between the blue and the red curve?
Since the data are on two different axes, but each artist needs to be on one axes alone, this is hard. What would need to be done here is to calculate all data in a single unit system. You might opt to transform both datasets to display-space first (meaning pixels), then plot those transformed data via fill_betweenx without transforming again (transform=None).
import numpy as np
import matplotlib.pyplot as plt
y = np.linspace(0, 22, 101)
x1 = np.sin(y)/2
x2 = np.cos(y/2)+20
fig, ax1 = plt.subplots()
ax2 = ax1.twiny()
ax1.tick_params(axis="x", colors="C0", labelcolor="C0")
ax2.tick_params(axis="x", colors="C1", labelcolor="C1")
ax1.set_xlim(-1,3)
ax2.set_xlim(15,22)
ax1.plot(x1,y, color="C0")
ax2.plot(x2,y, color="C1")
x1p, yp = ax1.transData.transform(np.c_[x1,y]).T
x2p, _ = ax2.transData.transform(np.c_[x2,y]).T
ax1.autoscale(False)
ax1.fill_betweenx(yp, x1p, x2p, color="C9", alpha=0.4, transform=None)
plt.show()
We might equally opt to transform the data from the second axes to the first. This has the advantage that it's not defined in pixel space and hence circumvents a problem that occurs when the figure size is changed after the figure is created.
x2p, _ = (ax2.transData + ax1.transData.inverted()).transform(np.c_[x2,y]).T
ax1.autoscale(False)
ax1.fill_betweenx(y, x1, x2p, color="grey", alpha=0.4)
I am currently doing my internship program in a company and my educational background is actually Petroleum Geoscience, nothing related to programming. So I apologize for any mistakes that I have made or I am about to make.
I was tasked by my supervisor to produce a polar contour plot just like the example below. The example below was generated from the OriginPro (Trial), after the trial period expires I couldn't use the commercial software anymore to produce polar contour plots so I really need help in producing the exactly the same plot in python with different set of data in the future.
The data imported for this plot are from an excel spreadsheet, there is no problem in importing data from the spreadsheet and plot heat map and contour map, the problem only arises when I attempted to produce a polar contour plot from the data given. From what I read, it is because for heat map and contour map it is projected on Cartesian plane which makes it pretty straight forward, but for polar plots you need a certain form of calculation to change from Cartesian coordination to polar coordination? please correct me if I'm wrong.
This is what happened when i tried in python, it should be looking like the example ive given above
]2
and this is the script that I used for the failed plotting
import numpy as np
import matplotlib.pyplot as plt
x = df_ODD.loc[:, 'Azimuth'].values.reshape(19,74)
y = df_ODD.loc[:, 'Inclination'].values.reshape(19,74)
z = df_ODD.loc[:, 'Values'].values.reshape(19,74)
f, ax = plt.subplots(subplot_kw=dict(projection='polar'))
plt.contour(x,y,z)
ax.set_theta_zero_location("N")
ax.set_theta_direction(-1)
cb = fig.colorbar(cax)
cax = ax.contourf(theta, r, values, 30)
cb.set_label("Normalized deviatoric stress")
plt.show()
Below is the form of data imported from the excel spreadsheet if you are wondering, only the columns labelled as "X" , "Y" and "Z", ignore the column labelled as "β, Azimuth". The rows of data stretches down until 1400++ rows.
]3
I really need help in solving this problem, I hope any of you could give me a hand. Thanks
Below is the plot i get after changing the degrees to radians.
and here is the script, i added the mathematical function of converting degrees to radians
import numpy as np
import matplotlib.pyplot as plt
import math
x = df_ODD.loc[:, 'Azimuth'].values.reshape(19,74)
y = df_ODD.loc[:, 'Inclination'].values.reshape(19,74)
z = df_ODD.loc[:, 'Values'].values.reshape(19,74)
xi = x * math.pi/180
yi = y * math.pi/180
zi = z * math.pi/180
f, ax = plt.subplots(subplot_kw=dict(projection='polar'))
plt.contour(xi,yi,zi) # choose 20 contour levels, just to show how good its interpolation is
#ax[1].plot(x,y, 'ko ')
ax.set_theta_zero_location("N")
ax.set_theta_direction(-1)
cb = fig.colorbar(cax)
cb.set_label("Normalized deviatoric stress")
#plt.savefig('attempt polar contour.png')
plt.show()