Python - Add new curve from a df into existing lineplot - python-3.x

I create a plot using sns base on a DafaFrame.
Now, I would like to add new curve from another dataframe on the plot created previusly.
This is the code of my plot:
tline = sns.lineplot(x='reads', y='time', data=df, hue='method', style='method', markers=True, dashes=False, ax=axs[0, 0])
tline.set_xlabel('Numero di reads')
tline.set_ylabel ('Time [s]')
tline.legend(loc='lower right')
tline.set_yscale('log')
tline.autoscale(enable=True, axis='x')
tline.autoscale(enable=True, axis='y')
Now I have another Dataframe with the same column of the first DataFrame. How can I add this new curve with a custom entry in the legend?
This is the structure of the DataFrame:
Dataset
Method
Reads
Time
Peak-memory
14M
Set
14000000
7.33
1035204
20K
Set
200000
0.38
107464
200K
Set
20000
0.07
42936
2M
Set
28428648
16.09
2347740
28M
Set
2000000
1.41
240240

I suggest to use matplotlibs OOP interface like this
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
# generate sample data
time_column = np.arange(10)
data_column1 = np.random.randint(0, 10, 10)
data_column2 = np.random.randint(0, 10, 10)
# store in pandas dfs
df1 = pd.DataFrame(zip(time_column, data_column1), columns=['Time', 'Data'])
df2 = pd.DataFrame(zip(time_column, data_column2), columns=['Time', 'Data'])
f, ax = plt.subplots()
sns.lineplot(df1.Time, df1.Data, label='foo', ax=ax)
sns.lineplot(df2.Time, df2.Data, label='bar', ax=ax)
ax.legend()
plt.show()
which generates the following output
the important thing is that both lineplots are on the same subplot (ax in this case).

Related

Need to force overlapping for seaborn's heatmap and kdeplot

I'm trying to combine seaborn's heatmap and kdeplot in one figure, but so far the result is not very promising since I cannot find a way to make them overlap. As a result, the heatmap is just squeezed to the left side of the figure.
I think the reason is that seaborn doesn't seem to recognize the x-axis as the same one in two charts (see picture below), although the data points are exactly the same. The only difference is that for heatmap I needed to pivot them, while for the kdeplot pivoting is not needed.
Therefore, data for the axis are coming from the same dataset, but in the different forms as it can be seen in the code below.
The dataset sample looks something like this:
X Y Z
7,75 280 52,73
3,25 340 54,19
5,75 340 53,61
2,5 180 54,67
3 340 53,66
1,75 340 54,81
4,5 380 55,18
4 240 56,49
4,75 380 55,17
4,25 180 55,40
2 420 56,42
2,25 380 54,90
My code:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
f, ax = plt.subplots(figsize=(11, 9), dpi=300)
plt.tick_params(bottom='on')
# dataset is just a pandas frame with data
X1 = dataset.iloc[:, :3].pivot("X", "Y", "Z")
X2 = dataset.iloc[:, :2]
ax = sns.heatmap(X1, cmap="Spectral")
ax.invert_yaxis()
ax2 = plt.twinx()
sns.kdeplot(X2.iloc[:, 1], X2.iloc[:, 0], ax=ax2, zorder=2)
ax.axis('tight')
plt.show()
Please help me with placing kdeplot on top of the heatmap. Ideally, I would like my final plot to look something like this:
Any tips or hints will be greatly appreciated!
The question can be a bit hard to understand, because the dataset can't be "just some data". The X and Y values need to lie on a very regular grid. No X,Y combination can be repeated, but not all values appear. The kdeplot will then show where the used values of X,Y are concentrated.
Such a dataset can be simulated by first generating dummy data for a full grid, and then take a subset.
Now, a seaborn heatmap uses categorical X and Y axes. Such axes are very hard to align with the kdeplot. To obtain a similar heatmap with numerical axes, ax.pcolor() can be used.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
xs = np.arange(2, 10, 0.25)
ys = np.arange(150, 400, 10)
# first create a dummy dataset over a full grid
dataset = pd.DataFrame({'X': np.repeat(xs, len(ys)),
'Y': np.tile(ys, len(xs)),
'Z': np.random.uniform(50, 60, len(xs) * len(ys))})
# take a random subset of the rows
dataset = dataset.sample(200)
fig, ax = plt.subplots(figsize=(11, 9), dpi=300)
X1 = dataset.pivot("X", "Y", "Z")
collection = ax.pcolor(X1.columns, X1.index, X1, shading='nearest', cmap="Spectral")
plt.colorbar(collection, ax=ax, pad=0.02)
# default, cut=3, which causes a lot of surrounding whitespace
sns.kdeplot(x=dataset["Y"], y=dataset["X"], cut=1.5, ax=ax)
fig.tight_layout()
plt.show()

Create 3D Plot- Depth/Time/Temp From Large .csv file_Python 3.x

I am trying to create a 3D Temperature plot vs Depth vs Time with a large .csv data-set. The example below is created in matlab. I want a similar output using Python 3.x with reverse scales on the Temperature and Depth axis.
Example output with a few mods needed
I have started off with the following code:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
# Get the data (csv file is hosted on the web)
data = pd.read_csv('C;\\Path\\TestData_Temp-Time-Depth_3DPlot.csv')
# Transform it to a long format
df = data.unstack().reset_index()
df.columns = ["X", "Y", "Z"]
# And transform the old column name in something numeric
df['X'] = pd.Categorical(df['X'])
df['X'] = df['X'].cat.codes
# Make the plot
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_trisurf(df['Y'], df['X'], df['Z'], cmap=plt.cm.jet, linewidth=0.2)
plt.show()
# to Add a color bar which maps values to colors.
surf = ax.plot_trisurf(df['Y'], df['X'], df['Z'], cmap=plt.cm.jet, linewidth=0.2)
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()
# Rotate it
ax.view_init(30, 45)
plt.show()
# Other palette
ax.plot_trisurf(df['Y'], df['X'], df['Z'], cmap=plt.cm.jet, linewidth=0.01)
plt.show()
I am having issues understanding how to assign values from from csv to the x, y, z axis.
The example data I am using is formatted like:
csv data structure
Example data download: Download Example Data
Thank you in advance.

Seaborn barplot with two y-axis

considering the following pandas DataFrame:
labels values_a values_b values_x values_y
0 date1 1 3 150 170
1 date2 2 6 200 180
It is easy to plot this with Seaborn (see example code below). However, due to the big difference between values_a/values_b and values_x/values_y, the bars for values_a and values_b are not easily visible (actually, the dataset given above is just a sample and in my real dataset the difference is even bigger). Therefore, I would like to use two y-axis, i.e., one y-axis for values_a/values_b and one for values_x/values_y. I tried to use plt.twinx() to get a second axis but unfortunately, the plot shows only two bars for values_x and values_y, even though there are at least two y-axis with the right scaling. :) Do you have an idea how to fix that and get four bars for each label whereas the values_a/values_b bars relate to the left y-axis and the values_x/values_y bars relate to the right y-axis?
Thanks in advance!
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
# working example but with unreadable values_a and values_b
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted)
plt.show()
# values_a and values_b are not displayed
values1_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_a", "values_b"],\
var_name="source1", value_name="value_numbers1")
values2_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_x", "values_y"],\
var_name="source2", value_name="value_numbers2")
g1 = sns.barplot(x=columns[0], y="value_numbers1", hue="source1",\
data=values1_melted)
ax2 = plt.twinx()
g2 = sns.barplot(x=columns[0], y="value_numbers2", hue="source2",\
data=values2_melted, ax=ax2)
plt.show()
This is probably best suited for multiple sub-plots, but if you are truly set on a single plot, you can scale the data before plotting, create another axis and then modify the tick values.
Sample Data
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
Code:
# Scale the data, just a simple example of how you might determine the scaling
mask = test_data_melted.source.isin(['values_a', 'values_b'])
scale = int(test_data_melted[~mask].value_numbers.mean()
/test_data_melted[mask].value_numbers.mean())
test_data_melted.loc[mask, 'value_numbers'] = test_data_melted.loc[mask, 'value_numbers']*scale
# Plot
fig, ax1 = plt.subplots()
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted, ax=ax1)
# Create a second y-axis with the scaled ticks
ax1.set_ylabel('X and Y')
ax2 = ax1.twinx()
# Ensure ticks occur at the same positions, then modify labels
ax2.set_ylim(ax1.get_ylim())
ax2.set_yticklabels(np.round(ax1.get_yticks()/scale,1))
ax2.set_ylabel('A and B')
plt.show()

How can I add a normal distribution curve to multiple histograms?

With the following code I create four histograms:
import numpy as np
import pandas as pd
data = pd.DataFrame(np.random.normal((1, 2, 3 , 4), size=(100, 4)))
data.hist(bins=10)
I want the histograms to look like this:
I know how to make it one graph at the time, see here
But how can I do it for multiple histograms without specifying each single one? Ideally I could use 'pd.scatter_matrix'.
Plot each histogram seperately and do the fit to each histogram as in the example you linked or take a look at the hist api example here. Essentially what should be done is
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
fig = plt.figure()
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)
for ax in [ax1, ax2, ax3, ax4]:
n, bins, patches = ax.hist(**your_data_here**, 50, normed=1, facecolor='green', alpha=0.75)
bincenters = 0.5*(bins[1:]+bins[:-1])
y = mlab.normpdf( bincenters, mu, sigma)
l = ax.plot(bincenters, y, 'r--', linewidth=1)
plt.show()

Plotting a timeseris graph from pandas dataframe using matplotlib

I have the following data in a csv file
SourceID BSs hour Type
7208 87 11 MAIN
11060 67 11 MAIN
3737 88 11 MAIN
9683 69 11 MAIN
I have the following python code.I want to plot a graph with the following specifications.
For each SourceID and Type I want to plot a graph of BSs over time. I would prefer if each SourceID and Type is a subplot on single plot.I have tried a lot of options using groupby, but can't seem to get it work.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
COLLECTION = 'NEW'
DATA = r'C:\Analysis\Test\{}'.format(COLLECTION)
INPUT_FILE = DATA + r'\in.csv'
OUTPUT_FILE = DATA + r'\out.csv'
with open(INPUT_FILE) as fin:
df = pd.read_csv(INPUT_FILE,
usecols=["SourceID", 'hour','BSs','Type'],
header=0)
df.drop_duplicates(inplace=True)
df.reset_index(inplace=True)
It's still not 100% clear to me what sort of plot you actually want, but my guess is that you're looking for something like this:
from matplotlib import pyplot as plt
# group by SourceID and Type, find out how many unique combinations there are
grps = df.groupby(['SourceID', 'Type'])
ngrps = len(grps)
# make a grid of axes
ncols = int(np.sqrt(ngrps))
nrows = -(-ngrps // ncols)
fig, ax = plt.subplots(nrows, ncols, sharex=True, sharey=True)
# iterate over the groups, plot into each axis
for ii, (idx, rows) in enumerate(grps):
rows.plot(x='hour', y='BSs', style='-s', ax=ax.flat[ii], legend=False,
scalex=False, scaley=False)
# hide any unused axes
for aa in ax.flat[ngrps:]:
aa.set_axis_off()
# set the axis limits
ax.flat[0].set_xlim(df['hour'].min() - 1, df['hour'].max() + 1)
ax.flat[0].set_ylim(df['BSs'].min() - 5, df['BSs'].max() + 5)
fig.tight_layout()

Resources