My data-frame contains the following column headers: subject, Group, MASQ_GDA, MASQ_AA, MASQ_GDD, MASQ_AD
I was successfully able to plot one of them using a bar plot with the following specifications:
bar_plot = sns.barplot(x="Group", y='MASQ_GDA', units="subject", ci = 68, hue="Group", data=demo_masq)
However, I am attempting to create several of such bar plot side by side. Might anyone know how I can accomplish this, for each plot to contain the remaining 3 variables (MASQ_AA, MASQ_GDD, MASQ_AD). Here is an example of what I am trying to achieve.
If you look in the documentation for sns.barplot(), you will see that the function accepts a parameter ax= allowing you to tell seaborn which Axes object to use to plot the result
ax : matplotlib Axes, optional
Axes object to draw the plot onto, otherwise uses the current Axes.
Therefore, the simple way to obtain the desired output is to create the Axes beforehand, and then calling sns.barplot() with the corresponding ax parameter
fig, axs = plt.subplots(1,4) # create 4 subplots on 1 row
for ax,col in zip(axs,["MASQ_GDA", "MASQ_AA", "MASQ_GDD", "MASQ_AD"]):
sns.barplot(x="Group", y=col, units="subject", ci = 68, hue="Group", data=demo_masq, ax=ax) # <- notice ax= argument
Another option, and maybe an option that is more in line with the philosophy of seaborn is to use a FacetGrid. This would allow you to automatically create the required number of subplots depending on the number of categories in your dataset. However, it requires to reshape your dataframe so that the content of your MASQ_* columns are on a single column, with a new column showing what category each value corresponds to.
Related
I am very new to python and plotly.express, and I find it very confusing...
I am trying to use the principle of adding different traces to my figure, using example code shown here https://plotly.com/python/line-charts/, Line Plot Modes, #Create traces.
BUT I get my data from a .CSV file.
import plotly.express as px
import plotly as plotly
import plotly.graph_objs as go
import pandas as pd
data = pd.read_csv(r"C:\Users\x.csv")
fig = px.scatter(data, x="Time", y="OD", color="C-source", size="C:A 1 ratio")
fig = px.line(data, x="Time", y="OD", color="C-source")
fig.show()
The above lines produces scatter/line plots with the correct data, but the data is mixed together. I have data from 2 different sources marked by a column named "Strain" in my .csv file that I would like the chart to reflect.
Is the traces option a possible way to do it, or is there another way?
You can add traces using an Express plot by using .select_traces(). Something like:
fig.add_traces(
list(px.line(...).select_traces())
)
Note the need to convert to list, since .select_traces() returns a generator.
It looks like you probably want the lines with the scatter dots as well on a single plot?
You're setting fig to equal px.scatter() and then setting (changing) it to equal px.line(). When set to line, the scatter plot is overwritten.
You're already importing graph objects so you can use add_trace with go, something like this:
fig.add_trace(go.Scatter(x=data["Time"], y=data["OD"], mode='markers', marker=dict(color=data["C-source"], size=data["C:A 1 ratio"])))
Depending on how your data is set up, you may need to add each C-source separately doing something like:
x=data.query("C-source=='Term'")["Time"], ... , name='Term'`
Here's a few references with examples and options you can use to set up your scatter:
Scatter plot examples
Marker styles
Scatter arguments and attributes
You can use the apporach stated in Plotly: How to combine scatter and line plots using Plotly Express?
fig3 = go.Figure(data=fig1.data + fig2.data)
or a more convenient and scalable approach:
fig1.data and fig2.data are common tuples that hold all the info needed for a plot and the + just concatenates them.
# this will hold all figures until they are combined
all_figures = []
# data_collection: dictionary with Pandas dataframes
for df_label in data_collection:
df = data_collection[df_label]
fig = px.line(df, x='Date', y=['Value'])
all_figures.append(fig)
import operator
import functools
# now you can concatenate all the data tuples
# by using the programmatic add operator
fig3 = go.Figure(data=functools.reduce(operator.add, [_.data for _ in all_figures]))
fig3.show()
thanks for taking the time to help me out. I ended up with two solutions that worked, of which using "facet_col" to divide the plot into two subplots (1 for each strain) was the most simple solution.
https://plotly.com/python/axes/
Thanks. this worked for me also where Fig_Set_B is a list of scatter plots
# create a tuple of first line plots in first 6 plots from plot set Fig_Set_B`
fig_combined = go.Figure(data= tuple(Fig_Set_B[x].data[0] for x in range(6)) )
fig_combined.show()
I'm trying to write a simple program that reads in a CSV with various datasets (all of the same length) and automatically plots them all (as a Pandas Dataframe scatter plot) on the same figure. My current code does this well, but all the marker colors are the same (blue). I'd like to figure out how to make a colormap so that in the future, if I have much larger data sets (let's say, 100+ different X-Y pairings), it will automatically color each series as it plots. Eventually, I would like for this to be a quick and easy method to run from the command line. I did not have luck reading the documentation or stack exchange, hopefully this is not a duplicate!
I've tried the recommendations from these posts:
1)Setting different color for each series in scatter plot on matplotlib
2)https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.scatter.html
3) https://matplotlib.org/users/colormaps.html
However, the first one essentially grouped the data points according to their position on the x-axis and made those groups of data the same color (not what I want, each series of data is roughly a linearly increasing function). The second and third links seemed to have worked, but I don't like the colormap choices (e.g. "viridis", many colors are too similar and it's hard to distinguish data points).
This is a simplified version of my code so far (took out other lines that automatically named axes, etc. to make it easier to read). I've also removed any attempts I've made to specify a colormap, for more of a blank canvas feel:
''' Importing multiple scatter data and plotting '''
import pandas as pd
import matplotlib.pyplot as plt
### Data file path (please enter Dataframe however you like)
path = r'/Users/.../test_data.csv'
### Read in data CSV
data = pd.read_csv(path)
### List of headers
header_list = list(data)
### Set data type to float so modified data frame can be plotted
data = data.astype(float)
### X-axis limits
xmin = 1e-4;
xmax = 3e-3;
## Create subplots to be plotted together after loop
fig, ax = plt.subplots()
### Since there are multiple X-axes (every other column), this loop only plots every other x-y column pair
for i in range(len(header_list)):
if i % 2 == 0:
dfplot = data.plot.scatter(x = "{}".format(header_list[i]), y = "{}".format(header_list[i + 1]), ax=ax)
dfplot.set_xlim(xmin,xmax) # Setting limits on X axis
plot.show()
The dataset can be found in the google drive link below. Thanks for your help!
https://drive.google.com/drive/folders/1DSEs8D7lIDUW4NIPBl2qW2EZiZxslGyM?usp=sharing
I'm trying to be able to control the colour of an individual data point using a corresponding rgb tuple. I've tried looping through the data set and plotting individual data points however I get the same effect as the code I have below; all that happens is it refuses to produce a graph.
This is an example of the data type I'm working with
Any tips?
import matplotlib.pyplot as plt
y=[(0.200,0.1100,0.520)]
for i in range(4):
y.append(y)
plt.plot([1,2,3,4], [3,4,5,2],c=y)
plt.show()
One problem is that you are appending the list to the new list. Instead, try appending the tuple to the list. Moreover, you need to use scatter plot for the color argument which contains rgb tuple for each point. However, in oyur case, I see only a single color for all the scatter points.
tup=(0.200,0.1100,0.520)
y = []
for i in range(4):
y.append(tup)
plt.scatter([1,2,3,4], [3,4,5,2], c=y)
A rather short version to your code is using a list comprehension
tup=(0.200,0.1100,0.520)
y = [tup for _ in range(4)]
plt.scatter([1,2,3,4], [3,4,5,2], c=y)
I have a question that is basically the same as a question back from 2014 (see here). However, my script still throws an error.
Here is what I do: I have a pandas dataframe with a few columns. I plot a simple boxplot comparison.
g = sns.boxplot(x='categories', y='oxygen', hue='target', data=df)
g.set_xticklabels(rotation=30)
The graph looks like this:
I'd like to rotate the x-labels by 30 degrees. Hence I use g.set_xticklabels(rotation=30). However, I get the following error:
set_xticklabels() missing 1 required positional argument: 'labels'
I don't know how to pass the matplotlib labels argument to seaborns sns.boxplot. Any ideas?
The question you link to uses a factorplot. A factorplot returns its own class which has a method called set_xticklabels(rotation). This is different from the set_xticklabels method of the matplotlib Axes.
In the linked question's answers there are also other options which you may use
ax = sns.boxplot(x='categories', y='oxygen', hue='target', data=df)
ax.set_xticklabels(ax.get_xticklabels(),rotation=30)
or
ax = sns.boxplot(x='categories', y='oxygen', hue='target', data=df)
plt.setp(ax.get_xticklabels(), rotation=45)
If you do not need to reset labels: ax.tick_params(axis='x', labelrotation=90)
So in a figure where three vertical subplots have been added with add_subplot, how can I select let's say the middle one?
Right now I do this list comprehension:
[r[0] for r in sorted([[ax, ax.get_geometry()[2]] for ax in self.figure.get_axes()], key=itemgetter(1))]
where I can simply select the index I want, with the corresponding axes. Is there a more straightforward way of doing this?
From the matplotlib documentation:
If the figure already has a subplot with key (args, kwargs) then it will simply make that subplot current and return it.
Here's an example:
import matplotlib.pyplot as plt
fig = plt.figure()
for vplot in [1,2,3]:
ax = fig.add_subplot(3,1,vplot)
ax.plot(range(10),range(10))
ax_again = fig.add_subplot(3,1,2)
ax_again.annotate("The middle one",xy=(7,5),xytext=(7,5))
plt.show()
The middle plot is called again so that it can be annotated.
What if I set the background with my original call, do I need to set it again when I get the subplot the second time?
Yes. The arguments and keywords for the original call are used to make a unique identifier. So for the figure to generate this unique identifier again, you need to pass the same arguments (grid definition, position) and keywords again. For example:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(2,1,1,axisbg='red')
ax.plot(range(10),range(10))
ax = fig.add_subplot(2,1,2)
ax.plot(range(10),range(10))
ax_again = fig.add_subplot(2,1,1,axisbg='red')
ax_again.annotate("The top one",xy=(7,5),xytext=(7,5))
plt.show()
What if I use ax_again.change_geometry() ?
You would think change_geometry, e.g. from a 312 to a 422, would change how you use add_subplot, but it doesn't. There appears to be a bug or undefined behavior when you call change_geometry. The unique key that was original generated using the arguments and keywords, to the first add_subplot call, does not get updated. Therefore, if you want to get an axis back with an add_subplot call, you need to call add_subplot with the original arguments and keywords. For more info, follow this issue report:
https://github.com/matplotlib/matplotlib/issues/429
My guess for now is that if you change any property of the subplot after generating it with add_subplot call, the unique will not be adjusted. So just use the original arguments and keywords, and hopefully this will work out.